ComputerVisionFoundation Videos
ComputerVisionFoundation Videos
  • 5 139
  • 3 234 745
Elastic Shape-From-Template With Spatially Sparse Deforming Forces | Spotlight 1-1B
Abed Malti; Cédric Herzet
Current Elastic SfT (Shape from Template) methods are based on l2-norm minimization. None can accurately recover the spatial location of the acting forces since l2-norm based minimization tends to find the best tradeoff among noisy data to fit an elastic model. In this work, we study shapes that are deformed with spatially sparse set of forces. We propose two formulations for a new class of SfT problems dubbed here SLE-SfT (Sparse Linear Elastic-SfT). The First ideal formulation uses an l0-norm to minimize the cardinal of non-zero components of the deforming forces. The second relaxed formulation uses an l1-norm to minimize the sum of absolute values of force components. These new formulations do not use Solid Boundary Constraints (SBC) which are usually needed to rigidly position the shape in the frame of the deformed image. We introduce the Projective Elastic Space Property (PESP) that jointly encodes the reprojection constraint and the elastic model. We prove that filling this property is necessary and sufficient for the relaxed formulation to: (i) retrieve the ground-truth 3D deformed shape, (ii) recover the right spatial domain of non-zero deforming forces. (iii) It also proves that we can rigidly place the deformed shape in the image frame without using SBC. Finally, we prove that when filling PESP, resolving the relaxed formulation provides the same ground-truth solution as the ideal formulation. Results with simulated and real data show substantial improvements in recovering the deformed shapes as well as the spatial location of the deforming forces.
zhlédnutí: 205

Video

Consistent-Aware Deep Learning for Person Re-Identification in a Camera Network | Spotlight 2-2B
zhlédnutí 169Před 3 měsíci
Ji Lin; Liangliang Ren; Jiwen Lu; Jianjiang Feng; Jie Zhou In this paper, we propose a consistent-aware deep learning (CADL) framework for person re-identification in a camera network. Unlike most existing person re-identification methods which identify whether two body images are from the same person, our approach aims to obtain the maximal correct matches for the whole camera network. Differe...
Deeply Aggregated Alternating Minimization for Image Restoration | Spotlight 1-1C
zhlédnutí 142Před 3 měsíci
Youngjung Kim; Hyungjoo Jung; Dongbo Min; Kwanghoon Sohn Regularization-based image restoration has remained an active research topic in image processing and computer vision. It often leverages a guidance signal captured in different fields as an additional cue. In this work, we present a general framework for image restoration, called deeply aggregated alternating minimization (DeepAM). We pro...
Probabilistic Temporal Subspace Clustering | Spotlight 3-1A
zhlédnutí 66Před 3 měsíci
Behnam Gholami; Vladimir Pavlovic Subspace clustering is a common modeling paradigm used to identify constituent modes of variation in data with locally linear structure. These structures are common to many problems in computer vision, including modeling time series of complex human motion. However classical subspace clustering algorithms learn the relationships within a set of data without con...
Panelformer: Sewing Pattern Reconstruction From 2D Garment Images
zhlédnutí 161Před 3 měsíci
Authors: Cheng-Hsiu Chen; Jheng-Wei Su; Min-Chun Hu; Chih-Yuan Yao; Hung-Kuo Chu Description: In this paper, we present a novel approach for reconstructing garment sewing patterns from 2D garment images. Our method addresses the challenge of handling occlusion in 2D images by leveraging the symmetric and correlated nature of garment panels. We introduce a transformer-based deep neural network c...
Guided Distillation for Semi-Supervised Instance Segmentation
zhlédnutí 365Před 3 měsíci
Authors: Tariq Berrada; Camille Couprie; Karteek Alahari; Jakob Verbeek Description: Although instance segmentation methods have improved considerably, the dominant paradigm is to rely on fully annotated training images, which are tedious to obtain. To alleviate this reliance, and boost results, semi-supervised approaches leverage unlabeled data as an additional training signal that limits over...
MetaSeg: MetaFormer-Based Global Contexts-Aware Network for Efficient Semantic Segmentation
zhlédnutí 155Před 3 měsíci
Authors: Beoungwoo Kang; Seunghun Moon; Yubin Cho; Hyunwoo Yu; Suk-Ju Kang Description: Beyond the Transformer, it is important to explore how to exploit the capacity of the MetaFormer, an architecture that is fundamental to the performance improvements of the Transformer. Previous studies have exploited it only for the backbone network. Unlike previous studies, we explore the capacity of the M...
Gradient-Guided Knowledge Distillation for Object Detectors
zhlédnutí 233Před 3 měsíci
Authors: Qizhen Lan; Qing Tian Description: Deep learning models have demonstrated remarkable success in object detection, yet their complexity and computational intensity pose a barrier to deploying them in real-world applications (e.g., self-driving perception). Knowledge Distillation (KD) is an effective way to derive efficient models. However, only a small number of KD methods tackle object...
Small Objects Matters in Weakly-Supervised Semantic Segmentation
zhlédnutí 167Před 3 měsíci
Authors: Cheolhyun Mun; Sanghuk Lee; Youngjung Uh; Junsuk Choe; Hyeran Byun Description: Weakly-supervised semantic segmentation (WSSS) performs pixel-wise classification given only image-level labels for training. Despite the difficulty of this task, the research community has achieved promising results over the last five years. Still, current WSSS literature misses the detailed sense of how w...
BPKD: Boundary Privileged Knowledge Distillation for Semantic Segmentation
zhlédnutí 103Před 3 měsíci
Authors: Liyang Liu; Zihan Wang; Minh Hieu Phan; Bowen Zhang; Jinchao Ge; Yifan Liu Description: Current knowledge distillation approaches in semantic segmentation tend to adopt a holistic approach that treats all spatial locations equally. However, for dense prediction, students' predictions on edge regions are highly uncertain due to contextual information leakage, requiring higher spatial se...
Query-Guided Attention in Vision Transformers for Localizing Objects Using a Single Sketch
zhlédnutí 77Před 3 měsíci
Query-Guided Attention in Vision Transformers for Localizing Objects Using a Single Sketch
TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding
zhlédnutí 49Před 3 měsíci
TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding
High-Fidelity Pseudo-Labels for Boosting Weakly-Supervised Segmentation
zhlédnutí 141Před 3 měsíci
High-Fidelity Pseudo-Labels for Boosting Weakly-Supervised Segmentation
Graph Neural Networks for End-to-End Information Extraction From Handwritten Documents
zhlédnutí 108Před 3 měsíci
Graph Neural Networks for End-to-End Information Extraction From Handwritten Documents
Real-Time User-Guided Adaptive Colorization With Vision Transformer
zhlédnutí 192Před 3 měsíci
Authors: Gwanghan Lee; Saebyeol Shin; Taeyoung Na; Simon S. Woo Description: Recently, the vision transformer (ViT) has achieved remarkable performance in computer vision tasks and has been actively utilized in colorization. Vision transformer uses multi-head self attention to effectively propagate user hints to distant relevant areas in the image. However, despite the success of vision transfo...
Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
zhlédnutí 36Před 3 měsíci
Authors: Shangbang Long; Siyang Qin; Yasuhisa Fujii; Alessandro Bissacco; Michalis Raptis Description: We propose Hierarchical Text Spotter (HTS), a novel method for the joint task of word-level text spotting and geometric layout analysis. HTS can recognize text in an image and identify its 4-level hierarchical structure: characters, words, lines, and paragraphs. The proposed HTS is characteriz...
Towards Domain-Aware Knowledge Distillation for Continual Model Generalization
zhlédnutí 60Před 3 měsíci
Authors: Nikhil Reddy; Mahsa Baktashmotlagh; Chetan Arora Description: Generalization on unseen domains is critical for Deep Neural Networks (\dnns) to perform well in real-world applications such as autonomous navigation. However, catastrophic forgetting limit the ability of domain generalization and unsupervised domain adaption approaches to adapt to constantly changing target domains. To ove...
Let’s Observe Them Over Time: An Improved Pedestrian Attribute Recognition Approach
zhlédnutí 139Před 3 měsíci
Authors: Kamalakar Vijay Thakare; Debi Prosad Dogra; Heeseung Choi; Haksub Kim; Ig-Jae Kim Description: Despite poor image quality, occlusions, and small training datasets, recent pedestrian attribute recognition (PAR) methods have achieved considerable performance. However, leveraging only spatial information of different attributes limits their reliability and generalizability. This paper int...
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
zhlédnutí 44Před 3 měsíci
Authors: Cheng Yang; Rui Xu; Ye Guo; Peixiang Huang; Yiru Chen; Wenkui Ding; Zhongyuan Wang; Hong Zhou Description: Visual commonsense reasoning (VCR) is a challenging multi-modal task, which requires high-level cognition and commonsense reasoning ability about the real world. In recent years, large-scale pre-training approaches have been developed and promoted the state-of-the-art performance ...
Beyond Fusion: Modality Hallucination-Based Multispectral Fusion for Pedestrian Detection
zhlédnutí 39Před 3 měsíci
Beyond Fusion: Modality Hallucination-Based Multispectral Fusion for Pedestrian Detection
Improved Techniques for Quantizing Deep Networks With Adaptive Bit-Widths
zhlédnutí 32Před 3 měsíci
Improved Techniques for Quantizing Deep Networks With Adaptive Bit-Widths
Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation
zhlédnutí 78Před 3 měsíci
Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation
Understanding Dark Scenes by Contrasting Multi-Modal Observations
zhlédnutí 26Před 3 měsíci
Understanding Dark Scenes by Contrasting Multi-Modal Observations
Can Vision-Language Models Be a Good Guesser? Exploring VLMs for Times and Location Reasoning
zhlédnutí 37Před 3 měsíci
Can Vision-Language Models Be a Good Guesser? Exploring VLMs for Times and Location Reasoning
Patch-Based Selection and Refinement for Early Object Detection
zhlédnutí 23Před 3 měsíci
Patch-Based Selection and Refinement for Early Object Detection
Implicit Neural Representation for Change Detection
zhlédnutí 57Před 3 měsíci
Implicit Neural Representation for Change Detection
Foundation Model Assisted Weakly Supervised Semantic Segmentation
zhlédnutí 84Před 3 měsíci
Foundation Model Assisted Weakly Supervised Semantic Segmentation
Unsupervised Graphic Layout Grouping With Transformers
zhlédnutí 20Před 3 měsíci
Unsupervised Graphic Layout Grouping With Transformers
Beyond Classification: Definition and Density-Based Estimation of Calibration in Object Detection
zhlédnutí 25Před 3 měsíci
Beyond Classification: Definition and Density-Based Estimation of Calibration in Object Detection
Universal Semi-Supervised Model Adaptation via Collaborative Consistency Training
zhlédnutí 27Před 3 měsíci
Universal Semi-Supervised Model Adaptation via Collaborative Consistency Training

Komentáře

  • @safaridah-rn1yi
    @safaridah-rn1yi Před 6 dny

    I want to contact you, email address please

  • @safaridah-rn1yi
    @safaridah-rn1yi Před 6 dny

    I want to contact you, email address please

  • @GirishKumar-de5op
    @GirishKumar-de5op Před 7 dny

    we need any dataset or csv file to get height

  • @usernotfound778
    @usernotfound778 Před 12 dny

    That's awesome I have to implement this for my university graduation project can i have any way to contact with you i have some questions

  • @arvindaa
    @arvindaa Před 20 dny

    hello sir can you please made how to implement the project

  • @pinkomega4106
    @pinkomega4106 Před 23 dny

    No calibration for the model in the experiment? The MLLS only performs well with proper calibration

  • @arvindaa
    @arvindaa Před 23 dny

    good job bro nice😃

  • @chenliangwang7882
    @chenliangwang7882 Před 25 dny

    I also particularly need a version with sound. It's a pity that such a great tutorial doesn't have any sound.

  • @abrahamvj3672
    @abrahamvj3672 Před 27 dny

    Can i get the codebase for the same if possible. Thank You

  • @shrikantAmbre-oq1kh
    @shrikantAmbre-oq1kh Před měsícem

    is it any code have you

  • @titirbiswas956
    @titirbiswas956 Před měsícem

    Where can I find the code ? If I could use it , it would be really helpful to me

  • @enestemel9490
    @enestemel9490 Před měsícem

    A word of thanks to the speakers would be appreciated before continuing with the next speaker right away. Thank you to all the speakers for your contributions.

  • @bugdary
    @bugdary Před měsícem

    any github ?

  • @nobdan123
    @nobdan123 Před měsícem

    It’s either mute or black screen.

  • @rahulsinghgulia6666
    @rahulsinghgulia6666 Před měsícem

    Is there any reference Github project on this guided VAE model ?

  • @kesavasriramkothamasu3797

    Nice presentation

  • @kaelthunderhoof5619
    @kaelthunderhoof5619 Před 2 měsíci

    Can this be used for a web application for a salon virtual try on?

  • @omarh.l.l8127
    @omarh.l.l8127 Před 2 měsíci

    Thanks for the presentation. Is the code open sourced? Can you share the repository if so? Thank you

  • @daimsharif1615
    @daimsharif1615 Před 2 měsíci

    Daym

  • @iMaTzzz
    @iMaTzzz Před 2 měsíci

    Really great video and paper ! I'm quite surprised it hasn't been looked much before as I haven't seen a lot of papers in this topic. But I believe it may be quite useful in generic cases and could easily be used to train on a particular dataset so thank you, I really appreciate your work !

  • @JeracVideo
    @JeracVideo Před 2 měsíci

    Hi, I'm searching for software that can detect eyes, nose, and ears on a human face in a point cloud and return the coordinates of those features. Then, I'd like to merge a second point cloud according to these coordinates. I'm not sure where to start. Can someone give me some hints? Thanks!

  • @abdulmuqtadir6470
    @abdulmuqtadir6470 Před 2 měsíci

    since test dataset have no public gt, how you have measured ANLS and accuracy? can you please! explain.

  • @TheNitroPython
    @TheNitroPython Před 2 měsíci

    could you plz post any links to a github/paper to this?

  • @user-nx8nq7gj5g
    @user-nx8nq7gj5g Před 2 měsíci

    code implementation ??

  • @danieleneh3193
    @danieleneh3193 Před 2 měsíci

    Please can you help with the githu repo

  • @paone9851
    @paone9851 Před 2 měsíci

    What is this crap

  • @DH0809
    @DH0809 Před 2 měsíci

    🎉🎉🎉 great job guy! Are you have any plan to open source layerDoc pre-train model?

  • @anish2099
    @anish2099 Před 2 měsíci

    HII Do you have the implementation of this paper I need

  • @user-kb9ft3es8u
    @user-kb9ft3es8u Před 2 měsíci

    please give me video data set link

  • @danibot3000
    @danibot3000 Před 2 měsíci

    It would be great to have a program to execute which then reads chart images and spits out a chart in excel etc.

  • @ZEENATFATIMA-jk1zl
    @ZEENATFATIMA-jk1zl Před 2 měsíci

    sir your code is not fully available could you kindly guide about training and evaluation . and how raw* images obtained. it Would be helpful

  • @itgoons
    @itgoons Před 2 měsíci

    can you pass on the source code link or any related material

  • @tiancaixinxin
    @tiancaixinxin Před 2 měsíci

    Thanks for watching the video about our paper!

  • @chrisrosch4731
    @chrisrosch4731 Před 3 měsíci

    How well would you say does this work for example to determine whether a person looks creepy in an image. E.g. creepy stare

  • @dialectricStudios
    @dialectricStudios Před 3 měsíci

    Sick 😤

  • @waniubaid7718
    @waniubaid7718 Před 3 měsíci

    Sahi h boss❤

  • @yyongfan
    @yyongfan Před 3 měsíci

    Is there a GEThib project?

  • @ashoknp
    @ashoknp Před 3 měsíci

    Thank you for sharing this algorithm and making the video. This is a very useful algorithm.

  • @pengfeichu7009
    @pengfeichu7009 Před 3 měsíci

    can you show your code

  • @user-ud8lp9gl6l
    @user-ud8lp9gl6l Před 3 měsíci

    Where can I get the dataset?

  • @BatashIs
    @BatashIs Před 3 měsíci

    Can you share the code and make a video on image augmention without changing the value

  • @improvement_developer8995
    @improvement_developer8995 Před 3 měsíci

    Graft

  • @dreamzdziner8484
    @dreamzdziner8484 Před 3 měsíci

    This looks super cool. Wish this gets a local install for windows😊

  • @rajpulapakura001
    @rajpulapakura001 Před 3 měsíci

    Hi Harsh, thanks for the awesome explanation and paper. I have a question. I am building a model for brain tumor segmentation. Each brain MRI scan contains 4 modalities: FLAIR, T1, T1CE, and T2, each providing complementary information about the subject's brain. Obviously, the reason I watched this video is because I wanted to create a segmentation model which could perform well even if some modalities were not available during inference. My question relates to the learnable token/embedding you use when you randomly mask the modalities during training. In the video you are using a dataset with 2 modalities, RGB and Depth. In my case, I have 4 modalities, so should I learn a "mask embedding" for each modality? Also, how do you learn this "mask embedding"? I would love to have a chat and discuss the details of your approach. Again, awesome paper and contributions. Hope to talk to you soon. Cheers, Raj, 16 y.o. ML Engineer from Australia

  • @user-ud1mv8zh2w
    @user-ud1mv8zh2w Před 3 měsíci

    please how can i implement this with efficientnetV2 in keras

  • @dhirendrakumar5158
    @dhirendrakumar5158 Před 3 měsíci

    The work is very interesting. Please upload the code I could not access the code for the work.

  • @fadwa2325
    @fadwa2325 Před 3 měsíci

    Can I get the code please?

  • @user-wc2qm9sw9n
    @user-wc2qm9sw9n Před 3 měsíci

    I actually like this topic

  • @mukeshkhanna305
    @mukeshkhanna305 Před 3 měsíci

    Can I get the code for it?

  • @Harshit-cv4ie
    @Harshit-cv4ie Před 3 měsíci

    why is this not presented by authors??