Guided Distillation for Semi-Supervised Instance Segmentation

Improving Vision-and-Language Reasoning via Spatial Relations Modeling

What is RAG? (Retrieval Augmented Generation)

Backstage 🤫 tutorial #elsarca #tiktok

ROCK PAPER SCISSOR! (50 MLN CHALLENGE!) feat @PANDAGIRLOFFICIAL #shorts

ZETOR NEBO NISSAN? 😳😳 #ukazkaru

Can Vision-Language Models Be a Good Guesser? Exploring VLMs for Times and Location Reasoning

ComputerVisionFoundation Videos

zhlédnutí 43

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 28. 01. 2024
Authors: Gengyuan Zhang; Yurui Zhang; Kerui Zhang; Volker Tresp
Description: Vision-Language Models (VLMs) are expected to be capable of reasoning with commonsense knowledge as human beings. One example is that humans can reason where and when an image is taken based on their knowledge. This makes us wonder if, based on visual cues, Vision-Language Models that are pre-trained with large-scale image-text resources can achieve and even surpass human capability in reasoning times and location. To address this question, we propose a two-stage Recognition & Reasoning probing task applied to discriminative and generative VLMs to uncover whether VLMs can recognize times and location-relevant features and further reason about it. To facilitate the studies, we introduce WikiTiLo, a well-curated image dataset compromising images with rich socio-cultural cues. In extensive evaluation experiments, we find that although VLMs can effectively retain times and location-relevant features in visual encoders, they still fail to make perfect reasoning with context-conditioned visual features. The dataset is available at github.com/gengyuanmax/WikiTiLo.
Věda a technologie

Komentáře •

Další v pořadí

Automatické přehrávání

Guided Distillation for Semi-Supervised Instance Segmentation

Guided Distillation for Semi-Supervised Instance Segmentation

Improving Vision-and-Language Reasoning via Spatial Relations Modeling

Improving Vision-and-Language Reasoning via Spatial Relations Modeling

What is RAG? (Retrieval Augmented Generation)

What is RAG? (Retrieval Augmented Generation)

Backstage 🤫 tutorial #elsarca #tiktok

Backstage 🤫 tutorial #elsarca #tiktok

ROCK PAPER SCISSOR! (50 MLN CHALLENGE!) feat @PANDAGIRLOFFICIAL #shorts

ROCK PAPER SCISSOR! (50 MLN CHALLENGE!) feat @PANDAGIRLOFFICIAL #shorts

ZETOR NEBO NISSAN? 😳😳 #ukazkaru

ZETOR NEBO NISSAN? 😳😳 #ukazkaru

I CANT BELIEVE WE ARE ALMOST THERE!! What should I do???

I CANT BELIEVE WE ARE ALMOST THERE!! What should I do???

The Attention Mechanism in Large Language Models

The Attention Mechanism in Large Language Models

Miika Aittala: Elucidating the Design Space of Diffusion-Based Generative Models

Miika Aittala: Elucidating the Design Space of Diffusion-Based Generative Models

Gradient-Guided Knowledge Distillation for Object Detectors

Gradient-Guided Knowledge Distillation for Object Detectors

What's next for AI agentic workflows ft. Andrew Ng of AI Fund

What's next for AI agentic workflows ft. Andrew Ng of AI Fund

Тестим "бесплатный" GPT-4o // мини-гайд // доступ из РФ

Тестим "бесплатный" GPT-4o // мини-гайд // доступ из РФ

Yann Lecun | Objective-Driven AI: Towards AI systems that can learn, remember, reason, and plan

Yann Lecun | Objective-Driven AI: Towards AI systems that can learn, remember, reason, and plan

BPKD: Boundary Privileged Knowledge Distillation for Semantic Segmentation

BPKD: Boundary Privileged Knowledge Distillation for Semantic Segmentation

MIT 6.S191 (2023): Convolutional Neural Networks

MIT 6.S191 (2023): Convolutional Neural Networks

Real-Time User-Guided Adaptive Colorization With Vision Transformer

Real-Time User-Guided Adaptive Colorization With Vision Transformer

Samsung Crushed Apple

Samsung Crushed Apple

Vyzkoušel jsem nejlepší funkce z iOS 18 pro iPhone! 📱

Vyzkoušel jsem nejlepší funkce z iOS 18 pro iPhone! 📱

METAVERTU 2 ДЛЯ БОГАТЫХ ЛЮДЕЙ! СВЕРХ-ЗАЩИЩЁННЫЙ СМАРТФОН WEB 3.0

METAVERTU 2 ДЛЯ БОГАТЫХ ЛЮДЕЙ! СВЕРХ-ЗАЩИЩЁННЫЙ СМАРТФОН WEB 3.0

Find The Best Video Game Deals

Find The Best Video Game Deals

iPhone triky, o ktorých STE NEVEDELI!

iPhone triky, o ktorých STE NEVEDELI!

Operační systém RED STAR OS 3.0 ze Severní Koreje..

Operační systém RED STAR OS 3.0 ze Severní Koreje..

Samsung S24 Ultra professional shooting kit #shorts

Samsung S24 Ultra professional shooting kit #shorts

iPhone má zas Bližšie k Androidu (iOS 18 Beta)

iPhone má zas Bližšie k Androidu (iOS 18 Beta)