MetaFormer is Actually What You Need for Vision

Gradient-Guided Knowledge Distillation for Object Detectors

#Type of Transactions Test sum

Kuběnka vs Denny! Veselý je moc a ex od Vlčka vrací úder

Using Ants To Stitch Wounds 🐜

Turek: Mám doma i dýku SS, jsem sběratel artefaktů. Není to normální, nechápala Nerudová

MetaSeg: MetaFormer-Based Global Contexts-Aware Network for Efficient Semantic Segmentation

ComputerVisionFoundation Videos

zhlédnutí 168

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 28. 01. 2024
Authors: Beoungwoo Kang; Seunghun Moon; Yubin Cho; Hyunwoo Yu; Suk-Ju Kang
Description: Beyond the Transformer, it is important to explore how to exploit the capacity of the MetaFormer, an architecture that is fundamental to the performance improvements of the Transformer. Previous studies have exploited it only for the backbone network. Unlike previous studies, we explore the capacity of the Metaformer architecture more extensively in the semantic segmentation task. We propose a powerful semantic segmentation network, MetaSeg, which leverages the Metaformer architecture from the backbone to the decoder. Our MetaSeg shows that the MetaFormer architecture plays a significant role in capturing the useful contexts for the decoder as well as for the backbone. In addition, recent segmentation methods have shown that using a CNN-based backbone for extracting the spatial information and a decoder for extracting the global information is more effective than using a transformer-based backbone with a CNN-based decoder. This motivates us to adopt the CNN-based backbone using the MetaFormer block and design our MetaFormer-based decoder, which consists of a novel self-attention module to capture the global contexts. To consider both the global contexts extraction and the computational efficiency of the self-attention for semantic segmentation, we propose a Channel Reduction Attention (CRA) module that reduces the channel dimension of the query and key into the one dimension. In this way, our proposed MetaSeg outperforms the previous state-of-the-art methods with more efficient computational costs on popular semantic segmentation and a medical image segmentation benchmark, including ADE20K, Cityscapes, COCO-stuff, and Synapse.
Věda a technologie

Komentáře •

Další v pořadí

Automatické přehrávání

MetaFormer is Actually What You Need for Vision

MetaFormer is Actually What You Need for Vision

Gradient-Guided Knowledge Distillation for Object Detectors

Gradient-Guided Knowledge Distillation for Object Detectors

#Type of Transactions Test sum

#Type of Transactions Test sum

Kuběnka vs Denny! Veselý je moc a ex od Vlčka vrací úder

Kuběnka vs Denny! Veselý je moc a ex od Vlčka vrací úder

Using Ants To Stitch Wounds 🐜

Using Ants To Stitch Wounds 🐜

Turek: Mám doma i dýku SS, jsem sběratel artefaktů. Není to normální, nechápala Nerudová

Turek: Mám doma i dýku SS, jsem sběratel artefaktů. Není to normální, nechápala Nerudová

World’s Deadliest Obstacle Course!

World’s Deadliest Obstacle Course!

Graph Neural Networks for End-to-End Information Extraction From Handwritten Documents

Graph Neural Networks for End-to-End Information Extraction From Handwritten Documents

Small Objects Matters in Weakly-Supervised Semantic Segmentation

Small Objects Matters in Weakly-Supervised Semantic Segmentation

Diffusion Models for Inverse Problems

Diffusion Models for Inverse Problems

Panelformer: Sewing Pattern Reconstruction From 2D Garment Images

Panelformer: Sewing Pattern Reconstruction From 2D Garment Images

Guided Distillation for Semi-Supervised Instance Segmentation

Guided Distillation for Semi-Supervised Instance Segmentation

Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis

Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis

Robust Perception with Vision Transformer SegFormer

Robust Perception with Vision Transformer SegFormer

Beyond Fusion: Modality Hallucination-Based Multispectral Fusion for Pedestrian Detection

Beyond Fusion: Modality Hallucination-Based Multispectral Fusion for Pedestrian Detection

Let’s Observe Them Over Time: An Improved Pedestrian Attribute Recognition Approach

Let’s Observe Them Over Time: An Improved Pedestrian Attribute Recognition Approach

How To Unlock Your iphone With Your Voice

How To Unlock Your iphone With Your Voice

iPhone má zas Bližšie k Androidu (iOS 18 Beta)

iPhone má zas Bližšie k Androidu (iOS 18 Beta)

My DREAM Everyday Tech!

My DREAM Everyday Tech!

Operační systém RED STAR OS 3.0 ze Severní Koreje..

Operační systém RED STAR OS 3.0 ze Severní Koreje..

Samsung galaxy S24ultra titanium green 💚, Oppo find N flip 3 Display quality 😱🤯 Digital #shorts

Samsung galaxy S24ultra titanium green 💚, Oppo find N flip 3 Display quality 😱🤯 Digital #shorts

iPhone 12 socket cleaning #fixit

iPhone 12 socket cleaning #fixit

Mi primera placa con dios

Mi primera placa con dios