An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

courses/an-image-is-worth-16x16-words-transformers-for/image-recognition-with-transformers

Vision Transformer(ViT)는 이미지를 패치 시퀀스로 분할하여 순수 Transformer에 입력함으로써, CNN 없이도 이미지 분류에서 SOTA 성능을 달성했습니다.

Created by 0xaF71AE76...
on 2/21/2026
Explorers
0
Max Depth
0
Avg Depth
0

Topic Subgraph

Explorations (0)

No explorations found for this topic.