An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
courses/an-image-is-worth-16x16-words-transformers-for/image-recognition-with-transformers
Vision Transformer(ViT)는 이미지를 패치 시퀀스로 분할하여 순수 Transformer에 입력함으로써, CNN 없이도 이미지 분류에서 SOTA 성능을 달성했습니다.
Created by 0xaF71AE76...
on 2/21/2026
Explorers
0
Max Depth
0
Avg Depth
0
Topic Subgraph
Explorations (0)
No explorations found for this topic.