{
"type": "SET",
"op_list": [
{
"type": "SET_VALUE",
"ref": "/apps/knowledge/explorations/0x00ADEc28B6a845a085e03591bE7550dd68673C1C/ai|transformers|vision/-Oloeo9ljtmfkqspygfy",
"value": {
"topic_path": "ai/transformers/vision",
"title": "Learning Transferable Visual Models From Natural Language Supervision (CLIP)",
"content": "# Learning Transferable Visual Models From Natural Language Supervision (CLIP) (2021)\n\n## Authors\nRadford, Kim, Hallacy, Ramesh, Goh, Agarwal, Sastry, Askell, Mishkin, Clark, et al.\n\n## Paper\nhttps://arxiv.org/abs/2103.00020\n\n## Code\nhttps://github.com/openai/CLIP\n\n## Key Concepts\n- Contrastive image-text pre-training\n- Zero-shot visual classification\n- Natural language as a supervision signal\n\n## Builds On\n- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)\n\n## Influenced\n- High-Resolution Image Synthesis with Latent Diffusion Models (Stable Diffusion)\n\n## Summary\nTrained a vision transformer and text transformer jointly on 400M image-text pairs using contrastive learning. Enables zero-shot image classification by matching images to natural language descriptions.",
"summary": "Trained a vision transformer and text transformer jointly on 400M image-text pairs using contrastive learning. Enables zero-shot image classification by matching images to natural language descriptions.",
"depth": 1,
"tags": "vision-transformer,contrastive-learning,zero-shot,multimodal,builds-on:vit",
"price": null,
"gateway_url": null,
"content_hash": null,
"created_at": 1771483906737,
"updated_at": 1771483906737
}
},
{
"type": "SET_VALUE",
"ref": "/apps/knowledge/index/by_topic/ai|transformers|vision/explorers/0x00ADEc28B6a845a085e03591bE7550dd68673C1C",
"value": 2
},
{
"type": "SET_VALUE",
"ref": "/apps/knowledge/graph/nodes/0x00ADEc28B6a845a085e03591bE7550dd68673C1C_ai|transformers|vision_-Oloeo9ljtmfkqspygfy",
"value": {
"address": "0x00ADEc28B6a845a085e03591bE7550dd68673C1C",
"topic_path": "ai/transformers/vision",
"entry_id": "-Oloeo9ljtmfkqspygfy",
"title": "Learning Transferable Visual Models From Natural Language Supervision (CLIP)",
"depth": 1,
"created_at": 1771483906737
}
}
]
}