Awesome Unified Multimodal Models
-
Updated
Mar 24, 2026
Awesome Unified Multimodal Models
A curated list of foundation models for vision and language tasks
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
[CVPR 2026] Scaling Spatial Intelligence with Multimodal Foundation Models
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
A curated list of Awesome Personalized Large Multimodal Models resources
Video Search with CLIP
The official implementation of the paper "Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts" (ICLR 2026).
The official implementation of the paper "Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity".
[CVPR2026] ConsistCompose: Unified Multimodal Layout Control for Image Composition
Implementation of the paper "Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning", arXiv, 2025
Multimodal Bi-Transformers (MMBT) in Biomedical Text/Image Classification
NanoOWL Detection System enables real-time open-vocabulary object detection in ROS 2 using a TensorRT-optimized OWL-ViT model. Describe objects in natural language and detect them instantly on panoramic images. Optimized for NVIDIA GPUs with .engine acceleration.
RAPID: A Reproducible Multi-Agent Pipeline for Interpretable Disaster Damage Assessment from Satellite and Street-View Imagery
Model Mondays is a weekly livestreamed series on Microsoft Reactor that helps you make informed model choice decisions with timely updates and model deep-dives. Watch live for the content. Join Discord for the discussions.
camroll • an AI assistant for your personal camera roll 🎞️ • Personal VQA
General-purpose vs. domain-specific models for diabetic retinopathy — Diagnostics 2026
Text-dominant reasoning failure in multimodal LLMs — JAAD
Add a description, image, and links to the multimodal-models topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-models topic, visit your repo's landing page and select "manage topics."