Stars
Get started with native image generation and editing using Gemini 2.0 and Next.js
Standing on the Giants: Informative Messenger Prompts with Self-adapter for Image Restoration
Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"
Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’
Official implementation of Unified Reward Model for Multimodal Understanding and Generation.
[CVPR 2025] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
The Next Step Forward in Multimodal LLM Alignment
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
The official code of "Weak-to-Strong Diffusion with Reflection".
Investigating CoT Reasoning in Autoregressive Image Generation
[AAAI‘ 2025 ] "AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement".
Evaluating text-to-image/video/3D models with VQAScore
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
The code of our work "Golden Noise for Diffusion Models: A Learning Framework".
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
CAR: Controllable AutoRegressive Modeling for Visual Generation
Illumination Drawing Tools for Text-to-Image Diffusion Models
SEED-Voken: A Series of Powerful Visual Tokenizers
Liquid: Language Models are Scalable and Unified Multi-modal Generators
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
[ NeurIPS 2024 D&B Track ] Implementation for "FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models"
The code and models for the paper: Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Official Repo for Paper "AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea"