-
KAIST
- Daejeon, South Korea
- https://phillipinseoul.github.io/
- @yuseungleee
- in/yuseung-lee-6b085223a
Highlights
- Pro
Stars
[CVPR 2025] VGGT: Visual Geometry Grounded Transformer
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Official pytorch implementation of "SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering"
[ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Solve Visual Understanding with Reinforced VLMs
Official implementation of "Reangle-A-Video: 4D Video Generation as Video-to-Video Translation"
Official implementation of Inductive Moment Matching
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
😎 up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.
Simple and readable code for training and sampling from diffusion models
Official implementation of StochSync: a zero-shot approach for image generation in arbitrary spaces via stochastic diffusion synchronization. (ICLR 2025)
SPHERE - a hierarchical evaluation for spatial reasoning in vision-language models.
Wan: Open and Advanced Large-Scale Video Generative Models
PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
The Superposition of Diffusion Models Using the Itô Density Estimator
Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as embodied agents.
Official PyTorch Implementation of "History-Guided Video Diffusion"
Official implementation for Rare-to-Frequent (R2F), ICLR'25, Spotlight
A Vision-Language Model for Spatial Affordance Prediction in Robotics
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…
LeanEuclid is a benchmark for autoformalization in the domain of Euclidean geometry, targeting the proof assistant Lean.
A fork to add multimodal model training to open-r1