A Library for Large Vision-Language Models
Most Used Tags
GLaMM is a cutting-edge multimodal model for visual grounding and grounded conversation generation.