VLMEvalKit Evaluation Results Collection
Generate Talking avatars from Text-to-Speech
Generate realistic talking heads from image+audio