@hakunamatata1997 on Hugging Face: "I'm working on talking head generation that takes audio and video as input…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

hakunamatata1997

posted an update May 29

Post

1509

I'm working on talking head generation that takes audio and video as input, can someone suggest me a good existing architecture that can generate videos with less latency or can we make it in real time?

Jaward

May 29

•

edited May 29

I think most existing OSS talking head archs only take audio and image as input, you can checkout sadtalker (https://sadtalker.github.io/) it takes in audio and image as inputs. As for streaming you'll have to do that via api with websocket, checkout D-ID's stream api: https://docs.d-id.com/reference/createstream

hakunamatata1997

May 29

Tried sadtalker , too much time consumption. D-ID is proprietary . Looking something from opensource. Tried wav2lip and also enhancing that with GFPGAN , output is good but i want something fast.

In this post

hakunamatata1997 Akhil B
Jaward Jaward Sesay
umair894 Muhammad Umair