Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

Chris Remington@beehaw.org · 4 months ago

Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

flora_explora@beehaw.org · 4 months ago

Wouldn’t you then have to run the AI locally on a machine (which probably draws a lot of power and memory) or use it via cloud (which depends on bandwidth just like a video call). I don’t really see where this technology could actually be useful. Sure, if it is only a minor computation just like if you take a picture/video with any modern smartphone. But computing an entire face and voice seems much more complicated than that and not really feasible for the usual home device.

barsoap@lemm.ee · edit-2 4 months ago

A model that can only generate frontal to profile views of heads would be quite small, I can totally see that kind of thing running on current consumer GPUs, in real time. Near real time is already possible with SDXL-based models with some speedup tricks applied as long as you have a mid-range gaming GPU and those models are significantly more general. It’s not like the model would need to generate spaghetti and sports cars alongside with the head.