Microsoft introduced a new model VASA-1

Microsoft introduced a new model VASA-1 . It allows you to generate realistic videos from a single photo or image of a person. However, Microsoft does not intend to release a product or API with the model and says it will be used to create virtual characters.

Tech Apr 21, 2024 0 216 Add to Reading List

Microsoft has unveiled a new artificial intelligence model that can create realistic videos from photos of people. VASA-1 can generate video from a single photo and a speech audio track.

In a post on the Research Announcement page, Microsoft detailed how its model works and highlighted its capabilities. The company claims that the VASA-1 can generate 512 x 512p video at up to 40 frames per second. It is also reported that VASA-1 supports online video generation with a slight startup delay.

The company says the videos produced will feature synchronized lip movements, facial expressions and head movements to make everything look natural.

VASA-1 also gives the user detailed control over various aspects of the video, such as primary gaze direction, head distance, emotion offset, and others. These attribution controls can help you fine-tune the result based on textual cues.

In addition, the AI model can also create videos using artistic photographs, audio and non-English speech. Microsoft researchers note that these features were missing from its data, hinting at self-learning capabilities.

The realistic generation of videos of real people using artificial intelligence models is impressive, but also raises questions about possible unethical use, especially for creating deepfakes. Based on this, Microsoft has no plans to release a product or API for the VASA-1 model and states that it will only be used to create virtual characters.