Social media giant Meta has introduced its latest artificial intelligence (AI) models for content editing and generation, according to a blog post on Nov. 16.
The company is introducing two AI-powered generative models. The first, Emu Video, which leverages Meta's previous Emu model, is capable of generating video clips based on text and image inputs. While the second model, Emu Edit, is focused on image manipulation, promising more precision in image editing.
The models are still in the research stage, but Meta says their initial results show potential use cases for creators, artists and animators alike.
According to Meta's blog post, the Emu Video was trained with a “factorized” approach, dividing the training process into two steps to allow the model to be responsive to different inputs:
Based on a text prompt, the same model can "animate" images. According to Meta, instead of relying on a "deep cascade of models", Emu Video only uses two diffusion models to generate 512x512 four-second long videos at 16 frames per second.
Emu Edit, focused on image manipulation, will allow users to remove or add backgrounds to images, perform color and geometry transformations, as well as local and global editing of images.
"We argue that the primary objective shouldn’t just be about producing a “believable” image. Instead, the model should focus on precisely altering only the pixels relevant to the edit request," Meta noted, claiming its model is able to precisely follow instructions:
Meta trained Emu Edit using computer vision tasks with a dataset of 10 million synthesized images, each with an input image and a description of the task, as well as the targeted output image. "We believe it’s the largest dataset of its kind to date," the
Read more on cointelegraph.com