Amazon Launches Six New AI Models for Text, Video and Imagery

Amazon CEO takes to stage to announce six new AI models - the Nova family - with a wide rage of abilities.

Written by

Published on December 4, 2024

Six new AI models are incoming from Amazon, including text, video and image generating offerings.

Amazon Web Services (AWS) unveiled the Nova family at its re:Invent conference with CEO, Andy Jassy, making the announcement on stage.

Only last week, Amazon revealed that it has made a second $4 billion investment in AI startup, Anthropic, as it incorporates AI into every aspect of its business from Audible to shopping.

Text-Generation Options

There are four text-generation models in the Nova family and three of them – Micro, Lite and Pro- are available from today for AWS customers in AWS Bedrock, Amazon’s AI development platform. The fourth model – Premier – will arrive in early 2025.

All of the models are optimized for 15 languages (but primarily English), explains Amazon. What differentiates them from each other is the size and capabilities. Let’s break it down.

This just in! View
the top business tech deals for 2026 👨‍💻

Micro is the smallest offering and has a 128,000-token context window. This equates to a processing ability of up to around 100,000 words. It takes in and outputs text but is the fastest option in the Nova text-generating group as it has the lowest latency.

Next up is Lite, which can analyze text, images, and video. Like Pro, it has a 300,000-token context windows. This means it can process around 225,000 words, 15,000 lines of computer code or 30 minutes of footage.

Pro is faster but Premier is designed to be used to build custom models upon. This means it is best suited to deal with complex workloads.

“We’ve optimized these models to work with proprietary systems and APIs, so that you can do multiple orchestrated automatic steps — agent behavior — much more easily with these models,” said Jassy, adding. “So I think these are very compelling.”

Amazon is already promising upgrades and says that early next year, the content windows in some of the Nova models will expand to more than two million tokens.

Image and Video Generation

There are two options here whose names explain what they do – Canvas and Reel. Canvas will let users generate but also edit images. It offers a range of options to change color schemes but also layouts.

Reel will allow users to create videos up to six seconds in length but a version for creating two-minute videos is incoming. Videos are creating either from entering a prompt or reference images. As Amazon showed in a video, tools include changing the camera motion with pans, 360-degree rotations, and zoom.

Responsible AI at Amazon

Jassy emphasized that both of these tools have “built-in” controls for responsible use. “[We’re trying] to limit the generation of harmful content,” he said. Tools include content moderation abilities and a watermarking option.

In a blog post, Amazon stated that this new family of AI models “extends [its] safety measures to combat the spread of misinformation, child sexual abuse material, and chemical, biological, radiological, or nuclear risks.”

More To Come from Amazon AI

As well as the updates to the six current Nova models, Amazon is promising a speech-to-speech model and a native multimodal-to-multimodal—or “any-to-any” modality model. Both are set to be released early next year.

The speech-to-speech model will “…understand streaming speech input in natural language, interpreting verbal and nonverbal cues (like tone and cadence), and delivering natural humanlike interactions,” shares Amazon.

Whereas the any-to-any model will be able to process text, images, audio, and video, but as “both input and output”. This means it can be used as the baseline to develop applications for lots of different tasks including “translating content from one modality to another, editing content, and powering AI agents that can understand and generate all modalities”, the company explains. “This is the future of how frontier models are going to be built and consumed,” added Jassy.