Developers Get Peace of Mind with Dataset from Getty Images

Enterprise developers will soon to be able to access a sample of images from Getty Images via the Hugging Face hub.

Enterprise developers will soon to be able to access a sample of images from Getty Images via the Hugging Face hub. The initial dataset includes 3,750 images from 15 categories, including healthcare, nature and travel.

The announcement comes at a time when developers are working to up public confidence in AI-generated content by tackling AI hallucinations.

Responsible Sourcing of Training Sets

Getty Images holds and licenses more than 572 million “visual assets” and more than 200 million of these are made available for licensing either for free or with a paid subscription.

These images — which include some of the earliest photographs taken — have been vetted legally and are of commercial quality.

 

About Tech.co Video Thumbnail Showing Lead Writer Conor Cawley Smiling Next to Tech.co LogoThis just in! View
the top business tech deals for 2024 👨‍💻
See the list button

In an announcement, the Hugging face team explains that this open dataset can “enhance the performance of your machine learning and AI models.” It adds that each image has an average of 50 keywords as well as human-inputted captions.

A Better Deal for Creators

It also promises that the content is “commercially safe” and that the creators will be compensated for its use. This will be a welcome statement for creators after several publicized spats between artists and AI ventures over the widespread mining of copyrighted works.

The company clarified that this deal is about exactly getting a fair deal for creators and delivering high quality training datasets to be used confidently.

“Imagine building or enhancing your AI/ML capabilities with data that’s not only diverse and high quality but also comes with the peace of mind that it’s responsibly sourced. That’s what we’re bringing to the table.” – Andrea Gagliano, head of data science and AI/ML at Getty Images told VentureBeat

Changing How Developers Get Data

She stated that her hope that the move might make AI companies move to using officially licensed content as a standard practice, which would mitigate any wrangles about copyright. It would also made AI technology far more reliable as this data is high-quality, legally sound and vetted.

From a developer’s point of view, this will mean far less time spent deleting low-quality data, filtering out copyrighted content, and filling in the blanks when metadata is missing.

The dataset is open and free to use but developers will have to abide by some rules relating to the redistribution of the dataset and creation products or services that would directly compete with Getty Images.

Gagliano stated her hope that the deal could have far reaching implications. “Our goal is to show that it is possible to accommodate licensing for all the content required to train functional AI models – developing business models that enable the creation of high-quality AI models while respecting creator IP” she said.

Did you find this article helpful? Click on one of the following buttons
We're so happy you liked! Get more delivered to your inbox just like it.

We're sorry this article didn't help you today – we welcome feedback, so if there's any way you feel we could improve our content, please email us at contact@tech.co

Written by:
Katie has been a journalist for more than twenty years. At 18 years old, she started her career at the world's oldest photography magazine before joining the launch team at Wired magazine as News Editor. After a spell in Hong Kong writing for Cathay Pacific's inflight magazine about the Asian startup scene, she is now back in the UK. Writing from Sussex, she covers everything from nature restoration to data science for a beautiful array of magazines and websites.
Explore More See all news
Back to top
close Building a Website? We've tested and rated Wix as the best website builder you can choose – try it yourself for free Try Wix today