Artificial Intelligence has been a hot word across all industries lately. Think all the fuss around self-driving cars, Google’s updated Assistant and the general talks of how conversational interfaces are the future of tech.
Around 54 percent of retailers already use or plan to add artificial intelligence technology to their toolkit, with 20 percent planning to introduce some AI within the next 12 months, according to the latest report from SLI Systems.
The increased adoption of AI in retail can be specifically attributed to advances in the deep learning.
What is Deep Learning?
Deep learning is a specific machine learning approach to building and training neural networks. A neural network, in turn, is a system of hardware and/or software that is styled after the operations of neurons in the human brain.
In a nutshell, deep learning assumes that you will first “feed” your network with enough samples (data) so that it could make decisions about similar data based on what it “knows” so far.
For instance, a deep learning network can be used to differentiate different types of product – a dress and a pair of trousers. It could use differences in the light and dark areas of an image to establish how a dress looks. That’s the initial step. Next, it will take into account other factors such as the shape, angles, colors etc., to precisely understand how a dress look and learn to identify it out of all other garments.
Why Deep Learning Isn’t Mainstream in Retail Yet
However, the process of training a network isn’t that simple. All the steps described above run inside a black box and are controlled by the network’s memory – arrays of numbers (weights) that indicate how certain inputs are combined and recombined to obtain the result.
Dealing with an example like dress recognition requires creating larger arrays that can contain over 60 million numbers. Working with such larger data is complicated.
Next, there’s the problem of data samples. If we are talking about image recognition, not all training samples can be used efficiently. For instance, scientists from the University of Washington have identified a curious mistake neural networks make – when the data includes undesirable correlations that the network picks up during training, the results will be highly biased.
They’ve tried to teach the network to distinguish between photos of Wolves and Huskies. Intentionally, all pictures of wolves had snow in the background, whereas huskies photos did not. Next, they have used the first max-pooling layer of Google’s pre-trained Inception neural network-and the network said that all the pictures with snow or light background at the bottom are pictures of wolves. Even if that was not the case.
Now, let’s get back to the retail space. In order to train a deep learning algorithm to recognize between different goods, you will need between 1000-5000 images of the object in question (one model of a dress). Each object’s photo should be precisely enclosed within a bounding box and accurately labeled with data bearing the name and attributes of the object.
The costs of obtaining such images manually can go exorbitant. For instance, a retail catalog of some 170,000 goods will require 1 billion labeled images. If you hire someone to do that manually, for instance via Amazon Mechanical Turk, the total cost would be around $240 million or $0.20 per image.
Scientists from Neuromation have come up with an interesting solution to this problem. Instead of collecting and labeling datasets, business could use synthetic data samples generated from them. Specifically, the team generates a 3D replica of a supermarket shelf, containing the exact digital copies of goods (already labeled with 100 percent accuracy). Next, millions of images can be generated with a combination of these objects – in different lighting conditions, shooting angles and so on.
The Early Use Cases of Deep Learning in Retail
Susana Zoghbi from KU Leuven has developed a deep learning technique that offers better image recognition results for the fashion e-commerce retailers.
In her recent paper, Susana describes a familiar problem both for consumers and business owners – often product descriptions do not feature all the garment’s attributes, as they are visible in the accompanying picture. So when you look for a “blue cotton shirt with golden buttons”, you receive no search results even if such product is available on the website.
Susan has worked on developing a cross-modal search tool that will tackle this issue in two ways:
- When given an image, it will provide suitable textual descriptions;
- When a text search occurs, the tool will retrieve images matching the provided textual characteristics.
The best part is this – a neural network will power the entire process. Specifically, the network should be capable to answer questions as “What does a V-neck look like?” or “How this shape of the skirt is called?”.
Another curious project Susana has worked on was teaching a neural network to link textual Pinterest pins with different textual descriptions to relevant products on ecommerce websites, for instance, Amazon. As a result, she managed to match the same/similar looking products on Pinterest (with different descriptions and no links to the original sources of purchase) to same/similar looking products on Amazon.
This technology could be used to build advanced personalized recommendation engines and revolutionize the way people discover goods online.
Read more about artificial intelligence and retail at TechCo