The 21st of January, 2018 will probably go down in history as the day on which Amazon changed the entire retail industry. On this day the first cashier-less and register-less store opened in Seattle. It’s called Amazon Go and has a great chance to reshape the retail market and take us to a future of queue-less shopping.
Amazon Go is a project the company has been working on for the last five years. The idea is brilliant in its simplicity: a store with no cashiers, no registers, and no time-consuming lines which only annoy customers. Just an app on your smartphone. This is a dream come true for anyone who hates wasting time, waiting in a queue to pay for purchases.
The idea is simple but its execution is extremely complex and requires sophisticated technology as well as out of the box thinking. How is it possible, that you can go to the grocery store, take any product you like and just leave, not bothered by anyone or anything? The answer is obvious: advanced technology.
The whole 1,800-square foot mini-market is packed with cameras and sensors. Hundreds of electronic eyes are registering every move you make, every item you pick up. They use computer vision which allows machines to “see” an object and determine what the object is. The purpose of this process is to distinguish items from people and recognize when an article has been taken off the shelf and who has taken it. Moreover, weight sensors support the camera system in this process and help to track the products. Both sensors and cameras are gathering an enormous set of data. Of course, all this data is useless if there is no serious coding behind it. And now the magic happens.
Amazon have developed unique algorithms which make use of deep learning to secure proper analysis from the data provided by the in-store camera and sensor system, recognizing customer behavior patterns and making predictions. Consequently, the Amazon Go store does not need cash register lines; the system knows what products have been taken by the customer and bills their account via the app.
According to Amazon, algorithms can distinguish who takes what, even when a shop floor is crowded, or two or more customers have a similar body build.
What’s the experience from a customer’s perspective? You just need the Amazon Go app. When you approach the store, you scan the QR code generated by the app and then start shopping. The system recognizes what you have taken, and when you are leaving, the store bills you. Simple. Brilliant. Time-saving.
OK, but how does it actually work?
Cameras, sensors, algorithms, deep learning. All of these pieces make Amazon Go tick. But how does it work? I asked Paweł Krynicki, Senior Software Developer at XSolve, who has experience in developing Machine Learning models [check his last piece about ML and content moderation], for a comment. This is his explanation of what Amazon probably did:
The main challenge
If we think of the main problem Amazon had to struggle with, the first thing that comes to mind is recognition. The system needs to recognize the customer, the product, the action performed with the product (taking it from or putting it back on the shelf) and the fact of the customer leaving the store. This way Amazon Go can adequately register articles as they are added to or removed from the shopping cart and then charge the customer’s card with the correct amount.
Despite the fact they didn’t reveal any detailed specifications for the technologies they used to build Amazon Go we can make some assumptions. We can presume that it was possible to automatically recognize and properly assign customers’ shopping by using advanced deep learning combined with computer vision algorithms and the data collected by a set of different sensors, similar to the solutions used for self-driving cars. But how exactly might this work?
Recognizing the customer
The mobile app generates a QR code; a customer scans it when they enter the store and is immediately registered. What happens next? With this information we only know that the customer is inside the store. But the system must also register when that specific customer is performing an action with a product. In order to recognize who is, for instance, picking up a product from the shelf, Amazon could have used Amazon Rekognition Video.
Once a customer enters the store and scans the QR code, the cameras take dozens of pictures from different angles. Then these images are indexed in the newly created collection (the collection could be stored to improve the tracking efficiency over time).
Assuming the footage from all the cameras is incessantly streamed to Amazon Rekognition with Amazon Kinesis when a product is collected, the system obtains the identifiers of all the persons surrounding the product. Then, it can match the person by simply finding out which face is closer to the product – the response from Amazon Rekognition together with the identifiers of all the recognized faces contains the “coordinates”.
Obviously, the actual system might be more sophisticated than that, and in a case of multiple people being near a product at the same time, the decision could be based, for example, on customer preferences, previous shopping or tracking the customer’s cell phone location by wi-fi triangulation. We should keep in mind the results predicted by machine learning models are always connected with some value of probability, so Amazon could enable hybrid methods to improve confidence in the results.
Recognizing the product
The problem of identifying a product that is either picked up or put back on a shelf is a little different. In general, there are some similarities to the process used for recognizing customers, but this time the trigger to perform the identification might be, for example, changes in product weight on a shelf. With enough number of very precise weight sensors, it might be possible even to finalize the process here, identify a product by its weight, assign the customer to the action by locating the closest one to the weight sensor.
But we can imagine situations which are less clear, such as dropping the product on the floor, exchanging it with another person, etc. and the solution should be prepared for them as well. So we can presume that computer vision algorithms are also in use here. This time there is no ready Amazon Web Services solution to recognize objects on streamed videos (although there is one for already recorded ones – Detecting Labels as a part of Amazon Rekognition – so Amazon could have simply adjusted it) so most probably Amazon Go uses some custom solution based on an implementation of a convolutional neural network algorithm for object detection.
This might be, for instance, Faster R-CNN which could be implemented using the OpenCV open source library. After training such a neural network with enough labeled images of the products available in the store, it could be possible to recognize them via streamed real-time video content.
Recognizing the actions
The next thing the Amazon Go store must be able to do is to understand what the customer does with the product. Does it add or remove the product from the customer’s virtual shopping cart? Does it charge the customer’s card with the correct amount?
Again, the simple solution would be to use the output from the weight sensors: when the weight on a shelf decreases it means a customer took a product, if it increases, the product was put back. This might work if every customer followed the ideal scenario, entering the store, taking products, and leaving, preferably without anyone else doing their shopping at the same time. But obviously, Amazon’s engineers had to devise a system for all possible scenarios, including high levels of traffic. Most likely, there’s a solution based, yet again, on machine learning algorithms under the hood.
There are many methods to detect human actions in real-time video content. In order to build an efficient solution, Amazon may have used, for instance, some implementation of Hidden Two-Stream Convolutional Networks for Action Recognition. This algorithm makes predictions on classifying human actions via video in just one step, in contrast to older methods which mostly required preprocessing the footage to obtain data about the motion first.
Recognizing when the customer leaves
Lastly, Amazon Go must be able to notice that the customer leaves the store. Assuming that cameras cover every inch of the premises, the system could easily check if the customer doesn’t appear for a certain amount of time in the dataset generated by the customer recognition solution. If so, it can assume the customer has left the store and deduct payment.
source: Amazon Go
It could be such a good business
What Amazon is trying to achieve here, is an entirely new shopping experience. However, this new approach has consequences. The lack of cash registers means that there is no need to hire cashiers. In the US alone, 3.5 million people are working a cashier job. If the new system from Amazon were to go global, their jobs would be in jeopardy.
The company says that it’s solution won’t affect the labor market so severely, that workers most certainly will be assigned to different roles and tasks and at some point it’s true. Even the Amazon Go store requires some human staff, for restocking shelves, checking IDs in the alcohol section, etc. However, it is not rocket science to predict, that the number of tasks will be limited, and there will not be as many roles for employees as in a traditional store.
Besides, the position of the cashier is already at risk from automation. According to the Guardian, due to technological progress in fields like AI and robotics, some positions won’t require humans to perform them anymore, and one of those positions is cashier – a role which has a 97% chance of automation. Amazon’s experimental store just confirms this forecast.
Amazon Go, at least for now, is just a first glimpse of the future. The automation of jobs in the retail industry won’t happen overnight. It will be years before the number of register-less shops reaches the critical mass required to impact on the labor market. Not so much a sudden strike, more a creeping evolution.
Nevertheless, it is an ongoing process, and Amazon won’t stop with just one mini-market. The company has been investing heavily in physical retail stores since 2015 when it decided to open a small chain of bookstores. In 2017, Amazon took over the American supermarket chain Whole Foods for $13.7 billion, with 473 stores in North America and the UK. If the experimental store in Seattle is a success, Amazon has no shortage of locations to implement its solution more widely.
The potential success of Amazon Go will be a spark, and if it turns out to be less costly and more profitable than a regular store, it won’t be too long before other players are picking up the ball and running with it, starting their own staff-less stores. There is already a Tao Cafe owned by Chinese e-commerce company Taobao which uses something very similar to Amazon’s solution.
The retail pie is huge. In 2016 alone, total retail sales worldwide reached USD 22.05 trillion and by the end of 2020 that figure is predicted to rise to USD 27.73 trillion. Automation will only make this pie even bigger.
The next big thing?
The launch of Amazon Go is most definitely a signal of changes to come. If the idea proves to be successful from a business point of view, Amazon has an unprecedented opportunity to change the entire retail industry and take automation, and time and cost optimization to the next level.
Still, Amazon Go does raise some questions and concerns, including its potential negative impact on the labor market, the level of personal surveillance (Amazon will harvest an enormous set of data about customer behaviors – what will that data be used for?), and the economic and social costs of implementing automatization in the retail industry on a global scale.
One thing is for sure: we are living in an era which will be shaped by the Internet of Things, artificial intelligence, and robotics. Amazon is currently at the forefront of these changes.