Image labeling at scale: Disruption and innovation in times of COVID-19
Author: Martin Etchart, Senior Computer Vision Scientist at Ulta Beauty
As we faced the challenge of working from home amid a pandemic, while Ulta Beauty temporarily became an e-commerce only business we recognized an opportunity: boost our AI experiences by accelerating a cross-team image labeling project at-scale. Being prepared, quickly prioritizing, and embracing failure as a possible outcome is a part of what we do with Digital Innovation at Ulta Beauty.
For 8+ weeks, we had an opportunity to bring the expertise of 60+ beauty associates from the Ulta Beauty stores into the domain of image labeling for machine learning. This internal, international cross-functional collaboration project was named the #TagSquad.
We collected more than 1M expert labels from 300K unique face images. Throughout the process, we consolidated and tested our in-house image labeling workflow and tool, CVAT, the Open Source project that helped make this possible.
What is image labeling, and why?
Within Digital Innovation, we develop technology to provide personalized experiences for our guests, such as Foundation Shade Matcher and Skin Advisor in the Ulta Beauty app. Our team boasts strong backgrounds across computer vision & machine learning, handling research, training, and deploying of deep learning models to bring personalized AI experiences to life. Labeled image datasets are critical for training supervised and semi-supervised deep learning models, to perform a specific, human-like task.
Let’s talk about labeling! Image labeling/annotation/tagging are synonymous terms. Some of the most usual image labeling tasks are full image attribute labeling — being presented with an image, and selecting an attribute from a set. For example: is the image of a cat or a dog? Other forms may consist of drawing a box or a closed shape in the image and selecting an attribute for the enclosed object. Some labeling tasks may be more subjective than others, requiring one image to be labeled several times by multiple labelers to reach a consensus. Once the labeling process is complete for all images, you have a labeled dataset for your AI algorithm to learn from.
Leveraging beauty expertise
With the #TagSquad in place we decided to focus our efforts on subjective, full-image labeling tasks for facial attributes. While full image labeling is technically the simplest form of image labeling, it allowed us to minimize training times and maximize the use of existing beauty knowledge from our subject matter experts.
Labeling tasks consisted of an image pre-filtering stage, which was a labeling task itself, with attributes like light intensity, light direction, and sharpness annotated once for each image. The attributes enabled a more efficient labeling process for beauty-oriented labeling, such as eye characteristics, skin complexion, and skin redness. These tasks are far more subjective, so up to 6 different experts labeled each image to get an accurate consensus label.
Scaling up (fast!)
Quickly changing our Computer Vision team’s focus to plan and manage this project was a challenge. In less than a week, we scaled our labeling operation 20 times, setting up tools, providing secure access, preparing ongoing labeling tasks, training materials, and training sessions. For the following 8+ weeks, our team successfully executed the project in an ongoing effort to improve and adapt to the new circumstances.
We split the 60+ associates into four teams of 15 with a leader in place to help manage and track progress. The labeling schedule allowed for support during working hours and daily progress reports. Beauty experts were performing tasks they had never done before, moving out of their comfort zones, adapting to new work environments and circumstances, and amazing us with the quality and pace of labels collected. Team leaders were vital to effectively manage and track progress, providing an essential human touch, closing the gap between beauty and technology, maintaining a constant feedback loop, reporting roadblocks, progress, and managing schedules.
Setting up the tools
A fundamental component of this project was the labeling tool itself. We utilized our in-house tool of choice, CVAT: a free, online, interactive video and image annotation tool for computer vision maintained by the OpenVINO Project as an open-source project.
We quickly and securely set up the tool. To make this possible at scale, we needed to develop additional functionality around CVTA’s existing REST API and Python CLI. The extra features allowed us to programmatically create users and tasks, pre-load labels, track progress, dump labels, analyze quality, and detect patterns. We also developed a task tool that allowed us to assign new tasks at the start of each day automatically in addition to creating a means for labelers to request a new task on their own. Daily reports were generated for team leaders and upper management with dashboards to visualize progress at given checkpoints to ensure consistency and align objectives.
As a byproduct of our efforts, we are working to improve our data models, polish our in-house labeling and training workflow, and give back to the Open Source community by contributing with code improvements for the CVAT project.
The outcome for Ulta Beauty:
- Collected more than 1M expert labels from a total of 300K unique face images.
- Efficiently repurposed the in-store associate workforce and subject matter expertise within Ulta Beauty.
- Generated a unique partnership between beauty and technology, strengthening internal collaboration.
- Consolidated an in-house labeling tool and workflow.
This project was successful thanks to the collaboration of many. Thanks to the more than 60 associates labeling who deserve a special shout-out given their pace, quality, and collaboration. A special thanks to the team that pushed themselves out of the comfort zones from the AR Innovation team that placed the labeling tool, scripting, training materials, and provided daily support and extensive reporting.