Illinois team wins first prize in OpenCV AI competition
Creating a dataset of labeled images manually is costly and requires a lot of effort. Motivated to reduce both of these factors, a team of four students from the University of Illinois Urbana-Champaign developed a solution, which automates the process of generating and annotating data for training deep learning-based computer vision models.
The team won first prize in the Core Track of the 2022 OpenCV AI Competition this year. Their software called COCOpen follows the image labeling approach introduced in the Microsoft "Common Objects in Context" dataset.
The software they created produces image data used to train models for identifying and outlining specific objects in a scene which could contain multiple objects of the same category.
In the example use-case in their code repository, they generate images which contain multiple objects of the wire and ethernet device categories. These synthetic images can be used to train a deep learning model to detect these categories of objects on new images which the model has never seen before.
Automated creation and labeling of these training images significantly reduce the time and expense associated with this process. The code can be used in a variety of applications like manufacturing, logistics, autonomous driving, and domestic services.
Holly Dinkel, a Ph.D. student in the Department of Aerospace Engineering at UIUC, explained that COCOpen works by taking simple, unlabeled images of single objects against a black background.
The software uses OpenCV to create masks for these individual objects based on their color. It then combines multiple object images into a single image using the copy-paste method of data augmentation. Additionally, OpenCV is used to apply enhancements including randomizing an object's orientation or altering its color.
The data generated by the COCOpen library is validated by training a Detectron2 Mask R-CNN model to detect Ethernet wires and networking devices for a robotic manipulation application.
Yash Rathod, a junior in Department of Computer Science, said his vision for COCOpen was to take research from a lab and build a user-friendly data generation experience for machine learning practitioners.
“The idea was to build a pipeline where we pull thousands of images from the cloud, preprocess them and apply the data generation techniques studied in the lab, to produce COCO-formatted data ready for training computer vision models,” he said.
Rathod used his semester-long experience with the Promoting Undergraduate Research in Engineering Program at UIUC to develop and test software for interfacing with cloud data storage resources—originally Microsoft Azure, then Box.
“Automated data generation means users can simply clone a code repository and follow minimal installation and run instructions. We want to save users time and valuable compute resources by leveraging the cloud,” Rathod said.
Harry Zhao, who graduated this past May with a B.S. in aerospace engineering, highlighted COCOpen’s ability to solve real-world computer vision problems using OpenCV with applications to many disciplines. Among some of the other 45 entries in their category were solutions for medical, environmental, and construction challenges.
“Creating the original Microsoft COCO dataset required 55,000 total worker hours, not all by one person of course,” Zhao said. “But there can be a lot of inconsistencies. Some labels may be inaccurate and have to be rejected or refined which wastes even more time. COCOpen puts data into a format people can use to automatically generate labels in images.”
Zhao said COCOpen is inspired by code and data that he and Dinkel created two years ago during his internship with the Illinois Space Grant Consortium Undergraduate Research Opportunity Program.
About the complexity of labeling, Zhao said, “If we only cared about detecting or classifying wires we would just say, this is a wire, and this is not a wire. It's zero or one. Binary. Semantic segmentation is when you know what the pixels represent.
“Say you had two wires, and you cared to distinguish between both, because, say, we wanted a robot to pick up the blue wire,” Zhao said. “Using the simplest semantic segmentation, we would use instance segmentation which considers multiple instances of an object. In a good instance segmentation algorithm, there are no specific number of objects. You could have many wires. You don't have to specify.”
Jingyi Xiang, a senior in the Department of Electrical and Computer Engineering, began studying automatic data generation during her experience with the Undergraduate Research Apprenticeship Program. Building on Zhao's work, Xiang implemented copy-paste data augmentation, a core feature of COCOpen.
"During my first two weeks of research, I spent a total of 16 hours hand-labeling images we previously collected,” Xiang said. “Labeling one image took me about 10 minutes on average. Some cluttered ones took up to an hour per image. The data augmentation techniques in COCOpen allowed us to scale our dataset and drastically reduce human labor time."
Xiang also said that Dinkel and Rathod did a great job ensuring the COCOpen library is as user-friendly as possible. "I learned a lot from them during this experience. In the future, I will try to match the high quality of COCOpen when open sourcing my own research work."
Dinkel said the success of the project hinged on the incredible effort of every member of the team.
“Although COCOpen as a product came together in the course of a few weeks, it represents two years of effort researching problems in computer vision,” Dinkel said. “This project would not have been possible without each member’s commitment to the project and to developing habits of achievement. Yash, Jingyi, and Harry are each audacious in their own way. This project was successful because they each adopted an attitude of ‘trying things,’ of jumping into the sand box and building something from nothing.”
Watch a video demonstration at: https://www.youtube.com/watch?v=H16CpeIdEHY
The code is available at: https://rmdlo.github.io/COCOpen-OpenCV/
The Illinois team, which they nicknamed COCONuts, was advised by AE’s Tim Bretl and by NASA’s Brian Coltin and Trey Smith. All team members are part of the UIUC/NASA Representing and Manipulating Deformable Linear Objects project (https://github.com/RMDLO).
The research effort was supported by the NASA Space Technology Graduate Research Opportunity award 80NSSC21K1292, the U.S. Department of Education Graduate Assistance in Areas of National award P200A180050-19, and the Coordinated Science Laboratory at UIUC.