In the News
Synthetic Training Data Used to Find Galaxies in Different Stages of Formation
In this study, accepted for publication in Astrophysical Journal and available online, researchers used computer simulations of galaxy formation to create synthetic training data for a deep learning algorithm, which then proved surprisingly good at analyzing images of galaxies from the Hubble Space Telescope.
In the image shown, the top row are high resolution renderings from a state-of-the-art galaxy simulation program (VELA). The second row are the same images from row one but modified to more closely resemble what would be observed by the Hubble optical systems; these images were used to train the neural network. The bottom row are Hubble galaxy photos correctly classified using the deep learning system developed by the researchers.
“We were not expecting it to be all that successful. I’m amazed at how powerful this is,” said coauthor Joel Primack, professor emeritus of physics and a member of the Santa Cruz Institute for Particle Physics (SCIPP) at UC Santa Cruz. “We know the simulations have limitations, so we don’t want to make too strong a claim. But we don’t think this is just a lucky fluke.”
Read the entire post by author Tim Stephens here.
Synthetic Training Data Used for Retail Merchandising Audit System
In this example created by Deep Vision Data, a deep learning model based on the ResNet101 architecture was trained to classify product SKU’s, stock outs and mis-merchandised products for a retail store merchandising audit system. The model was trained with 20,000 synthetic product images using a 50-50 split of structured and unstructured domain randomized subsets and an 80-20 training/validation data split. Model validation was also completely done with 100% synthetic training data. The test set was comprised of actual photos; a sampling of labeled results images are shown to the right.
Domain randomization (DR) is a powerful tool available with synthetic data; it enables the creation of data variability that encompasses both expected and unexpected real-world input, forcing the model to focus on the data features most important to the problem understanding. DR is much more costly and difficult to implement with physical data. For example, the creation of a dataset of thousands of products where each product is shown in thousands of poses on dozens of backgrounds requires many millions of labeled images. That dataset is easily created synthetically, while virtually impossible to create using physical product photos.
Synthetic training data can be utilized for almost any machine learning application, either to augment a physical dataset or completely replace it. By effectively utilizing domain randomization the model interprets synthetic data as just part of the DR and it becomes indistinguishable from the physical information. Synthetic training data is inherently less costly, faster to create, perfectly annotated, and isn’t constrained by availability, time or even the physics of the natural world.
Our Vertical Markets
Deep Vision Data is the industry leader in synthetic training data for a range of markets