Synthetic Training Data Infographic

Synthetic Training Data

Deep Vision Data® specializes in the creation of synthetic training data for supervised and unsupervised training of machine learning systems such as deep neural networks, and also the development of XR environments as reinforcement and imitation learning platforms. Lack of machine learning datasets is often cited as the major development obstacle for deep learning systems, and creating and labeling sufficient data from physical testing and other non-algorithmic methods such as photography can be extremely time consuming or impossible. The problem is further compounded when the product or process being studied is currently under development and no physical data exist. Synthetic training data also mitigates privacy concerns associated with medical data and other private information. Learn more about synthetic data at Wikipedia.

Our synthetic training data are created using a variety of proprietary methods, can be multi-class, and developed for both regression and classification problems. Labeling and semantic segmentation are automatic and 100% accurate. Our machine learning datasets are provided using a database and labeling schema designed for your requirements. Contact us to discuss your particular machine learning data needs.

Synthetic Training Data Mobile Infographic

In the News

Synthetic Training Data Used to Find Galaxies in Different Stages of Formation

In this study, accepted for publication in Astrophysical Journal and available online, researchers used computer simulations of galaxy formation to create synthetic training data for a deep learning algorithm, which then proved surprisingly good at analyzing images of galaxies from the Hubble Space Telescope.

In the image shown, the top row are high resolution renderings from a state-of-the-art galaxy simulation program (VELA). The second row are the same images from row one but modified to more closely resemble what would be observed by the Hubble optical systems; these images were used to train the neural network. The bottom row are Hubble galaxy photos correctly classified using the deep learning system developed by the researchers.

“We were not expecting it to be all that successful. I’m amazed at how powerful this is,” said coauthor Joel Primack, professor emeritus of physics and a member of the Santa Cruz Institute for Particle Physics (SCIPP) at UC Santa Cruz. “We know the simulations have limitations, so we don’t want to make too strong a claim. But we don’t think this is just a lucky fluke.”

Read the entire post by author Tim Stephens here.

Synthetic Training Data - Galaxy Formation

Synthetic Training Data Used for Retail Merchandising Audit System

In this example created by Deep Vision Data, a deep learning model based on the ResNet101 architecture was trained to classify product SKU’s, stock outs and mis-merchandised products for a retail store merchandising audit system. The model was trained with 20,000 synthetic product images using a 50-50 split of structured and unstructured domain randomized subsets and an 80-20 training/validation data split. Model validation was also completely done with 100% synthetic training data. The test set was comprised of actual photos; a sampling of labeled results images are shown to the right.

Domain randomization (DR) is a powerful tool available with synthetic data; it enables the creation of data variability that encompasses both expected and unexpected real-world input, forcing the model to focus on the data features most important to the problem understanding. DR is much more costly and difficult to implement with physical data. For example, the creation of a dataset of thousands of products where each product is shown in thousands of poses on dozens of backgrounds requires many millions of labeled images. That dataset is easily created synthetically, while virtually impossible to create using physical product photos.

Synthetic training data can be utilized for almost any machine learning application, either to augment a physical dataset or completely replace it. By effectively utilizing domain randomization the model interprets synthetic data as just part of the DR and it becomes indistinguishable from the physical information. Synthetic training data is inherently less costly, faster to create, perfectly annotated, and isn’t constrained by availability, time or even the physics of the natural world.

Synthetic Training Data - Merchandising Audit System
Deep Vision Data - Synthetic Product Images

Product Images

Synthetic product images are 100% virtual and can include variability in pose, lighting, material finish and many other factors. Portions of the images can be occluded to simulate handling or other situations, and the renderings can be photorealistic, grey-scale or silhouette. Need a million images in a few days? No problem!

Deep Vision Data - Hybrid Synthetic-Physical Product Images

Hybrid Images

Need thousands of training images of your product presented in a specific physical environment, and you need them fast? Mobile devices running augmented reality (AR) apps can be utilized to quickly create hybrid images of virtual objects in physical environments. Useful for when the environment contains important class information or for domain randomization.

Deep Vision Data - Synthetic Environment Images

People and Environments

Synthetic environment images can be interior or exterior and include people, plants/trees and any inanimate object.  Used to train systems to identify safety, compliance, stocking, manufacturing process deviations, traffic incidents, human behavior and many other functions. Scenes are database-driven to enable rapid configuration variation.

Deep Vision Data - Real-Time Simulators

Real-Time XR Environments

Real-time simulators utilize game engines and custom software to replicate physical products, systems and processes. These Digital Twins are implemented in Extended Reality (XR) environments to create fully immersive training experiences. Multiple simultaneous instances of the real-time simulation environment can be utilized to rapidly speed up the training process.

Deep Vision Data - Modeling and Simulation

Modeling and Simulation

Modeling and simulation is used to explore product life cycle, performance, etc. due to variation in component dimensions, production quality, process parameters and other real-world variances. This data can be used to create intelligent systems that guide users to design products or systems which have optimized cost, performance or weight goals.

Deep Vision Data - Design Variation

Design Variation

Computer-aided design (CAD) tools are used to algorithmically vary component dimensions or assembly parameters. The resulting data can be utilized to train intelligent production vision systems to identify manufacturing or quality issues that are too complex for traditional in-line processes. Coupled with industrial 3D scanning, machine learning creates the next evolution of part inspection.

Our Vertical Markets

Deep Vision Data is the industry leader in synthetic training data for a range of markets

Deep Vision Data - Our Expertise

Our Expertise

  • Multi-disciplinary expertise in design, engineering, modeling and simulation software/app/simulator development, digital art and industrial 3D scanning.
  • Proprietary software infrastructure for data creation, labeling and warehousing
  • Ability to rapidly create product 3D models from physical samples or digital data
  • Expertise with all major computer-aided design (CAD) and digital modeling software
  • Expertise with all major modeling and simulation software systems
  • Ability to create labeled training, validation and testing data sets for neural networks

Contact Us