The World is Increasingly Turning to Synthetic Data

2022-05-06 18:32:21 By : Mr. Blank Cai

Rendered.ai provides a platform as a service (PaaS) for data scientists and Computer Vision (CV) engineers to build scalable, large, configurable synthetic CV datasets in the cloud for training Artificial Intelligence (AI) and Machine Learning (ML) systems.

The company recently announced two significant partnerships to improve their capabilities and scope – while Esri will now provide data for training algorithms; the Rochester Institute of Technology’s Digital Imaging and Remote Sensing (DIRS) Laboratory will provide expanded, high-quality synthetic imagery.

Pools of unbiased data can be hard – and expensive – to find, but they are required to train algorithms that enable artificial intelligence (AI) and machine learning (ML). The Rendered.ai Platform, using Esri data, increases the accuracy and power of engineered data to add to real datasets. This can reduce bias and improve innovation while modeling with increasing accuracy.

The DIRSIG model can produce passive single-band, multi-spectral or hyper-spectral imagery, visible and not. The model also has a very mature active laser (LIDAR) capability and an evolving active RF (RADAR) capability. In collaboration with Rendered.ai, they provide a cloud-based platform for high volume synthetic data generation.

After successful projects in Earth Observations (EO) with customers such as the National Geospatial-Intelligence Agency, Rendered.ai intends to deepen the company’s capability and technology for synthetic satellite and aerial training data using simulation capability ranging from visible light imagery to Synthetic Aperture Radar (SAR).

At GEOINT 22 in Denver (April 24-27), Geospatial World caught up with Rendered.ai’s CEO, Nathan Kundtz, to hear about their latest partnerships:

There’s so much data being generated off geospatial sources now that humans simply can’t analyze it. So, going from data to insight has become a huge bottleneck, and really, the only viable way to address that is to use artificial intelligence algorithms. It turns out, all the behavior and performance of those algorithms depends on your ability not just to get data, but annotated and effective data, to them for training.

That’s where you end up with the dearth, but not because there’s not a lot of imagery out there. It’s because it’s expensive to annotate, it’s rarely annotated fully or correctly, and then much of what you actually need to find are things that are quite rare.

The world is increasingly turning to what’s called synthetic data to address that gap in data provision. In synthetic data, we’re using essentially physics-based simulations to say, “What would it look like if…” You can do that across a wide variety of different types of imaging, but we found that it’s really hard to do that well.

Just as having a good camera does not necessarily make a good film, having a good simulator is not enough to create useful datasets. What we started to do was break down the steps required to actually make synthetic data effective at scale, and in a way that it could be iterated upon, in order to really integrate into an engineering process.

So, our partnerships with DIRSIG and Esri are a great example of how we can leverage existing technologies and capabilities that have been built up over decades to address the new problem of synthetic data production

Those steps might involve large libraries of 3D models and procedural generation tools. Data aggregation, data provenance, librarianship of those data sets, quality assessment, domain adaptation, and ultimately, pipelining into AI — these are all parts of the processes, and no one had built a platform that addresses all those different elements.

Rendered.ai has built that platform. On top of that platform, we are layering on capabilities. So, this gets us to the partnerships.

Esri has a tremendous amount of data that actually exists in the real world in GIS layers. Our partnership with them allows us to ingest that onto the platform and then leverage it in the production of these synthetic algorithms, which you can then set up.

The partnership with DIRSIG is they have some of the best, very, very high-fidelity, multi-spectral and hyper-spectral simulation tools on the planet. This allows us to start integrating their imagery into the platform and use those to hopefully generate more accurate simulations.

I think if you think about the types of things you can do with ESRI’s layers…Let’s imagine that one of the things you’re trying to do is detect pipeline leaks, say natural gas leaks. Well, there’s an Esri GIS layer for where all the pipelines are.

So, you could sort of look at that real data, and then start to introduce, synthetically, what would happen if there was a fire?

From a DIRSIG standpoint, yeah, that’s largely a lot of simulation fidelity with a tremendous history in validating the accuracy of their output. We can be using those real-world environments with real information, then layer on things that become varieties that we want to detect.

And so, for us, those are two important partnerships, but they’re also some of our first partnerships on the platform with third-party content. And so, it’s been exciting here because a lot of what we hear is, “Oh, so can I put my content there and start using this as a platform to bring that to the world of synthetic data?” The answer’s yes, we’re here to help make that happen. If you go around the booths here, you’ll find a lot of companies with tremendous sensor technology, tremendous simulation technology, and you’ll find a number of other companies with a huge need for synthetic data and data generation.

And so, part of what we’re doing is matchmaking, and then also providing the fabric for making it.

We have a lot of simulation capability throughout this sector, and we have this massive demand for synthetic data generation. To work together, there needs to be a platform that respects the capabilities of each, and also allows you to engineer around and iterate to get to value.

Our partnerships with DIRSIG and ESRI are a great example of how we can leverage existing technologies and capabilities that have been built up over decades to address the new problem of synthetic data production.

This is the first time that we’re announcing third-party content coming onto the platform. We have built a number of our own simulators on the platform, and we do use those, but the platform was designed to allow all sorts of people to come together in using it. Now we’re really starting to see that vision take shape.

© Geospatial Media and Communications. All Rights Reserved.​