Free the Data!

Development of AI depends heavily on access to deep reservoirs of clean, structured data from which to learn. The giant firms that have it aren't sharing.

learn more about Chad Steelberg

By Chad Steelberg


Opinions expressed by Entrepreneur contributors are their own.

"What good are wings without the courage to fly?" These words of wisdom come to mind as I consider the open-source craze among leading artificial-intelligence technology providers.

Top firms, including IBM, Google and Facebook, have opened the source code of their artificial intelligence software tools, making them available for developers to use in their own devices and applications. This is most certainly a good thing, for the companies themselves and for the AI business generally.

However, open source is only part of the equation. Unlike previous generations of software, AI algorithms are worthless without a dataset to work on. And in contrast to their open-source code policies, these companies maintain a closed-data stance, hoarding their vast information repositories as a competitive advantage for developing better AI technology.

Essentially these companies have given us wings -- but have denied us the sky. What the top tech firms need is the courage to stop hoarding information and embrace open data, giving the rest of the world access to the information required for AI cognitive engines to attain their full potential.

The data-rich get richer.

In the age of AI, a new 1 percent is arising. This upper, upper crust consists of companies blessed both with machine-learning technology and with large quantities of information.

Some companies have been dubbed "the Superrich" of the AI business, including Google, Facebook, Amazon and Microsoft. It has been reported that, while there are very few of these companies in the world, they have a massive advantage over everyone else in the machine learning space because they have access to vast amounts of clean, structured data.

Related: There's a Lot More to AI Than Just Chatbots

Such data is needed to train machine-learning algorithms, giving them the basic information they need to function on their own in the real world. For example, an object-recognition algorithm designed to recognize cats in photos will be trained by reviewing massive numbers of images depicting felines. These images need to have some structure, i.e., they must be tagged with keywords that properly indicate they are depicting cats.

The larger the quantity of training data, the better the algorithm will perform, with more information providing more examples that can be used to find patterns. Conversely, inadequate quantities of training data can produce algorithms that deliver substandard results—sometimes to the extreme embarrassment of their creators.

Because of this, the usefulness of an AI algorithm is intrinsically tied to the availability of high-quality data. In this regard, AI algorithms are fundamentally different from other types of software, whose code is valuable on its own without any additional data.

Thus, when a company open-sources an AI cognitive engine such as a translation tool, it's not the same as open-sourcing a piece of traditional software, like a spreadsheet. Without also providing access to the data, open isn't really open.


Such data-denial is no accident. Rather, it's part of a deliberate strategy to maintain a competitive advantage. With AI models well known and well distributed, the data set is the one commodity that can be locked away and kept from rivals.

That's why top technology players are hoarding data. For example, IBM didn't buy The Weather Channel's data operations because it wanted to know if it's going to rain in Tallahassee tomorrow.

Weather is the number-one factor driving global GDP. By combining The Weather Channel's vast repository of climate-related information with its Watson AI, IBM can take the lead in forecasting the weather for private businesses, allowing it to do everything from predicting winter energy demand to forecasting crop yields.

Related: In the Next Wave Of Innovation, Big Data Is Your Competitive Advantage

This gives IBM a huge market impact and a built-in advantage that will be hard for other companies to match.

Google, Facebook and others hold similar advantages in their respective areas, possessing vast quantities of consumer and social-media data that can be used to train highly-valuable AI tasks, from sentiment analysis for marketing to object-recognition for photos, to natural language processing for user interfaces.

Open season.

Examples of open AI software tools offered by technology powerhouses include:

  • Google's TensorFlow, which is designed for building and training neural networks.

  • Microsoft's Computational Network Toolkit, which can be used for applications including machine translation, image recognition, image captioning, text processing, language understanding and language modeling.

  • IBM's SystemML, which can be used to create customized machine learning software.

  • Facebook's deep-learning technologies, which the company has donated to an open source software project known as Torch.

With such initiatives, these companies are essentially giving away software that's the product of enormous investments in manpower and intellectual property. However, these efforts are far from altruistic; by proliferating their technologies, companies aim to build large communities of developers accustomed to using their tools, establishing them as standards in the AI market.

Furthermore, with the real value of AI locked up in their proprietary data, these companies have little to lose by giving away their software tools.

Data Dump.

So how can companies be convinced to give up their prized data for the greater good of the AI business? One example can be found in an Uber initiative called Movement, which opens up data collected by the company's fleet of cars regarding urban traffic patterns. Via the Uber Movement website, city planners can gather information to help improve traffic conditions.

What's in it for Uber? The company doesn't conduct road planning and construction itself, so providing this information to planners allows the government to make changes that improve driving conditions. This results in an improved user experience for Uber vehicles.

Related: Uber Reveals Its Traffic Data With New Website

For AI tech companies with large treasure troves of data, there may be other opportunities to open up access to information in order to stimulate broad societal benefits. These benefits could indirectly boost demand for their technologies.

The AI market is ready to take wing -- now all the big players need to do is give clearance for takeoff by having the courage to open up their data.

Chad Steelberg

CEO and chairman of Veritone

Chad Steelberg is an entrepreneur who co-founded several successful internet software companies. He serves as chairman and CEO of Veritone.

Related Topics

Editor's Pick

This 61-Year-Old Grandma Who Made $35,000 in the Medical Field Now Earns 7 Figures in Retirement
A 'Quiet Promotion' Will Cost You a Lot — Use This Expert's 4-Step Strategy to Avoid It
3 Red Flags on Your LinkedIn Profile That Scare Clients Away
'Everyone Is Freaking Out.' What's Going On With Silicon Valley Bank? Federal Government Takes Control.

How to Detect a Liar in Seconds Using Nonverbal Communication

There are many ways to understand if someone is not honest with you. The following signs do not even require words and are all nonverbal queues.

Celebrity Entrepreneurs

'I Dreaded Falling in Love.' Rupert Murdoch Is Getting Hitched for the Fifth Time.

The 92-year-old media tycoon announces he will wed former San Francisco police chaplain Ann Lesley Smith.

Business News

Carnival Cruise Wants Passengers to Have Fun in the Sun — But Do This, and You'll Get Burned With a New $500 Fee

The cruise line's updated contract follows a spate of unruly guest behavior across the tourism industry.

Starting a Business

Selling Your Business? Do These 6 Things Right Now.

If you want the maximum price you need to make these moves before you do anything else.


5 Practical Strategies Founders Can Use to Improve Their Mental Health

Supporting your mental health is one of the most important investments you can make in your company. If you're unsure where to begin, choose one of these strategies and focus on implementing it in your everyday life.


How Great Entrepreneurs Find Ways to Win During Economic Downturns

Recessions are an opportunity to recalibrate and make great strides in your business while others are unprepared to brave the challenges. Here's how great entrepreneurs can set themselves up for success despite economic uncertainty.