Compute for AI Models Is Overrated - Data Quality Is the Real Bottleneck
You're reading Entrepreneur Georgia, an international franchise of Entrepreneur Media.
There is a pattern that plays out in tech companies more often than anyone publicly admits. A team spends months - sometimes over a year - building an AI system. Engineers are hired, infrastructure is set up, a model is carefully chosen and tested. Everything looks promising. Then the system goes live, and something is quietly, persistently off. The results do not hold up the way they did in testing, and the decisions the system produces start looking unreliable. The team goes back to the model, adjusts it, retrains it, tries a different approach - and the whole cycle starts over again.
The actual problem, in many of these cases, has nothing to do with the model. It turns out the data the system was trained on had mislabeled entries, inconsistent information pulled from mismatched sources, and records that were months out of date. Nobody had thought to look there first.
This is the part of AI that rarely makes the news. Not the model, not the hardware, but the work that happens before all of that: deciding which data gets collected, from where, cleaned how, structured in what format, and by whose standards.
Every founder dreams of building a game-changing AI product, but the reality behind the scenes is usually chaotic and incredibly expensive. We have seen the same story play out across dozens of industries: companies burn through massive budgets on the latest tech and hardware, only to watch the whole project stall out before it ever delivers results.
The breakdown almost never happens because of the software itself - it happens because the underlying data is all over the place.
The question is not whether something is broken - it usually is. The harder question is why nobody catches it before it becomes expensive. To get to the bottom of it, we spoke to the experts at Datamam. As a data infrastructure company, they operate at a critical layer of the tech world that rarely gets the spotlight, handling the heavy lifting that has to happen before anything else: collecting the right data, ensuring its accuracy, and structuring it so it is actually usable.
As entrepreneurs trying to navigate this landscape, we asked them to cut through the hype and tell us why companies keep missing the mark, and what it actually takes to fix it.
The Myth That More Compute Solves Everything
The model gets the attention because it is the tangible, demonstrable part of an AI system, but by the time anyone is debating which model to use, the most consequential decisions have already been made - what data went in, where it came from, and whether any of it was actually reliable.
"One of the largest myths of AI at this point in time is that good results come from good models", - says Sandro Shubladze, Founder & CEO of Datamam. "Actually, the source of those is a lot further upstream - the sourcing, the structuring, the labeling, the metadata, and the timing. Obviously, computing is important. But computing is something you can always buy. Not so a trusted reality.
While only a few years ago, it was satisfactory if the data was there, now it needs to be recent, well-organized, traceable, and applicable to production systems. One of the reasons for this is the fact that the performance of models is evolving much more quickly than originally thought. For example, the Stanford University 2025 AI Index reveals that the difference between the top and the 10th ranking on Chatbot Arena is decreasing from 11.9% to 5.4%, and the open vs. closed version difference is decreasing from 8.04% to 1.70%. The gap is closing. As models get closer, the differentiator moves towards data quality, integration, and governance".
Think of it the way you would think about building a house. You can hire the best architect, the most experienced crew, use the finest materials for the walls and the roof - but if the foundation was poured wrong, none of that saves you. The model is the roof everyone admires. The data is the foundation nobody sees until it cracks.
As Datamam puts it, the fundamental issue lies in the data layer - in quality, consistency, origin, and the way those things get measured and maintained. Models have gained visibility and commercialization because there is much to discuss and market around them, but the underlying reality of bad labeling, poor metadata management, and difficult-to-track data pipelines affects everything beneath the surface. A widely cited academic paper examined ten of the most commonly used AI benchmark datasets and found that label errors averaged 3.3% across all of them, with more than 6% of labels incorrect in the ImageNet validation set alone. Even more importantly, the authors demonstrated that enough labeling noise will allow smaller networks to outperform larger ones - which means the problem is not just cosmetic. It changes the outcome.
"The bulk of the work actually comes beforehand. The architecture is what you see, but the difficult part is typically determining which aspects of the world can be observed by the model and in what format. This is why much of the groundbreaking work happening today isn't necessarily focused on designing new architectures - rather, it is concerned with how the data is preprocessed. While there may be an inclination to consider inference (running the model) to be the exciting part of the process, the critical decisions are actually made well in advance", - explains the respondent.
When the Problem Is Already Years Old Before Anyone Notices
Usually, things go wrong in the places nobody thinks to look until something has already broken.
Amazon built a recruiting tool that was meant to make hiring smarter and faster. They eventually had to shut it down entirely - not because the model was poorly designed, but because it had been trained on a decade of the company's own hiring history, which heavily reflected a male-dominated industry. The model did not malfunction. It learned exactly what it was taught. The problem had been baked in years before the tool ever launched, and it took real-world use before anyone fully traced where it had come from.
That is the uncomfortable reality about data problems: they tend to be invisible right up until they are not. "Whenever the performance fails, people are quick to point fingers at the model because that is the most obvious culprit. But from personal experience, the issue is rarely with the model, but rather with the data - for example, outdated data, inconsistent labels, duplicated entries, inadequate metadata, or data source misalignment. In most cases, the issue with the data outweighs the issue with the model", - Sandro Shubladze notes. " The NeurIPS 2021 paper Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks makes this clear. It identified at least 3.3% label errors across 10 major datasets, including at least 6% in the ImageNet validation set, and showed that with enough label noise, a smaller model can actually outperform a larger one".
Part of why this keeps going undetected is that testing environments are almost always tidier than real life. When a team builds and tests something internally, they work with a carefully prepared sample. Then the system goes live, where the data is messier, comes from more sources, and moves faster than anything in the controlled environment. By the time the gap becomes visible, the natural instinct is to fix the thing most recently built - which is usually the model - rather than go digging through a data pipeline that a dozen people touched over two years.
"Normally, the problem occurs at the interfaces: between systems, teams, definitions, or time states. For example, System X defines an item, consumer, or event differently from how System Y defines it, and no one pays attention until the decision-making process becomes contradictory. The reason these problems have not been detected before now is that pre-production tests are always conducted on smaller and more controlled samples than actual production environments", - was told to Entrepreneur.
Three Departments, Three Different Answers - and Nobody Knows Which One Is Right
For a company that is not building AI research tools - just trying to use data to make better day-to-day decisions - the problem looks ordinary, which is exactly why it is so easy to miss.
Three departments in the same company are each tracking customers, but each team built their systems slightly differently, so "customer" means something subtly different depending on which report you are reading. The CRM gives one number, the finance system gives another, the analytics dashboard gives a third. Every meeting where those figures come up turns into an argument about whose number is right, and the actual decision either gets delayed or gets made on the wrong information.
Or a pricing team is making calls based on competitor data that is two months old, because someone is manually downloading reports and copying numbers into a spreadsheet every few weeks. By the time that information reaches the person making the decision, the market has already shifted.
"In such an environment, business decisions are based on subjective impressions - 'what we remember' or 'what we saw last' - rather than the full picture", - CEO of Datamam explains. "This directly impacts pricing strategy, risk management, and competitive positioning".
None of this feels catastrophic in the moment. It creates a slow, steady drag - the kind of inefficiency that does not show up in a single bad quarter but quietly shapes how a company performs over years.
,, When a company approaches with an AI performance issue, we always start with the data, not the model. To start with, we want to know everything about the input, such as the source, its freshness, structure, labeling, metadata, and whether the production environment is receiving the same kind of data for which the model was trained. This is followed by a consideration of normalization, enrichment, timing, and delivery. It is only then that we consider the model itself", - the experts at Datamam say.
What Changes When a Law Firm Stops Reading Last Month's News
The difference that proper data infrastructure makes becomes most visible in cases where the before and after are concrete.
One large international law firm needed to track competitor activity - which firms were involved in which deals, and when. The way they had been doing it involved people manually scanning websites and compiling information by hand, which meant everything they knew was already old by the time it reached someone who could act on it. Datamam built a system that pulled from dozens of live sources simultaneously, structured everything into a single database, and flagged relevant transactions in real time. The business development team went from reading summaries of last month's activity to watching the market as it actually moved.
A real estate company faced a similar problem - listings scattered across multiple platforms, each updating at different times and in different formats, making it nearly impossible to get a consistent picture of what was available and at what price. After building a system that aggregated and normalized listings daily, the company reduced the time spent on property searches by 60% and closed around 60 additional high-value deals within the first three months.
,,You normalize your fields, eliminate duplicates, enhance your timestamping, optimize your enrichment rules, fix your labels, and you find that your system starts working much more effectively - all without having to change your infrastructure. This is why I believe the industry consistently undervalues data management. The difference between good inputs and poor inputs makes a model appear more intelligent than it truly is", - according to Sandro Shubladze..
In neither case was a new or better model involved. The improvement came entirely from having accurate, timely, well-structured information that people could trust and act on without spending half their day questioning it.
Why "We Will Sort the Data Later" Is a Promise That Almost Never Gets Kept
Many companies operate on a collect-now, sort-it-out-later basis, and the difficult thing about that approach is that what you lose by moving fast is not always something you can recover. The thing that tends to get skipped in the rush is metadata - which, in plain terms, means the context around your data. Where did this information come from? When exactly was it collected? Has it been changed, and if so, how and by whom? That context is what allows you to verify data, trace errors back to their source, and understand why a system is behaving the way it is. Without it, you have numbers - but no reliable way to know whether those numbers mean what you think they mean.
"It may be possible to improve upon it at a later date", - Datamam says, "but more often than not, once it is lost, it is gone forever. This is one reason why Data Debt can be more serious than Technical Debt. MIT Sloan just discussed this exact problem with artificial intelligence training data".
It is a bit like taking research notes without writing down the sources. The notes still exist, but the moment someone asks you to verify them or build seriously on top of them, you are stuck. With large datasets built under time pressure, that gap tends to multiply quietly until it becomes a structural problem that touches everything built on top of it.
"This is where scale breaks things," - Sandro Shubladze explains. "When a dataset becomes too large and too fast-moving to review manually, organizations need to stop thinking about validating individual rows and start validating the entire pipeline - applying validation rules, detecting anomalies, maintaining data lineage, monitoring drift, and establishing clear protocols for quarantining or rejecting data that does not meet standards."
The consistency problem runs even deeper when data is coming from hundreds of sources at once, each with different structures and update frequencies. "No source will be stable forever," - Entrepreneur learned from Datamam. "The most common failure does not happen at the initial build. It happens later, when a source quietly changes its formatting, slows down, or introduces more noise - and nobody realizes it in time".
Models Are Getting Closer to Each Other - Which Means Something Else Has to Be the Edge
One of the more telling signs that the industry is approaching an inflection point is what is happening with model performance itself. According to Stanford's 2025 AI Index, the difference between the top-ranked and tenth-ranked models on the main industry benchmark dropped from nearly 12% to just over 5% in a single year, and the gap between open-source and closed models fell to under 2%. When models become that similar in what they can do, the competitive advantage has to come from somewhere else.
Recent research has started pointing directly at data quality as that next differentiator. Carefully curating training data - removing duplicates, improving labeling consistency, tracking where information came from - has been shown to produce better benchmark results than simply using more data or more compute. In one widely discussed study, a smaller model trained on well-prepared data matched a much larger model across dozens of tasks, at a fraction of the cost.
"We believe that model scaling will continue to reign supreme in the discourse, due to its simplicity of explanation and marketing", - Founder of Datamam says. "However, in actuality, major companies will turn towards data centrality. Large-scale AI is setting higher expectations across the board. Merely having vast amounts of data is not sufficient - better filters, de-duplication, source tracking, annotation, and synchronization are needed. For instance, a curated training set using open-data in the case of DataComp-LM yielded a gain of 6.6 points on the MMLU test compared to previous state-of-the-art scores, using 40 percent fewer compute resources. The 7B model from that training set matched Llama 3 8B across 53 tasks, using only a sixth of the compute. The FineWeb team reached the same conclusion when they trained on 15 trillion tokens across 96 Common Crawl snapshots - curation consistently outperformed simply having more data".
And the quality of data is not just about what is in it - it is also about when it was collected. "Time must be treated as an integral part of the data model rather than a technical detail", - Datamam adds. "Precise time windows, accurate timestamps, strict freshness constraints, and versioning are all essential for making sure that what you have is a true representation of a consistent state. Otherwise, you risk assembling pieces of information from slightly different moments into a picture that has never actually been real".
The public conversation will keep circling around models for a while - they are easier to announce and easier to understand as a headline. But the companies quietly doing the less visible work of making sure their data is accurate, structured, and trustworthy before it ever reaches a model are likely the ones building something that actually holds up over time.
,,From our perspective, an example of a mature data architecture would entail more than just gathering the data - it would also include ensuring its quality, understanding its meaning, auditing its accuracy, and being able to leverage it right away. It would feature robust provenance, good observability, data validation capabilities, data drift detection, scoring of sources, and delivery that was ready for analytics and AI out-of-the-box. Most critically, it would be measured in terms of its business benefits such as speedier decision-making, reduced manual work required to clean up data, and reduced surprises in production", - Datamam suggests.