Gulf's AI Ambitions Risk Leaving Arabic LLMs Behind Unlocking the full potential of Arabic AI starts with building the collective infrastructure to support it.
Opinions expressed by Entrepreneur contributors are their own.
You're reading Entrepreneur Middle East, an international franchise of Entrepreneur Media.
The Gulf is adopting AI faster than almost anywhere else. According to Microsoft's recent AI Diffusion Report, the United Arab Emirates ranks first globally in AI adoption, with nearly 60% of working-age adults using AI tools, while Qatar ranks tenth, and countries like Saudi Arabia ranking comfortably in the top 20%. Governments and businesses across the region are steadily integrating AI into daily operations, backing it with heavy investment.
But much of this money and momentum is going towards infrastructure that trains and powers English-language AI models. As the UAE and Saudi Arabia strike up multi-billion-dollar deals with US tech giants like Oracle and Nvidia, Arabic, the fifth-most-spoken language in the world, remains chronically underrepresented in AI.
While their sights are set on global growth, Gulf nations must not let the AI race eclipse the task of building high-performance Arabic large language models (LLMs). Unless Arabic-language AI becomes as much of a strategic priority as constructing data centres or funding chip plants, the region risks creating an AI economy that doesn't fully serve its needs.
The most effective way to address this is through an Arabic AI consortium: a technical partnership that unites sovereign wealth funds, research institutions, and businesses from across the entire Arab World.
From importers to exporters
Gulf nations have led the charge on AI adoption. But now we're seeing a shift in focus from importing AI to exporting it, building major AI infrastructure that serves the rest of the world.
The region wants to become a global AI hub, and it's backing its ambitions with serious capital. Take the 500-megawatt data centre being built by Saudi Arabia's HUMAIN AI and Elon Musk's xAI. It could make the Gulf home to one of the largest computing clusters on the planet (NBC).
Don't get me wrong, these initiatives are critical to ensuring the Gulf has a stake in the global AI landscape. But as it orients toward the global market, we risk unintentionally shifting away from developing Arabic-language LLMs.
It's only natural. If your target market is global, you optimise for the world's lingua franca: English. It allows for scaling, aligns with competitors, and attracts international investors. But as a result, Arabic becomes secondary.
Viewing Arabic AI development as peripheral is a strategic misstep. Not only because Arabic countries need Arabic-first AI, but also because the English-language AI market is already so crowded. Where Middle Eastern countries can make an outsized difference in the AI landscape is by building an Arabic LLM of equal scale and performance to the largest English-language models.
AI's Arabic gap
When it comes to Arabic, LLMs struggle. The performance of any LLM is dictated by the quality and quantity of the data it is trained on. The more language-specific data a model has, the better it performs. Despite being spoken by over 450 million people (UNESCO), AI systems are trained on roughly the same volume of Arabic text as Czech, spoken by just 12 million people.
This underrepresentation results in tools that misinterpret Arabic text, mishandle tone, or generate culturally inappropriate outputs. It's especially risky in sensitive contexts, like employment, loan processing, or critical infrastructure delivery.
Beyond the linguistic case for Arabic-first AI, closing the gap would unlock value across the entire MENA market. Instead of repacking predominantly English LLMs for Arabic users, entrepreneurs could develop native, region-specific tools.
Gulf businesses and governments have recognised this opportunity. But while models like Saudi Arabia's ALLaM, Qatar's Fanar, and the UAE's Falcon are major steps toward Arabic-proficient AI, they're still built at a fraction of the scale of frontier English-language models.
Take Jais 2, the latest advance in Arabic AI. It was trained on 1.6 trillion Arabic, English, and code tokens: the pieces of words and punctuation that models break language down into (Middle East AI News). It's the largest Arabic-first dataset, but it pales in comparison to the 13 trillion tokens Meta's Llama-3 was trained on (Meta).
The challenge isn't a lack of ambition or funding. It's that a limited and fragmented supply of Arabic-language is capping the performance of domestic models.
These models need more than investment from individual nations. They need a shared data and training pipeline. Without a coordinated, pan-Arabic data and research ecosystem, Arabic-focused models will struggle to match the scale and performance of their English counterparts.
Why we need an AI consortium
No country, however well-resourced, can capture the full range of dialects and cultural contexts needed to build a truly representative Arabic LLM. The UAE and Saudi Arabia may have the infrastructure and capital, but every Arab state holds valuable linguistic and cultural data that are missing from today's models.
A coordinated, pan-Arab effort is the only way to unlock that data. Similar in ethos to the Arab League, an Arabic AI consortium would unite the region's capital, compute, talent, and data to accelerate the development of competitive Arabic-language AI.
By pooling datasets from businesses and public bodies across all Arabic-speaking countries, it could create a unified Arabic data common: a rich and representative collection of dialects and cultural contexts.
At the same time, shared investment would make it more economically viable to build an Arabic LLM capable of matching the scale of leading US and Chinese models.
None of this diminishes the impressive progress already made by regional AI developers. Nor does it detract from the value of Gulf investments in global AI infrastructure. It's about complementing outward investments with an AI ecosystem capable of meeting the linguistic and cultural needs of the Arab-speaking world.
Right now, there isn't an LLM that fully understands the depth and diversity of the Arabic language. Until they have access to more data, even the strongest regional models will struggle to keep pace with global leaders. And as the pace of AI development accelerates, the window to develop an equally powerful Arabic LLM begins to close.
A pan-Arab AI consortium is the most effective way of changing that. It would ensure that every dialect and culture is represented, laying the foundations for truly world-class Arabic LLMs.
The Gulf's global AI ambitions and the development of Arabic-native models aren't competing priorities. They are mutually reinforcing. Unlocking the full potential of Arabic AI starts with building the collective infrastructure to support it.