Gulf's AI Ambitions Risk Leaving Arabic LLMs Behind Unlocking the full potential of Arabic AI starts with building the collective infrastructure to support it.

By Dr Zaid Al-Fagih

Opinions expressed by Entrepreneur contributors are their own.

You're reading Entrepreneur Middle East, an international franchise of Entrepreneur Media.

Shutterstock

The Gulf is adopting AI faster than almost anywhere else. According to Microsoft's recent AI Diffusion Report, the United Arab Emirates ranks first globally in AI adoption, with nearly 60% of working-age adults using AI tools, while Qatar ranks tenth, and countries like Saudi Arabia ranking comfortably in the top 20%. Governments and businesses across the region are steadily integrating AI into daily operations, backing it with heavy investment.

But much of this money and momentum is going towards infrastructure that trains and powers English-language AI models. As the UAE and Saudi Arabia strike up multi-billion-dollar deals with US tech giants like Oracle and Nvidia, Arabic, the fifth-most-spoken language in the world, remains chronically underrepresented in AI.

While their sights are set on global growth, Gulf nations must not let the AI race eclipse the task of building high-performance Arabic large language models (LLMs). Unless Arabic-language AI becomes as much of a strategic priority as constructing data centres or funding chip plants, the region risks creating an AI economy that doesn't fully serve its needs.

The most effective way to address this is through an Arabic AI consortium: a technical partnership that unites sovereign wealth funds, research institutions, and businesses from across the entire Arab World.

From importers to exporters

Gulf nations have led the charge on AI adoption. But now we're seeing a shift in focus from importing AI to exporting it, building major AI infrastructure that serves the rest of the world.

The region wants to become a global AI hub, and it's backing its ambitions with serious capital. Take the 500-megawatt data centre being built by Saudi Arabia's HUMAIN AI and Elon Musk's xAI. It could make the Gulf home to one of the largest computing clusters on the planet (NBC).

Don't get me wrong, these initiatives are critical to ensuring the Gulf has a stake in the global AI landscape. But as it orients toward the global market, we risk unintentionally shifting away from developing Arabic-language LLMs.

It's only natural. If your target market is global, you optimise for the world's lingua franca: English. It allows for scaling, aligns with competitors, and attracts international investors. But as a result, Arabic becomes secondary.

Viewing Arabic AI development as peripheral is a strategic misstep. Not only because Arabic countries need Arabic-first AI, but also because the English-language AI market is already so crowded. Where Middle Eastern countries can make an outsized difference in the AI landscape is by building an Arabic LLM of equal scale and performance to the largest English-language models.

AI's Arabic gap

When it comes to Arabic, LLMs struggle. The performance of any LLM is dictated by the quality and quantity of the data it is trained on. The more language-specific data a model has, the better it performs. Despite being spoken by over 450 million people (UNESCO), AI systems are trained on roughly the same volume of Arabic text as Czech, spoken by just 12 million people.

This underrepresentation results in tools that misinterpret Arabic text, mishandle tone, or generate culturally inappropriate outputs. It's especially risky in sensitive contexts, like employment, loan processing, or critical infrastructure delivery.

Beyond the linguistic case for Arabic-first AI, closing the gap would unlock value across the entire MENA market. Instead of repacking predominantly English LLMs for Arabic users, entrepreneurs could develop native, region-specific tools.

Gulf businesses and governments have recognised this opportunity. But while models like Saudi Arabia's ALLaM, Qatar's Fanar, and the UAE's Falcon are major steps toward Arabic-proficient AI, they're still built at a fraction of the scale of frontier English-language models.

Take Jais 2, the latest advance in Arabic AI. It was trained on 1.6 trillion Arabic, English, and code tokens: the pieces of words and punctuation that models break language down into (Middle East AI News). It's the largest Arabic-first dataset, but it pales in comparison to the 13 trillion tokens Meta's Llama-3 was trained on (Meta).

The challenge isn't a lack of ambition or funding. It's that a limited and fragmented supply of Arabic-language is capping the performance of domestic models.

These models need more than investment from individual nations. They need a shared data and training pipeline. Without a coordinated, pan-Arabic data and research ecosystem, Arabic-focused models will struggle to match the scale and performance of their English counterparts.

Why we need an AI consortium

No country, however well-resourced, can capture the full range of dialects and cultural contexts needed to build a truly representative Arabic LLM. The UAE and Saudi Arabia may have the infrastructure and capital, but every Arab state holds valuable linguistic and cultural data that are missing from today's models.

A coordinated, pan-Arab effort is the only way to unlock that data. Similar in ethos to the Arab League, an Arabic AI consortium would unite the region's capital, compute, talent, and data to accelerate the development of competitive Arabic-language AI.

By pooling datasets from businesses and public bodies across all Arabic-speaking countries, it could create a unified Arabic data common: a rich and representative collection of dialects and cultural contexts.

At the same time, shared investment would make it more economically viable to build an Arabic LLM capable of matching the scale of leading US and Chinese models.

None of this diminishes the impressive progress already made by regional AI developers. Nor does it detract from the value of Gulf investments in global AI infrastructure. It's about complementing outward investments with an AI ecosystem capable of meeting the linguistic and cultural needs of the Arab-speaking world.

Right now, there isn't an LLM that fully understands the depth and diversity of the Arabic language. Until they have access to more data, even the strongest regional models will struggle to keep pace with global leaders. And as the pace of AI development accelerates, the window to develop an equally powerful Arabic LLM begins to close.

A pan-Arab AI consortium is the most effective way of changing that. It would ensure that every dialect and culture is represented, laying the foundations for truly world-class Arabic LLMs.

The Gulf's global AI ambitions and the development of Arabic-native models aren't competing priorities. They are mutually reinforcing. Unlocking the full potential of Arabic AI starts with building the collective infrastructure to support it.

Dr Zaid Al-Fagih

Co-founder and CEO, Rhazes AI

Dr Zaid Al-Fagih is the co-founder and CEO of Rhazes AI, an award-winning AI-powered virtual assistant.

The tool empowers doctors by boosting clinical productivity, reducing medical errors and burnout, and restoring the human connection in medicine.

Prior to founding Rhazes AI, Dr Al-Fagih practiced full-time as a medical doctor in the NHS, and was a voluntary first responder and first aid trainer on humanitarian missions during the Syrian conflict.

He has published research in leading journals on applying emerging technologies to healthcare, most recently in the Emergency Medical Journal.

Business News

How to Write a Business Plan

Learn the essential elements of writing a business plan, including advice and resources for how to write and conduct each section of your business plan.

Marketing

April 21 Is Your Last Chance for Mobile Optimization Before 'Mobilegeddon'

The search giant is currently working on a major algorithm change that will revolutionize the way mobile friendliness is determined.

Leadership

Revolutionizing Proptech: Haider Ali Khan, CEO of Bayut and dubizzle, and CEO of Dubizzle Group MENA

Born from a mission to redefine real estate through technology, Bayut sparked a movement that evolved into the global proptech and classifieds leader, Dubizzle group — and today, we go back to understanding the homegrown powerhouse that started it all.

Marketing

The Quickest Way to Deliver Your Message? Make It Visual.

Infographics, dashboards and mobile apps provide a direct avenue to our brains. Use them to your advantage.

Starting a Business

College Startup Offers a Creative Approach to Banish Boring Presentations

Instead of boring slides with bullet points and clip art, Big Fish creates presentations that tell stories and resonate emotionally with viewers.

News and Trends

International Fashion Brand Maison D'AngelAnn Secures US$2 Million Investment From A Private Family Office In The UAE

The newest round of funds follows Maison D'AngelAnn's $7 million investment in November 2020 from The Gate Business Services, a UAE-based investment and real estate consultancy, which also saw it also acquire a majority stake in the business.