India's Road Map Towards Building Foundation LLMs in India Building LLMs presents several challenges--the need for extensive computing power, large datasets, and specialized expertise

By Paromita Gupta

You're reading Entrepreneur India, an international franchise of Entrepreneur Media.

Freepik

India has recorded significant growth in Artificial Intelligence in the past two years after the Large Language Model (LLM) ChatGPT became the talk of the town and took the world towards the AI race. However, India is a very unique country with different cultures, with thousands of dialects, and several unique languages spoken across the country. This beauty of the country sometimes leads to challenges in implementing new innovative policies and the process consumes a lot of resources and time.

Leveraging Open Source for Building Scalable Foundation Models in India

While sharing insights on how India can leverage best practices of open source to build foundational models and LLMs, Pratyush Kumar, Co-founder, of Sarvam AI said that this technology is very new and would take time to better and become more useful. Building LLMs presents several challenges--the need for extensive computing power, large datasets, and specialized expertise. Open source in democratizing AI technology is essential and presents unique challenges and opportunities. India has done well, but still, it has a long journey ahead for these technologies to mature and become more universally useful.

"And that's where open source as a movement in computing has been very strong. And kudos to the government of India. I think the Bhashni project has been a big success in demonstrating how to do open-source Indian language AI at scale," said Kumar.

The Complexity of Indian Languages

While speaking on what innovative approaches and technologies India can use to address the challenges of building the foundation model MohitSewak, AI researcher and developer relations, South Asia, NVIDIA said that India's linguistic diversity is immense, with 23 official languages, over 10,500 unique dialects, and 123 unique languages. LLMs, like GPT, currently support only up to 100 languages with a tokenizer vocabulary size of 254,000. However, the diverse linguistic landscape of India requires models with an even larger tokenizer vocabulary to handle the multitude of languages and dialects effectively.

"That means we are talking about tens of trillions of tokens of data across these languages if we want a real Indian LLM that can actually do the type of tasks that we expect it to do," said Sewak.

Alignment with Cultural Sensitivities: Need Insider's View

While addressing if present language models accurately represent Indian cultural nuances and traditions other than English Dr.Kalika Bali, principal researcher at Microsoft said that LLMs currently possess what can be described as an "outsider's view" of culture. While these models are not entirely ignorant of cultural contexts, their understanding can be superficial.

Indian culture and sensitivities are vastly different from those of the Western world, where most current models are trained. To create effective models for India, it is crucial to incorporate alignment techniques that make models more attuned to Indian cultural nuances.

"I do not think that we can ever have a bias-free system. We can only hope to mitigate the bias as far as possible," Bali further added.

Public-Private Partnerships Is Needed

While speaking on how various stakeholders can collaborate in the journey of making Bharat GPT, Professor Ganesh Ramakrishnan began by highlighting the role of public-private partnerships in the Bharat GPT initiative. The project is supported by the Department of Science and Technology under the NMICPS program with several IITs and IIMs.

He feels that India needs more skilled people so that the open-source culture can be facilitated. Also, the importance of algorithmic innovation, particularly in resource-constrained settings given the limited availability of data across Indian languages, innovative algorithms can play a crucial role in optimizing the use of available data. "A lot more can be done there, and that's where, through this academic-industrial collaboration," said Ramakrishnan.

Measuring the Impact of AI Solutions in India

While addressing the impact of AI solutions and their measurement parameters, Shalini Kapoor, chief technologist APJ, AWS said that it will be defined a lot by Indian citizens because they are going to use it and usage comes only when there is a need. One of the primary metrics is the business value derived from them. This includes immediate benefits as well as long-term sustainability.

Another metric is AI cost-effectiveness includes not only the initial investment but also ongoing operational costs. "People don't have that much time and energy, cost, effort to waste," said Kapoor. Also, a successful AI solution should integrate multiple components rather than relying solely on LLMs. Mitigating bias and ensuring the ethical use of AI are essential metrics.

All the speakers shared their views at the Global India AI Summit 2024.

Paromita Gupta

Entrepreneur Staff

Freelancer

Covering news and trends in AI and Metaverse segments. An avid book reader running her personal blog on the side. You may reach me at paromita@entrepreneurindia.com. 
Business Ideas

70 Small Business Ideas to Start in 2025

We put together a list of the best, most profitable small business ideas for entrepreneurs to pursue in 2025.

Business News

'Fully Replacing People': A Tech Investor Says These Two Professions Should Be the Most Wary of AI Taking Their Jobs

AI might replace jobs, but it also has the potential to help start new companies.

News and Trends

Kolkata-Based Lab-Grown Diamond Brand Jewelbox Secures USD 3.2 Mn

The startup will primarily use the funds to expand its retail footprint, growing from eight stores to 30 locations by the end of this year.

News and Trends

MS Dhoni-backed Garuda Aerospace Raises INR 100 Cr in Series B from Venture Catalysts

A significant portion of the new funding will also be allocated to expanding Garuda's IP portfolio, which currently includes over 20 patents, and to build a new design facility focused on next-gen drone systems.

Growing a Business

This Entrepreneur Used AI to Transform Their Business and Create Multiple Revenue Streams — Here's Exactly How They Did It

There are five new ways entrepreneurs can make money with AI — and it takes less time than you think.

News and Trends

India Enters Quantum Frontier with QpiAI's 25-Qubit Superconducting Quantum Computer

This powerful hybrid computing platform combines quantum processors with next-gen Quantum-HPC and AI-enhanced solutions, aiming to revolutionise sectors such as life sciences, materials discovery, logistics, and climate tech.