Get All Access for $5/mo

India's Road Map Towards Building Foundation LLMs in India Building LLMs presents several challenges--the need for extensive computing power, large datasets, and specialized expertise

By Paromita Gupta

You're reading Entrepreneur India, an international franchise of Entrepreneur Media.

Freepik

India has recorded significant growth in Artificial Intelligence in the past two years after the Large Language Model (LLM) ChatGPT became the talk of the town and took the world towards the AI race. However, India is a very unique country with different cultures, with thousands of dialects, and several unique languages spoken across the country. This beauty of the country sometimes leads to challenges in implementing new innovative policies and the process consumes a lot of resources and time.

Leveraging Open Source for Building Scalable Foundation Models in India

While sharing insights on how India can leverage best practices of open source to build foundational models and LLMs, Pratyush Kumar, Co-founder, of Sarvam AI said that this technology is very new and would take time to better and become more useful. Building LLMs presents several challenges--the need for extensive computing power, large datasets, and specialized expertise. Open source in democratizing AI technology is essential and presents unique challenges and opportunities. India has done well, but still, it has a long journey ahead for these technologies to mature and become more universally useful.

"And that's where open source as a movement in computing has been very strong. And kudos to the government of India. I think the Bhashni project has been a big success in demonstrating how to do open-source Indian language AI at scale," said Kumar.

The Complexity of Indian Languages

While speaking on what innovative approaches and technologies India can use to address the challenges of building the foundation model MohitSewak, AI researcher and developer relations, South Asia, NVIDIA said that India's linguistic diversity is immense, with 23 official languages, over 10,500 unique dialects, and 123 unique languages. LLMs, like GPT, currently support only up to 100 languages with a tokenizer vocabulary size of 254,000. However, the diverse linguistic landscape of India requires models with an even larger tokenizer vocabulary to handle the multitude of languages and dialects effectively.

"That means we are talking about tens of trillions of tokens of data across these languages if we want a real Indian LLM that can actually do the type of tasks that we expect it to do," said Sewak.

Alignment with Cultural Sensitivities: Need Insider's View

While addressing if present language models accurately represent Indian cultural nuances and traditions other than English Dr.Kalika Bali, principal researcher at Microsoft said that LLMs currently possess what can be described as an "outsider's view" of culture. While these models are not entirely ignorant of cultural contexts, their understanding can be superficial.

Indian culture and sensitivities are vastly different from those of the Western world, where most current models are trained. To create effective models for India, it is crucial to incorporate alignment techniques that make models more attuned to Indian cultural nuances.

"I do not think that we can ever have a bias-free system. We can only hope to mitigate the bias as far as possible," Bali further added.

Public-Private Partnerships Is Needed

While speaking on how various stakeholders can collaborate in the journey of making Bharat GPT, Professor Ganesh Ramakrishnan began by highlighting the role of public-private partnerships in the Bharat GPT initiative. The project is supported by the Department of Science and Technology under the NMICPS program with several IITs and IIMs.

He feels that India needs more skilled people so that the open-source culture can be facilitated. Also, the importance of algorithmic innovation, particularly in resource-constrained settings given the limited availability of data across Indian languages, innovative algorithms can play a crucial role in optimizing the use of available data. "A lot more can be done there, and that's where, through this academic-industrial collaboration," said Ramakrishnan.

Measuring the Impact of AI Solutions in India

While addressing the impact of AI solutions and their measurement parameters, Shalini Kapoor, chief technologist APJ, AWS said that it will be defined a lot by Indian citizens because they are going to use it and usage comes only when there is a need. One of the primary metrics is the business value derived from them. This includes immediate benefits as well as long-term sustainability.

Another metric is AI cost-effectiveness includes not only the initial investment but also ongoing operational costs. "People don't have that much time and energy, cost, effort to waste," said Kapoor. Also, a successful AI solution should integrate multiple components rather than relying solely on LLMs. Mitigating bias and ensuring the ethical use of AI are essential metrics.

All the speakers shared their views at the Global India AI Summit 2024.

Paromita Gupta

Entrepreneur Staff

Features Writer with Entrepreneur India

Covering news and trends in AI and Metaverse segments. An avid book reader running her personal blog on the side. You may reach me at paromita@entrepreneurindia.com. 
News and Trends

Multiples Private Equity Leads INR 1000 Cr Funding in Shubham Housing

With the raised funds, the Gurugram-based platform aims to propel its growth trajectory, diversifying its product offerings and strengthening its national footprint.

News and Trends

Recur Club Announces Credit Offerings for Startups Beyond Series A and SMEs

In FY 24–25, the platform also plans to deploy an additional INR 2000 crores through its Recur Swift program for startups.

News and Trends

Orios Venture Partners Leads USD 1.45 Mn Investment in Climate Tech Startup Sustainiam

The fresh funds will be used to launch a digital platform for trading environmental assets, scale its workforce, and expand operations globally.

News and Trends

MPL Acquires Stake in CloudFeather Games to Enhance Skill-Gaming Ecosystem

MPL integrates CloudFeather's advanced gaming infrastructure and liquidity solutions to enhance platform stability and elevate the gaming experience for its 120 million users across India, the US, and Nigeria.

News and Trends

Three Things Yuval Noah Harari Warns About Artificial Intelligence

Warning against complacency, Harari urged stakeholders to act decisively to steer AI toward empowerment and collaboration rather than control and exploitation

Leadership

How to Master the Art of Delegation — Lessons From Andrew Carnegie's Legacy

Here's what Andrew Carnegie can teach today's entrepreneurs about leadership, teamwork and effective delegation.