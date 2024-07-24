With this initiative, Meta joins the likes of Tech Mahindra, Krutrim, Bhasini, and Sarvam AI for creating large language models (LLMs) tailored for Indian languages

On Tuesday, Meta announced new AI features for WhatsApp, including making its Meta AI available in several languages, including Hindi.



"Meta AI can help you with answers, ideas and inspiration. It's now available in 22 countries, with the newest additions rolling out now including Argentina, Chile, Colombia, Ecuador, Mexico, Peru and Cameroon in several new languages including French, German, Hindi, Italian, Portuguese and Spanish with more to come," read the official blog.

Meta also rolled out its Imagine Edit feature in English and Imagine Yourself feature in the US market.

Jumping on the native boat

With 27 officially recognized languages, 1600+ dialects spoken, and 19200 unofficial dialects, India's market cannot be satiated by the use of just English.

With this initiative, Meta joins the likes of Tech Mahindra, Krutrim, Bhasini, and Sarvam AI for creating large language models (LLMs) tailored for Indian languages.

Tech Mahindra with its 'The Indus Project' aims to construct a culturally rooted language model and to excel in prevailing benchmarks. Meanwhile, Ola's Krutrim aims to bridge the gap between conventional AI and the specific needs based on Indian languages and culture.

In late 2023, Sarvam AI launched the OpenHathi Series. With the initiative in partnership with AI4Bharat, it aims to contribute to the ecosystem by providing open models and datasets to foster innovation in AI for Indian languages.

But why is there a need to cater to Indian languages for LLM? The answer lies in cultural bias. English-based LLMs can show bias in various ways when dealing with Indian languages or contexts.

"Suppose a user inputs a sentence in English, asking for a summary of a traditional Indian festival like Diwali. An English-based LLM, trained primarily on English data, may generate a response that likens Diwali to a Western holiday, such as Christmas, to make it more relatable to an English-speaking audience. The response could be something like: "Diwali, often referred to as the 'Indian Christmas,' is a festival of lights celebrated in India. This comparison, while potentially helpful for some audiences, could also be seen as culturally biased, as it inadvertently undermines the unique cultural significance of Diwali. Diwali has its own rich history, customs, and meanings that are not directly comparable to Christmas. By equating the two, the LLM might inadvertently propagate the notion that non-Western cultures and traditions can only be understood or appreciated through a Western lens," shared Shobhit Mathur, co-founder and vice-chancellor, Rishihood University.

Currently, the majority of online content and services are available in English, making it inaccessible to a large portion of the Indian population. Such models will enable and simplify communication across the country and preserve our native languages and dialects.

However, datasets remain a crucial concern. Another significant one is creating Indian language datasets and training models in a way that replicates the nuances of Indian languages. For this, Bhasini, under MeitY, has created Bhasha Daan, a crowdsourcing platform that is working to build an open repository of data from Indian languages. With Bolo India, Suno India, Likho India, and Dekho India, Bhasini is encouraging users to contribute sentences in different languages, validate text or audio transcribed by others, and enrich their language by typing the audio they hear.