Understanding the scope of regional languages through artificial intelligence
You're reading Entrepreneur India, an international franchise of Entrepreneur Media.
India is home to more than 19,500 languages or dialects and 96.71 per cent of the population in the country have one of the 22 scheduled languages as their mother tongue, according to the Census 2011. Yet the language of technology, even in a country as diverse as ours, largely remains one, English. How can then we make use of the sophisticated artificial intelligence (AI) technology, that has taken over the world in just a few years, to bridge the gap between India’s billion Internet users and technology? To answer this, let us understand the scope of non-English population, the largest underserved market through the lens of AI.
Decoding regional languages
Getting any bot to understand the natural language of a user is always difficult and remains a big hurdle for all the players in the segment. Scarcity of relevant data in Indic languages is another challenge. There are many cases wherein the utterance parsing (extracting the precise meaning of an utterance) or next action is ambiguous.
AI not only resolves that ambiguity but also chooses the path in the dialogue that is most probable, given the context. Meanwhile, ‘Transformer Networks’ make for efficient machine translation as they can give high BLEU (algorithm for evaluating the quality of text which has been machine-translated from one natural language to another) scores even on scarcely available parallel languages corpora.
Market leaders are now building NLP focused on Indian languages, that is accompanied by collection and indexing of data in Indian languages along with its English counterparts. For example, adding Hindi to English parallel text for machine translation and transliteration purposes.
The algorithms behind the whole ‘understanding and then responding accordingly’ process in Indic languages are quite complicated and definitely makes use of the most sophisticated of research in AI, natural language processing and machine learning.
How it works
Language has been the biggest roadblock in the adoption of online commerce among India’s non-English population. Most of them still rely on offline methods for managing their household expenses and continue to stand in long queues for their bill payments or depend on offline agents for things as simple as booking tickets for religious trips.
Regional language-first AI solutions can provide multilingual support as well as hand-holding to unlock assistive voice-enabled commerce for first time Internet users emerging from tier II and III cities and become a gateway for technology to become accessible to all, despite language, socio-economic background, age or gender.
This is where AI plays an instrumental role. The mechanism is based on a framework called ‘concept entity grammar’ wherein the core is language-agnostic and is more of an abstraction over syntax rules of a given language. Over and above this, a combination of statistical and deep learning models are implied for intent and entity extractions.
This is why when the user interacts using a mix of different dialects or languages with multiple requirements in a single sentence, the NLP engine backed by conversational data processes the instructions to provide personalized solutions. Here, code-mixing becomes an important aspect to account for, as typed and spoken tend to be incomplete and grammatically incorrect.
Learning bilingual embeddings with a focus on cross-lingual transfer, too, helps in building NLP models for low-resource languages and can even be leveraged for code mixing tasks.
Scope of AI in regional markets
The scope of building for Bharat’s regional language-first audience is massive. In 2016, close to 60% of the 409 million Internet users in India were Indic language users. This number is growing exponentially. While only a handful of AI startups are currently working towards empowering Indians, to use technology in their first language, a KPMG report estimates that out of the next 326 million internet users in India, 93% are expected to be local language-first users.
Apart from this, redesigning a product to cater to the largest underserved market is equally important where ‘voice-enabled’ interfaces are seeing swift adoption. As natural language processing and understanding technologies evolve and voice-based applications become the norm, language AI and ML technologies will also evolve to speech-to-text, NLU engines. Moreover, conversations make for the most natural way of interacting with any entity and getting work done.
With or without technology, basic customer instinct is to expect businesses to understand their preferences. Data cleaning and building analytics and reporting tools on top of the data is the first step in truly understanding a customer’s needs. As the next step, businesses will have to use AI platforms to use the insights developed from all the data and serve their customers better. Having a vast amount of data about one’s customers and the way they interact is a clear advantage and necessary to become a true one-stop solution provider.
With such a massive scope to serve the largest growing Internet population, AI can bridge the gap between language and technology while making India, truly digital.