Editorial

Breaking barriers: AI and the Assamese language

Sentinel Digital Desk

Dimpi Sarma

(Project Associate, Assamese Team, AI4Bharat, IIT, Madras. She can be reached at dimpi.sarma43@gmail.com.

Every other day, we come across certain phrases and statements as we read news articles or have an insight into the world of digital media: ‘Artificial Intelligence is taking the world by storm’, ‘will the machines replace human beings?’, ‘Artificial Intelligence to replace human intelligence within half a century.’ No matter how alarming it sounds, we still cannot deny the fact that artificial intelligence has also made life easier for everyone. People’s lives have become more convenient with just a click of a button. In a world where digital media and the internet have become a basic necessity for almost all of its inhabitants, it is essential to make the most use of technological innovations like Artificial Intelligence.

With more and more people getting acquainted with the world of technology and the internet, there is a need to make technology accessible to all. Apart from technology, AI also extraordinarily deals with communication, enabling not only smooth communication between humans and machines but also breaking barriers in human-human communication. This communication doesn’t necessarily focus on technical commands for a machine to perform but also encapsulates the natural form of communication between humans—the language that we speak or the language that connects us to the world.

The underlying intimate relationship between language and artificial intelligence can be understood if we observe the two features in it: how the syntax contributes data to machine learning, while others focus on enhancing the platforms we use every day, like the basic social media platforms like Facebook, Instagram, or virtual agents such as Siri, Google Assistant, and Alexa. For all of these to function ingeniously, AI needs to be able to understand and create human language and grasp the human emotions behind it. Artificial intelligence can learn a language using tools like natural language processing (NLP), deep learning, and machine learning.

The Assamese language is one of the oldest languages spoken in the Indian subcontinent, and according to the research publication Ethnologue, it is currently spoken by 15 million speakers. Assamese is also considered the ‘Lingua Franca’ of this region and is a scheduled language as per the Indian constitution. Despite the fact that Assamese has been widely spoken for centuries, the technological advancement of the language is still in its dormant stage. Examining the written or oral literature in Assamese is not sufficient; it is also critical to assess the language’s ability to adapt to changing technological standards. Referring to a personal experience while using Facebook, I came across an AI-generated translated version of a caption in a photo about a very famous political figure from Assam as ‘Grapefruit’. Recently, I happened to see a photo circulating on WhatsApp wherein a special kind of rice, ‘Economy Boiled Rice’, was mistranslated into some absurd term. We often click on the ‘translate’ buttons or enter data into voice recognition tools and then engage in mockery and fun after receiving the output. Not to forget the spelling errors or the inaccuracy in the pronunciation. Another personal experience regarding the mispronunciation in the voice-generating tools: My cousin lives in Pub Sarania, Guwahati, and whenever I visit her place, I enter her address in Google Maps. The AI voice in the app mispronounces the location as ‘Pab Sarani’ which is quite displeasing to hear.

Why is the technological advancement of the Assamese language the need of the hour? With rapid progress in technology and globalisation, people from all over the world have been constantly trying to keep pace, learn, and accumulate common languages to break the barriers to their communication. It is also necessary for the Assamese language to update itself with technology in the world of AI. In order for the Assamese language to be embraced by new technologies like artificial intelligence, it must be at the forefront of technology, and the language must have a huge database from which new technologies can acquire knowledge. Enough resources should be made available so that the language may be accessible to all with reference to modern technology. Despite their demands for the Assamese to be included in the Unicode, the people only received a proposal that a Bengali script, which was renamed ‘Bengali-Assamese’, be used for the same. In this aspect, more inclusivity is needed for the language to be a part of the world’s technological database. A challenge lies in the fact that many words and terms that have developed after the introduction of technology do not have any native counterparts. Terms like ‘computer’, ‘desktop’, ‘drone’,’smartphones’’, etc. do not have a specific native term for them. Therefore, those words need to be transliterated or correctly uttered while giving input to any machine.

Despite the above-mentioned challenges, we must not move away from the positive progress that the Assamese language is making with the help of institutions, language experts, machine developers, and AI programmers in the field of artificial intelligence. Online resources, YouTube videos, and Instagram reels are the smallest steps taken by everyone from their homes towards this development. Today, we have language-changing options in almost all apps or platforms and Assamese features in such lists of options. Premier institutions of the state, like Tezpur University and Gauhati University, and institutes from all over India, like the IITs and CIIL Mysore, have been trying their best to incorporate more tools like text corpus, synsets, MTs, WordNets, etc., and make online resources available for the Assamese language.

The ‘National Education Policy’ 2020 by the government of Assam aims at the inclusiveness of Assamese as a medium of learning, which means more online resources will be available in Assamese, which is a positive step towards the future roadmap of the Assamese language. We must also not forget the initiatives undertaken by the Central Government to integrate vernacular languages with language technology, like the Technology Development for Indian Languages to facilitate human-machine interaction without a language barrier. The National Language Translation Mission, also known as ‘Bhashini’ under the Ministry of Electronics and Information Technology, aims ‘to enable all Indians’ easy access to the Internet and digital services in their own languages and increase the content in Indian Languages’.

Under the Digital India-Bhashini mission, IIT Madras has developed a project named ‘AI4Bharat’ in order to empower Indian languages by creating more datasets in the fields of translation, transcription, and voice data recognition in 22 languages across India. Under this project, currently, a group of 7 annotators and language experts, around 9 transcriptionists and quality analysts, and numerous speech data coordinators have been a part of the Assamese team that have contributed largely towards all significant milestones achieved by AI4Bharat, like IndicCorp, BPCC, Shrutilipi, Kathbath, IndicBERT, IndicTrans, IndicXlit, IndicWav2Vec, IndicWhishper, TTS, translating content for the Supreme Court, and speech recognition enhances NPCI Payments, enabling effortless voice-driven transactions. The various datasets have been developed to train the machine model for sentiment analysis and machine translations.

Therefore, we see that despite the socio-political challenges in the state, the Assamese language has made significant progress with the collective efforts of various researchers, linguists, and the government. Especially in the field of Artificial Intelligence, everyone is trying their best to build new datasets and tools for the natural language processing of the Assamese language. In order for a language to be embraced by new technologies like artificial intelligence, it must be at the forefront of technology. Only then will these new technologies be able to learn the language and communicate with people who speak it, answer questions, translate, and create contents, both speech and written.