Playing it by hear: How does one preserve an accent?
It takes a lot of footwork, and audio footage. See how a new archive is mapping spoken tongues, and aiming to hear from all of India by the time it’s done.
The different kinds of speech in the world are like the colours of the rainbow, says Prasanta Ghosh, 44. “No one stripe should ever be allowed to fade away.”
How does one keep the fade from happening, though? A language is hard enough to preserve; how exactly does one preserve an accent?
Ghosh, an associate professor of electrical engineering at the Indian Institute of Science (IISc), Bengaluru, has found an interesting way, using machine learning and artificial intelligence programs.
His Project Vaani (Hindi for Voice) — built with assistance from the IISc deep-tech incubation program ARTPARK (AI and Robotics Technology Park), and with funding from Google — is an open-source archive of the many ways in which Indians speak at this point in our history. (So much has been lost, but at least we can save traces of what remains, he says).
More than 14,000 hours of recordings from across 80 districts in 12 states are currently available on vaani.iisc.ac.in. The site can be searched by region; the data can be downloaded and filtered by language. The aim is to eventually have at least 150,000 hours of recordings from a million people across all 806 districts of India.
Phase One — the collection of 200 hours of recording from each of the 80 districts — will soon be complete. It has taken over two years.
How is it all done? The recordings, Ghosh says, are gathered by ground agents who tend to be locals in each district, selected in collaboration with private data-collection agencies. So far, these agents have recorded voice notes from 83,000 speakers of 59 languages.
“The ground agents are instructed to collect voices from as many pin codes in each district as possible, to ensure greater linguistic, urban-rural and educational diversity.”
Their interaction with each subject follows a simple format. An image of a common object, such as a chair, a market or a building, or of something specific to that district, such as a monument, is shown to the subject, who is then asked to name and describe it in a language of their choice. Their description is recorded as a voice note. Each person names and describes 60 objects, on average.
The fact that the surveyors are usually locals matters, Ghosh says, because this helps the more niche dialects emerge.
“A person’s speech often changes according to who their audience is. If you are from Kolkata and go to North Bengal, the locals there will speak to you in a ‘standard’ Bangla, but with fellow locals, they’ll switch to Rajbanshi. No one asks them to do that, it happens instinctively,” Ghosh says. Scores of subjects ended up speaking in Rajbanshi, he adds, because the interviewer was from North Bengal.
In this way, the project has already documented dialects and languages that they had not expected to encounter.
These include variants of Malvani in Maharashtra, Badayuni in Uttar Pradesh, Kudmali and Khortha in Jharkhand, Bajjika and Angika in Bihar, Shekhawati and Jaipuri in Rajasthan, and Halbi in states ranging from Madhya Pradesh and Andhra Pradesh to Odisha.
Tongue ties
It was in 2004, while studying at IISc, for a Masters in engineering with a focus on speech-signal processing, that Ghosh first began to think about using machine-learning to enhance archiving in the field of oral traditions, spoken language and linguistics.
The aim of his project extends far beyond archiving.
There is considerable potential for a project like Vaani in the med-tech and software-services world, he says, because it could help refine training modules for AI and machine learning programs in areas such as automatic speech recognition and speech-to-speech translation. This could change the game for technology-driven services and platforms in India, which famously struggle to decode voice commands.
Project Vaani could also help the speech-impaired be better understood.
Since 2016, for instance, Ghosh has been working with the National Institute of Mental Health and Neurosciences (NIMHANS), Bengaluru, on ways to use technology to assist people with dysarthric speech, a condition that is caused by abnormalities in the muscle of the mouth.
“There are different types of assistance we can lend,” Ghosh says. “Voice-based services could make their speech more intelligible. Are they asking for water? To go to the toilet? Do they need food? I would like to make it easier for others to know what they want.”