While democratising AI has been the goal of many research groups and startups in India, the government-led effort of creating GenAI for and by Bharat has taken a step closer with the formal launch of BharatGen, a desi version of tools like ChatGPT and Gemini.
Spearheaded by IIT Bombay under the National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS) of the Department of Science and Technology (DST), the initiative aims to create generative AI systems that can generate high-quality text and multimodal (speech and computer vision) content in various Indian languages. The implementation of the project is by the TIH Foundation for IOT and IOE at IIT Bombay with academic partners from other academic Institutes that include IIT Bombay, IIIT Hyderabad, IIT Mandi, IIT Kanpur, IIT Hyderabad, IIM Indore, and IIT Madras.
Vision Language Model For India
Unlike Large Language Models that process vast amounts of textual data to understand and generate human language, Vision Language Models are multi-modal in nature, using both text and images to tackle many tasks.
Prof. Ravi Kiran Sarvadevabhatla of IIITH who leads the computer vision efforts of the BharatGen initiative explains how one of the first vision language model use cases was for the Indian e-commerce sector. “Typically, we are buyers in the online space and it is just a matter of selecting a product, adding it to the cart and clicking ‘Buy’. But it’s a completely different experience as a seller on the same platform.” In addition to the initial registration process, to list products on a platform, sellers need to upload multiple images of the product they intend to sell along with its details and features. “There’s a form that needs to be filled; it involves a lot of writing and can be daunting for non-English speakers,” says Prof. Ravi Kiran. In order to automate and simplify this process, the group created a model that eliminates the need for tedious, manual entry. From the product image that is uploaded by the seller, the model processes the image, analyses it and automatically categorises and generates appropriate descriptions. “What typically takes around 6-8 minutes manually now gets done in 30 seconds,” he remarks.
Enhancing Accessibility
What’s perhaps more interesting is the ability of the model to translate the generated content and vocalise it in a language of the user’s choice. “Our technology might be generating the product description automatically but it is important to communicate this content to the sellers in an Indic language of their choice so that they know exactly how their product is being described”, explains Prof. Ravi Kiran. It is this accessibility in various languages that is the aim of BharatGen.
The e-vikrAI use case was selected as an exhibit at the prestigious Indian Mobile Congress (IMC) 2024 event. The technology attracted a lot of attention and interest from the visitors which included prominent Government officials and tech entrepreneurs.