The government has named eight organisations to build large language models under the IndiaAI Mission. The announcement came from IT Minister Ashwini Vaishnaw at the AI Impact Summit in New Delhi on September 18.mThe chosen entities include IIT Bombay’s BharatGen consortium, Tech Mahindra, Fractal Analytics, Avataar AI, Zeinteiq Aitech Innovations, Genloop Intelligence, NeuroDX, and Shodh AI.
A Trillion-Parameter Goal
IIT Bombay’s BharatGen has been tasked with training a trillion-parameter large language model. Backed by over Rs 900 crore in funding, this will be among the largest AI models under development globally.
Parameters help models identify patterns in data. Higher counts allow deeper language understanding. BharatGen’s project will not be used directly by end users. Instead, it will act as a foundation for lighter, specialised models.
Distilled Models for Indian Needs
BharatGen leaders say the trillion-parameter system will serve as a base to produce leaner, faster models for specific fields. These could include agriculture assistants in regional languages, legal advisory systems, and sector-specific financial tools.
“Once trained, distilled versions will support lawyers, farmers, and businesses with targeted AI applications,” said BharatGen’s Rishi Bal.
Building Datasets with Indian Context
To reduce dependence on foreign data, BharatGen is building a sovereign dataset. Strategies for this purpose include:
• Acquiring licensing archives from publishers
• Offering OCR services to digitise regional language texts
• Gathering annotations from sources that capture cultural and language details
This is to particularly enforce the model reflects India’s diverse multi-lingual facets and contexts.
Hardware Bottlenecks and GPU Access
To combat the struggles of vast computing power needed to train this trillion-parameter model, BharatGen will rely on GPUs made available through the IndiaAI Mission. Nearly 40,000 GPUs have been allocated for national AI projects.
Bal acknowledged that supply delays remain a challenge, but said the team is working to optimise large-scale training runs.
Focus on Real-World Use
BharatGen’s leaders stress that success is not about chasing scale for its own sake. Ganesh Ramakrishnan of IIT Bombay said the goal is reliability and practical use.
Distilled models will be released to developers, enabling startups and enterprises to build applications without retraining massive systems. “This is national infrastructure,” noted Ramakrishnan, adding that the ecosystem will generate value by building on top of BharatGen’s foundation.
Distributed Approach for India
BharatGen will follow a hub-and-spoke model, with curated teams spread across academic institutes like IIT Madras, IIIT Hyderabad, IIT Kanpur, IIT Hyderabad, IIT Mandi, and IIM Indore. This structure brings together diverse engineers, scientists, and domain experts.
Its distributed approach mirrors India itself: large, multilingual, and full of local contexts.
Setting India Apart from Global AI Leaders
Global firms like OpenAI and Google have also attempted trillion-parameter systems. BharatGen’s leaders argue India’s model is different. Trained on Indian data and languages, it is expected to perform in ways global models cannot.
“This is about relevance,” said Ramakrishnan. “An Indian-trained system will behave in tune with Indian culture and needs.”
The Road Ahead
The immediate targets for BharatGen and the other firms include refining datasets, building hardware infrastructure, training foundation models, and creating distilled versions for practical deployment.
Minister Vaishnaw added that an AI policy framework is in the works to guide responsible development and deployment in India.

