Convergence India
header banner
Sarvam Translate Unlocks Paragraph-Level AI-Powered Translation in 22 Indian Languages, From Bengali to Sanskrit
The Sarvam Translate model can translate across formats, including textbooks, HTML webpages, digital images, or even LaTeX documents.

By Kumar Harshit

on June 9, 2025

 In a bid to introduce the world to a cutting-edge translation model, Sarvam AI has come up with Sarvam Translate—an AI-powered translation model that supports paragraph-based translation for the 22 Scheduled Languages of India and also translates diverse structured content for 15 languages. It supports formats ranging from textbooks with equations, a webpage with HTML content, or a digital image with some blurry text. 

The model supports 22 languages, including Hindi, Bengali, Marathi, Telugu, Tamil, Gujarati, Urdu, Kannada, Odia, Malayalam, Punjabi, Assamese, Maithili, Santali, Kashmiri, Nepali, Sindhi, Dogri, Konkani, Manipuri (Meitei), Bodo, and Sanskrit. The model’s USP lies in enabling the natural translation of stylized long-format text and also supporting structured text in different formats. 

How does it translate webpages? 

Translating web content is a key application of language models, but the process of extracting, translating, and reinserting text often proves tedious and error-prone. Sarvam Translate simplifies this workflow by translating only the visible text while preserving all underlying HTML tags and structure. This enables the model to retain the elements, such as emphasis and formatting, seamlessly in the translated output.

To read about Google's latest e-commerce-based innovation, like virtual try-on and AI-powered price tracking, click here!  

How does it translate chemical equations?

Chemistry documents frequently combine specialized notations and chemical equations, making accurate translation a challenge. Sarvam Translate ensures precise translation of the surrounding text while preserving the integrity of chemical equations and formatting. Notations such as x and y are also retained in Roman characters to maintain scientific accuracy.

What are the challenges associated with this? 

The model supports 22 languages for various tasks, but performance can vary. This depends on factors like training data and how well each language is represented. Document translation works well overall, but quality may be lower for languages like Bodo, Dogri, Kashmiri, Manipuri, Santali, Sanskrit, and Sindhi, with occasional incomplete results.

In addition, the company has infrequently observed that some outputs may include transliterations or code-mixed segments, particularly in low-resource or highly inflected languages.

To read about the latest collaboration between OpenAI and the government of India to launch an academy, click here!

With this development, India’s AI journey has reached a remarkable milestone. What stands out most is the model’s ability to handle real-world, mixed-format documents, ranging from Markdown and HTML to scientific notation and code. It goes beyond basic translation, preserving structure, respecting context, and capturing nuances in style and gender, resulting in translations that feel natural and authentic.