Bengaluru-based startup ‘Sarvam AI’ has reported benchmark results that position it ahead of global competitors such as Google Gemini and OpenAI’s ChatGPT in targeted tasks.
Sarvam’s in-house tools, including a vision and text-to-speech system, achieved higher accuracy on document- and voice-related tasks commonly used to evaluate artificial intelligence performance. The results have drawn attention from users and industry commentators globally.
Also Read: Google Rolls Out Gemini AI-Powered “Auto Browse” in Chrome
Strong Benchmark Showing with Sarvam Vision and Bulbul V3
The document reading model of Sarvam AI, ‘Sarvam Vision’, has achieved high accuracy across major evaluation sets. It scored 84.3% on the olmOCR-Bench and 93.28% on OmniDocBench v1.5.
The results show that Sarvam AI has outpaced its competitors, including Google’s Gemini 3 Pro and other optical character recognition (OCR) systems.
The company says Vision excels at reading complex layouts, tables and formulas, areas where many models struggle.
Alongside this, Sarvam’s Bulbul V3 text-to-speech model now supports more than 35 natural-sounding voices. This spans across 11 Indian languages, with plans to expand to all 22 official Indian languages.
Also Read: OpenAI Launches ‘Frontier’ to Scale Workplace Automation and Governance
The tool is built to generate expressive, human-like speech, serving India-specific voice applications and telephony use cases.
Sarvam AI’s co-founder, Pratyush Kumar, has shared these results on social platforms, indicating growing confidence in domestic AI tools tailored to India’s linguistic diversity. Experts and users also noted that strong benchmark performance is important for demonstrating understanding and multilingual voice interfaces.

