Sber's groundbreaking release of new AI models promises to revolutionise the open-source landscape, featuring advanced capabilities for Russian-language tasks and speech recognition.
Image: Supplied
In a landmark move for the global open-source community, Sber has released the weights of two new flagship Mixture-of-Experts (MoE) models in its GigaChat series - Ultra Preview and Lightning- both trained from scratch for high-performance Russian-language tasks.
The release also includes the latest generation of open speech recognition models, GigaAM-v3, now capable of advanced punctuation and text normalisation.
Adding to the announcement, Sber has made the entire Kandinsky 5.0 visual generation family publicly available, covering cutting-edge image and video models designed specifically for Russian cultural context and native Cyrillic generation.
The company also introduced its next-generation K-VAE 1.0 visual encoders and decoders, considered among the best open-source models worldwide. All models are distributed under the MIT licence, opening the door to unrestricted commercial use.
Andrey Belevtsev, Senior Vice President, Head of Technology & AI at Sberbank, emphasised the significance of the release:
“We believe creating world-class artificial intelligence requires two things: massive resources and world class R&D teams. Sber has both. But what matters most is sharing—not locking down technology. Our strategy is to become an open foundation for innovation nationwide. That’s why we’re releasing model weights.
“This is a pivotal moment. Any company in Russia, whether a bank or startup, can install these models within their closed systems, fine-tune them on sensitive internal datasets, and retain complete control over their confidential information.”
“This approach reflects true technological sovereignty: AI belongs to the entire nation, driving business transformations and economic growth. I would also like to note that Ultra will be soon available for corporate clients, with optimized cost of ownership for internal corporate deployments.”
The GigaChat range has expanded with the release of GigaChat Ultra Preview and GigaChat Lightning, each optimised for different performance needs.
GigaChat Ultra Preview is now the largest and most capable model in the GigaChat line-up. While still in training, the model already surpasses several major international benchmarks — including outperforming DeepSeek V3.1 on Russian-language performance and ranking first on the MERA benchmark.
Despite its size, it runs faster than GigaChat 2 Max, the previous flagship. Because its weights are freely available, organisations can fine-tune it offline within secure environments where data privacy is critical.
GigaChat Lightning, on the other hand, is compact and exceptionally fast, engineered for local execution on laptops while supporting rapid product development.
It outperforms Qwen3-4B in Russian-language tasks, matches it in dialogue and document processing, and runs almost as fast as Qwen3-1.7B despite being six times larger.
Both models also integrate powerful development tools, notably:
Sber’s new GigaAM-v3 family introduces five advanced open-source models for Automatic Speech Recognition (ASR). Designed for large-scale commercial use, they support voice assistants, call centres, voice analytics, and multimodal applications.
The models have undergone a dramatic leap in training scale — from 50,000 hours to 700,000 hours of audio.
With added punctuation and text normalisation, GigaAM-v3 now competes directly with OpenAI Whisper while significantly surpassing it in recognition accuracy.
The model also serves as the foundation for a wide range of speech technologies inside Sber, including synthesis and audio/video processing for GigaChat.
The Kandinsky 5.0 family introduces a trio of visual generation models:
Video Pro currently leads the global open-source field, outperforming models such as Wan-2.2-A14B, and delivering quality comparable to proprietary frontier systems like Veo 3.
The models were trained on an enormous dataset comprising one billion images, 300 million videos, and over one million curated multimedia assets.
Advanced training methodologies were developed specifically for Kandinsky 5.0, with final refinement done using a professionally curated dataset to ensure top-tier visual quality.
The family unlocks major opportunities for consumer technologies and creative industries — including personalised video greetings, animated photo tools, marketing assets, and commercial digital content creation.
Read more about the report here.
Generative models operate in latent space — invisible compressed representations that enable faster and more efficient training. Sber’s newly released K-VAE 1.0 models, built from scratch for both images (2D) and videos (3D), set a new open-source standard for fidelity and reconstruction accuracy.
Their release is expected to substantially elevate the quality of future generative AI tools and help developers build more powerful visual systems.
Partnered Content
Related Topics: