Sber has released tokenizers - core components for building image and video generation models

Sber's Kandinsky team has made KVAE-2.0 — a family of tokenizers for diffusion models for image and video generation — publicly available.

As Sber explained, tokenizers convert images and video into compact numerical codes, which reduces computational costs and improves the final quality of models. KVAE-2.0 outperforms counterparts from Tencent and Alibaba on key quality metrics and is distributed under the open MIT license.

Denis Dimitrov, head of the Kandinsky project, noted that the solution makes video generation accessible to startups, universities, and independent developers. It allows models to be created from scratch faster and more cheaply, without relying on foreign tokenizers, and delivers better quality results.

The key advantage of KVAE-2.0 is the creation of semantically robust representations that accurately reflect the meaning of an image. This is important for applied scenarios such as generating advertising materials and educational content. The models have also been additionally trained to work with Russian text within the frame, which increases their functionality in these scenarios.

Sources:
Sber

Now on home