VK представила датасет для развития рекомендательных систем

VK's AI researchers have released the VK-LSVD (Large Short-Video Dataset) dataset to the public. According to the press service, engineers and scientists will be able to develop and improve recommendation algorithms with its help, in order to make services and products more personalized.

Generated by Midjourney neural network
Generated by Midjourney neural network

The dataset is available on Hugging Face and includes 40 billion anonymized unique interactions of 10 million users with 20 million short videos over six months (January-June 2025), including "aggregated likes, dislikes, shares, watch time, and playback context."

All data is presented in the form of numerical identifiers, which ensures confidentiality. An embedding (numerical description of the content) is provided for each video, and socio-demographic characteristics are provided for each user. VK explained:

Short videos are a unique format for recommendation algorithms. Unlike music, podcasts, or long videos, they cannot be 'consumed' in the background, and each video shown receives some reaction from the user. Even if the user does not leave a like, skipping or watching the video is already considered feedback.

Sources
VK

Now on home