VK представила датасет для развития рекомендательных систем

Generated by Midjourney neural network

The dataset is available on Hugging Face and includes 40 billion anonymized unique interactions of 10 million users with 20 million short videos over six months (January-June 2025), including "aggregated likes, dislikes, shares, watch time, and playback context."

All data is presented in the form of numerical identifiers, which ensures confidentiality. An embedding (numerical description of the content) is provided for each video, and socio-demographic characteristics are provided for each user. VK explained:

Short videos are a unique format for recommendation algorithms. Unlike music, podcasts, or long videos, they cannot be 'consumed' in the background, and each video shown receives some reaction from the user. Even if the user does not leave a like, skipping or watching the video is already considered feedback.