Опубликован огромный кросс-доменный датасет T-ECD для рекомендательных систем

Generated by the Midjourney neural network

T-ECD (T-Tech E-commerce Cross-Domain Dataset) is collected based on anonymized actions of 44 million unique users of the "City" services: "Shopping" and "Supermarkets", as well as the advertising platform of "T-Bank", 30 million products and more than 135 billion interactions.

Distinctive features of T-ECD: cross-domain nature and versatility for solving different types of tasks. The benchmark consists of five interconnected and fully anonymized data sources: purchase history by transactions, receipts, reviews, interaction with recommendations for everyday goods (FMCG) and non-food products (non-FMCG): household appliances, clothing, electronics, cosmetics, as well as the history of activations of special offers and cashback.

All data sources can be used as independent datasets, as well as linked by keys of users, products or store brands, which allows you to build complete behavioral profiles and analyze complex scenarios for personalization.

The dataset is suitable for most types of recommendation tasks – recommendations for one next item (next-item), next basket (next-basket), next session (session-based), general top-N recommendations and other types of tasks.

The data is collected with a depth of 1 year to 3.5 years, which allows you to analyze both short-term and long-term user preferences, the dynamics of their changes, as well as seasonality and trends.

The T-ECD dataset is available on Hugging Face under the Apache 2.0 license, which allows free commercial use and modification.