The Artificial Intelligence Center of T-Technologies, the technology division of T-Bank (formerly Tinkoff Bank), has released one of the world's largest datasets for recommendation systems in e-commerce.
T-ECD (T-Tech E-commerce Cross-Domain Dataset) is collected based on anonymized actions of 44 million unique users of the "City" services: "Shopping" and "Supermarkets", as well as the advertising platform of "T-Bank", 30 million products and more than 135 billion interactions.
Distinctive features of T-ECD: cross-domain nature and versatility for solving different types of tasks. The benchmark consists of five interconnected and fully anonymized data sources: purchase history by transactions, receipts, reviews, interaction with recommendations for everyday goods (FMCG) and non-food products (non-FMCG): household appliances, clothing, electronics, cosmetics, as well as activation history of special offers and cashback.
All data sources can be used as independent datasets, or linked by user, product, or store brand keys, allowing for the construction of complete behavioral profiles and analysis of complex personalization scenarios.
The dataset is suitable for most types of recommendation tasks - next-item recommendation, next-basket recommendation, session-based recommendation, general top-N recommendations, and other types of tasks.
The data is collected with a depth of 1 year to 3.5 years, which allows analyzing both short-term and long-term user preferences, the dynamics of their changes, as well as seasonality and trends.
The T-ECD dataset is available onHugging Face under the Apache 2.0 license, which allows free commercial use and modification.
Now on home