A method for rapid adaptation of chatbots to new scenarios has been developed in Russia

The work was included in the EACL 2026 program - one of the largest international conferences in the field of text processing (NLP), which will be held in Rabat, Morocco, from March 24 to 29, 2026.

Russian scientists from MWS AI, ITMO University, and IITU have developed a method that improves the accuracy of tracking the state of dialogue in chatbots and voice assistants. The new approach allows the system to better understand user requests at each stage of communication, which improves the quality of interaction.

The method is based on GRPO reinforcement learning, which does not require large computing resources and data arrays. Experiments have shown that a model with 8 billion parameters, trained using GRPO, outperformed GPT-4 and a model four times larger in terms of dialogue tracking accuracy. This opens up new opportunities for adapting systems to new scenarios without significant expenditure of time and resources.

MWS AI research engineer Timur Ionov emphasized that GRPO reduces the barrier to entry when adapting the system to new scenarios and will be useful in customer support, voice assistants, and booking systems. The entire training and inference process fits on one GPU. The code is available in open access.