Nikita Letov, architect of fintech systems, Java Tech Lead, IEEE Senior Member, and Hackathon Raptors Fellow, shared his approaches to creating reliable authorization and discussed the technological principles that shape the architecture of modern banking and payment services. On April 5, users across Russia experienced disruptions in the operation of the Faster Payments System (FPS), as well as in the applications of Alfa-Bank and T-Bank.
For large banking applications, uninterrupted operation is critical, as any downtime and failures are directly related to the risks of financial losses. It is during such moments that the most vulnerable nodes become apparent, one of which is authorization — the entry point for users into the application, says Nikita Letov, software engineer and expert in building fault-tolerant systems. As a technical leader and head of Java development at Rosbank, he coordinated the work of several teams, was responsible for the architecture of mobile banking and key elements of the system — from authorization to service availability. He managed to significantly increase the stability of the application and reduce user complaints about authorization difficulties tenfold. Nikita is a member of international associations of development engineers IEEE and Hackathon Raptors, which place high demands on their candidates — their level must meet global standards. In an interview, the expert shared which architectural solutions allow avoiding most typical authorization problems and withstanding load without failures, how development in fintech differs from other industries, and which technical areas will be at the peak of demand in the next few years.
Users expect banking applications to be fast and seamless: a 10-second delay upon entry already causes irritation. You designed the authorization architecture in an application with millions of users, and you know the inner workings behind the laconic screen: high loads, security, integrations. What tasks are considered the most challenging in their development today?
The requirements for fintech solutions have indeed become as high as possible, and IT teams face a number of serious challenges. One of the most important is ensuring high service availability. Banking applications must operate 24/7 and withstand thousands, and sometimes millions, of simultaneous user sessions. At the same time, security remains a priority. Financial data is always of interest to attackers, and any vulnerability can result in serious financial and reputational consequences. Another important feature is that fintech products do not live in isolation. They are almost always connected to many external systems: from government agencies to insurance companies and partner platforms. These integrations need to be not just configured, but built in such a way as to ensure reliable data transitivity and stability during updates. A separate challenge is scalability. The user base is constantly growing: every day, and sometimes every minute, new customers come or old ones return, and solutions that worked perfectly with the load yesterday may not withstand this load tomorrow. Therefore, the architecture must initially be designed taking into account growth: flexible, expandable, based on a deep knowledge of system design. And, of course, we must not forget about compliance with regulatory requirements. This is perhaps one of the most difficult and least favorite tasks. Banks, brokers, and other fintech organizations operate under the close supervision of state regulators, so you have to take into account many legislative norms. Very often, this greatly narrows the choice of technologies that can be used to solve a particular problem.
You mentioned ensuring scalability and stability of the architecture as one of the key tasks. In your case, we are talking about an application used by almost 3.5 million customers. What changes in the design of authorization under high load? What non-standard situations do you have to face, and how to build protection against them already at the architecture stage?
When a system has a million-user base, authorization ceases to be just a login and password check — it turns into a full-fledged high-load service, which is subject to the same requirements as other critical components of the system: it must be fast, reliable, and secure. The first thing that changes is the scale and storage of sessions. Classic approaches, such as storing user sessions in memory, are no longer suitable: they do not scale horizontally. Therefore, authorization must be stateless, and all requests must be authenticated by tokens, for example, by JWT. At the same time, authorization is divided into two modes: full — with password and 2FA verification, and fast — when the user enters using a previously issued token. In both cases, it is critical to ensure secure token exchange so that an attacker cannot intercept it. Another important point is the unevenness of the load. For example, on Monday morning or on the last day of the month, when the salary comes, the load on entering the mobile application can be abnormally high compared to normal traffic. Such peaks must be able to be processed without allowing a cascading failure of the service. Solutions like Circuit Breaker help here to isolate faulty components, or dynamic upscaling, which allows you to quickly increase the number of copies of services depending on the load. And, of course, you can't do without fallback mechanisms. Failures are possible even with a perfectly designed system — and it is important that they are handled transparently for the user: either through backup logic, or, if it is completely critical, with the offer of alternative steps so that the user can still complete the task.
During your time at Rosbank, users regularly encountered problems when logging into the application. You managed to solve the problem — complaints on this issue have almost disappeared. What problems with the entry point to the application did you identify, how did they manifest themselves, and what did you do?
It was one of those cases when, it would seem, everything works, but in fact, customers encountered errors every day. For me, it was just the case when I needed to "dig deep" and not accept the architecture as something given. The problems manifested themselves on the user side: someone could not log in after updating the application, someone's authorization took up to 15 seconds, and some customers did not experience anything at all — just a white screen. Such requests were received by technical support constantly, and they accounted for up to a third of all tickets. The first thing I did was trace client requests. We implemented distributed tracing and improved the logging system at all key stages of authorization — from the gateway to the responsible backend services. This allowed us to accurately determine where exactly the "chain breaks". As a result, it turned out that many problems arose during the authorization process. One of them was related to inconsistent routing — the gateway directed clients to non-corresponding versions of services. Another — with the leakage of security context during authorization. In addition, there were problems with the network availability of the cache, which temporarily stored meta-information about tokens. Most of the problems were solved by rewriting the gateway code and authorization service filters. And the failure with access to the cache turned out to be caused by a banal reason — the lack of network rules in the orchestrator of backend services. This case showed me how important it is to develop critical parts of the system not just as a separate feature, but as a full-fledged engineering platform, where everything is thought out — from UX to SLA.
During your work at the bank, its application became noticeably more stable — availability increased to 99.97%. What technology did you use, how did you manage to achieve such high stability?
After I encountered inconsistent routing and began to build transparency of requests, I decided to implement Spring Cloud Gateway. But it is important to understand that stability and high availability of the application are not achieved through one technology, there is no so-called "silver bullet" that solves all problems at once. Spring Cloud Gateway did not give an instant or "magical" increase in availability, but played a role in achieving this indicator. The result was achieved through the coordinated and systematic work of all development teams. From the very beginning, working as a technical leader, I built processes in such a way as to minimize the likelihood of errors and hidden defects. And in case of their occurrence — to ensure the fastest possible response. It was then that we decided to grow our own SRE engineers within the team — and successfully implemented this direction. As a result, a combination of factors: the spread of event-driven architecture throughout the backend part of the remote banking platform, strong technical leadership, mature development and testing processes, as well as a competent choice of technologies — allowed us to reach very high availability indicators for a high-load banking application.
Nikita, you actively communicate with the professional community: you speak at specialized conferences, you are a member of the IEEE and Hackathon Raptors associations, which accept only highly qualified specialists who have already proven themselves. Are there any current trends in fintech development that you would call overrated, and what, on the contrary, is developing quietly, but will soon "shoot"?
Yes, that's an interesting question. In my opinion, the direction of implementing LLM models in applications without a clear understanding of goals and limitations is currently very overrated. And this is true not only for the fintech industry, but for the entire IT industry as a whole. Most often it looks like this: "Let's add AI, because it's fashionable," — but at the same time, a real business problem is not solved. Yes, the technology is powerful, but it must be applied consciously, taking into account regulatory requirements, interpretability, and security. As a result of such ill-considered use of AI, we most often see beautiful demo versions and MVPs, but very little real benefit in production. But what, in my opinion, is underestimated and will definitely "shoot" in the near future, and in some places has already shot — is the spread of event-driven architectures and asynchronous event processing. Especially in banks, where everything used to be built on synchronous REST requests. Now, when the load is growing, and millions of operations are running in parallel, the transition to reactive approaches and the use of message brokers such as Apache Kafka or Pulsar is becoming not just a "plus", but in fact — a necessity. The trend is already noticeable: in many projects, event sourcing allows you to roll back to any recovery point, provide flexible auditing, and collect analytics close to real time without suffocating loads on databases. In addition, with this approach, there is no binding to a specific database — you can at any time configure the consumption of events from the broker to any suitable database, while changing the data structure. Another "quiet" trend, not so much related to development as to infrastructure, is the growing popularity of serverless approaches within corporations. Previously, this seemed to be the prerogative of startups, but now even large fintech companies are starting to actively use FaaS and lightweight containers to solve problems: from AML scenarios to report generation. All this speeds up time-to-market and provides flexibility — especially in conditions of limited budgets and small teams. So noise is not always an indicator of maturity, and vice versa — the most useful things often come quietly, but for a long time.
Now on home
Герой России Гарнаев: никто из профессионалов о возобновлении производства на КАЗ всерьёз не говорит
Система отслеживает спутники на высотах до 50 000 км и ведёт за ними наблюдение
The armored vehicle is equipped with a KamAZ-740.35-400 diesel engine with a power of 400 hp.
Constant improvements in avionics, weapons and tactical capabilities will make the aircraft a flexible response to future challenges
The exterior of the KamAZ-54901 features fairings on the cab and chassis for fuel economy
Fighters are in demand both domestically and abroad
Tyazhpromexport and Venezuela Agree on Plant Revival
The company not only completed the state order, but also quickly mastered the production of AK-12K for special forces
Experts have developed a photogrammetric complex with a resolution of less than 1 cm