Today we will talk about the following:
- Problems of classic large data storage systems,
- How to organize data storage in companies with increased internal security requirements;
- How to easily migrate to a domestic solution;
- Advantages of Cloud Storage object storage from VK.
In the event program:
- Customer need;
- Presentation of the Cloud Storage product from VK;
- Product partnership program;
- Implemented cases.
Speakers: Khanetsky Dmitry, Head of Cloud Storage Sales Group, and Sorokin Georgy, Cloud Storage Architect.
The portrait of our client is the enterprise segment, i.e. large businesses, absolutely any industry, all those companies that have the task of storing unstructured content, which means different data formats: files, videos, scans, audio, etc. The second point is the SLA requirements. There must be high availability, high SLA, usually this is the government segment or the banking sector, where there are requirements for critical infrastructure. Our solution is designed for highly critical SLAs for availability and storage security.
Regarding the storage security perimeter – storage is provided in more than one data center (in two or three), VK has four of them. We pay increased attention to the requirements of regulators, this is what concerns directive No. 12-36, the transition to import substitution and certain requirements for the software to belong to the "Russian" class.
What needs do our clients have? This is long-term storage of files of various types, the minimum storage volume is 50 TB. This is useful data, not raw storage capacity, below this threshold our solutions become less interesting from the point of view of pricing and functionally they are redundant.
Another point is constant intensive access to read and write objects. If these three criteria are met, then you are on the right track in terms of our portfolio.
What do clients usually store? First of all, backups, archives, both hot and cold, with various types of data, this is document management, but there are cases with Big Data, machine learning and multimedia content.
What points should be clarified with the customer when you start a conversation about selling VK products? First of all, it is the willingness to use the standard S3 protocol. This is an international standard that was created by Amazon in 2006. We strictly adhere to this standard, we move within the framework of the standard rules of the S3 protocol. There are many cases when our customers are aware of and use S3 as a standard protocol direction for storage, but there are cases when they do not use it. Therefore, it is worth asking about this and taking into account their wishes at an early stage of project creation.
The second point is the willingness to deploy the solution in the internal circuit, and not in the cloud. In this case, we are talking about an “on-premise” solution. This installation is in the customer's circuit, it is completely Internet-independent from the point of view of external access. We do not store any keys outside in the case of such an installation, we have no restrictions in terms of access. That is, this is a fully isolated solution that can be extended to one, two or more data centers at the customer's site.
The third point is the possibility of placing hardware capacities of the data center, because we often encounter a situation where the customer does not have either sufficient capacity in the data center, or any internal capacities and resources to place equipment on their site. If the answers to all these three questions are positive, then we move on.
I want to draw your attention to the simplicity of implementing our case. We have a standard form in the form of a questionnaire. It includes about 15 questions. The basic profile that we compile based on these answers gives a picture of the load profile, the volume for calculating the cost specification of the project.
The first point is the choice of delivery format. In our case, there are several options available on the market. This is just software, we transfer it to you for sale to the customer. Then you can independently supplement this solution with equipment, hardware configuration, hardware platform (we also attach to the configuration).
The second point is the sale of a hardware and software complex (HSC). We, as a vendor, also offer HSC, which we assemble ourselves on the base, test, pre-configure and transfer to the customer as a ready-made hardware product. It includes both support, hardware and software. This is a full-fledged ready-made vendor solution, which we are used to seeing on the market.
There is also a third implementation option. It is precisely this exceptional case for configurations less than 50 TB. This is a sale within the public cloud VK.
The fourth step that we go through, before the sale, before closing the project, we get the very configuration that needs to be obtained, either in the HSC version or in the software version. We sell it to you with the necessary prices and documentation for sale and transfer to the customer. Ultimately, we simply, if it is HSC, bring it directly to the data center, configure it on our own, and if you want to participate in this process yourself, we can transfer the reins of control to you, but, as a rule, we try to do the initial implementations ourselves in order to avoid the risks of poor-quality assembly of the platform. Or, we can audit your work if you gain sufficient competence, are willing to learn in this direction, we also provide such services and are ready to develop our partners. Under certain conditions, we can transfer the implementation to you. We already have several partners who have the necessary competencies and certificates for implementing our solutions and have been successfully doing this on the market for more than once.
These are the main four steps that we define for a successful sale. I want to emphasize once again the fact that we try to simplify the sales process and solution configuration as much as possible. This is a basic approach. We do not want to introduce any complications and additions here, we try to make it easier for our partners to work for more productive work with customers. So that it takes not weeks, but a day or two.
There is a small segment that we would like to highlight in terms of classic SDS. For more than a year we have been in a situation where problems have appeared on the market related to the difficulties of supplying foreign classics and the transition to something Russian, but in a block format. There are vendors who produce these SDS, but there are customers who are switching from Western solutions to local alternative technologies.
As a rule, this entire structure is ideally supported by vendors, and any manipulations are primarily wasted time and plus costs. From the point of view of stretching the storage to more than one data center, there are also certain nuances that relate to the implementation of the architectural stretching of the storage, and, as you understand, these restrictions do not go away. Also, for block SDS, you need to create an additional access circuit for structured content. All these challenges are all those basic things that customers face before switching to alternative S3 technology.
Now I will tell you about how we deal with these challenges, what advantages we have for our clients over our colleagues in terms of approach and technology. The first thing I want to note is unlimited scaling. This is an x86 architecture, ordinary servers, and we horizontally do not have a bar, at least, we have not yet reached it in the architectural solution. We have installations of more than 350 PB of data. No one in Russia has such installations, except for VK in our data centers. Thus we claim that we have no scaling restrictions. All this happens within the framework of a single large installation, and we achieve scaling by delivering certain servers or disk shelves simply to the system, without stopping the process. That is, the customer, or you, if you will be servicing it, can bring additional equipment to the customer's data center and simply connect it to the already ready-made architecture in on-line mode. All this is done quickly and, most importantly, affordably. x86 servers are available on the market. And I also want to note that we are not tied to the manufacturers of these servers. If you supply software, and we sell it to the customer with you, then the servers have only certain configuration requirements that we must take into account. From the point of view of the manufacturer of this hardware vendor, its availability and other factors, there are no restrictions .
The second point is reducing TCO. We can deliver the solution as software, which almost no one does on the market, mainly - these are HSCs, i.e. this is an attachment to hardware, own equipment and vendor. Or it can be HSCs from us, if the customer wants everything from one window, full support with hot replacement. In the case of HSC delivery, the customer receives a fully integrated solution. Thus we reduce TCO relative to block stories due to the fact that architecturally we are cheaper. The software itself and components with x86 architecture are lower in cost than block ones. So we are free to do whatever we want, and you also have your hands untied in terms of configuring solutions. That is, as one of the business options, we will support if you, as partners, will offer to configure certain HSCs for customers on your side.
The flexibility of storage configuration is another point that we have. We can, depending on the task that the customer solves, for example, there are large blocks of data that need to be written, but at the same time the emphasis is not on storage there, but the emphasis is on the input-output of hot data. This is one configuration option. The second option is cold data storage, when, for example, the customer has many different kinds of objects, small, medium and large, which are stored on a long-term basis according to certain criteria. In this case, we can reconfigure our solution and add nodes of a certain performance format for storage, etc. Our solution is configured as flexibly as possible in terms of disk components, disk volume , processor power, etc. We are not limited in terms of hardware configuration parameters. This is also very important for the customer.
We provide a multi-data center configuration, and these are not isolated configurations, this is one large stretched configuration across many data centers. That is, this is a system that fully closes the solution with a cluster, with multi-data center storage. And we have examples of several sites. We have 4 sites in VK where we store our data. And this is all one big installation.
And the last point is also important. This is our own development from the very zero cycle.
A little history. In 2006, the report on the S3 protocol and technology began. It was invented and released by Amazon for its data centers and cloud storage. VK joined this race in 2013, still as part of Mail. And we developed the first object storage for our own cloud. We stored content for the mail agent. Our mail, our cloud Mail.ru used our VK Cloud solution completely. In 2016, support for the S3 protocol was implemented in full in the VK object storage. In 2017, we started the first commercial sales of object storage already in the format of a separate product as part of platforms. In 2019, we added the product to the Register of Russian Software as part of Mail.ru Cloud Solutions. And in 2023, we launched this product into a separate combat unit. This is a completely independent data storage solution. We also gave it a new registry name and a new registry entry. Now this is a completely independent project with a long, almost 10-year history of development. During this time, we have fixed a lot of errors, like all developers who go through this path. And we can safely say that we are as consistent and worked out as possible from the point of view of functionality. Our customers say this. We are included in the Register, there is a registry entry, you can find it on the website of the relevant State Authority. This is all in the public domain, and we are ready to share this information with you if necessary.
Our solution is powerful and productive. There are no analogues in terms of data storage volume in a single installation on the market. Today we store more than 350 PB of data in our 4 data centers. And we experience the operation of this solution on ourselves every day. We had cases when data centers were turned off, there were fires in data centers in Russia. One of the data centers was rented by us at that very moment. And our solution seamlessly switched and continued to work even in the event of an emergency. We store more than 30 billion objects in hot access. And 90 billion objects are in cold general access. This does not mean at all that you should sell exactly the same large installations. This only means that our solution is as well-developed as possible and capable of ideal operating conditions from the point of view of both the risks of data loss and the scalability of the system.
Now let's talk about delivery options. The first option is software. In this case, we sell perpetual licenses, non-transferable, i.e. they are not limited in duration, this is not a subscription, this is a full-fledged license that is purchased once and transferred to the customer's balance sheet and is fully owned by him without time limit. We are licensed for useful terabyte data. That is, roughly speaking, the customer needs to store 100 TB of data, we offer a software license for 100 TB, regardless of the platform configuration with what replication factor is present, etc. We have support like most vendors. It is sold for certain periods: one, two, three and five years. The maximum term is five years. We can sell support to you as a certificate or as a service. Depending on the customer's budget and on the basis of which item of expenditure he is ready to buy it.
We sell HSC in different variations. There are situations when the customer does not care what type of equipment is installed within this HSC. The equipment can be both registered and non-registered. By default, we do not offer registered hardware in order to reduce the cost of the configuration and provide a more competitive price on the market for our HSC. But, if the customer says that they have certain regulatory requirements, or an internal order to comply with the requirements of the regulator regarding the availability of registered components of the hardware complex, then we can also supply registered components to our HSC. That is, any options are possible. We cooperate with all major vendors on the Russian market, and we can supply a wide variety of configurations based on our HSC. At the same time, if the HSC is supplied from us, regardless of what components are in its composition, we provide centralized support for you and for our customers both for hardware and software. And as in the case of software, so in the case of HSC, we provide services for commissioning and implementation of software, commissioning of HSC and support. We can also offer training services. We can train both you and the customer's employees to work with our system.
The third implementation option is a public cloud for cases when we understand that the customer really needs it, but he is not ready to put the complex in his data center, there are no requirements for his own isolated circuit and the storage volume is less than 50 TB. In this case, we can offer Public Cloud. And you can also resell it on your papers for the customer and make money on it.
Then Georgy Sorokin, Cloud Storage architect, took the floor. He began to talk about how Cloud Storage is arranged. It is conditionally divided into three functional levels, each of which is responsible for its function. This is the Front Server level, the Metadata level and the Storage server level. In large solutions, we separate these levels, including physically. Each of these levels consists of completely separate servers.
Front Servers are servers that process all incoming requests, process the load, calculate hashes and distribute incoming information to Storage servers. These are the servers on which the objects themselves are stored.
The Metaserver level is the level at which meta-information about all objects is stored. It is written and scaled separately. This level runs on the in-memory Tarantool platform. It is thanks to it that we have 350 PB of data that we can scale. Because even at these large volumes, our response time does not drop at all.
Regarding scaling. We can scale differently. If, for example, there was initially a certain volume, for example, 5000 IOPS. There was a load on the system. And suddenly the customer needs to increase this load. There are more clients or the system has become more loaded. We can simply increase the number of Front servers that are responsible for processing incoming requests. And if we increase the amount of data, then we usually just add Storage servers. Usually along with Storage servers that are responsible for saving objects, if they increase, it is quite logical that we also have an increase in the number of objects, for which our Metaservers are responsible. We add Metaservers accordingly. Each of these layers can be scaled separately, which is very important. There is no need, as in the case if these functions were on one server, to add all layers at once. We choose the parameter that we need to activate, and add servers in accordance with this need.
Now about several data centers. Here our solution works in distributed mode. This means that we do not make one data center active and the second passive and do not configure replication between them. Our solution works distributed between several data centers. Two, three, four or more, it doesn't matter. And work is carried out immediately with all servers, with all objects that are located in these data centers.
Regarding the replication level. We have Storage servers. They are usually added in pairs. This is due to the fact that by default the replication factor is 2. We have each object stored in two copies, and each of these two copies is absolutely always stored on a separate server and on a separate disk. This is all built into the architecture of the system and objects cannot be located in any other way. Situations when we have both copies of one object will be located either on one server or on one disk are excluded. If we are talking about a multi-data center configuration, then they will always automatically be located in different data centers.
About Meta-information. Thanks to Tarantool, we can configure the replication factor, i.e. the number of copies that we store meta-information. By default, we use 3 copies. For several (two, four) data centers, we increase this factor to 4. So that you can symmetrically arrange the data centers. Thanks to Tarantool , we have sharding, when we break one large database into several and place them on separate machines, which is why we have the ability to increase a huge amount of information and a huge number of objects without performance degradation. Since all data about them is stored on separate machines in separate memory and this is handled by a separate instance of Tarantool.
Regarding replications and delays, the standard delay between data centers is recommended to be no more than 7 milliseconds. But this does not mean that we do not support a longer delay. We can safely work at 20 and 30 milliseconds. But you need to understand that we have a distributed system, the recording is synchronous, and in order for an object to be considered recorded, we must receive confirmation from each data center.
We have our own monitoring system, there are dashboards. Everything works on Grafana. We show all the basic information about the operation of the system, this is the load, and RPS, and the operation of individual services, bandwidth , and this is only a small part of the dashboards. There are actually 10 times more of them. And you can delve into the smallest details, see what and how works, prevent errors and process them in advance, without bringing them to critical events.
In terms of functionality, we maintain the standard S5 protocol developed by Amazon. And the main functions that we support, in principle, are all functions. Some of them are still under development, those that are less critical. On the screen you can see what we support and a small clock for the next quarters, what we already have even in testing.