Инструменты MIND для миграции и защиты виртуальной инфраструктуры

Let's say, if you need to "move," for example, from the VMware platform to some other platform, this task is not often encountered in the activities of IT specialists. Therefore, we create such tools for this purpose.

To date, our solution consists of four products.

Mind Migrate is, strictly speaking, a one-time migration for virtual services. Mind Guard is a very recently released product, we are not officially presenting it yet, but we will talk about it today. This is to ensure the protection of these virtual servers.

Mind Universe - automatic creation and management of application infrastructure in any of the supported cloud platforms or virtualization systems.

Mind Swap is a software module for implementing various scenarios for migrating VMs and their disk resources at the hypervisor level without interrupting the operation of the virtual server.

Mind

What does Mind do? A task is set to "transport" a virtual server. A virtual server is a virtual machine and OS together with its data. The product has been on the market for about two years. At the same time, it has been tested in "production" by various customers, from small to large, and the scenarios in which such a need arises have been studied. In the Mind product, we incorporate support for all these scenarios. That is, our main task is to create a very simple product, understandable in terms of the graphical interface, in terms of documentation, in terms of the possibility of using it with minimal preparation, and to ensure the possibility of moving virtual servers from anywhere to anywhere. That is, in this case, on the screen with the demonstrated icons, we carry out the arrival, for example, of a virtual machine from the VMware environment to BASIS.

The move itself is a live migration. That is, what options are there to perform such an operation without using any third-party tools? You can, for example, take and turn off the virtual machine, export it, convert it to the required type or format in the target platform, and import it there. You can make a backup using backup systems, save it somewhere in the backup repository, and restore it to the new platform from there. These are, as I call them, "crutches," that is, tools that generally work but are not designed for such events. What, for example, can be done with our tool is, for example, to organize a queue.

That is, we first want to transport virtual machines of one type or one information system. After they have moved and been switched, we can start another stage, for example, manually checking something. Next, we fine-tune, for example, the ability to execute scripts, because data migration requires maximum consistency.

That is, we want nothing to be lost during migration. Since our tool works at the block level, we cannot know what is in the RAM of the server or database.

Therefore, often when moving, for example, it is required to execute some script, that is, to take and transfer the database to the consistent code mode, that is, to ask it through some built-in tools to track data from memory to disk, after that we do the final synchronization and can include the database already in a new place. These and many other tools are included in the functionality of the platform. Mind Migrate can handle more than a thousand virtual machines of completely different types, from different places to different places.

How this product is arranged. We work at the operating system level. We select a virtual machine in which our Modul Control management server is installed. The virtual machine is very demanding on resources, 4 VCPU, 16 GB of memory. Operating system Ubuntu or Astra. Next, inside the virtual machines that we plan to transport, we need to get access to them, via SMB, if it is a Windows machine, we also need administrator accounts.

With the help of these accounts, Modul Control installs agents on the operating system. At the same time, we do not interact with the underlying infrastructure in any way. That is, it does not matter to us what platform it is, it can be a clean server, it can be VMware, Hyper-V, in principle, anything. Next, I will show a list of compatibility, we checked it. But virtually 100 percent compatibility with any platform is ensured. Accordingly, on the target platform, for example, on Astra, virtual machines are created in the same configuration as the migrated machines, they need to be created manually. The disk configuration should be exactly the same as in the virtual machine. We only work with what the operating system sees inside the virtual machine. We save exactly the same disk structure and create it on the target platform. The operating system - "blank" is launched on these target virtual machines. Specifically, again, we support Ubuntu or Astra for migrating Linux machines and for moving Windows machines OS "blank". Accordingly, we also give access to the controller there. The controller installs its agent there. We install the operating system on the target machines so that the agent can work there. We create a migration task: we move from this virtual machine to this virtual machine and start the moving process.

That is, the requirements for the site, the source and the receiving site are only network connectivity between virtual machines and between the controller. No more conditions. This allows you to carry out migrations, probably in almost any scenario that you can think of.

What follows from the fact that we work at the operating system level? This is, of course, maximum compatibility. Initially, there was an option to design this software agentless, that is, to create a product that would interact with the virtualization platform, perform the necessary actions through the API. But this is good when one or two platforms are virtualized, as it was massively in Russia two or three years ago. But in the modern world, when the number of virtualization vendors in Russia is currently estimated in dozens, this is impossible. We decided that we would work at the operating system level.

The main criterion for the possibility of migration is the supported operating system. The slide shows a list of operating systems. These are Ubuntu, CentOS, Debian and others. There is an outdated error on the slide, we do not support Windows 2008, only starting with Windows 2012. The platforms listed here are all those with whom we are compatible, i.e. directly tested or had projects. That is, each brand that is indicated on this list has been checked, the moving scenario has passed and, in fact, was most likely even in production.

All domestic virtualization vendors are supported. We have a technological partnership with a number of Russian vendors. The import substitution scenario goes hand in hand with the functionality of our product.

Advantages

The product is completely Russian, developed by a Russian team of about 90 people. At this stage of development, it is almost entirely engineering.

Everything is developed almost from scratch. We do not use open source solutions, this is entirely our development, which consists in the fact that we create our own technology for creating snapshots in order to make a consistent copy of all data on the disk. We have a technology for transferring this data and a management technology.

The customer by default does not think much about how to move. This was mainly expressed in the fact that many import substitution projects in previous years were of a pilot nature or the creation of test zones. That is, it was planned to create a virtualization environment, deploy it, test non-critical services, and assess the stability of the product. For the most part, they are new, developing dynamically. And loads were moved there, mainly simple ones, those that can be turned off and turned on in a new place in a week. Now, when many companies have already decided on their path, either they remain on VMware or on Hyper-V, or they move to Russian solutions, the task of scaling arises. And companies are approaching the question of how to transport those information systems that have a strict requirement for minimal downtime. That is, you need to select a window for switching and assess how much data can be lost in this case.

And here companies are increasingly faced with the fact that manual transfer methods through backups, through conversions, and so on, are not suitable. And in fact, probably the most popular scenario for using our software is just the migration of virtual machines of complex services.

A typical request looks something like this, there is an information system and its downtime is possible for no more than an hour. It is almost impossible to do this manually. It is for this that they resort to the help of our tools. Migration takes place online. This means that the virtual machine that we want to move continues to work all the time with synchronization. That is, we choose some database. We install agents on these machines. At the same time, we need full access. Synchronization is launched, which begins to transport to the target platform. At the same time, the impact on the performance of the source machines is almost absent (2-4% overhead). Given that during transportation we can compress and encrypt data. The only thing that matters are two points. First, the very success of such a migration depends on the fact that we transport data over the network faster than it has time to accumulate. And what is also important is that while this data is moving, we record all its changes using Change Book Tracking technology. We must store these changes somewhere on the disk. And, as a rule, for highly loaded systems inside the "virtual machine" of the source, it is required to connect an additional disk where we will store these changes during the move. This is the only thing, probably, that will require intervention in the operation of the virtual machine.

Online migration takes place, the first synchronization takes place, and then the system starts transferring changes, and informs us that it is ready to move. We choose the time of the move. Let's say on Thursday evening at the appointed hour for switching. On this day, a council gathers in the evening: partners, project manager, IT specialists, the owner of the information system.

The switching process begins. At the same time, we stop the services of the virtual machine, they dump their data to disk. We carry out the last small synchronization. A new virtual machine is turned on in the platform, network settings can be changed in it if necessary, this is all configured, additional disks can be added to it if necessary, all the necessary drivers will already be installed on the new platform.

Licensing

There is only one normal license, this is Migrate Enterprise, which includes all the necessary functions that talk about traffic encryption, channel limitation, the ability to use the CBT log, a separate disk, and so on. Licensing is done by virtual machines. That is, if you want to transfer 400 virtual machines, you need to buy 400 licenses. There is a separate Migrate Enterprise Desk license for transferring desktop operating systems.

If you purchase our software through the OEM channel, we have a number of special agreements, new agreements will appear now. I think that within a year, most virtualization platform manufacturers will be able to supply our solution also inside themselves, there will be licensing that will correspond to the licensing of the virtual platform itself. That is, if by hosts, then to hosts, if by virtual machines, then by virtual machines.

There are different purchase options. The license also includes a warranty support mode. They include consultations on the operability of the system.

If necessary, you can use extended technical support. If the customer decides that he needs to migrate not manually, but use some tool for this, then this is to us.

Moving to a new virtualization system is a complex comprehensive project in which you need to solve many different tasks. If there is an opportunity to take a small part of your budget and change it to a ready-made solution, that is, to relieve yourself of a headache, they usually agree not just to buy software, but also a service. We have professional services that we provide by default, but as part of the partnership program, there are partners who are also ready to provide them.

The services are simple, this is the implementation of a migration project, which is now being done by a group of 100 machines. If the migration passes more than 100 machines, you need to take a multiple of 100, respectively. But if it is less than 100, then you need to take a pack of 100 machines.

The customer, as a rule, who has decided to buy software, he actually wants to go to the end, that is, so that everything is migrated to him on a turnkey basis. And they do not often buy extended support, but usually take a professional service right away so that we can share our own competence.

MIND Guard

We have a technology for creating snapshots, there is a technology for creating a copy, that is, we send these snapshots to a new platform. And if you stop this process and switch the machine, it will be a migration. One day we thought, don't stop this process, but just keep a copy on a remote site all the time? A fault tolerance tool appears. And we have not officially launched it yet, but we have released the first version of the MIND Guard product, which does three things.

First, it performs asynchronous data replication from one virtual machine to another. Next, it accordingly monitors and manages this process. And in the event that some kind of breakdown occurs on the source machine, for example, the data center is unavailable, the servers burned down, etc., we switch. That is, we automate the launch of an emergency copy. This is the so-called disaster recovery.

How does it work? The first is asynchronous data copying from the primary server to the backup site. That is, first primary copying, and then applying changes. We theoretically, under ideal conditions, that is, when there is a wide communication channel, fast disks, fast servers, can provide up to 5 seconds RPO. Data loss in 5 seconds is considered acceptable by many customers.

The level of consistency is a block storage device inside the guest OS, that is, the data that is on the disk. Because the data that is in the memory of the virtual machine will be lost.

The second is management, there is a controller that constantly monitors the state of the site. The backup controller, respectively, is located on the backup site. Monitoring of parameters takes place through the controller.

If an accident occurs, then the controller switches to the switching capability mode. Switching is not automatic. Accordingly, the operator who decides that the main site has really died, he can perform a manual switch. He manually launches an emergency plan, which involves turning on the machine on the site, assigning IP addresses, executing scripts, or through API interfaces.

We have all the functionality of all products that are available through the graphical interface, they are available through the REST API, this allows this product to fit into some more complex systems. For example, in the service management panel of a cloud provider that provides disaster recovery services for its customers.

Failover

This product has just been released. It is in the early stages, so we can now guarantee operation only with a number of restrictions.

Tasks performed by Guard. Let me remind you, our architecture is platform-based. We install agents, we do not require support for the underlying platform. We are also not interested in the data storage network.

The first is the ability to organize replication between different storage manufacturers. The ability to do this between different types of platforms. An interesting scenario that is now being worked out in the first production projects is customers who are afraid to transfer their systems to new platforms and want to wait for some more stable state, new Russian products. They are offered to transfer the information system and configure replication in the opposite direction to the old reliable site. It turns out that in this way some specific plans that are set before IT managers are fulfilled. And on the other hand, there is insurance. In case something goes wrong, say, some patch did not pass, the administrator made a mistake, etc. You can switch back to Hyper-V and continue working, and at this time fix your cluster, many do this even inside one data center.

Again, some applications, especially critical ones, where data loss cannot be at all, they, as a rule, have clustering tools. For such, our tools are not very much needed. And for information systems of applications that do not have built-in tools, as a rule, the loss of a few seconds or minutes of data is not very critical for them, we can provide a disaster recovery system for any such application.

Limitations

First. At the moment, only the Linux operating system is supported. Accordingly, Windows is next in line. This will be implemented in the next releases. Second. We can work out the Failover situation when the main site "lies down". We switch to the backup, but now we do not have automation of the process of returning to the main site or to a new one. We are not saying that this cannot be done. You can again apply Guard, create a pair manually and send the machine in a new direction. But there is no automation of the process now, but it is planned. And again, an important thing for this class of solutions is planned for implementation - this is testing. That is, as with backup tools, testing is important, because we check the migration almost immediately. However, now testing can only be done manually.

Here on the slide is the same list of compatibility, but one graph with Windows has disappeared here. Support is expected, but it is announced for the second quarter of 2024. This is due to the fact that we are creating our own technology for creating snapshots under Windows.