Cloud Spark provides distributed batch and stream processing of unstructured and semi-structured data from various sources, such as S3, ClickHouse, Kafka, and others. Thanks to optimization and caching in memory, the service performs analytical queries on data of virtually any size.
With Cloud Spark, you can quickly and with minimal infrastructure costs solve Data Science and analytics tasks, including conducting exploratory data analysis (EDA), as well as training machine learning models on company data. Analysts and data scientists get quick access to the necessary data from various sources via SQL queries, and ML developers can use the capabilities of the built-in MLlib library for machine learning. All users can also manage the service from any convenient environment, including from a local computer and from JupyterHub, thanks to the built-in client library.
Cloud Spark is deployed on the basis of the managed Kubernetes service from VK Cloud, which allows you to automatically scale and reduce computing resources depending on the current load. Thus, the company can flexibly manage and optimize the cost of the service, saving up to 60% of infrastructure costs. VK Cloud platform supports the operability and administration of Cloud Spark, which eliminates the need to spend time and resources of own specialists on routine tasks.
"The Cloud Spark cloud service makes enterprise-level technologies accessible to companies of any size. Businesses get a scalable tool for working with big data without having to independently launch, configure, and administer Spark or Kubernetes. The VK Cloud platform provides flexible resource scaling, security, and compliance with the requirements of 152-FZ, while the company's specialists can focus on analytical and research tasks, working with machine learning, that is, on extracting benefits from data for the business," notes Alexander Volynsky, technical product manager, VK Cloud.