Challenges for mobile cloud DBaaS systems

Before I dive into the technical details, let me introduce China Mobile Cloud's DBaaS system, which manages all of our cloud databases. China Mobile Cloud's DBaaS system has a comprehensive product line, covering transactional databases, analytical and search databases, NoSQL databases, etc. We not only provide database services for some of the most popular open-source and third-party databases, but also develop and provide services based on our own database engines. At present, we serve more than 35,000 customers, covering 9 major industries such as government, communications, finance, healthcare, and education.

We run more than 130,000 DB cluster instances in 15 Tier 1 and 31 Tier 2 regions. In addition to database configuration, we have built a robust ecosystem to help customers manage their databases more efficiently. We've built a number of efficient management tools and systems, such as data migration, database management consoles, and AIOps-enabled tools. China Mobile's cloud DBaaS platform runs in a cloud-native manner, which means that most of our database instances run within K8s clusters.

Managing database instances of this magnitude is a challenging task. While we have established a DBaaS system that can manage different types of database instances, we still face the challenge of maintaining this DBaaS system. At present, our DBaaS system is broadly divided into API layer and Operator layer, of which the Operator layer is the core part.

Our first challenge came from developing different operators for different database engines. However, there are big differences between these operators, for example, developers cannot quickly switch from the operator developed for engine A to the operator developed for engine B, resulting in the inflexible allocation of development resources.

In addition, developing an operator is demanding on the developer. Not only do they need to understand the principles of the database engine itself, but they also need to be familiar with the Operator framework as a whole. While there are some ready-made frameworks that can be reused, they still place high demands on developers, and it can be difficult to quickly put productive developers into the team.

What's more, we're developing our own database engine and hope to quickly build a DBaaS system for it. However, due to the challenges mentioned earlier, we were unable to quickly develop a DBaaS system for a new database engine. To do this, we need to assemble a team of highly skilled developers who understand both our database engine and the Operator framework. These developers then have to write a new Operator from scratch. Since this database engine is developed in-house, we can't find an off-the-shelf operator on the market. Rewriting means a lot of redundancy. Even though some of the logic of this engine is very similar to that of other database engines, we still can't reuse it effectively.

So we're always looking for solutions to these challenges, how do we make different Operators have similar interfaces? How can I reduce the requirements for DBaaS system developers? How can I quickly integrate a new database engine?

Why KubeBlocks

So we found the KubeBlocks project, which solved our problem very well. KubeBlocks is a general-purpose operator framework designed specifically for database workloads. Developers write addon for different database engines for integration into the KubeBlocks system. This project attracted us with several features:

First of all, it is a general-purpose Operator framework. This means that you only need to run a single Operator to support different types of database engines. Developers only need to maintain an Operator and a set of CRDs. This makes it easy to share knowledge at the Operator level, with the flexibility to be assigned to different database engine teams.

Second, the framework uses a low-code development model. The integration of different database engines is achieved by writing different addons, rather than writing a dedicated operator from scratch. The addon used is just a Helm chart containing the CR object in the KubeBlocks framework. When we developed addon, we only needed to write the YAML file of the CR object we needed and some functional scripts. We'll cover these in more detail later. The CR object in Addon is defined declaratively. Just like any other Kubernetes object, the developer only needs to describe the desired state of the DB cluster and the KubeBlocks framework will tune it. This low-code development model lowers the barrier to entry for developing DBaaS systems for new database engines. Developers only need to understand how the database engine works and start developing.

Finally, less code means fewer potential bugs and faster integration of new database engines. This met our needs very well.

What's more, KubeBlocks is a general-purpose framework designed specifically for database workloads. It effectively covers all the basic management of the database on Kubernetes. For example, KubeBlocks provides lifecycle management, backup and restoration, configuration management, and high availability. In addition, KubeBlocks provides an extensibility mechanism that allows specific database engine management to be seamlessly integrated into the overall framework.

Build a mobile cloud based on KubeBlocks, H-DB addon

After much research, we decided to try KubeBlocks. At the time, there was an in-house developed database engine that needed to be integrated into a DBaaS system, and we use H-DB to refer to the engine. This is a great opportunity to validate the integrated use of KubeBlocks in a DBaaS system.

First, let me give you a brief introduction to our H-DB. It is a self-developed cloud-native distributed database engine that separates storage and computing. Writing an operator for such a complex database system is often a huge challenge, and it is even more difficult to integrate it quickly. But with KubeBlocks, we can do that in a low-code way.

Here's how we built the KubeBlocks addon:

Design the cluster topology and build the addon framework. Typically, the initial addon contains only a rough ClusterDefinition framework and a very basic ClusterVersion, which specifies the image of all component containers. In our case, there are two components in an H-DB cluster: compute nodes and data nodes. So, we define a Cluster Definition object that contains both components. The image of each component is configured in ClusterVersion, and a virtual startup command is temporarily set in ClusterDefinition. Then, we write a simple Cluster CR object to test to make sure that all addons can be installed successfully and Pods can be successfully launched.
Refine the ClusterDefinition, set the correct configuration parameters in the ConfigMap, and write a script to boot the cluster. We tweak the configuration and scripts to make sure the cluster is up and running. This step is important because it means that the first available addon is complete.
Backup and restore functions are supported. We need to script the backup and restore functionality and integrate it into the ActionSet CR object of KubeBlocks. We can create a backup OpsRequest and a recovery OpsRequest to test these features.
编写 ConfigConstraint，控制哪些参数可以被修改，是否可以动态配置，以及重新加载（reload）命令。这些配置使 addon 能够修改数据库引擎中的部分配置参数。
启用高可用性和角色检测，在我们的数据库集群中添加一些观测边车（sidecar），这些边车将收集数据库实例的指标和日志。
最后，我们可以添加更多的 Cluster 版本，以适配不同的内核版本。

至此，我们成功为 H-DB 开发了一个完整的 KubeBlocks addon。通过使用 KubeBlocks，我们在仅两个月内、仅用一个人完成了 H-DB 的第一个 DBaaS 系统。而且，这个过程可以更快，因为后续的 addon 构建步骤可以并行推进。这是中国移动云和 KubeBlocks 的首个集成案例。

以下是开发 KubeBlocks addon 与开发专用 Operator 的总结对比。这里，我们将 KubeBlocks addon 的开发过程与为类似的数据库引擎编写专用 Operator 的开发过程进行比较。在开发人员资源投入方面，编写 KubeBlocks addon 只需要 2 人月（有效代码行数 2000+），而为类似产品编写专用的 Operator 则需要大约 6 人月（有效代码行数 7000+）。H-DB 的案例是一个很好的起点。它证实了我们可以通过使用 KubeBlocks 来解决目前在 DBaaS 系统中面临的问题。

关于未来

下一步，我们计划通过 KubeBlocks 进一步集成更多的数据库引擎，进行深入的评估，并尝试升级到 KubeBlocks 的新版本，评估我们感兴趣的一些功能。

在中国移动云，我们的理想目标是建立一个统一的云原生 DBaaS 平台。在这个平台上，我们旨在实现统一的多云架构、API 和 Operator 层的统一接口，支持不同架构的数据库集群，并且数据库实例可以根据需求部署在无服务器的 Kubernetes 集群上。这将形成一个统一的数据库编排和通用管理平台，支持公共云、私有云、专用云、边缘云等不同基础设施。

随着 KubeBlocks 的不断发展和改进，未来我们将考虑基于 KubeBlocks 框架重构我们现有的 DBaaS 系统。从长远来看，对于不同的数据库引擎，我们预计可以节省大约 50% 的开发资源。

演讲发表于 KubeCon China 2024

China Mobile Cloud

Practice of using KubeBlocks

50%

71%

Challenges for mobile cloud DBaaS systems

Why KubeBlocks

Build a mobile cloud based on KubeBlocks, H-DB addon

关于未来