Couchbase: Rebalance
Note: In this article, I assume you have basic knowledge about Couchbase.
A Couchbase cluster could contain one or more nodes. Each node can have various services. When a node is added to or removed from a cluster for some reason, data and indexes redistribute. This process is called Rebalance. Rebalance is a major part of Failovers and Removals in Couchbase, so it is important to understand what is happening under the hood.
Rebalance on Data Service
vBuckets are distributed evenly on data service nodes. After adding or removing a Data Service node, rebalance will hold active vBuckets and create as many replicas as possible.
Rebalance process on Data Service proceeds sequentially, so if it closes for some reason, it can continue from the last carried vBucket.
On Rebalance process, Data Services remain to work, so that applications can continue. Therefore, you don’t have to concern about availability.
Data Service Rebalance Phases
There are two main phases in Data Service Rebalance: Phases for active vBuckets and replica vBuckets. Now let’s see step by step these phases.
Rebalance Phases for Replica vBuckets
This phase contains two subphases which are Backfill and Book-keeping.
In the Backfill phase, there are two subphases. In the first subphase, replica vBucket data moves from the source node to the destination node’s memory. The time required for the Backfill phase is called Backfill Time.
In the second subphase, data moves from the destination node’s memory to the destination node’s disk. The time required for this subphase is called Persistence Time.
In the Book-keeping phase, rebalance completes ancillary tasks such as updating the cluster map.
The time required for the entire process, Backfill Time and Book-keeping phase, is called Move Time.
Rebalance Phases for Active vBuckets
There are four main phases. The first two phases are the same as the replica vBucket phases. There is a little difference, which is the second phase, Book-keeping, has additional Persistence Time.
After completing the move operation, new vBuckets need to be assigned as a new active copy.
The time required for this phase is called Takeover Time.
The final phase is called Book-keeping, like the second phase. The phase’s purpose is to complete ancillary tasks. Master Services will update the cluster map, and client SDKs begin access to new vBuckets.
The time required for the entire process is called Move Time.
That is how the Rebalance works on Data Services. Rebalance occurs on the other services too. Let’s check some of them out briefly:
Index Service: Rebalance does not move indexes or replicas; instead, it rebuilds them in their new nodes.
Eventing Service: During Rebalance, Eventing Service stops. After the Rebalance process complete, the service continues to work. There are checkpoints, so there aren’t any lost mutations.
Analytics Service: Analytics Service uses shadowed data, a part of Data Service data. If a node is completely removed or replaced, all shadow data have to rebuild.
Query Service: When you add a Query Service, it automatically starts to receive queries. When you remove one, it becomes unavailable. If you have ongoing queries, they will be interrupted.
In this article, I tried to explain how Rebalance works on Couchbase. You can check official documents for more details. Thank you for reading.
May the force be with you!