Restarting a cluster

In a multi-master replication cluster, a reference point must be established before other nodes can join and form the cluster. This is called cluster bootstrapping and involves starting a single node as the primary component. Restarting a single node or reconnecting after a shutdown can be done normally.

In case of a full cluster shutdown, the server that was stopped last should be started first with the --new-cluster command line option or by running manticore_new_cluster through systemd. To ensure that the server is capable of being the reference point, the grastate.dat file located at the cluster path should be updated with a value of 1 for the safe_to_bootstrap option. Both conditions, --new-cluster and safe_to_bootstrap=1, must be met. If any other node is started without these options set, an error will occur. The --new-cluster-force command line option can be used to override this protection and start the cluster from another server forcibly. Alternatively, you can run manticore_new_cluster --force to use systemd.

In the event of a hard crash or an unclean shutdown of all servers in the cluster, the most advanced node with the largest seqno in the grastate.dat file located at the cluster path must be identified and started with the --new-cluster-force command line key.

Cluster recovery

In the event that the Manticore search daemon stops with no remaining nodes in the cluster to serve requests, recovery is necessary. Due to the multi-master nature of the Galera library used for replication, Manticore replication cluster is a single logical entity that maintains the consistency of its nodes and data, and the status of the entire cluster. This allows for safe writes on multiple nodes simultaneously and ensures the integrity of the cluster.

However, this also poses challenges. Let's examine several scenarios, using a cluster of nodes A, B, and C, to see what needs to be done when some or all nodes become unavailable.

Case 1

When node A is stopped, the other nodes receive a "normal shutdown" message. The cluster size is reduced, and a quorum re-calculation takes place.

Upon starting node A, it joins the cluster and will not serve any write transactions until it is fully synchronized with the cluster. If the writeset cache on donor nodes B or C (which can be controlled with the Galera cluster's gcache.size) still contains all of the transactions missed at node A, node A will receive a fast incremental state transfer (IST), that is, a transfer of only missed transactions. If not, a snapshot state transfer (SST) will occur, which involves the transfer of table files.

Case 2

In the scenario where nodes A and B are stopped, the cluster size is reduced to one, with node C forming the primary component to handle write transactions.

Nodes A and B can then be started as usual and will join the cluster after start-up. Node C acts as the donor, providing the state transfer to nodes A and B.

Case 3

All nodes are stopped as usual and the cluster is off.

The problem now is how to initialize the cluster. It's important that on a clean shutdown of searchd the nodes write the number of last executed transaction into the cluster directory grastate.dat file along with flag safe_to_bootstrap. The node which was stopped last will have option safe_to_bootstrap: 1 and the most advanced seqno number.

It is important that this node starts first to form the cluster. To bootstrap a cluster the server should be started on this node with flag --new-cluster. On Linux you can also run manticore_new_cluster which will start Manticore in --new-cluster mode via systemd.

If another node starts first and bootstraps the cluster, then the most advanced node joins that cluster, performs full SST and receives a table file where some transactions are missed in comparison with the table files it got before. That is why it is important to start first the node which was shut down last, it should have flag safe_to_bootstrap: 1 in grastate.dat.

Case 4

In the event of a crash or network failure causing Node A to disappear from the cluster, nodes B and C will attempt to reconnect with Node A. Upon failure, they will remove Node A from the cluster. With two out of the three nodes still running, the cluster maintains its quorum and continues to operate normally.

When Node A is restarted, it will join the cluster automatically, as outlined in Case 1.

Case 5

Nodes A and B have gone offline. Node C is unable to form a quorum on its own as 1 node is less than half of the total nodes (3). As a result, the cluster on node C is shifted to a non-primary state and rejects any write transactions with an error message.

Meanwhile, node C waits for the other nodes to connect and also tries to connect to them. If this happens, and the network is restored and nodes A and B are back online, the cluster will automatically reform. If nodes A and B are just temporarily disconnected from node C but can still communicate with each other, they will continue to operate as normal, as they still form the quorum.

However, if both nodes A and B have crashed or restarted due to a power failure, someone must activate the primary component on node C using the following command:

  • SQL
  • JSON
SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1

t's important to note that before executing this command, you must confirm that the other nodes are truly unreachable. Otherwise, a split-brain scenario may occur and separate clusters may form.

Case 6

All nodes have crashed. In this situation, the grastate.dat file in the cluster directory has not been updated and does not contain a valid seqnosequence number.

If this occurs, someone needs to locate the node with the most recent data and start the server on it using the --new-cluster-force command line key. All other nodes will start as normal, as described in Case 3). On Linux, you can also use the manticore_new_cluster --force, command, which will start Manticore in --new-cluster-force mode via systemd.

Case 7

Split-brain can cause the cluster to transition into a non-primary state. For example, consider a cluster comprised of an even number of nodes (four), such as two pairs of nodes located in different data centers. If a network failure interrupts the connection between the data centers, split-brain occurs as each group of nodes holds exactly half of the quorum. As a result, both groups stop handling write transactions, since the Galera replication model prioritizes data consistency, and the cluster cannot accept write transactions without a quorum. However, nodes in both groups attempt to reconnect with the nodes from the other group in an effort to restore the cluster.

If someone wants to restore the cluster before the network is restored, the same steps outlined in Case 5 hould be taken, but only at one group of nodes.

After the statement is executed, the group with the node that it was run on will be able to handle write transactions once again.

  • SQL
  • JSON
SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1

However, it's important to note that if the statement is issued at both groups, it will result in the formation of two separate clusters, and the subsequent network recovery will not result in the groups rejoining.

Connecting to the server

With default configuration, Manticore is waiting for your connections on:

  • port 9306 for MySQL clients
  • port 9308 for HTTP/HTTPS connections
  • port 9312 for HTTP/HTTPS, and connections from other Manticore nodes and clients based on Manticore binary API
  • SQL
  • HTTP
  • PHP
  • Python
  • Javascript
  • Java
  • C#
  • docker
mysql -h0 -P9306