The ALTER CLUSTER <cluster_name> UPDATE nodes statement updates the node lists on each node within the specified cluster to include all active nodes in the cluster. For more information on node lists, see Joining a cluster.
- SQL
- JSON
- PHP
- Python
- Python-asyncio
- javascript
- Java
- C#
- Rust
ALTER CLUSTER posts UPDATE nodesPOST /cli -d "
ALTER CLUSTER posts UPDATE nodes
"$params = [
'cluster' => 'posts',
'body' => [
'operation' => 'update',
]
];
$response = $client->cluster()->alter($params);utilsApi.sql('ALTER CLUSTER posts UPDATE nodes')await utilsApi.sql('ALTER CLUSTER posts UPDATE nodes')res = await utilsApi.sql('ALTER CLUSTER posts UPDATE nodes');utilsApi.sql("ALTER CLUSTER posts UPDATE nodes");utilsApi.Sql("ALTER CLUSTER posts UPDATE nodes");utils_api.sql("ALTER CLUSTER posts UPDATE nodes", Some(true)).await;{u'error': u'', u'total': 0, u'warning': u''}{u'error': u'', u'total': 0, u'warning': u''}{"total":0,"error":"","warning":""}For instance, when the cluster was initially established, the list of nodes used to rejoin the cluster was 10.10.0.1:9312,10.10.1.1:9312. Since then, other nodes joined the cluster and now the active nodes are 10.10.0.1:9312,10.10.1.1:9312,10.15.0.1:9312,10.15.0.3:9312.However, the list of nodes used to rejoin the cluster has not been updated.
To rectify this, you can run the ALTER CLUSTER ... UPDATE nodes statement to copy the list of active nodes to the list of nodes used to rejoin the cluster. After this, the list of nodes used to rejoin the cluster will include all the active nodes in the cluster.
Both lists of nodes can be viewed using the Cluster status statement (cluster_post_nodes_set and cluster_post_nodes_view).
To remove a node from the replication cluster, follow these steps:
- Stop the node
- Remove the information about the cluster from
<data_dir>/manticore.json(usually/var/lib/manticore/manticore.json) on the node that has been stopped. - Run
ALTER CLUSTER cluster_name UPDATE nodeson any other node.
After these steps, the other nodes will forget about the detached node and the detached node will forget about the cluster. This action will not impact the tables in the cluster or on the detached node.
EXIT CLUSTER <cluster_name> is the online equivalent of the manual detach flow above. It removes the local node from the replication cluster, keeps the local tables intact as regular local tables, saves the local config, and then asks the surviving peers to refresh their persisted node lists using the existing ALTER CLUSTER ... UPDATE nodes machinery.
EXIT CLUSTER posts
Use EXIT CLUSTER when you want to detach only the current node. Use DELETE CLUSTER when you want to remove the cluster from every node.
EXIT CLUSTER is only allowed for a healthy local node in a primary cluster. If the command returns a warning, the local detach already succeeded, but some follow-up may still be required. In that case run ALTER CLUSTER <cluster_name> UPDATE nodes on any surviving node to finish refreshing the remaining cluster metadata.
You can view the cluster status information by checking the node status. This can be done using the Node status command, which displays various information about the node, including the cluster status variables.
The output format for the cluster status variables is as follows: cluster_name_variable_name variable_value. Most of the variables are described in the Galera Documentation Status Variables. In addition to these variables, Manticore Search also displays:
cluster_name- the name of the cluster, as defined in the replication setupnode_state- the current state of the node:closed,destroyed,joining,donor,syncedindexes_count- the number of tables managed by the clusterindexes- a list of table names managed by the clusternodes_set- the list of nodes in the cluster defined using theCREATE,JOINorALTER UPDATEcommandsnodes_view- the actual list of nodes in the cluster that the current node can see.state_uuid- UUID state of the cluster. If it matches the value in local_state_uuid, the local and cluster nodes are in sync.conf_id- total number of cluster membership changes that have taken place.status- cluster component status. Possible values are primary (primary group configuration, quorum present), non_primary (non-primary group configuration, quorum lost), or disconnected (not connected to group, retrying).size- number of nodes currently in the cluster.local_index- the node's index in the cluster.last_error- the last recorded error message related to a cluster operation. The message provides a high-level summary of the problem. For more detailed context, you should consult thesearchd.logfile.
During a State Snapshot Transfer (SST), a node provisions another by transferring a full data copy. This happens when a new node joins the cluster JOIN CLUSTER or when new tables are added ALTER CLUSTER ADD. While an SST is active, the following additional status variables will be available on both the donor and joiner nodes, with their progress kept in sync.
cluster_name_sst_total- The overall progress of the entire SST operation, from 0 to 100. This is the primary counter to watch.cluster_name_sst_stage- The name of the current work phase. The process cycles through these stages for each table being transferred:await nodes syncblock checksum calculateanalyze remotesend filesactivate tables
cluster_name_sst_stage_total- The progress of the current stage, from 0 to 100.cluster_name_sst_tables- The total number of tables being transferred in the SST.cluster_name_sst_table- The name and index of the table currently being processed (e.g.,3 (products)).
For most use cases, cluster_name_sst_total is sufficient. However, the other counters can be useful for investigating stalls or performance issues during a specific SST stage or on a particular table.
- SQL
- JSON
- PHP
- Python
- Python-asyncio
- javascript
- Java
- C#
- Rust
SHOW STATUSPOST /cli -d "
SHOW STATUS
"$params = [
'body' => []
];
$response = $client->nodes()->status($params);utilsApi.sql('SHOW STATUS')await utilsApi.sql('SHOW STATUS')res = await utilsApi.sql('SHOW STATUS');utilsApi.sql("SHOW STATUS");utilsApi.sql("SHOW STATUS");utils_api.sql("SHOW STATUS", Some(true)).await;+---------------------------------+-------------------------------------------------------------------------------------+
| Counter | Value |
+---------------------------------+-------------------------------------------------------------------------------------+
| cluster_name | post |
| cluster_post_state_uuid | fba97c45-36df-11e9-a84e-eb09d14b8ea7 |
| cluster_post_conf_id | 1 |
| cluster_post_status | primary |
| cluster_post_size | 5 |
| cluster_post_local_index | 0 |
| cluster_post_node_state | donor |
| cluster_post_indexes_count | 2 |
| cluster_post_indexes | pq1,pq_posts |
| cluster_post_nodes_set | 10.10.0.1:9312 |
| cluster_post_nodes_view | 10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication |
| cluster_post_sst_total | 65 |
| cluster_post_sst_stage | send files |
| cluster_post_sst_stage_total | 78 |
| cluster_post_sst_tables | 5 |
| cluster_post_sst_table | 3 (products) |
+---------------------------------+-------------------------------------------------------------------------------------+"
{"columns":[{"Counter":{"type":"string"}},{"Value":{"type":"string"}}],
"data":[
{"Counter":"cluster_name", "Value":"post"},
{"Counter":"cluster_post_state_uuid", "Value":"fba97c45-36df-11e9-a84e-eb09d14b8ea7"},
{"Counter":"cluster_post_conf_id", "Value":"1"},
{"Counter":"cluster_post_status", "Value":"primary"},
{"Counter":"cluster_post_size", "Value":"5"},
{"Counter":"cluster_post_local_index", "Value":"0"},
{"Counter":"cluster_post_node_state", "Value":"donor"},
{"Counter":"cluster_post_indexes_count", "Value":"2"},
{"Counter":"cluster_post_indexes", "Value":"pq1,pq_posts"},
{"Counter":"cluster_post_nodes_set", "Value":"10.10.0.1:9312"},
{"Counter":"cluster_post_nodes_view", "Value":"10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"},
{"Counter":"cluster_post_sst_total", "Value":"65"},
{"Counter":"cluster_post_sst_stage", "Value":"send files"},
{"Counter":"cluster_post_sst_stage_total", "Value":"78"},
{"Counter":"cluster_post_sst_tables", "Value":"5"},
{"Counter":"cluster_post_sst_table", "Value":"3 (products)"}
],
"total":0,
"error":"",
"warning":""
}(
"cluster_name" => "post",
"cluster_post_state_uuid" => "fba97c45-36df-11e9-a84e-eb09d14b8ea7",
"cluster_post_conf_id" => 1,
"cluster_post_status" => "primary",
"cluster_post_size" => 5,
"cluster_post_local_index" => 0,
"cluster_post_node_state" => "donor",
"cluster_post_indexes_count" => 2,
"cluster_post_indexes" => "pq1,pq_posts",
"cluster_post_nodes_set" => "10.10.0.1:9312",
"cluster_post_nodes_view" => "10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication",
"cluster_post_sst_total" => 65,
"cluster_post_sst_stage" => "send files",
"cluster_post_sst_stage_total" => 78,
"cluster_post_sst_tables" => 5,
"cluster_post_sst_table" => "3 (products)"
){u'columns': [{u'Key': {u'type': u'string'}},
{u'Value': {u'type': u'string'}}],
u'data': [
{u'Key': u'cluster_name', u'Value': u'post'},
{u'Key': u'cluster_post_state_uuid', u'Value': u'fba97c45-36df-11e9-a84e-eb09d14b8ea7'},
{u'Key': u'cluster_post_conf_id', u'Value': u'1'},
{u'Key': u'cluster_post_status', u'Value': u'primary'},
{u'Key': u'cluster_post_size', u'Value': u'5'},
{u'Key': u'cluster_post_local_index', u'Value': u'0'},
{u'Key': u'cluster_post_node_state', u'Value': u'donor'},
{u'Key': u'cluster_post_indexes_count', u'Value': u'2'},
{u'Key': u'cluster_post_indexes', u'Value': u'pq1,pq_posts'},
{u'Key': u'cluster_post_nodes_set', u'Value': u'10.10.0.1:9312'},
{u'Key': u'cluster_post_nodes_view', u'Value': u'10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication'},
{u'Key': u'cluster_post_sst_total', u'Value': u'65'},
{u'Key': u'cluster_post_sst_stage', u'Value': u'send files'},
{u'Key': u'cluster_post_sst_stage_total', u'Value': u'78'},
{u'Key': u'cluster_post_sst_tables', u'Value': u'5'},
{u'Key': u'cluster_post_sst_table', u'Value': u'3 (products)'}],
u'error': u'',
u'total': 0,
u'warning': u''}{u'columns': [{u'Key': {u'type': u'string'}},
{u'Value': {u'type': u'string'}}],
u'data': [
{u'Key': u'cluster_name', u'Value': u'post'},
{u'Key': u'cluster_post_state_uuid', u'Value': u'fba97c45-36df-11e9-a84e-eb09d14b8ea7'},
{u'Key': u'cluster_post_conf_id', u'Value': u'1'},
{u'Key': u'cluster_post_status', u'Value': u'primary'},
{u'Key': u'cluster_post_size', u'Value': u'5'},
{u'Key': u'cluster_post_local_index', u'Value': u'0'},
{u'Key': u'cluster_post_node_state', u'Value': u'donor'},
{u'Key': u'cluster_post_indexes_count', u'Value': u'2'},
{u'Key': u'cluster_post_indexes', u'Value': u'pq1,pq_posts'},
{u'Key': u'cluster_post_nodes_set', u'Value': u'10.10.0.1:9312'},
{u'Key': u'cluster_post_nodes_view', u'Value': u'10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication'},
{u'Key': u'cluster_post_sst_total', u'Value': u'65'},
{u'Key': u'cluster_post_sst_stage', u'Value': u'send files'},
{u'Key': u'cluster_post_sst_stage_total', u'Value': u'78'},
{u'Key': u'cluster_post_sst_tables', u'Value': u'5'},
{u'Key': u'cluster_post_sst_table', u'Value': u'3 (products)'}],
u'error': u'',
u'total': 0,
u'warning': u''}{"columns": [{"Key": {"type": "string"}},
{"Value": {"type": "string"}}],
"data": [
{"Key": "cluster_name", "Value": "post"},
{"Key": "cluster_post_state_uuid", "Value": "fba97c45-36df-11e9-a84e-eb09d14b8ea7"},
{"Key": "cluster_post_conf_id", "Value": "1"},
{"Key": "cluster_post_status", "Value": "primary"},
{"Key": "cluster_post_size", "Value": "5"},
{"Key": "cluster_post_local_index", "Value": "0"},
{"Key": "cluster_post_node_state", "Value": "donor"},
{"Key": "cluster_post_indexes_count", "Value": "2"},
{"Key": "cluster_post_indexes", "Value": "pq1,pq_posts"},
{"Key": "cluster_post_nodes_set", "Value": "10.10.0.1:9312"},
{"Key": "cluster_post_nodes_view", "Value": "10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"},
{"Key": "cluster_post_sst_total", "Value": "65"},
{"Key": "cluster_post_sst_stage", "Value": "send files"},
{"Key": "cluster_post_sst_stage_total", "Value": "78"},
{"Key": "cluster_post_sst_tables", "Value": "5"},
{"Key": "cluster_post_sst_table", "Value": "3 (products)"}],
"error": "",
"total": 0,
"warning": ""}{columns=[{ Key : { type=string }},
{ Value : { type=string }}],
data : [
{ Key=cluster_name, Value=post},
{ Key=cluster_post_state_uuid, Value=fba97c45-36df-11e9-a84e-eb09d14b8ea7},
{ Key=cluster_post_conf_id, Value=1},
{ Key=cluster_post_status, Value=primary},
{ Key=cluster_post_size, Value=5},
{ Key=cluster_post_local_index, Value=0},
{ Key=cluster_post_node_state, Value=donor},
{ Key=cluster_post_indexes_count, Value=2},
{ Key=cluster_post_indexes, Value=pq1,pq_posts},
{ Key=cluster_post_nodes_set, Value=10.10.0.1:9312},
{ Key=cluster_post_nodes_view, Value=10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication},
{ Key=cluster_post_sst_total, Value=65},
{ Key=cluster_post_sst_stage, Value=send files},
{ Key=cluster_post_sst_stage_total, Value=78},
{ Key=cluster_post_sst_tables, Value=5},
{ Key=cluster_post_sst_table, Value=3 (products)}],
error= ,
total=0,
warning= }{columns=[{ Key : { type=String }},
{ Value : { type=String }}],
data : [
{ Key=cluster_name, Value=post},
{ Key=cluster_post_state_uuid, Value=fba97c45-36df-11e9-a84e-eb09d14b8ea7},
{ Key=cluster_post_conf_id, Value=1},
{ Key=cluster_post_status, Value=primary},
{ Key=cluster_post_size, Value=5},
{ Key=cluster_post_local_index, Value=0},
{ Key=cluster_post_node_state, Value=donor},
{ Key=cluster_post_indexes_count, Value=2},
{ Key=cluster_post_indexes, Value=pq1,pq_posts},
{ Key=cluster_post_nodes_set, Value=10.10.0.1:9312},
{ Key=cluster_post_nodes_view, Value=10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication},
{ Key=cluster_post_sst_total, Value=65},
{ Key=cluster_post_sst_stage, Value=send files},
{ Key=cluster_post_sst_stage_total, Value=78},
{ Key=cluster_post_sst_tables, Value=5},
{ Key=cluster_post_sst_table, Value=3 (products)}],
error="" ,
total=0,
warning="" }{columns=[{ Key : { type=String }},
{ Value : { type=String }}],
data : [
{ Key=cluster_name, Value=post},
{ Key=cluster_post_state_uuid, Value=fba97c45-36df-11e9-a84e-eb09d14b8ea7},
{ Key=cluster_post_conf_id, Value=1},
{ Key=cluster_post_status, Value=primary},
{ Key=cluster_post_size, Value=5},
{ Key=cluster_post_local_index, Value=0},
{ Key=cluster_post_node_state, Value=donor},
{ Key=cluster_post_indexes_count, Value=2},
{ Key=cluster_post_indexes, Value=pq1,pq_posts},
{ Key=cluster_post_nodes_set, Value=10.10.0.1:9312},
{ Key=cluster_post_nodes_view, Value=10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication},
{ Key=cluster_post_sst_total, Value=65},
{ Key=cluster_post_sst_stage, Value=send files},
{ Key=cluster_post_sst_stage_total, Value=78},
{ Key=cluster_post_sst_tables, Value=5},
{ Key=cluster_post_sst_table, Value=3 (products)}],
error="" ,
total=0,
warning="" }When a whole replication cluster is down, one node must be started first so the other nodes know which copy of the cluster to join.
That first-start decision is based on grastate.dat, the small replication state file stored in the cluster data directory. The most important fields are:
seqno- the last transaction number known to that nodesafe_to_bootstrap- whether that node is marked as safe to start first after a clean shutdown
Example of what grastate.dat can look like after a clean shutdown:
# saved replication state
version: 2.1
uuid: <cluster-uuid>
seqno: 12345
safe_to_bootstrap: 1
In this example:
seqno: 12345means this node knows about transactions up to sequence number 12345safe_to_bootstrap: 1means this node is marked as safe to start first
If the whole cluster was shut down cleanly, start the node that was stopped last. In practice, this is usually the node with:
- the most advanced
seqno safe_to_bootstrap: 1
Start that node first. This tells Manticore to start a new copy of the cluster from that node. After that, start the remaining nodes normally so they can rejoin.
Use this after a clean full cluster shutdown.
- Bash
- Systemd
searchd --new-clustermanticore_new_clusterIf another node is started first without the required clean-shutdown state, startup is refused to protect the cluster from being restored from an older copy.
If all nodes crashed or were shut down uncleanly, grastate.dat may no longer be trustworthy for normal bootstrap selection. In that case, find the node with the most recent data, usually the one with the largest seqno, and start it with --new-cluster-force. This overrides the normal protection and forces the cluster to start from the chosen node.
Use this after a crash or unclean full cluster shutdown.
- Bash
- Systemd
searchd --new-cluster-forcemanticore_new_cluster --forceIf a replication node or an entire cluster becomes unavailable, the correct recovery procedure depends on how many nodes are still reachable and whether the shutdown was clean or abrupt.
A replication cluster should be treated as one logical system rather than a set of unrelated servers. That gives you multi-master writes and consistent data, but it also means you must recover quorum carefully. In particular, do not run the manual recovery command that restores writes on the surviving side until you are sure the missing nodes are really gone. That command is shown later in this page as SET CLUSTER <name> GLOBAL 'pc.bootstrap' = 1. If you run it too early, you can create split-brain and end up with two independent clusters.
For the examples below, assume a cluster with nodes A, B, and C unless noted otherwise.
First identify which situation you are in:
- Is at least one node still online?
- Was the node stopped cleanly, or did it crash? A clean stop means
searchdwas shut down normally and had time to save its replication state before exiting. A crash, power loss, orkill -9is not a clean stop. - Does the surviving part of the cluster still have quorum? Quorum means enough nodes can still see each other to safely remain the writable cluster.
- If all nodes are down, which node should be started first to bring the cluster back?
Useful checks:
SHOW STATUS LIKE 'cluster_<name>_status'SHOW STATUS LIKE 'cluster_<name>_size'SHOW STATUS LIKE 'cluster_<name>_node_state'- if all nodes are down, inspect
grastate.dat, the small replication state file stored in the cluster data directory. Look especially atseqnoandsafe_to_bootstrap: on a clean shutdown, the best node to start first is usually the one with the most advancedseqnoandsafe_to_bootstrap: 1. For the full bootstrap procedure, see Restarting a cluster.
Example of what grastate.dat can look like after a clean shutdown:
# saved replication state
version: 2.1
uuid: <cluster-uuid>
seqno: 12345
safe_to_bootstrap: 1
In this example:
seqno: 12345means this node knows about transactions up to sequence number 12345safe_to_bootstrap: 1means this node is marked as safe to start first
In a clean all-nodes-down recovery, this is usually the kind of node you start first with --new-cluster to bring the cluster back.
After recovery, wait until the restarted node reports cluster_<name>_status=primary and cluster_<name>_node_state=synced before treating it as fully writable again. You can check this with SHOW STATUS LIKE 'cluster_<name>_status' and SHOW STATUS LIKE 'cluster_<name>_node_state'. In local tests, restarted nodes sometimes spent a short time with cluster_<name>_node_state=joining and cluster_<name>_status=disconnected before reaching synced/primary.
If node A is stopped normally, nodes B and C keep serving writes. You can confirm that the cluster is still healthy on those nodes with SHOW STATUS LIKE 'cluster_<name>_status' and SHOW STATUS LIKE 'cluster_<name>_size'.
When node A starts again, it rejoins the cluster automatically. Until synchronization finishes, do not send writes to that node. Check SHOW STATUS LIKE 'cluster_<name>_status' and SHOW STATUS LIKE 'cluster_<name>_node_state' and wait for primary / synced.
If donor nodes B or C still have all the transactions that node A missed in their replication cache, node A can catch up using an incremental state transfer (IST). IST stands for incremental state transfer. It means the node receives only the transactions it missed, so recovery is usually faster and lighter. Otherwise it will require a snapshot state transfer (SST). SST stands for snapshot state transfer. It means copying table files from another node instead of just replaying the missing transactions. SST is heavier: it is usually slower, moves more data, and can make recovery more disruptive on large clusters.
If nodes A and B are stopped cleanly and node C remains online, node C can continue accepting writes. Check SHOW STATUS LIKE 'cluster_<name>_status' and SHOW STATUS LIKE 'cluster_<name>_size' on node C if you want to confirm it is now the only active node.
When nodes A and B start again, they rejoin automatically and synchronize from node C. While they are rejoining, check SHOW STATUS LIKE 'cluster_<name>_status', SHOW STATUS LIKE 'cluster_<name>_node_state', and SHOW STATUS LIKE 'cluster_<name>_size'. Wait until all nodes show primary / synced and the expected cluster size before treating recovery as complete.
If all nodes were stopped normally, the cluster is fully offline and must be started again in a special way so it can become the first running node of the cluster.
On clean shutdown, each node writes its last transaction number to grastate.dat. The node that was stopped last is the safest node to start first:
- it has the most advanced
seqno - it has
safe_to_bootstrap: 1
Start that node with --new-cluster. This tells Manticore to start a new copy of the cluster from that node. If you run Manticore via systemd on Linux, use manticore_new_cluster. It starts Manticore in --new-cluster mode for you.
After that, start the remaining nodes normally and let them rejoin. Verify recovery with SHOW STATUS LIKE 'cluster_<name>_status', SHOW STATUS LIKE 'cluster_<name>_node_state', and SHOW STATUS LIKE 'cluster_<name>_size'.
If you bootstrap a less advanced node first, a more advanced node may later join it and receive a full SST from an older state, which can discard transactions that existed only on the more advanced node. That is why the node with safe_to_bootstrap: 1 should be your first choice.
If node A disappears because of a crash or a network problem, nodes B and C will first try to reconnect to it. If that fails, they remove it from the cluster, recalculate quorum, and continue working as the primary cluster.
In local tests this peer removal was not immediate: the surviving nodes stayed at the old cluster size for a few seconds before dropping the failed peer and switching to the smaller primary cluster.
When node A is started again, it rejoins automatically and catches up the same way as after a clean one-node shutdown. Again, use SHOW STATUS LIKE 'cluster_<name>_status', SHOW STATUS LIKE 'cluster_<name>_node_state', and SHOW STATUS LIKE 'cluster_<name>_size' to confirm recovery is finished.
If nodes A and B are lost and only node C is still running, node C no longer has quorum in a three-node cluster. It switches to non-primary and rejects writes.
The write error is explicit:
ERROR 1064 (42000): cluster '<name>' is not ready, not primary state (synced)
If nodes A and B are only temporarily disconnected but can still see each other, they may continue accepting writes while node C remains isolated. Use SHOW STATUS LIKE 'cluster_<name>_status' on each side if you need to see which side is still writable.
If nodes A and B really crashed and node C is the only surviving copy you want to keep working with, run this command on node C to make it writable again:
If you have confirmed that the other nodes are truly offline, run:
- SQL
- JSON
SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1POST /cli -d "
SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1
"Important:
- run this only after you are sure the other nodes are unreachable
- run it only on the side that must survive
- after bootstrapping, the node can accept writes again and the other nodes can later rejoin from it
If every node crashed, grastate.dat is typically no longer trustworthy for normal bootstrap selection. In local tests, all nodes showed:
seqno: -1safe_to_bootstrap: 0
In this situation, choose the node with the most recent data and start it with --new-cluster-force. This forces Manticore to start a new copy of the cluster from that node even though the usual clean-shutdown metadata is not trustworthy. If you run Manticore via systemd on Linux, use manticore_new_cluster --force. It starts Manticore in --new-cluster-force mode for you.
Then start the remaining nodes normally and let them rejoin. Verify recovery with SHOW STATUS LIKE 'cluster_<name>_status', SHOW STATUS LIKE 'cluster_<name>_node_state', and SHOW STATUS LIKE 'cluster_<name>_size'.
Split-brain risk is highest in even-sized clusters. For example, imagine four nodes split into two isolated pairs across two data centers. Each side has exactly half of the original members, so neither side has quorum and both sides stop accepting writes.
If you must restore writes before connectivity is fixed, choose only one side of the split and run the same recovery command there so that side becomes writable again. Before doing that, check SHOW STATUS LIKE 'cluster_<name>_status' on both sides so you know which side is currently non-primary.
Choose the side that should remain the writable cluster, then run:
- SQL
- JSON
SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1POST /cli -d "
SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1
"Never issue that statement on both sides. If you do, you will create two separate primary clusters, and they will not merge back automatically when the network recovers.
In local testing, abruptly losing half of a four-node cluster reproduced the same non-primary behavior and the same recovery command brought one surviving half back to primary.