Creating a cluster > Setting up replication > Restarting a cluster

The ALTER CLUSTER <cluster_name> UPDATE nodes statement updates the node lists on each node within the specified cluster to include all active nodes in the cluster. For more information on node lists, see Joining a cluster.

‹›

SQL
JSON
PHP
Python
Python-asyncio
javascript
Java
C#
Rust

📋

ALTER CLUSTER posts UPDATE nodes

POST /cli -d "
ALTER CLUSTER posts UPDATE nodes
"

$params = [
  'cluster' => 'posts',
  'body' => [
     'operation' => 'update',
  ]
];
$response = $client->cluster()->alter($params);

utilsApi.sql('ALTER CLUSTER posts UPDATE nodes')

await utilsApi.sql('ALTER CLUSTER posts UPDATE nodes')

res = await utilsApi.sql('ALTER CLUSTER posts UPDATE nodes');

utilsApi.sql("ALTER CLUSTER posts UPDATE nodes");

utilsApi.Sql("ALTER CLUSTER posts UPDATE nodes");

utils_api.sql("ALTER CLUSTER posts UPDATE nodes", Some(true)).await;

‹›

Response

{u'error': u'', u'total': 0, u'warning': u''}

{u'error': u'', u'total': 0, u'warning': u''}

{"total":0,"error":"","warning":""}

If authentication and authorization is enabled, ALTER CLUSTER ... UPDATE nodes, ALTER CLUSTER ... ADD, and ALTER CLUSTER ... DROP use the stored cluster user. To change the stored cluster user, grant replication permission to the new user and run:

ALTER CLUSTER posts UPDATE user 'repl_user'

The new stored user must be provisioned with matching authentication data on the nodes that will participate in later cluster operations. Those operations fail if the stored user is missing, has different auth data, or loses replication permission.

For instance, when the cluster was initially established, the list of nodes used to rejoin the cluster was 10.10.0.1:9312,10.10.1.1:9312. Since then, other nodes joined the cluster and now the active nodes are 10.10.0.1:9312,10.10.1.1:9312,10.15.0.1:9312,10.15.0.3:9312.However, the list of nodes used to rejoin the cluster has not been updated.

To rectify this, you can run the ALTER CLUSTER ... UPDATE nodes statement to copy the list of active nodes to the list of nodes used to rejoin the cluster. After this, the list of nodes used to rejoin the cluster will include all the active nodes in the cluster.

Both lists of nodes can be viewed using the Cluster status statement (cluster_post_nodes_set and cluster_post_nodes_view).

To remove a node from the replication cluster, follow these steps:

Stop the node
Remove the information about the cluster from <data_dir>/manticore.json (usually /var/lib/manticore/manticore.json) on the node that has been stopped.
Run ALTER CLUSTER cluster_name UPDATE nodes on any other node.

After these steps, the other nodes will forget about the detached node and the detached node will forget about the cluster. This action will not impact the tables in the cluster or on the detached node.

EXIT CLUSTER <cluster_name> is the online equivalent of the manual detach flow above. It removes the local node from the replication cluster, keeps the local tables intact as regular local tables, saves the local config, and then asks the surviving peers to refresh their persisted node lists using the existing ALTER CLUSTER ... UPDATE nodes machinery.

EXIT CLUSTER posts

Use EXIT CLUSTER when you want to detach only the current node. Use DELETE CLUSTER when you want to remove the cluster from every node.

EXIT CLUSTER is only allowed for a healthy local node in a primary cluster. If the command returns a warning, the local detach already succeeded, but some follow-up may still be required. In that case run ALTER CLUSTER <cluster_name> UPDATE nodes on any surviving node to finish refreshing the remaining cluster metadata.

If the surviving side becomes a clean one-node cluster after EXIT CLUSTER, it remains a replication cluster. After a clean shutdown, that surviving node can be started normally and should return as primary / synced; you do not need --new-cluster for this self-only case.

Replication cluster status

Last modified: June 30, 2026

You can view the cluster status information by checking the node status. This can be done using the Node status command, which displays various information about the node, including the cluster status variables.

The output format for the cluster status variables is as follows: cluster_name_variable_name variable_value. Most of the variables are described in the Galera Documentation Status Variables. In addition to these variables, Manticore Search also displays:

cluster_name - the name of the cluster, as defined in the replication setup
node_state - the current state of the node: closed, destroyed, joining, donor, synced
indexes_count - the number of tables managed by the cluster
indexes - a list of table names managed by the cluster
nodes_set - the list of nodes in the cluster defined using the CREATE, JOIN or ALTER UPDATE commands
nodes_view - the actual list of nodes in the cluster that the current node can see.
state_uuid - UUID state of the cluster. If it matches the value in local_state_uuid, the local and cluster nodes are in sync.
conf_id - total number of cluster membership changes that have taken place.
status - cluster component status. Possible values are primary (primary group configuration, quorum present), non_primary (non-primary group configuration, quorum lost), disconnected (not connected to group, retrying), or recovery-required (cluster metadata is preserved, but the replication provider did not start and manual recovery is required).
size - number of nodes currently in the cluster.
local_index - the node's index in the cluster.
last_error - the last recorded error message related to a cluster operation. The message provides a high-level summary of the problem. For more detailed context, you should consult the searchd.log file.

When status is recovery-required, Manticore keeps the cluster metadata and table membership visible so you can inspect or delete the cluster, but the cluster is not writable. Galera provider status variables are not available until the cluster is recovered and started again.

During a State Snapshot Transfer (SST), a node provisions another by transferring a full data copy. This happens when a new node joins the cluster JOIN CLUSTER or when new tables are added ALTER CLUSTER ADD. While an SST is active, the following additional status variables will be available on both the donor and joiner nodes, with their progress kept in sync.

cluster_name_sst_total - The overall progress of the entire SST operation, from 0 to 100. This is the primary counter to watch.
cluster_name_sst_stage - The name of the current work phase. The process cycles through these stages for each table being transferred:
- await nodes sync
- block checksum calculate
- analyze remote
- send files
- activate tables
cluster_name_sst_stage_total - The progress of the current stage, from 0 to 100.
cluster_name_sst_tables - The total number of tables being transferred in the SST.
cluster_name_sst_table - The name and index of the table currently being processed (e.g., 3 (products)).

For most use cases, cluster_name_sst_total is sufficient. However, the other counters can be useful for investigating stalls or performance issues during a specific SST stage or on a particular table.

‹›

SQL
JSON
PHP
Python
Python-asyncio
javascript
Java
C#
Rust

📋

SHOW STATUS

POST /cli -d "
SHOW STATUS
"

$params = [
    'body' => []
];
$response = $client->nodes()->status($params);

utilsApi.sql('SHOW STATUS')

await utilsApi.sql('SHOW STATUS')

res = await utilsApi.sql('SHOW STATUS');

utilsApi.sql("SHOW STATUS");

utilsApi.sql("SHOW STATUS");

utils_api.sql("SHOW STATUS", Some(true)).await;

‹›

Response

+---------------------------------+-------------------------------------------------------------------------------------+
| Counter                         | Value                                                                               |
+---------------------------------+-------------------------------------------------------------------------------------+
| cluster_name                    | post                                                                                |
| cluster_post_state_uuid         | fba97c45-36df-11e9-a84e-eb09d14b8ea7                                                |
| cluster_post_conf_id            | 1                                                                                   |
| cluster_post_status             | primary                                                                             |
| cluster_post_size               | 5                                                                                   |
| cluster_post_local_index        | 0                                                                                   |
| cluster_post_node_state         | donor                                                                               |
| cluster_post_indexes_count      | 2                                                                                   |
| cluster_post_indexes            | pq1,pq_posts                                                                        |
| cluster_post_nodes_set          | 10.10.0.1:9312                                                                      |
| cluster_post_nodes_view         | 10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication |
| cluster_post_sst_total          | 65                                                                                  |
| cluster_post_sst_stage          | send files                                                                       |
| cluster_post_sst_stage_total    | 78                                                                                  |
| cluster_post_sst_tables         | 5                                                                                   |
| cluster_post_sst_table          | 3 (products)                                                                        |
+---------------------------------+-------------------------------------------------------------------------------------+

"
{"columns":[{"Counter":{"type":"string"}},{"Value":{"type":"string"}}],
"data":[
{"Counter":"cluster_name", "Value":"post"},
{"Counter":"cluster_post_state_uuid", "Value":"fba97c45-36df-11e9-a84e-eb09d14b8ea7"},
{"Counter":"cluster_post_conf_id", "Value":"1"},
{"Counter":"cluster_post_status", "Value":"primary"},
{"Counter":"cluster_post_size", "Value":"5"},
{"Counter":"cluster_post_local_index", "Value":"0"},
{"Counter":"cluster_post_node_state", "Value":"donor"},
{"Counter":"cluster_post_indexes_count", "Value":"2"},
{"Counter":"cluster_post_indexes", "Value":"pq1,pq_posts"},
{"Counter":"cluster_post_nodes_set", "Value":"10.10.0.1:9312"},
{"Counter":"cluster_post_nodes_view", "Value":"10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"},
{"Counter":"cluster_post_sst_total", "Value":"65"},
{"Counter":"cluster_post_sst_stage", "Value":"send files"},
{"Counter":"cluster_post_sst_stage_total", "Value":"78"},
{"Counter":"cluster_post_sst_tables", "Value":"5"},
{"Counter":"cluster_post_sst_table", "Value":"3 (products)"}
],
"total":0,
"error":"",
"warning":""
}

(
"cluster_name" => "post",
"cluster_post_state_uuid" => "fba97c45-36df-11e9-a84e-eb09d14b8ea7",
"cluster_post_conf_id" => 1,
"cluster_post_status" => "primary",
"cluster_post_size" => 5,
"cluster_post_local_index" => 0,
"cluster_post_node_state" => "donor",
"cluster_post_indexes_count" => 2,
"cluster_post_indexes" => "pq1,pq_posts",
"cluster_post_nodes_set" => "10.10.0.1:9312",
"cluster_post_nodes_view" => "10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication",
"cluster_post_sst_total" => 65,
"cluster_post_sst_stage" => "send files",
"cluster_post_sst_stage_total" => 78,
"cluster_post_sst_tables" => 5,
"cluster_post_sst_table" => "3 (products)"
)

{u'columns': [{u'Key': {u'type': u'string'}},
              {u'Value': {u'type': u'string'}}],
 u'data': [
    {u'Key': u'cluster_name', u'Value': u'post'},
    {u'Key': u'cluster_post_state_uuid', u'Value': u'fba97c45-36df-11e9-a84e-eb09d14b8ea7'},
    {u'Key': u'cluster_post_conf_id', u'Value': u'1'},
    {u'Key': u'cluster_post_status', u'Value': u'primary'},
    {u'Key': u'cluster_post_size', u'Value': u'5'},
    {u'Key': u'cluster_post_local_index', u'Value': u'0'},
    {u'Key': u'cluster_post_node_state', u'Value': u'donor'},
    {u'Key': u'cluster_post_indexes_count', u'Value': u'2'},
    {u'Key': u'cluster_post_indexes', u'Value': u'pq1,pq_posts'},
    {u'Key': u'cluster_post_nodes_set', u'Value': u'10.10.0.1:9312'},
    {u'Key': u'cluster_post_nodes_view', u'Value': u'10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication'},
    {u'Key': u'cluster_post_sst_total', u'Value': u'65'},
    {u'Key': u'cluster_post_sst_stage', u'Value': u'send files'},
    {u'Key': u'cluster_post_sst_stage_total', u'Value': u'78'},
    {u'Key': u'cluster_post_sst_tables', u'Value': u'5'},
    {u'Key': u'cluster_post_sst_table', u'Value': u'3 (products)'}],
 u'error': u'',
 u'total': 0,
 u'warning': u''}

{u'columns': [{u'Key': {u'type': u'string'}},
              {u'Value': {u'type': u'string'}}],
 u'data': [
    {u'Key': u'cluster_name', u'Value': u'post'},
    {u'Key': u'cluster_post_state_uuid', u'Value': u'fba97c45-36df-11e9-a84e-eb09d14b8ea7'},
    {u'Key': u'cluster_post_conf_id', u'Value': u'1'},
    {u'Key': u'cluster_post_status', u'Value': u'primary'},
    {u'Key': u'cluster_post_size', u'Value': u'5'},
    {u'Key': u'cluster_post_local_index', u'Value': u'0'},
    {u'Key': u'cluster_post_node_state', u'Value': u'donor'},
    {u'Key': u'cluster_post_indexes_count', u'Value': u'2'},
    {u'Key': u'cluster_post_indexes', u'Value': u'pq1,pq_posts'},
    {u'Key': u'cluster_post_nodes_set', u'Value': u'10.10.0.1:9312'},
    {u'Key': u'cluster_post_nodes_view', u'Value': u'10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication'},
    {u'Key': u'cluster_post_sst_total', u'Value': u'65'},
    {u'Key': u'cluster_post_sst_stage', u'Value': u'send files'},
    {u'Key': u'cluster_post_sst_stage_total', u'Value': u'78'},
    {u'Key': u'cluster_post_sst_tables', u'Value': u'5'},
    {u'Key': u'cluster_post_sst_table', u'Value': u'3 (products)'}],
 u'error': u'',
 u'total': 0,
 u'warning': u''}

{"columns": [{"Key": {"type": "string"}},
              {"Value": {"type": "string"}}],
 "data": [
    {"Key": "cluster_name", "Value": "post"},
    {"Key": "cluster_post_state_uuid", "Value": "fba97c45-36df-11e9-a84e-eb09d14b8ea7"},
    {"Key": "cluster_post_conf_id", "Value": "1"},
    {"Key": "cluster_post_status", "Value": "primary"},
    {"Key": "cluster_post_size", "Value": "5"},
    {"Key": "cluster_post_local_index", "Value": "0"},
    {"Key": "cluster_post_node_state", "Value": "donor"},
    {"Key": "cluster_post_indexes_count", "Value": "2"},
    {"Key": "cluster_post_indexes", "Value": "pq1,pq_posts"},
    {"Key": "cluster_post_nodes_set", "Value": "10.10.0.1:9312"},
    {"Key": "cluster_post_nodes_view", "Value": "10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"},
    {"Key": "cluster_post_sst_total", "Value": "65"},
    {"Key": "cluster_post_sst_stage", "Value": "send files"},
    {"Key": "cluster_post_sst_stage_total", "Value": "78"},
    {"Key": "cluster_post_sst_tables", "Value": "5"},
    {"Key": "cluster_post_sst_table", "Value": "3 (products)"}],
 "error": "",
 "total": 0,
 "warning": ""}

{columns=[{ Key : { type=string }},
              { Value : { type=string }}],
  data : [
    { Key=cluster_name, Value=post},
    { Key=cluster_post_state_uuid, Value=fba97c45-36df-11e9-a84e-eb09d14b8ea7},
    { Key=cluster_post_conf_id, Value=1},
    { Key=cluster_post_status, Value=primary},
    { Key=cluster_post_size, Value=5},
    { Key=cluster_post_local_index, Value=0},
    { Key=cluster_post_node_state, Value=donor},
    { Key=cluster_post_indexes_count, Value=2},
    { Key=cluster_post_indexes, Value=pq1,pq_posts},
    { Key=cluster_post_nodes_set, Value=10.10.0.1:9312},
    { Key=cluster_post_nodes_view, Value=10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication},
    { Key=cluster_post_sst_total, Value=65},
    { Key=cluster_post_sst_stage, Value=send files},
    { Key=cluster_post_sst_stage_total, Value=78},
    { Key=cluster_post_sst_tables, Value=5},
    { Key=cluster_post_sst_table, Value=3 (products)}],
  error= ,
  total=0,
  warning= }

{columns=[{ Key : { type=String }},
              { Value : { type=String }}],
  data : [
    { Key=cluster_name, Value=post},
    { Key=cluster_post_state_uuid, Value=fba97c45-36df-11e9-a84e-eb09d14b8ea7},
    { Key=cluster_post_conf_id, Value=1},
    { Key=cluster_post_status, Value=primary},
    { Key=cluster_post_size, Value=5},
    { Key=cluster_post_local_index, Value=0},
    { Key=cluster_post_node_state, Value=donor},
    { Key=cluster_post_indexes_count, Value=2},
    { Key=cluster_post_indexes, Value=pq1,pq_posts},
    { Key=cluster_post_nodes_set, Value=10.10.0.1:9312},
    { Key=cluster_post_nodes_view, Value=10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication},
    { Key=cluster_post_sst_total, Value=65},
    { Key=cluster_post_sst_stage, Value=send files},
    { Key=cluster_post_sst_stage_total, Value=78},
    { Key=cluster_post_sst_tables, Value=5},
    { Key=cluster_post_sst_table, Value=3 (products)}],
  error="" ,
  total=0,
  warning="" }

{columns=[{ Key : { type=String }},
              { Value : { type=String }}],
  data : [
    { Key=cluster_name, Value=post},
    { Key=cluster_post_state_uuid, Value=fba97c45-36df-11e9-a84e-eb09d14b8ea7},
    { Key=cluster_post_conf_id, Value=1},
    { Key=cluster_post_status, Value=primary},
    { Key=cluster_post_size, Value=5},
    { Key=cluster_post_local_index, Value=0},
    { Key=cluster_post_node_state, Value=donor},
    { Key=cluster_post_indexes_count, Value=2},
    { Key=cluster_post_indexes, Value=pq1,pq_posts},
    { Key=cluster_post_nodes_set, Value=10.10.0.1:9312},
    { Key=cluster_post_nodes_view, Value=10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication},
    { Key=cluster_post_sst_total, Value=65},
    { Key=cluster_post_sst_stage, Value=send files},
    { Key=cluster_post_sst_stage_total, Value=78},
    { Key=cluster_post_sst_tables, Value=5},
    { Key=cluster_post_sst_table, Value=3 (products)}],
  error="" ,
  total=0,
  warning="" }

Managing replication nodes Restarting a cluster

Last modified: June 30, 2026

When a whole multi-node replication cluster is down, one node must be started first so the other nodes know which copy of the cluster to join.

There is one normal-startup exception: if the cluster had already become a clean one-node cluster, for example after the other nodes left with EXIT CLUSTER, start that remaining node normally. After a clean shutdown, Manticore restores that self-only cluster as primary / synced without --new-cluster.

That first-start decision is based on grastate.dat, the small replication state file stored in the cluster data directory. The most important fields are:

seqno - the last transaction number known to that node
safe_to_bootstrap - whether that node is marked as safe to start first after a clean shutdown

Example of what grastate.dat can look like after a clean shutdown:

# saved replication state
version: 2.1
uuid:    <cluster-uuid>
seqno:   12345
safe_to_bootstrap: 1

In this example:

seqno: 12345 means this node knows about transactions up to sequence number 12345
safe_to_bootstrap: 1 means this node is marked as safe to start first

If the whole cluster was shut down cleanly, start the node that was stopped last. In practice, this is usually the node with:

the most advanced seqno
safe_to_bootstrap: 1

Start that node first. This tells Manticore to start a new copy of the cluster from that node. After that, start the remaining nodes normally so they can rejoin.

Use this after a clean full cluster shutdown.

‹›

Bash
Systemd

📋

searchd --new-cluster

If another node is started first without the required clean-shutdown state, startup is refused to protect the cluster from being restored from an older copy.

If all nodes crashed or were shut down uncleanly, grastate.dat may no longer be trustworthy for normal bootstrap selection. In that case, find the node with the most recent data, usually the one with the largest seqno, and start it with --new-cluster-force. This overrides the normal protection and forces the cluster to start from the chosen node.

Use this after a crash or unclean full cluster shutdown.

‹›

Bash
Systemd

📋

searchd --new-cluster-force

Replication cluster status Cluster recovery

Last modified: June 30, 2026

If a replication node or an entire cluster becomes unavailable, the correct recovery procedure depends on how many nodes are still reachable and whether the shutdown was clean or abrupt.

A replication cluster should be treated as one logical system rather than a set of unrelated servers. That gives you multi-master writes and consistent data, but it also means you must recover quorum carefully. In particular, do not run the manual recovery command that restores writes on the surviving side until you are sure the missing nodes are really gone. That command is shown later in this page as SET CLUSTER <name> GLOBAL 'pc.bootstrap' = 1. If you run it too early, you can create split-brain and end up with two independent clusters.

For the examples below, assume a cluster with nodes A, B, and C unless noted otherwise.

First identify which situation you are in:

Is at least one node still online?
Was the node stopped cleanly, or did it crash? A clean stop means searchd was shut down normally and had time to save its replication state before exiting. A crash, power loss, or kill -9 is not a clean stop.
Does the surviving part of the cluster still have quorum? Quorum means enough nodes can still see each other to safely remain the writable cluster.
If all nodes are down, which node should be started first to bring the cluster back?

Useful checks:

SHOW STATUS LIKE 'cluster_<name>_status'
SHOW STATUS LIKE 'cluster_<name>_size'
SHOW STATUS LIKE 'cluster_<name>_node_state'
if all nodes are down, inspect grastate.dat, the small replication state file stored in the cluster data directory. Look especially at seqno and safe_to_bootstrap: on a clean shutdown, the best node to start first is usually the one with the most advanced seqno and safe_to_bootstrap: 1. For the full bootstrap procedure, see Restarting a cluster.

Example of what grastate.dat can look like after a clean shutdown:

# saved replication state
version: 2.1
uuid:    <cluster-uuid>
seqno:   12345
safe_to_bootstrap: 1

In this example:

seqno: 12345 means this node knows about transactions up to sequence number 12345
safe_to_bootstrap: 1 means this node is marked as safe to start first

In a clean all-nodes-down recovery, this is usually the kind of node you start first with --new-cluster to bring the cluster back.

After recovery, wait until the restarted node reports cluster_<name>_status=primary and cluster_<name>_node_state=synced before treating it as fully writable again. You can check this with SHOW STATUS LIKE 'cluster_<name>_status' and SHOW STATUS LIKE 'cluster_<name>_node_state'. In local tests, restarted nodes sometimes spent a short time with cluster_<name>_node_state=joining and cluster_<name>_status=disconnected before reaching synced/primary.

If node A is stopped normally, nodes B and C keep serving writes. You can confirm that the cluster is still healthy on those nodes with SHOW STATUS LIKE 'cluster_<name>_status' and SHOW STATUS LIKE 'cluster_<name>_size'.

When node A starts again, it rejoins the cluster automatically. Until synchronization finishes, do not send writes to that node. Check SHOW STATUS LIKE 'cluster_<name>_status' and SHOW STATUS LIKE 'cluster_<name>_node_state' and wait for primary / synced.

If donor nodes B or C still have all the transactions that node A missed in their replication cache, node A can catch up using an incremental state transfer (IST). IST stands for incremental state transfer. It means the node receives only the transactions it missed, so recovery is usually faster and lighter. Otherwise it will require a snapshot state transfer (SST). SST stands for snapshot state transfer. It means copying table files from another node instead of just replaying the missing transactions. SST is heavier: it is usually slower, moves more data, and can make recovery more disruptive on large clusters.

If nodes A and B are stopped cleanly and node C remains online, node C can continue accepting writes. Check SHOW STATUS LIKE 'cluster_<name>_status' and SHOW STATUS LIKE 'cluster_<name>_size' on node C if you want to confirm it is now the only active node.

When nodes A and B start again, they rejoin automatically and synchronize from node C. While they are rejoining, check SHOW STATUS LIKE 'cluster_<name>_status', SHOW STATUS LIKE 'cluster_<name>_node_state', and SHOW STATUS LIKE 'cluster_<name>_size'. Wait until all nodes show primary / synced and the expected cluster size before treating recovery as complete.

If nodes A and B were intentionally removed from the cluster, for example with EXIT CLUSTER, and node C is now the only persisted node in the cluster, node C can also recover after its own clean restart with a normal daemon start. In that self-only case, --new-cluster is not required.

If all nodes in a multi-node cluster were stopped normally, the cluster is fully offline and must be started again in a special way so one node can become the first running node of the cluster.

On clean shutdown, each node writes its last transaction number to grastate.dat. The node that was stopped last is the safest node to start first:

it has the most advanced seqno
it has safe_to_bootstrap: 1

Start that node with --new-cluster. This tells Manticore to start a new copy of the cluster from that node. If you run Manticore via systemd on Linux, use manticore_new_cluster. It starts Manticore in --new-cluster mode for you.

After that, start the remaining nodes normally and let them rejoin. Verify recovery with SHOW STATUS LIKE 'cluster_<name>_status', SHOW STATUS LIKE 'cluster_<name>_node_state', and SHOW STATUS LIKE 'cluster_<name>_size'.

If you bootstrap a less advanced node first, a more advanced node may later join it and receive a full SST from an older state, which can discard transactions that existed only on the more advanced node. That is why the node with safe_to_bootstrap: 1 should be your first choice.

Use this procedure as a last resort when no node can restore the cluster through the normal recovery methods above, but the cluster's tables are still present in the local data directories. This can happen when saved cluster metadata is incomplete or incompatible with the running version, or when another startup failure leaves the old cluster membership unusable.

This procedure creates a new cluster. It discards the old cluster membership and replication history, then copies the tables from one selected node to the other nodes. Do not use it while any node still has a healthy primary / synced copy of the old cluster. In that case, keep the healthy cluster and recover the other nodes through JOIN CLUSTER.

Stop Manticore on every node that belonged to the failed cluster. Do not write to the local tables during this procedure.
Back up the full data_dir on every node. Keep a copy of the old cluster descriptor if it contains custom path, nodes, or provider options that you need when recreating the cluster.
Choose the node whose local tables contain the data you want to keep. This node will become the source for the new cluster. Adding its tables to the cluster later replaces tables with the same names on the other nodes.
On every node, edit <data_dir>/manticore.json. From the top-level clusters object, remove only the entry for the failed cluster. For example, remove clusters.posts for a cluster named posts.

Leave the indexes object and every table directory in place. Do not delete or recreate the tables.
Start Manticore normally on every node. The former cluster tables should now appear as local tables. Check the tables on the selected source node before continuing. If they are missing or incomplete, stop and restore the backup.
On the selected source node, create the cluster. Include any saved path, nodes, or provider options if you still need them. If authentication is enabled, add '<replication-user>' AS user to this statement and the JOIN CLUSTER statements below. The account must have matching stored authentication data and replication permission on every node. See setting up replication.
```
CREATE CLUSTER posts
```
On every other node, join the new cluster through the first node:
```
JOIN CLUSTER posts AT 'node-a.example:9312'
```
On the selected source node, attach the existing local tables to the new cluster:
```
ALTER CLUSTER posts ADD table_a, table_b
```
Wait until every node reports cluster_posts_status=primary and cluster_posts_node_state=synced before resuming writes.

If node A disappears because of a crash or a network problem, nodes B and C will first try to reconnect to it. If that fails, they remove it from the cluster, recalculate quorum, and continue working as the primary cluster.

In local tests this peer removal was not immediate: the surviving nodes stayed at the old cluster size for a few seconds before dropping the failed peer and switching to the smaller primary cluster.

When node A is started again, it rejoins automatically and catches up the same way as after a clean one-node shutdown. Again, use SHOW STATUS LIKE 'cluster_<name>_status', SHOW STATUS LIKE 'cluster_<name>_node_state', and SHOW STATUS LIKE 'cluster_<name>_size' to confirm recovery is finished.

If nodes A and B are lost and only node C is still running, node C no longer has quorum in a three-node cluster. It switches to non-primary and rejects writes.

The write error is explicit:

ERROR 1064 (42000): cluster '<name>' is not ready, not primary state (synced)

If nodes A and B are only temporarily disconnected but can still see each other, they may continue accepting writes while node C remains isolated. Use SHOW STATUS LIKE 'cluster_<name>_status' on each side if you need to see which side is still writable.

If nodes A and B really crashed and node C is the only surviving copy you want to keep working with, run this command on node C to make it writable again:

If you have confirmed that the other nodes are truly offline, run:

‹›

SQL
JSON

📋

SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1

Important:

run this only after you are sure the other nodes are unreachable
run it only on the side that must survive
after bootstrapping, the node can accept writes again and the other nodes can later rejoin from it

If every node crashed, grastate.dat is typically no longer trustworthy for normal bootstrap selection. In local tests, all nodes showed:

seqno: -1
safe_to_bootstrap: 0

In this situation, choose the node with the most recent data and start it with --new-cluster-force. This forces Manticore to start a new copy of the cluster from that node even though the usual clean-shutdown metadata is not trustworthy. If you run Manticore via systemd on Linux, use manticore_new_cluster --force. It starts Manticore in --new-cluster-force mode for you.

Then start the remaining nodes normally and let them rejoin. Verify recovery with SHOW STATUS LIKE 'cluster_<name>_status', SHOW STATUS LIKE 'cluster_<name>_node_state', and SHOW STATUS LIKE 'cluster_<name>_size'.

Split-brain risk is highest in even-sized clusters. For example, imagine four nodes split into two isolated pairs across two data centers. Each side has exactly half of the original members, so neither side has quorum and both sides stop accepting writes.

If you must restore writes before connectivity is fixed, choose only one side of the split and run the same recovery command there so that side becomes writable again. Before doing that, check SHOW STATUS LIKE 'cluster_<name>_status' on both sides so you know which side is currently non-primary.

Choose the side that should remain the writable cluster, then run:

‹›

SQL
JSON

📋

SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1

Never issue that statement on both sides. If you do, you will create two separate primary clusters, and they will not merge back automatically when the network recovers.

In local testing, abruptly losing half of a four-node cluster reproduced the same non-primary behavior and the same recovery command brought one surviving half back to primary.

Restarting a cluster 4️⃣ Connecting to the server

Last modified: June 30, 2026

Managing replication nodes

Removing node from cluster

EXIT CLUSTER

Replication cluster status

SST Progress Metrics

Restarting a cluster

Cluster recovery

Before you recover anything

One node was shut down cleanly

Two nodes were shut down cleanly and one node stayed online

All nodes were shut down cleanly

Recreate a cluster from existing local tables

One node crashed or became unreachable

Two nodes are gone and only one node is still running

All nodes crashed

An even-sized cluster lost quorum, for example after a split