Adding a new node

To add another node to a cluster just start another node of Manticore and make sure it's accessible by other nodes in your cluster. Then use a distributed table to connect one node with another and replication for data safety.

Adding a distributed table with remote agents

Please read the article about distributed tables for general overview of distributed tables. Here we focus on using a distributed table as a basis for creating a cluster of Manticore instances.

Here we have split the data over 4 servers, each serving one of the shards. If one of the servers fails, our distributed table will still work, but we would miss the results from the failed shard.

‹›
  • ini
ini
📋
table mydist {
          type  = distributed
          agent = box1:9312:shard1
          agent = box2:9312:shard2
          agent = box3:9312:shard3
          agent = box4:9312:shard4
}

Now that we've added mirrors, each shard is found on 2 servers. By default, the master (the searchd instance with the distributed table) will pick randomly one of the mirrors.

The mode used for picking mirrors can be set with ha_strategy. In addition to the default random mode there's also ha_strategy = roundrobin.

More interesting strategies are those based on latency-weighted probabilities. noerrors and nodeads: not only those take out mirrors with issues, but also monitor the response times and do balancing. If a mirror responds slower (for example due to some operations running on it), it will receive less requests. When the mirror recovers and provides better times, it will get more requests.

‹›
  • ini
ini
📋
table mydist {
          type  = distributed
          agent = box1:9312|box5:9312:shard1
          agent = box2:9312:|box6:9312:shard2
          agent = box3:9312:|box7:9312:shard3
          agent = box4:9312:|box8:9312:shard4
}

Mirroring

Agent mirrors can be used interchangeably when processing a search query. Manticore instance (can be multiple) hosting the distributed table where the mirrored agents are defined keeps track of mirror status (alive or dead) and response times, and does automatic failover and load balancing based on that.

Agent mirrors

agent = node1|node2|node3:9312:shard2

The above example declares that 'node1:9312', 'node2:9312', and 'node3:9312' all have a table called shard2, and can be used as interchangeable mirrors. If any single of those servers go down, the queries will be distributed between the other two. When it gets back up, master will detect that and begin routing queries to all three nodes again.

Mirror may also include individual table list, as:

agent = node1:9312:node1shard2|node2:9312:node2shard2

This works essentially the same as the previous example, but different table names will be used when querying different severs: node1shard2 when querying node1:9312, and node2shard when querying node2:9312.

By default, all queries are routed to the best of the mirrors. The best one is picked based on the recent statistics, as controlled by the ha_period_karma config directive. Master stores a number of metrics (total query count, error count, response time, etc) recently observed for every agent. It groups those by time spans, and karma is that time span length. The best agent mirror is then determined dynamically based on the last 2 such time spans. Specific algorithm that will be used to pick a mirror can be configured ha_strategy directive.

The karma period is in seconds and defaults to 60 seconds. Master stores up to 15 karma spans with per-agent statistics for instrumentation purposes (see SHOW AGENT STATUS statement). However, only the last 2 spans out of those are ever used for HA/LB logic.

When there are no queries, master sends a regular ping command every ha_ping_interval milliseconds in order to have some statistics and at least check, whether the remote host is still alive. ha_ping_interval defaults to 1000 msec. Setting it to 0 disables pings and statistics will only be accumulated based on actual queries.

Example:

# sharding table over 4 servers total
# in just 2 shards but with 2 failover mirrors for each shard
# node1, node2 carry shard1 as local
# node3, node4 carry shard2 as local

# config on node1, node2
agent = node3:9312|node4:9312:shard2

# config on node3, node4
agent = node1:9312|node2:9312:shard1