Distributed searching

To scale well, Manticore has distributed searching capabilities. Distributed searching is useful to improve query latency (i.e. search time) and throughput (ie. max queries/sec) in multi-server, multi-CPU or multi-core environments. This is essential for applications which need to search through huge amounts data (ie. billions of records and terabytes of text).

The key idea is to horizontally partition searched data across search nodes and then process it in parallel.

Partitioning is done manually. You should:

  • setup several instances of Manticore on different servers
  • put different parts of your dataset to different instances
  • configure a special distributed table on some of the searchd instances
  • and route your queries to the distributedtable

This kind of table only contains references to other local and remote tables - so it could not be directly reindexed, and you should reindex those tables which it references instead.

When Manticore receives a query against distributed table, it does the following:

  1. connects to configured remote agents
  2. issues the query to them
  3. at the same time searches configured local tables (while the remote agents are searching)
  4. retrieves remote agent's search results
  5. merges all the results together, removing the duplicates
  6. sends the merged results to client

From the application's point of view, there are no differences between searching through a regular table, or a distributed table at all. That is, distributed tables are fully transparent to the application, and actually there's no way to tell whether the table you queried was distributed or local.

Read more about remote nodes.