Searching > Multi-queries | Manticore Search Manual

Manticore Search returns by default the top 20 matched documents in the result set.

For SQL the navigation over the result set can be done with LIMIT clause.

LIMIT can accept one number as the size of the returned set using a zero offset or a pair of offset and size.

If HTTP JSON is used, the nodes offset and limit can control the offset of the result set and the size of the returned set. Alternatively the pair size and from can be used instead.

‹›

SQL
JSON

📋

SELECT  ... FROM ...  [LIMIT [offset,] row_count]
SELECT  ... FROM ...  [LIMIT row_count][ OFFSET offset]

By default, Manticore Search uses a result set window of 1000 best ranked documents that can be returned back in the result set. If the result set is paginated beyond this value, the query will end in error.

This limitation can be adjusted with the query option max_matches.

Increasing the max_matches to very high values should be made only if it's required for the navigation to reach such points. High max_matches value requires more memory used and can increase the query response time. One way to work with deep result sets is to set max_matches as a sum of offset and limit.

Lowering max_matches under 1000 has a benefit in reducing the memory used by the query. It can also reduce the query time, but in most cases it might not be a noticeable gain.

‹›

SQL
JSON

📋

SELECT  ... FROM ...   OPTION max_matches=<value>

Distributed searching

Last modified: January 18, 2023

To scale well, Manticore has distributed searching capabilities. Distributed searching is useful to improve query latency (i.e. search time) and throughput (ie. max queries/sec) in multi-server, multi-CPU or multi-core environments. This is essential for applications which need to search through huge amounts data (ie. billions of records and terabytes of text).

The key idea is to horizontally partition searched data across search nodes and then process it in parallel.

Partitioning is done manually. You should:

setup several instances of Manticore on different servers
put different parts of your dataset to different instances
configure a special distributed table on some of the searchd instances
and route your queries to the distributedtable

This kind of table only contains references to other local and remote tables - so it could not be directly reindexed, and you should reindex those tables which it references instead.

When Manticore receives a query against distributed table, it does the following:

connects to configured remote agents
issues the query to them
at the same time searches configured local tables (while the remote agents are searching)
retrieves remote agent's search results
merges all the results together, removing the duplicates
sends the merged results to client

From the application's point of view, there are no differences between searching through a regular table, or a distributed table at all. That is, distributed tables are fully transparent to the application, and actually there's no way to tell whether the table you queried was distributed or local.

Read more about remote nodes.

Pagination Multi-queries

Last modified: December 02, 2022

Multi-queries, or query batches, let you send multiple search queries to Manticore in one go (more formally, one network request).

👍 Why use multi-queries?

Generally, it all boils down to performance. First, by sending requests to Manticore in a batch instead of one by one, you always save a bit by doing less network round-trips. Second, and somewhat more important, sending queries in a batch enables Manticore to perform certain internal optimizations. In the case when there aren't any possible batch optimizations to apply, queries will be processed one by one internally.

⛔ When not to use multi-queries?

Multi-queries require all the search queries in a batch to be independent, and sometimes they aren't. That is, sometimes query B is based on query A results, and so can only be set up after executing query A. For instance, you might want to display results from a secondary index if and only if there were no results found in a primary table. Or maybe just specify offset into 2nd result set based on the amount of matches in the 1st result set. In that case, you will have to use separate queries (or separate batches).

You can run multiple search queries with SQL by just separating them with a semicolon. When Manticore receives a query formatted like that from a client all the inter-statement optimizations will be applied.

Multi-queries don't support queries with FACET. The number of multi-queries in one batch shoudln't exceed max_batch_queries.

‹›

SQL

SQL

📋

SELECT id, price FROM products WHERE MATCH('remove hair') ORDER BY price DESC; SELECT id, price FROM products WHERE MATCH('remove hair') ORDER BY price ASC

There are two major optimizations to be aware of: common query optimization and common subtree optimization.

Common query optimization means that searchd will identify all those queries in a batch where only the sorting and group-by settings differ, and only perform searching once. For instance, if a batch consists of 3 queries, all of them are for "ipod nano", but 1st query requests top-10 results sorted by price, 2nd query groups by vendor ID and requests top-5 vendors sorted by rating, and 3rd query requests max price, full-text search for "ipod nano" will only be performed once, and its results will be reused to build 3 different result sets.

Faceted search is a particularly important case that benefits from this optimization. Indeed, faceted searching can be implemented by running a number of queries, one to retrieve search results themselves, and a few other ones with same full-text query but different group-by settings to retrieve all the required groups of results (top-3 authors, top-5 vendors, etc). And as long as full-text query and filtering settings stay the same, common query optimization will trigger, and greatly improve performance.

Common subtree optimization is even more interesting. It lets searchd exploit similarities between batched full-text queries. It identifies common full-text query parts (subtrees) in all queries, and caches them between queries. For instance, look at the following query batch:

donald trump president
donald trump barack obama john mccain
donald trump speech

There's a common two-word part donald trump that can be computed only once, then cached and shared across the queries. And common subtree optimization does just that. Per-query cache size is strictly controlled by subtree_docs_cache and subtree_hits_cache directives (so that caching all sixteen gazillions of documents that match "i am" does not exhaust the RAM and instantly kill your server).

How to tell whether the queries in the batch were actually optimized? If they were, respective query log will have a "multiplier" field that specifies how many queries were processed together:

Note the "x3" field. It means that this query was optimized and processed in a sub-batch of 3 queries.

‹›

log

log

📋

[Sun Jul 12 15:18:17.000 2009] 0.040 sec x3 [ext/0/rel 747541 (0,20)] [lj] the
[Sun Jul 12 15:18:17.000 2009] 0.040 sec x3 [ext/0/ext 747541 (0,20)] [lj] the
[Sun Jul 12 15:18:17.000 2009] 0.040 sec x3 [ext/0/ext 747541 (0,20)] [lj] the

For reference, this is how the regular log would look like if the queries were not batched:

‹›

log

log

📋

[Sun Jul 12 15:18:17.062 2009] 0.059 sec [ext/0/rel 747541 (0,20)] [lj] the
[Sun Jul 12 15:18:17.156 2009] 0.091 sec [ext/0/ext 747541 (0,20)] [lj] the
[Sun Jul 12 15:18:17.250 2009] 0.092 sec [ext/0/ext 747541 (0,20)] [lj] the

Note how per-query time in multi-query case was improved by a factor of 1.5x to 2.3x, depending on a particular sorting mode.

Distributed searching Sub-selects

Last modified: December 02, 2022

Manticore supports SELECT subqueries via SQL in the following format:

SELECT * FROM (SELECT ... ORDER BY cond1 LIMIT X) ORDER BY cond2 LIMIT Y

The outer select allows only ORDER BY and LIMIT clauses. Sub-selects queries currently have 2 usage cases:

We have a query with 2 ranking UDFs, one very fast and the other slow and we perform a full-text search with a big match result set. Without subselect the query would look like
```
 SELECT id,slow_rank() as slow,fast_rank() as fast FROM index 
     WHERE MATCH(‘some common query terms’) ORDER BY fast DESC, slow DESC LIMIT 20 
     OPTION max_matches=1000;
```
With sub-selects the query can be rewritten as:
```
 SELECT * FROM
     (SELECT id,slow_rank() as slow,fast_rank() as fast FROM index WHERE 
         MATCH(‘some common query terms’)
         ORDER BY fast DESC LIMIT 100 OPTION max_matches=1000)
 ORDER BY slow DESC LIMIT 20;
```
In the initial query the slow_rank() UDF is computed for the entire match result set. With SELECT sub-queries only fast_rank() is computed for the entire match result set, while slow_rank() is only computed for a limited set.
The second case comes handy for large result set coming from a distributed table.

For this query:
```
 SELECT * FROM my_dist_index WHERE some_conditions LIMIT 50000;
```
If we have 20 nodes, each node can send back to master a number of 50K records, resulting in 20 x 50K = 1M records, however as the master sends back only 50K (out of 1M), it might be good enough for us for the nodes to send only the top 10K records. With sub-select we can rewrite the query as:
```
 SELECT * FROM 
      (SELECT * FROM my_dist_index WHERE some_conditions LIMIT 10000) 
  ORDER by some_attr LIMIT 50000;
```
In this case, the nodes receive only the inner query and execute. This means the master will receive only 20x10K=200K records. The master will take all the records received, reorder them by the OUTER clause and return the best 50K records. The sub-select help reducing the traffic between the master and the nodes and also reduce the master's computation time (as it process only 200K instead of 1M).

Multi-queries Grouping

Last modified: December 02, 2022

Pagination of search results

SQL

HTTP JSON

Result set window

Distributed searching

Multi-queries

Multi-queries optimizations

Sub-selects