Geo search

One of the greatest features of Manticore Search is the ability to combine full-text searching with geo-location. For example, a retailer can offer a search where a user looks for a product, and the result set can indicate the closest shop that has the product in stock, so the user can go in-store and pick it up. A travel site can provide results based on a search limited to a certain area and have the results sorted by the distance from a point (for example, 'search museums near a hotel').

To perform geo-searching, a document needs to contain pairs of latitude/longitude coordinates. The coordinates can be stored as float attributes. If the document has multiple locations, it may be convenient to use a JSON attribute to store coordinate pairs.

table myrt
{
    ...
    rt_attr_float = lat
    rt_attr_float = lon
    ...
}

The coordinates can be stored as degrees or radians.

Performing distance сalculation

To find out the distance between two points, the GEODIST() function can be used. GEODIST requires two pairs of coordinates as its first four parameters.

The 5th parameter in a simplified JSON format can configure certain aspects of the function. By default, GEODIST expects coordinates to be in radians, but in=degrees can be added to allow using degrees as input. The coordinates for which we perform the geo distance must have the same type (degrees or radians) as the ones stored in the table; otherwise, results will be misleading.

The calculated distance is by default in meters, but with the out option, it can be transformed to kilometers, feet, or miles. Lastly, by default, a calculation method called adaptive is used. An alternative method based on the haversine algorithm is available; however, this one is slower and less precise.

The result of the function - the distance - can be used in theORDER BY clause to sort the results:

SELECT *, GEODIST(40.7643929, -73.9997683, lat, lon, {in=degrees, out=miles}) AS distance FROM myindex WHERE MATCH('...') ORDER BY distance ASC, WEIGHT() DESC;

Or to limit the results to a radial area around the point:

SELECT *,GEODIST(40.7643929, -73.9997683, lat,lon, {in=degrees, out=miles}) AS distance FROM myindex WHERE MATCH('...') AND distance <1000 ORDER BY WEIGHT(), DISTANCE ASC;

Searching in polygons

Another geo search feature is the ability to determine if a location is within a specified area. A special function constructs a polygon object, which is then used by another function to test whether a set of coordinates is contained within that polygon or not.

There are two functions available for creating the polygon:

  • GEOPOLY2D() - creates a polygon that takes into account the Earth's curvature
  • POLY2D() - creates a simple polygon in flat space

POLY2D is suitable for geo searches when the area has sides shorter than 500km (for polygons with 3-4 sides; for polygons with more sides, lower values should be considered). For areas with longer sides, using GEOPOLY2D is required to maintain accurate results. GEOPOLY2D expects coordinates as latitude/longitude pairs in degrees; using radians will yield results in flat space (similar to POLY2D).

CONTAINS() takes a polygon and a set of coordinates as input and outputs 1 if the point is inside the polygon or 0 otherwise.

SELECT *,CONTAINS(GEOPOLY2D(40.76439, -73.9997, 42.21211, -73.999,  42.21211, -76.123, 40.76439, -76.123), 41.5445, -74.973) AS inside FROM myindex WHERE MATCH('...') AND inside=1;

Percolate Query

Percolate queries are also known as Persistent queries, Prospective search, document routing, search in reverse, and inverse search.

The traditional way of conducting searches involves storing documents and performing search queries against them. However, there are cases where we want to apply a query to a newly incoming document to signal a match. Some scenarios where this is desired include monitoring systems that collect data and notify users about specific events, such as reaching a certain threshold for a metric or a particular value appearing in the monitored data. Another example is news aggregation, where users may want to be notified only about certain categories or topics, or even specific "keywords."

In these situations, traditional search is not the best fit, as it assumes the desired search is performed over the entire collection. This process gets multiplied by the number of users, resulting in many queries running over the entire collection, which can cause significant additional load. The alternative approach described in this section involves storing the queries instead and testing them against an incoming new document or a batch of documents.

Google Alerts, AlertHN, Bloomberg Terminal, and other systems that allow users to subscribe to specific content utilize similar technology.

Performing a percolate query with CALL PQ

The key thing to remember about percolate queries is that your search queries are already in the table. What you need to provide are documents to check if any of them match any of the stored rules.

You can perform a percolate query via SQL or JSON interfaces, as well as using programming language clients. The SQL approach offers more flexibility, while the HTTP method is simpler and provides most of what you need. The table below can help you understand the differences.

Desired Behavior SQL HTTP PHP
Provide a single document CALL PQ('tbl', '{doc1}') query.percolate.document{doc1} $client->pq()->search([$percolate])
Provide a single document (alternative) CALL PQ('tbl', 'doc1', 0 as docs_json) -
Provide multiple documents CALL PQ('tbl', ('doc1', 'doc2'), 0 as docs_json) query.percolate.documents[{doc1}, {doc2}] $client->pq()->search([$percolate])
Provide multiple documents (alternative) CALL PQ('tbl', ('{doc1}', '{doc2}')) - -
Provide multiple documents (alternative) CALL PQ('tbl', '[{doc1}, {doc2}]') - -
Return matching document ids 0/1 as docs (disabled by default) Enabled by default Enabled by default
Use document's own id to show in the result 'id field' as docs_id (disabled by default) Not available Not available
Consider input documents are JSON 1 as docs_json (1 by default) Enabled by default Enabled by default
Consider input documents are plain text 0 as docs_json (1 by default) Not available Not available
Sparsed distribution mode default default default
Sharded distribution mode sharded as mode Not available Not available
Return all info about matching query 1 as query (0 by default) Enabled by default Enabled by default
Skip invalid JSON 1 as skip_bad_json (0 by default) Not available Not available
Extended info in SHOW META 1 as verbose (0 by default) Not available Not available
Define the number which will be added to document ids if no docs_id fields provided (mostly relevant in distributed PQ modes) 1 as shift (0 by default) Not available Not available

To demonstrate how this works, here are a few examples. Let's create a PQ table with two fields:

  • title (text)
  • color (string)

and three rules in it:

  • Just full-text. Query: @title bag
  • Full-text and filtering. Query: @title shoes. Filters: color='red'
  • Full-text and more complex filtering. Query: @title shoes. Filters: color IN('blue', 'green')
‹›
  • SQL
  • JSON
  • PHP
  • Python
  • javascript
  • Java
  • C#
  • TypeScript
  • Go
📋
CREATE TABLE products(title text, color string) type='pq';
INSERT INTO products(query) values('@title bag');
INSERT INTO products(query,filters) values('@title shoes', 'color=\'red\'');
INSERT INTO products(query,filters) values('@title shoes', 'color in (\'blue\', \'green\')');
select * from products;
‹›
Response
+---------------------+--------------+------+---------------------------+
| id                  | query        | tags | filters                   |
+---------------------+--------------+------+---------------------------+
| 1657852401006149635 | @title shoes |      | color IN ('blue, 'green') |
| 1657852401006149636 | @title shoes |      | color='red'               |
| 1657852401006149637 | @title bag   |      |                           |
+---------------------+--------------+------+---------------------------+
Just tell me what PQ rules match my single document

The first document doesn't match any rules. It could match the first two, but they require additional filters.

The second document matches one rule. Note that CALL PQ by default expects a document to be a JSON, but if you use 0 as docs_json, you can pass a plain string instead.

‹›
  • SQL
  • JSON
  • PHP
  • Python
  • javascript
  • Java
  • C#
  • TypeScript
  • Go
📋
CALL PQ('products', 'Beautiful shoes', 0 as docs_json);

CALL PQ('products', 'What a nice bag', 0 as docs_json);
CALL PQ('products', '{"title": "What a nice bag"}');
‹›
Response
+---------------------+
| id                  |
+---------------------+
| 1657852401006149637 |
+---------------------+

+---------------------+
| id                  |
+---------------------+
| 1657852401006149637 |
+---------------------+
I want to know complete PQ rules matching my document
‹›
  • SQL
  • JSON
  • PHP
  • Python
  • javascript
  • Java
  • C#
  • TypeScript
  • Go
📋
CALL PQ('products', '{"title": "What a nice bag"}', 1 as query);
‹›
Response
+---------------------+------------+------+---------+
| id                  | query      | tags | filters |
+---------------------+------------+------+---------+
| 1657852401006149637 | @title bag |      |         |
+---------------------+------------+------+---------+
How about multiple documents?

Note that with CALL PQ, you can provide multiple documents in different ways:

  • as an array of plain documents in round brackets ('doc1', 'doc2'). This requires 0 as docs_json
  • as an array of JSONs in round brackets ('{doc1}', '{doc2}')
  • or as a standard JSON array '[{doc1}, {doc2}]'
‹›
  • SQL
  • JSON
  • PHP
  • Python
  • javascript
  • Java
  • C#
  • TypeScript
  • Go
📋
CALL PQ('products', ('nice pair of shoes', 'beautiful bag'), 1 as query, 0 as docs_json);

CALL PQ('products', ('{"title": "nice pair of shoes", "color": "red"}', '{"title": "beautiful bag"}'), 1 as query);

CALL PQ('products', '[{"title": "nice pair of shoes", "color": "blue"}, {"title": "beautiful bag"}]', 1 as query);
‹›
Response
+---------------------+------------+------+---------+
| id                  | query      | tags | filters |
+---------------------+------------+------+---------+
| 1657852401006149637 | @title bag |      |         |
+---------------------+------------+------+---------+

+---------------------+--------------+------+-------------+
| id                  | query        | tags | filters     |
+---------------------+--------------+------+-------------+
| 1657852401006149636 | @title shoes |      | color='red' |
| 1657852401006149637 | @title bag   |      |             |
+---------------------+--------------+------+-------------+

+---------------------+--------------+------+---------------------------+
| id                  | query        | tags | filters                   |
+---------------------+--------------+------+---------------------------+
| 1657852401006149635 | @title shoes |      | color IN ('blue, 'green') |
| 1657852401006149637 | @title bag   |      |                           |
+---------------------+--------------+------+---------------------------+
I want to know what docs match what rules

Using the option 1 as docs allows you to see which documents of the provided ones match which rules.

‹›
  • SQL
  • JSON
  • PHP
  • Python
  • javascript
  • Java
  • C#
  • TypeScript
  • Go
📋
CALL PQ('products', '[{"title": "nice pair of shoes", "color": "blue"}, {"title": "beautiful bag"}]', 1 as query, 1 as docs);
‹›
Response
+---------------------+-----------+--------------+------+---------------------------+
| id                  | documents | query        | tags | filters                   |
+---------------------+-----------+--------------+------+---------------------------+
| 1657852401006149635 | 1         | @title shoes |      | color IN ('blue, 'green') |
| 1657852401006149637 | 2         | @title bag   |      |                           |
+---------------------+-----------+--------------+------+---------------------------+

Static ids

By default, matching document ids correspond to their relative numbers in the list you provide. However, in some cases, each document already has its own id. For this case, there's an option 'id field name' as docs_id for CALL PQ.

Note that if the id cannot be found by the provided field name, the PQ rule will not be shown in the results.

This option is only available for CALL PQ via SQL.

‹›
  • SQL
SQL
📋
CALL PQ('products', '[{"id": 123, "title": "nice pair of shoes", "color": "blue"}, {"id": 456, "title": "beautiful bag"}]', 1 as query, 'id' as docs_id, 1 as docs);
‹›
Response
+---------------------+-----------+--------------+------+---------------------------+
| id                  | documents | query        | tags | filters                   |
+---------------------+-----------+--------------+------+---------------------------+
| 1657852401006149664 | 456       | @title bag   |      |                           |
| 1657852401006149666 | 123       | @title shoes |      | color IN ('blue, 'green') |
+---------------------+-----------+--------------+------+---------------------------+
I may have invalid JSONs, please skip them

When using CALL PQ with separate JSONs, you can use the option 1 as skip_bad_json to skip any invalid JSONs in the input. In the example below, the 2nd query fails due to an invalid JSON, but the 3rd query avoids the error by using 1 as skip_bad_json. Keep in mind that this option is not available when sending JSON queries over HTTP, as the whole JSON query must be valid in that case.

‹›
  • SQL
SQL
📋
CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag"}'));

CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag}'));

CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag}'), 1 as skip_bad_json);
‹›
Response
+---------------------+
| id                  |
+---------------------+
| 1657852401006149635 |
| 1657852401006149637 |
+---------------------+

ERROR 1064 (42000): Bad JSON objects in strings: 2

+---------------------+
| id                  |
+---------------------+
| 1657852401006149635 |
+---------------------+
I want higher performance of a percolate query

Percolate queries are designed with high throughput and large data volumes in mind. To optimize performance for lower latency and higher throughput, consider the following.

There are two modes of distribution for a percolate table and how a percolate query can work against it:

  • Sparse (default). Ideal for: many documents, mirrored PQ tables. When your document set is large but the set of queries stored in the PQ table is small, the sparse mode is beneficial. In this mode, the batch of documents you pass will be divided among the number of agents, so each node processes only a portion of the documents from your request. Manticore splits your document set and distributes chunks among the mirrors. Once the agents have finished processing the queries, Manticore collects and merges the results, returning a final query set as if it came from a single table. Use replication to assist the process.
  • Sharded. Ideal for: many PQ rules, rules split among PQ tables. In this mode, the entire document set is broadcast to all tables of the distributed PQ table without initially splitting the documents. This is beneficial when pushing a relatively small set of documents, but the number of stored queries is large. In this case, it's more appropriate to store only a portion of PQ rules on each node and then merge the results returned from the nodes that process the same set of documents against different sets of PQ rules. This mode must be explicitly set, as it implies an increase in network payload and expects tables with different PQs, which replication cannot do out-of-the-box.

Assume you have table pq_d2 defined as:

table pq_d2
{
    type = distributed
    agent = 127.0.0.1:6712:pq
    agent = 127.0.0.1:6712:ptitle
}

Each of 'pq' and 'ptitle' contains:

‹›
  • SQL
  • JSON
  • PHP
  • Python
  • javascript
  • Java
  • C#
  • TypeScript
  • Go
📋
SELECT * FROM pq;
‹›
Response
+------+-------------+------+-------------------+
| id   | query       | tags | filters           |
+------+-------------+------+-------------------+
|    1 | filter test |      | gid>=10           |
|    2 | angry       |      | gid>=10 OR gid<=3 |
+------+-------------+------+-------------------+
2 rows in set (0.01 sec)

And you execute CALL PQ on the distributed table with a couple of documents.

‹›
  • SQL
  • JSON
  • PHP
  • Python
  • javascript
  • Java
  • C#
  • TypeScript
  • Go
📋
CALL PQ ('pq_d2', ('{"title":"angry test", "gid":3 }', '{"title":"filter test doc2", "gid":13}'), 1 AS docs);
‹›
Response
+------+-----------+
| id   | documents |
+------+-----------+
|    1 | 2         |
|    2 | 1         |
+------+-----------+

In the previous example, we used the default sparse mode. To demonstrate the sharded mode, let's create a distributed PQ table consisting of 2 local PQ tables and add 2 documents to "products1" and 1 document to "products2":

create table products1(title text, color string) type='pq';
create table products2(title text, color string) type='pq';
create table products_distributed type='distributed' local='products1' local='products2';

INSERT INTO products1(query) values('@title bag');
INSERT INTO products1(query,filters) values('@title shoes', 'color=\'red\'');
INSERT INTO products2(query,filters) values('@title shoes', 'color in (\'blue\', \'green\')');

Now, if you add 'sharded' as mode to CALL PQ, it will send the documents to all the agent's tables (in this case, just local tables, but they can be remote to utilize external hardware). This mode is not available via the JSON interface.

‹›
  • SQL
SQL
📋
CALL PQ('products_distributed', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag"}'), 'sharded' as mode, 1 as query);
‹›
Response
+---------------------+--------------+------+---------------------------+
| id                  | query        | tags | filters                   |
+---------------------+--------------+------+---------------------------+
| 1657852401006149639 | @title bag   |      |                           |
| 1657852401006149643 | @title shoes |      | color IN ('blue, 'green') |
+---------------------+--------------+------+---------------------------+

Note that the syntax of agent mirrors in the configuration (when several hosts are assigned to one agent line, separated with |) has nothing to do with the CALL PQ query mode. Each agent always represents one node, regardless of the number of HA mirrors specified for that agent.

How can I learn more about performance?

In some cases, you might want to get more details about the performance of a percolate query. For that purpose, there is the option 1 as verbose, which is only available via SQL and allows you to save more performance metrics. You can see them using the SHOW META query, which you can run after CALL PQ. See SHOW META for more info.

‹›
  • 1 as verbose
  • 0 as verbose
📋
CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag"}'), 1 as verbose); show meta;
‹›
Response
+---------------------+
| id                  |
+---------------------+
| 1657852401006149644 |
| 1657852401006149646 |
+---------------------+
+-------------------------+-----------+
| Name                    | Value     |
+-------------------------+-----------+
| Total                   | 0.000 sec |
| Setup                   | 0.000 sec |
| Queries matched         | 2         |
| Queries failed          | 0         |
| Document matched        | 2         |
| Total queries stored    | 3         |
| Term only queries       | 3         |
| Fast rejected queries   | 0         |
| Time per query          | 27, 10    |
| Time of matched queries | 37        |
+-------------------------+-----------+

Autocomplete

Autocomplete (or word completion) is a feature in which an application predicts the rest of a word a user is typing. On websites, it's used in search boxes, where a user starts to type a word, and a dropdown with suggestions pops up so the user can select the ending from the list.

Autocomplete

There are a few ways you can do autocomplete in Manticore:

Autocomplete a sentence

To autocomplete a sentence, you can use infixed search. You can find endings of a document's field by providing its beginning and:

  • using full-text operators * to match anything it substitutes
  • and optionally ^ to start from the beginning of the field
  • and perhaps "" for phrase matching
  • and optionally highlight the results so you don't have to fetch them in full to your application

There is an article about it in our blog and an interactive course. A quick example is:

  • Let's assume you have a document: My cat loves my dog. The cat (Felis catus) is a domestic species of small carnivorous mammal.
  • Then you can use ^, "", and * so as the user is typing, you make queries like: ^"m*", ^"my *", ^"my c*", ^"my ca*" and so on
  • It will find the document, and if you also do highlighting, you will get something like: <strong>My cat</strong> loves my dog. The cat ( ...
Autocomplete a word

In some cases, all you need is to autocomplete a single word or a couple of words. In this case, you can use CALL KEYWORDS.

CALL KEYWORDS

CALL KEYWORDS is available through the SQL interface and offers a way to examine how keywords are tokenized or to obtain the tokenized forms of specific keywords. If the table enables infixes, it allows you to quickly find possible endings for given keywords, making it suitable for autocomplete functionality.

This is a great alternative to general infixed search, as it provides higher performance since it only needs the table's dictionary, not the documents themselves.

General syntax

CALL KEYWORDS(text, table [, options])

The CALL KEYWORDS statement divides text into keywords. It returns the tokenized and normalized forms of the keywords, and if desired, keyword statistics. Additionally, it provides the position of each keyword in the query and all forms of tokenized keywords when the table enables lemmatizers.

Parameter Description
text Text to break down to keywords
table Name of the table from which to take the text processing settings
0/1 as stats Show statistics of keywords, default is 0
0/1 as fold_wildcards Fold wildcards, default is 0
0/1 as fold_lemmas Fold morphological lemmas, default is 0
0/1 as fold_blended Fold blended words, default is 0
N as expansion_limit Override expansion_limit defined in the server configuration, default is 0 (use value from the configuration)
docs/hits as sort_mode Sort output results by either 'docs' or 'hits'. Default no sorting

The examples show how it works if assuming the user is trying to get an autocomplete for "my cat ...". So on the application side all you need to do is to suggest the user the endings from the column "normalized" for each new word. It often makes sense to sort by hits or docs using 'hits' as sort_mode or 'docs' as sort_mode.

‹›
  • Examples
Examples
📋
MySQL [(none)]> CALL KEYWORDS('m*', 't', 1 as stats);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | m*        | my         | 1    | 2    |
| 1    | m*        | mammal     | 1    | 1    |
+------+-----------+------------+------+------+

MySQL [(none)]> CALL KEYWORDS('my*', 't', 1 as stats);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | my*       | my         | 1    | 2    |
+------+-----------+------------+------+------+

MySQL [(none)]> CALL KEYWORDS('c*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+-------------+------+------+
| qpos | tokenized | normalized  | docs | hits |
+------+-----------+-------------+------+------+
| 1    | c*        | cat         | 1    | 2    |
| 1    | c*        | carnivorous | 1    | 1    |
| 1    | c*        | catus       | 1    | 1    |
+------+-----------+-------------+------+------+

MySQL [(none)]> CALL KEYWORDS('ca*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+-------------+------+------+
| qpos | tokenized | normalized  | docs | hits |
+------+-----------+-------------+------+------+
| 1    | ca*       | cat         | 1    | 2    |
| 1    | ca*       | carnivorous | 1    | 1    |
| 1    | ca*       | catus       | 1    | 1    |
+------+-----------+-------------+------+------+

MySQL [(none)]> CALL KEYWORDS('cat*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | cat*      | cat        | 1    | 2    |
| 1    | cat*      | catus      | 1    | 1    |
+------+-----------+------------+------+------+

There is a nice trick how you can improve the above algorithm - use bigram_index. When you have it enabled for the table what you get in it is not just a single word, but each pair of words standing one after another indexed as a separate token.

This allows to predict not just the current word's ending, but the next word too which is especially beneficial for the purpose of autocomplete.

‹›
  • Examples
Examples
📋
MySQL [(none)]> CALL KEYWORDS('m*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | m*        | my         | 1    | 2    |
| 1    | m*        | mammal     | 1    | 1    |
| 1    | m*        | my cat     | 1    | 1    |
| 1    | m*        | my dog     | 1    | 1    |
+------+-----------+------------+------+------+

MySQL [(none)]> CALL KEYWORDS('my*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | my*       | my         | 1    | 2    |
| 1    | my*       | my cat     | 1    | 1    |
| 1    | my*       | my dog     | 1    | 1    |
+------+-----------+------------+------+------+

MySQL [(none)]> CALL KEYWORDS('c*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+--------------------+------+------+
| qpos | tokenized | normalized         | docs | hits |
+------+-----------+--------------------+------+------+
| 1    | c*        | cat                | 1    | 2    |
| 1    | c*        | carnivorous        | 1    | 1    |
| 1    | c*        | carnivorous mammal | 1    | 1    |
| 1    | c*        | cat felis          | 1    | 1    |
| 1    | c*        | cat loves          | 1    | 1    |
| 1    | c*        | catus              | 1    | 1    |
| 1    | c*        | catus is           | 1    | 1    |
+------+-----------+--------------------+------+------+

MySQL [(none)]> CALL KEYWORDS('ca*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+--------------------+------+------+
| qpos | tokenized | normalized         | docs | hits |
+------+-----------+--------------------+------+------+
| 1    | ca*       | cat                | 1    | 2    |
| 1    | ca*       | carnivorous        | 1    | 1    |
| 1    | ca*       | carnivorous mammal | 1    | 1    |
| 1    | ca*       | cat felis          | 1    | 1    |
| 1    | ca*       | cat loves          | 1    | 1    |
| 1    | ca*       | catus              | 1    | 1    |
| 1    | ca*       | catus is           | 1    | 1    |
+------+-----------+--------------------+------+------+

MySQL [(none)]> CALL KEYWORDS('cat*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | cat*      | cat        | 1    | 2    |
| 1    | cat*      | cat felis  | 1    | 1    |
| 1    | cat*      | cat loves  | 1    | 1    |
| 1    | cat*      | catus      | 1    | 1    |
| 1    | cat*      | catus is   | 1    | 1    |
+------+-----------+------------+------+------+

CALL KEYWORDS supports distributed tables so no matter how big your data set you can benefit from using it.