• Downloads
  • Services
  • Resources

    • Documentation
    • Cheatsheets
    • Courses
    • Free Config Review
    • Blog
    • Forum
  • About

    • Manticore Search
    • Clients
  • Contact
  • ☝ Introduction
  • ❗ Read this first
  • ✔ Installation
    • Docker
    • RedHat and Centos
    • Debian and Ubuntu
    • MacOS
    • Windows
    • Old Linuxes
    • Compiling from sources
    • Migration from Sphinx
  • ⚡ Quick start guide
  • ✔ Starting the server
    • In Linux
    • Manually
    • In Docker
    • In Windows
    • In MacOS
  • ▪️ Creating a table
    • Data types
      • Row-wise and columnar attribute storages
    • Creating a local table
      • ✔ Real-time table
      • Plain table
      • Plain and real-time table settings
      • Percolate table
      • Template table
    • NLP and tokenization
      • Data tokenization
      • Supported languages
      • CJK
      • Low-level tokenization
      • Wildcard searching settings
      • Ignoring stop words
      • Word forms
      • Exceptions
      • Morphology
      • Advanced HTML tokenization
    • Creating a distributed table
      • Creating a local distributed table
      • Remote tables
  • ▪️ Listing tables
  • ▪️ Deleting a table
  • ▪️ Emptying a table
  • ▪️ Creating a cluster
    • Adding a new node
    • Remote nodes
      • Mirroring
      • Load balancing
    • Setting up replication
      • Creating a replication cluster
      • Joining a replication cluster
      • Deleting a replication cluster
      • Adding and removing a table from a replication cluster
      • Managing replication nodes
      • Replication cluster status
      • Restarting a cluster
      • Cluster recovery
  • ✔ Connecting to the server
    • MySQL protocol
    • HTTP
    • SQL over HTTP
  • ▪️ Data creation and modification
    • ▪️ Adding documents to a table
      • ✔ Adding documents to a real-time table
      • Adding rules to a percolate table
    • ▪️ Adding data from external storages
      • Plain tables creation
      • Fetching from databases
        • Introduction
        • Database connection
        • Execution of fetch queries
        • Processing fetched data
        • Ranged queries
      • Fetching from XML stream
      • Fetching from CSV,TSV
      • Main+delta schema
      • Adding data from tables
        • Merging tables
        • Killlists in plain tables
        • Attaching a plain table to RT table
        • Importing RT table
      • Rotating a table
    • ✔ Updating documents
      • REPLACE vs UPDATE
      • REPLACE
      • UPDATE
    • ▪️ Deleting documents
    • ▪️ Transactions
  • ✔ Searching
    • Intro
    • Full-text matching
      • Basic usage
      • Operators
      • Escaping
      • Search profiling
      • Boolean optimization
    • Search results
    • Filters
    • Expressions
    • Search options
    • Highlighting
    • Sorting and ranking
    • Pagination
    • Distributed searching
    • Multi-queries
    • Sub-selects
    • Grouping
    • Faceted search
    • Geo search
    • Percolate query
    • Autocomplete
    • Spell correction
    • Query cache
    • Collations
    • Cost-based optimizer
  • ▪️ Updating table schema and settings
  • ▪️ Functions
    • Mathematical functions
    • Searching and ranking functions
    • Type casting functions
    • Functions to handle arrays and conditions
    • Date and time functions
    • Geo-spatial functions
    • String functions
    • Other functions
  • ▪️ Securing and compacting a table
    • Backup and restore
    • Few words about RT table structure
    • Flushing RAM chunk to a new disk chunk
    • Flushing RT table to disk
    • Compacting a table
    • Isolation during flushing and merging
    • Freezing a table
    • Flushing attributes
    • Flushing hostnames
  • ▪️ Security
    • SSL
    • Read-only
  • ▪️ Logging
    • Query logging
    • Server logging
    • Binary logging
    • Docker logging
    • Rotating query and server logs
  • ▪️ Node info and management
    • Node status
    • SHOW META
    • SHOW THREADS
    • SHOW QUERIES
    • KILL
    • SHOW WARNINGS
    • SHOW VARIABLES
    • SHOW COLLATION
    • Profiling
      • Query profiling
      • Query plan
    • Table settings and status
      • SHOW TABLE STATUS
      • SHOW TABLE SETTINGS
  • ▪️ Server settings
    • Searchd
    • Common
    • Special suffixes
    • Scripted configuration
    • Comments
    • Inheritance of table and source declarations
    • Setting variables online
  • ▪️ Extensions
    • SphinxSE
    • FEDERATED
    • UDFs and Plugins
      • Listing plugins
      • UDF
        • Creating a function
        • Deleting a function
      • Plugins
        • Creating a plugin
        • Deleting a plugin
        • Reloading plugins
        • Ranker plugins
        • Token filter plugins
  • ▪️ Miscellaneous tools
  • ▪️ OpenAPI specification
  • ▪️ Telemetry
  • ▪️ Changelog
  • 🐞 Reporting bugs
  • 📖 References
    • Previous versions

Cost-based optimizer

When Manticore executes a fullscan query, it can either use plain scan to check every document against the filters, or it can use additional data and/or algorithms to speed up the query execution. To decide which approach to take, Manticore uses a query cost-based optimizer ("CBO" also known as "query optimizer").

The CBO may decide to replace one or more query filters with one of the following entities if it determines that it will improve performance:

  1. A docid index, which uses a special docid-only secondary index stored in files with the .spt extension. In addition to improving filters on document ids, the docid index is also used to speed up document id to row id lookups, and to speed up the application of large killlists on daemon startup.
  2. A columnar scan, which uses columnar storage and can only be used on a columnar attribute. It still scans every value and tests it against the filter, but it is heavily optimized and is usually faster than the default approach.
  3. Secondary indexes, which are generated for all attributes by default. They use the PGM index together with Manticore's built-in inverted index to retrieve the list of row ids corresponding to a value or range of values. The secondary indexes are stored in files with the .spidx extension.

The optimizer estimates the cost of each execution path using different attribute statistics, including:

  1. Information on the data distribution within an attribute (histograms, stored in .sphi files). Histograms are generated automatically when data is indexed and are the main source of information for the CBO.
  2. Information from PGM (secondary indexes), which is used to estimate the number of document lists to read. This helps to estimate doclist merge performance and to choose the correct merge algorithm (priority queue merge or bitmap merge).
  3. Columnar encoding statistics, which are used to estimate columnar data decompression performance.
  4. A columnar min-max tree. The CBO uses histograms to estimate the number of documents left after the filter was applied, but it also needs to estimate how many documents the filter had to process. For columnar attributes, partial evaluation of the min-max tree is used for that purpose.

The optimizer calculates the execution cost for every filter used in a query. Because certain filters can be replaced with several different entities (e.g., for a document id, Manticore can use a plain scan, a docid index lookup, a columnar scan (if the document id is columnar), and a secondary index), the optimizer evaluates every available combination. Note that there is a maximum limit of 1024 combinations.

To estimate query execution costs, the optimizer calculates the estimated costs of the most significant operations that are performed when the query is executed. It uses preset constants to represent the cost of each operation.

The optimizer compares the costs of each execution path and chooses the path with the lowest cost to execute the query.

Another thing to consider is multithreaded query execution (when pseudo_sharding is enabled). The CBO knows that some queries can be executed in multiple threads and takes that into account. The CBO favors smaller query execution times (i.e., latency) over throughput. For example, if a query using a columnar scan can be executed in multiple threads (and occupy multiple CPU cores) and is faster than a query executed in a single thread using secondary indexes, multithreaded execution will be preferred.

Queries using secondary indexes and docid indexes always run in a single thread, as benchmarks show that there is little to no benefit in making them multithreaded.

Currently, the optimizer only uses CPU costs and does not consider memory or disk usage.

️ Updating table schema and settings

Updating table schema

Updating table schema in RT mode

ALTER TABLE table ADD COLUMN column_name [{INTEGER|INT|BIGINT|FLOAT|BOOL|MULTI|MULTI64|JSON|STRING|TIMESTAMP|TEXT [INDEXED [ATTRIBUTE]]}] [engine='columnar']

ALTER TABLE table DROP COLUMN column_name

It supports adding one field at a time for RT tables. Supported data types are:

  • int - integer attribute
  • timestamp - timestamp attribute
  • bigint - big integer attribute
  • float - float attribute
  • bool - boolean attribute
  • multi - multi-valued integer attribute
  • multi64 - multi-valued bigint attribute
  • json - json attribute
  • string / text attribute / string attribute - string attribute
  • text / text indexed stored / string indexed stored - full-text indexed field with original value stored in docstore
  • text indexed / string indexed - full-text indexed field, indexed only (the original value is not stored in docstore)
  • text indexed attribute / string indexed attribute - full text indexed field + string attribute (not storing the original value in docstore)
  • text stored / string stored - the value will be only stored in docstore, not full-text indexed, not a string attribute
  • adding engine='columnar' to any attribute (except for json) will make it stored in the columnar storage

Important notes:

  • ❗It's recommended to backup table files before ALTERing it to avoid data corruption in case of a sudden power interruption or other similar issues.
  • Querying a table is impossible while a column is being added.
  • Newly created attribute's values are set to 0.
  • ALTER will not work for distributed tables and tables without any attributes.
  • DROP COLUMN will fail if a table has only one field.
  • When dropping a field which is both a full-text field and a string attribute the first ALTER DROP drops the attribute, the second one drops the full-text field.
  • Adding/dropping full-text field is only supported in the RT mode.
‹›
  • Example
Example
📋
⚙

mysql> desc rt;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| group_id   | uint      |
| date_added | timestamp |
+------------+-----------+

mysql> alter table rt add column test integer;

mysql> desc rt;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| group_id   | uint      |
| date_added | timestamp |
| test       | uint      |
+------------+-----------+

mysql> alter table rt drop column group_id;

mysql> desc rt;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| date_added | timestamp |
| test       | uint      |
+------------+-----------+

mysql> alter table rt add column title text indexed;

mysql> desc rt;
+------------+-----------+------------+
| Field      | Type      | Properties |
+------------+-----------+------------+
| id         | bigint    |            |
| text       | text      | indexed    |
| title      | text      | indexed    |
| date_added | timestamp |            |
| test       | uint      |            |
+------------+-----------+------------+

mysql> alter table rt add column title text attribute;

mysql> desc rt;
+------------+-----------+------------+
| Field      | Type      | Properties |
+------------+-----------+------------+
| id         | bigint    |            |
| text       | text      | indexed    |
| title      | text      | indexed    |
| date_added | timestamp |            |
| test       | uint      |            |
| title      | string    |            |
+------------+-----------+------------+

mysql> alter table rt drop column title;

mysql> desc rt;
+------------+-----------+------------+
| Field      | Type      | Properties |
+------------+-----------+------------+
| id         | bigint    |            |
| text       | text      | indexed    |
| title      | text      | indexed    |
| date_added | timestamp |            |
| test       | uint      |            |
+------------+-----------+------------+
mysql> alter table rt drop column title;

mysql> desc rt;
+------------+-----------+------------+
| Field      | Type      | Properties |
+------------+-----------+------------+
| id         | bigint    |            |
| text       | text      | indexed    |
| date_added | timestamp |            |
| test       | uint      |            |
+------------+-----------+------------+

Updating table FT settings in RT mode

ALTER TABLE table ft_setting='value'[, ft_setting2='value']

You can also use ALTER to modify full-text settings of your table in the RT mode. Just remember that it doesn't affect existing documents, it only affects new ones. Take a look at the example where we:

  • create a table with a full-text field and charset_table that allows only 3 searchable characters: a, b and c.
  • then we insert document 'abcd' and find it by query abcd, the d just gets ignored since it's not in the charset_table array
  • then we understand, that we want d to be searchable too, so we add it with help of ALTER
  • but the same query where match('abcd') still says it searched by abc, because the existing document remembers previous contents of charset_table
  • then we add another document abcd and search by abcd again
  • now it finds the both documents and show meta says it used two keywords: abc (to find the old document) and abcd (for the new one).
‹›
  • Example
Example
📋
⚙
mysql> create table rt(title text) charset_table='a,b,c';

mysql> insert into rt(title) values('abcd');

mysql> select * from rt where match('abcd');
+---------------------+-------+
| id                  | title |
+---------------------+-------+
| 1514630637682688054 | abcd  |
+---------------------+-------+

mysql> show meta;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total         | 1     |
| total_found   | 1     |
| time          | 0.000 |
| keyword[0]    | abc   |
| docs[0]       | 1     |
| hits[0]       | 1     |
+---------------+-------+

mysql> alter table rt charset_table='a,b,c,d';
mysql> select * from rt where match('abcd');
+---------------------+-------+
| id                  | title |
+---------------------+-------+
| 1514630637682688054 | abcd  |
+---------------------+-------+

mysql> show meta
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total         | 1     |
| total_found   | 1     |
| time          | 0.000 |
| keyword[0]    | abc   |
| docs[0]       | 1     |
| hits[0]       | 1     |
+---------------+-------+

mysql> insert into rt(title) values('abcd');
mysql> select * from rt where match('abcd');
+---------------------+-------+
| id                  | title |
+---------------------+-------+
| 1514630637682688055 | abcd  |
| 1514630637682688054 | abcd  |
+---------------------+-------+

mysql> show meta;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total         | 2     |
| total_found   | 2     |
| time          | 0.000 |
| keyword[0]    | abc   |
| docs[0]       | 1     |
| hits[0]       | 1     |
| keyword[1]    | abcd  |
| docs[1]       | 1     |
| hits[1]       | 1     |
+---------------+-------+

Updating table FT settings in plain mode

ALTER TABLE table RECONFIGURE

ALTER can also reconfigure an RT table in the plain mode, so that new tokenization, morphology and other text processing settings from the configuration file take effect for new documents. Note, that the existing document will be left intact. Internally, it forcibly saves the current RAM chunk as a new disk chunk and adjusts the table header, so that new documents are tokenized using the updated full-text settings.

‹›
  • Example
Example
📋
⚙
mysql> show table rt settings;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| settings      |       |
+---------------+-------+
1 row in set (0.00 sec)

mysql> alter table rt reconfigure;
Query OK, 0 rows affected (0.00 sec)

mysql> show table rt settings;
+---------------+----------------------+
| Variable_name | Value                |
+---------------+----------------------+
| settings      | morphology = stem_en |
+---------------+----------------------+
1 row in set (0.00 sec)

Rebuild secondary index

ALTER TABLE table REBUILD SECONDARY

ALTER can also be used to rebuild secondary indexes in a given table. Sometimes a secondary index can be disabled for the whole table or for one/multiple attributes in it:

  • On UPDATE of an attribute: in this case its secondary index gets disabled.
  • In case Manticore loads a table with old formatted secondary indexes: in this case secondary indexes will be disabled for the whole table.

ALTER TABLE table REBUILD SECONDARY rebuilds secondary indexes from attribute data and enables them again.

‹›
  • Example
Example
📋
⚙
ALTER TABLE rt REBUILD SECONDARY;
‹›
Response
Query OK, 0 rows affected (0.00 sec)
Cost-based optimizer ️ Functions

Functions

️ Updating table schema and settings Mathematical functions

Mathematical functions

ABS()

Returns the absolute value of the argument.

ATAN2()

Returns the arctangent function of two arguments, expressed in radians.

BITDOT()

BITDOT(mask, w0, w1, ...) returns the sum of products of an each bit of a mask multiplied with its weight. bit0*w0 + bit1*w1 + ...

CEIL()

Returns the smallest integer value greater or equal to the argument.

COS()

Returns the cosine of the argument.

CRC32()

Returns the CRC32 value of a string argument.

EXP()

Returns the exponent of the argument (e=2.718... to the power of the argument).

FIBONACCI()

Returns the N-th Fibonacci number, where N is the integer argument. That is, arguments of 0 and up will generate the values 0, 1, 1, 2, 3, 5, 8, 13 and so on. Note that the computations are done using 32-bit integer math and thus numbers 48th and up will be returned modulo 2\^32.

FLOOR()

Returns the largest integer value lesser or equal to the argument.

GREATEST()

GREATEST(attr_json.some_array) function takes JSON array as the argument, and returns the greatest value in that array. Also works for MVA.

IDIV()

Returns the result of an integer division of the first argument by the second argument. Both arguments must be of an integer type.

LEAST()

LEAST(attr_json.some_array) function takes JSON array as the argument, and returns the least value in that array. Also works for MVA.

LN()

Returns the natural logarithm of the argument (with the base of e=2.718...).

LOG10()

Returns the common logarithm of the argument (with the base of 10).

LOG2()

Returns the binary logarithm of the argument (with the base of 2).

MAX()

Returns the bigger of two arguments.

MIN()

Returns the smaller of two arguments.

POW()

Returns the first argument raised to the power of the second argument.

RAND()

RAND(seed) function returns a random float between 0..1. Optionally can accept seed which can be:

  • constant integer
  • or integer attribute's name

If you use the seed take into account that it resets rand()'s starting point separately for each plain table / RT disk / RAM chunk / pseudo shard, so queries to a distributed table in any form can return multiple identical random values.

SIN()

Returns the sine of the argument.

SQRT()

Returns the square root of the argument.

️ Functions Searching and ranking functions