Introduction

Manticore Search is a database designed specifically for search, including full-text search.

Manticore was born in 2017 as a continuation of Sphinx Search. We took the best from Sphinx (C++ core and focus on low level data structures and fine-tuned algorithms), added a lot of new functionality, fixed hundreds of bugs, made it easier to use, kept it open source and made Manticore Search even more lightweight & extremely fast database for search.

Our key features are:

Powerful and fast full-text searching

SQL-first

The native Manticore's syntax is SQL. It speaks SQL over HTTP and MySQL protocol. You can use your preferred mysql client to connect to Manticore Search server via SQL protocol in any programming language.

JSON over HTTP

To provide more programmatic way to manage your data and schemas Manticore provides HTTP JSON protocol. It is very similar to the one from Elasticsearch.

Declarative and imperative schema management

You can create / update / delete indexes online as well as providing schemas in a configuration file.

Power of C++

Being written fully in C++ Manticore Search starts fast and doesn't take much RAM. Low-level optimizations give good performance.

Real-time inserts

After a new document is added or updated it can be read immediately.

Interactive courses

We provide interactive courses for easier learning.

ACID compliance

Manticore is not fully ACID-compliant, but it supports isolated transactions for atomic changes and binary logging for safe writes.

Built-in replication and load balancing

Data can be distributed across servers and data-centers. Any Manticore Search node can be both a load balancer and a data node. Manticore implements synchronous multi-master replication using Galera library which guarantees consistency between all data nodes and no data loss.

Can sync from MySQL/PostgreSQL/ODBC/xml/csv out of the box

Manticore indexer tool and rich configuration syntax helps to sync existing data from MySQL, PostgreSQL, any database which speaks ODBC and any other technology which can generate a simple XML or CSV.

Integrations

You can integrate Manticore Search with MySQL/MariaDB server via a FEDERATED engine or use Manticore through ProxySQL

Stream filtering

Manticore has a special index type called "percolate" which implements search in reverse when you index your queries rather than data. It's an extremely powerful tool for full-text data stream filtering: just put all your queries in the index, process your data stream by sending each batch of documents to Manticore Search and you'll get only those back that match some of your stored queries.

Possible applications:

Manticore's possible applications are not limited by, but include:

Read this first

About this manual

The manual is arranged as a reflection of the most likely way you would use Manticore:

  • starting from some basic information about it and how to install and connect
  • through some essential things like adding documents and running searches
  • to some performance optimization tips and tricks and extending Manticore with help of plugins and custom functions
Do not skip ✔️

Key sections of the manual are marked with sign ✔️ in the menu for your convenience since their corresponding functionality is most used. If you are new to Manticore we highly recommend to not skip them.

Quick start guide

If you are looking for a quick understanding of how Manticore works in general ⚡ Quick start guide section should be good to read.

Using examples

Each query example has a little icon 📋 in the top-right corner:

Copy example

You can use it to copy examples to clipboard. If the query is an HTTP request it will be copied as a CURL command. You can configure the host/port if you press ⚙️.

Search in this manual

We love search and we've made our best to make searching in this manual as convenient as possible. Of course it's backed by Manticore Search. Besides using the search bar which requires opening the manual first there is a very easy way to find something by just opening mnt.cr/your-search-keyword :

mnt.cr quick manual search

Best practices

There are few things you need to understand about Manticore Search that can help you follow the best practices of using it.

Real-time index vs plain index

  • Real-time index allows adding, updating and deleting documents with immediate availability of the changes.
  • Plain index is a mostly immutable data structure and a basic element used by real-time indexes. Plain index stores a set of documents, their common dictionary and indexation settings. One real-time index can consist of multiple plain indexes (chunks), but besides that Manticore provides direct access to building plain indexes using tool indexer. It makes sense when your data is mostly immutable, therefore you don't need a real-time index for that.

Real-time mode vs plain mode

Manticore Search works in two modes:

  • Real-time mode (RT mode). This is a default one and:
    • allows managing your data schema online using SQL commands CREATE/ALTER/DROP TABLE and their equivalents in non-SQL clients
    • in the configuration file you need to define only server-related settings
  • Plain mode allows to define your data schemas in a configuration file. It makes sense in two cases:
    • when you only deal with plain indexes
    • or when your data schema is very stable and you don't need replication (as it's available only in the RT mode)

You cannot combine the 2 modes and need to decide which one you want to follow. If you are unsure our recommendation is to follow the RT mode as if even you need a plain index you can build it with a separate plain index config and import to your main Manticore instance.

SQL vs JSON

Manticore provides multiple ways and interfaces to manage your schemas and data, but the two main are:

  • SQL. This is a native Manticore's language which enables all Manticore's functionality. The best practice is to use SQL to:
    • manage your schemas and do other DBA routines as it's the easiest way to do that
    • design your queries as SQL is much closer to natural language than the JSON DSL which is important when you design something new. You can use Manticore SQL via any MySQL client or /sql.
  • JSON. Most functionality is also available via JSON domain specific language. This is especially useful when you integrate Manticore with your application as with JSON you can do it more programmatically than with SQL. The best practice is to first explore how to do something via SQL and then use JSON to integrate it into your application.

Installation

Quick start guide

Install and start Manticore

You can install and start Manticore easily in Ubuntu, Centos, Debian, Windows and MacOS or use Manticore as a docker container.

📋
wget https://repo.manticoresearch.com/manticore-repo.noarch.deb
sudo dpkg -i manticore-repo.noarch.deb
sudo apt update
sudo apt install manticore-bin
sudo systemctl start manticore

Connect to Manticore

By default Manticore is waiting for your connections on:

  • port 9306 for MySQL clients
  • port 9308 for HTTP/HTTPS connections
  • port 9312 for connections from other Manticore nodes and clients based on Manticore binary API
📋
mysql -h0 -P9306

Create an index

Let's now create an index called "products" with 2 fields:

  • title - full-text field which will contain our product's title
  • price - of type "float"
📋
create table products(title text, price float) morphology='stem_en';
Response
Query OK, 0 rows affected (0.02 sec)

Add documents

Let's now add few documents to the index:

📋
insert into products(title,price) values ('Crossbody Bag with Tassel', 19.85), ('microfiber sheet set', 19.99), ('Pet Hair Remover Glove', 7.99);
Response
Query OK, 3 rows affected (0.01 sec)

Search

Let's find one of the documents. The query we will use is 'remove hair'. As you can see it finds document with title 'Pet Hair Remover Glove' and highlights 'Hair remover' in it even though the query has "remove", not "remover". This is because when we created the index we turned on using English stemming (morphology "stem_en").

📋
select id, highlight(), price from products where match('remove hair');
Response
+---------------------+-------------------------------+----------+
| id                  | highlight()                   | price    |
+---------------------+-------------------------------+----------+
| 1513686608316989452 | Pet <strong>Hair Remover</strong> Glove | 7.990000 |
+---------------------+-------------------------------+----------+
1 row in set (0.00 sec)

Update

Let's assume we now want to update the document - change the price to 18.5. This can be done by filtering by any field, but normally you know the document id and update something based on that.

📋
update products set price=18.5 where id = 1513686608316989453;
Response
Query OK, 1 row affected (0.00 sec)

Delete

Let's now delete all documents with price lower than 10.

📋
delete from products where price < 10;
Response
Query OK, 1 row affected (0.00 sec)

Starting the server

Manticore Search server can be started in several ways, depending on how it was installed.

Creating an index

Listing indexes

Manticore Search has a single level of hierarchy of indexes.

There is no concept of grouping tables in databases like in other DBMS. Still, Manticore accepts SHOW DATABASES statements for interoperability with SQL dialect, but the statement doesn't return anything.

While the data collections in Manticore are called indexes, the statement that displays them is SHOW TABLES for compatibility with miscellaneous SQL clients.

SHOW TABLES

General syntax:

SHOW TABLES [ LIKE pattern ]

SHOW TABLES statement enumerates all currently active indexes along with their types. Existing index types are local, distributed, rt and template.

📋
SHOW TABLES;
Response
+-------+-------------+
| Index | Type        |
+-------+-------------+
| dist1 | distributed |
| rt    | rt          |
| test1 | local       |
| test2 | local       |
+-------+-------------+
4 rows in set (0.00 sec)

Optional LIKE clause is supported for filtering indexes by name.

📋
SHOW TABLES LIKE '%4';
Response
+-------+-------------+
| Index | Type        |
+-------+-------------+
| dist4 | distributed |
+-------+-------------+
1 row in set (0.00 sec)

DESCRIBE

{DESC | DESCRIBE} index [ LIKE pattern ]

DESCRIBE statement lists index columns and their associated types. Columns are document ID, full-text fields, and attributes. The order matches that in which fields and attributes are expected by INSERT and REPLACE statements. Column types are field, integer, timestamp, ordinal, bool, float, bigint, string, and mva. ID column will be typed as bigint. Example:

mysql> DESC rt;
+---------+---------+
| Field   | Type    |
+---------+---------+
| id      | bigint  |
| title   | field   |
| content | field   |
| gid     | integer |
+---------+---------+
4 rows in set (0.00 sec)

An optional LIKE clause is supported. Refer to SHOW META for its syntax details.

SHOW CREATE TABLE

SHOW CREATE TABLE name

Prints the CREATE TABLE statement that creates the named table.

SQL
📋
SHOW CREATE TABLE idx\G
Response
       Table: idx
Create Table: CREATE TABLE idx (
f text indexed stored
) charset_table='non_cjk,cjk' morphology='icu_chinese'
1 row in set (0.00 sec)

Percolate index schemas

If you apply DESC statement to a percolate index it will show the outer index schema, i.e. the schema of stored queries. It's static and the same for all local pq indexes:

mysql> DESC pq;
+---------+--------+
| Field   | Type   |
+---------+--------+
| id      | bigint |
| query   | string |
| tags    | string |
| filters | string |
+---------+--------+
4 rows in set (0.00 sec)

If you're looking for an expected document schema use DESC <pq index name> table:

mysql> DESC pq TABLE;
+-------+--------+
| Field | Type   |
+-------+--------+
| id    | bigint |
| title | text   |
| gid   | uint   |
+-------+--------+
3 rows in set (0.00 sec)

Also desc pq table like ... is supported and works as follows:

mysql> desc pq table like '%title%';
+-------+------+----------------+
| Field | Type | Properties     |
+-------+------+----------------+
| title | text | indexed stored |
+-------+------+----------------+
1 row in set (0.00 sec)

Deleting an index

Deleting an index is performed in 2 steps:

  1. Index is cleared (similar to TRUNCATE)
  2. All index files are removed from the index folder. All the external index files that were used by the index (such as wordforms, extensions or stopwords) are also deleted. Note that these external files are copied to index folder when CREATE TABLE is used, so the original files specified in CREATE TABLE will not be deleted.

Deleting an index is possible only when the server is running in RT mode. It is possible to delete RT indexes, PQ indexes and distributed indexes.

📋
drop table products;
Response
Query OK, 0 rows affected (0.02 sec)

Here is the syntax of the DROP TABLE statement in SQL:

DROP TABLE [IF EXISTS] index_name

When deleting an index via SQL, adding IF EXISTS can be used to delete the index only if it exists. If you try to delete a non-existing index with the IF EXISTS option, nothing happens.

When deleting an index via PHP, you can add an optional silent parameter which works the same as IF EXISTS.

📋
drop table if exists products;

Emptying an index

The index can be emptied with a TRUNCATE TABLE SQL statement or with a truncate() PHP client function.

Here is the syntax for the SQL statement:

TRUNCATE TABLE index_name [WITH RECONFIGURE]

When this statement is executed, it clears the RT index completely. It disposes the in-memory data, unlinks all the index data files, and releases the associated binary logs.

An index can also be emptied with DELETE FROM index WHERE id>0, but it's not recommended as it's much slower than TRUNCATE.

📋
truncate table products;
Response
Query OK, 0 rows affected (0.02 sec)

One of the possible uses of this command is before attaching an index.

When RECONFIGURE option is used new tokenization, morphology, and other text processing settings specified in the config take effect after the index gets cleared. With this option clearing and reconfiguring an index becomes one atomic operation.

📋
truncate table products with reconfigure;
Response
Query OK, 0 rows affected (0.02 sec)

Manticore cluster

Manticore Search is a highly distributed system and consists of all the needed components to allow you build a highly available and scalable setup of a database for search:

Manticore Search is extremely flexible in terms of how you setup your cluster, there's no limitations and it's up to you how you design it. Just learn the tools mentioned above and use them to achieve your goal.

Connecting to the server

By default Manticore is waiting for your connections on:

  • port 9306 for MySQL clients
  • port 9308 for HTTP/HTTPS connections
  • port 9312 for connections from other Manticore nodes and clients based on Manticore binary API
📋
mysql -h0 -P9306

▪️ Adding documents to an index

▪️ Adding data from external storages

✔ ️Updating documents

Deleting documents

Deleting is only supported for Real-Time and percolate indexes and for distributed that contain only RT indexes as agents. You can delete existing rows (documents) from an existing index based on ID or conditions.

Deleting works for SQL and HTTP interfaces.

SQL response for successful operation will show the number of rows deleted.

json/delete is an HTTP endpoint for for deleting. The server will respond with a JSON object stating if the operation was successful or not and the number of rows deleted.

To delete all documents from an index it's recommended to use instead the index truncation as it's a much faste operation.

📋
DELETE FROM index WHERE where_condition
  • index is a name of the index from which the row should be deleted.
  • where_condition for SQL has the same syntax as in the SELECT statement.

In this example we are deleting all documents that match full-text query dummy from index named test:

📋
select * from test;

delete from test where match ('dummy');

select * from test;
Response
+------+------+-------------+------+
| id   | gid  | mva1        | mva2 |
+------+------+-------------+------+
|  100 | 1000 | 100,201     | 100  |
|  101 | 1001 | 101,202     | 101  |
|  102 | 1002 | 102,203     | 102  |
|  103 | 1003 | 103,204     | 103  |
|  104 | 1004 | 104,204,205 | 104  |
|  105 | 1005 | 105,206     | 105  |
|  106 | 1006 | 106,207     | 106  |
|  107 | 1007 | 107,208     | 107  |
+------+------+-------------+------+
8 rows in set (0.00 sec)

Query OK, 2 rows affected (0.00 sec)

+------+------+-------------+------+
| id   | gid  | mva1        | mva2 |
+------+------+-------------+------+
|  100 | 1000 | 100,201     | 100  |
|  101 | 1001 | 101,202     | 101  |
|  102 | 1002 | 102,203     | 102  |
|  103 | 1003 | 103,204     | 103  |
|  104 | 1004 | 104,204,205 | 104  |
|  105 | 1005 | 105,206     | 105  |
+------+------+-------------+------+
6 rows in set (0.00 sec)

Here - deleting a document with id 100 from index named test:

📋
delete from test where id=100;

select * from test;
Response
Query OK, 1 rows affected (0.00 sec)

+------+------+-------------+------+
| id   | gid  | mva1        | mva2 |
+------+------+-------------+------+
|  101 | 1001 | 101,202     | 101  |
|  102 | 1002 | 102,203     | 102  |
|  103 | 1003 | 103,204     | 103  |
|  104 | 1004 | 104,204,205 | 104  |
|  105 | 1005 | 105,206     | 105  |
+------+------+-------------+------+
5 rows in set (0.00 sec)

Manticore SQL allows to use complex conditions for the DELETE statement.

For example here we are deleting documents that match full-text query dummy and have attribute mva1 with a value greater than 206 or mva1 values 100 or 103 from index named test:

SQL
📋
delete from test where match ('dummy') and ( mva1>206 or mva1 in (100, 103) );

select * from test;
Response
Query OK, 4 rows affected (0.00 sec)

+------+------+-------------+------+
| id   | gid  | mva1        | mva2 |
+------+------+-------------+------+
|  101 | 1001 | 101,202     | 101  |
|  102 | 1002 | 102,203     | 102  |
|  104 | 1004 | 104,204,205 | 104  |
|  105 | 1005 | 105,206     | 105  |
+------+------+-------------+------+
6 rows in set (0.00 sec)

Here is an example of deleting documents in cluster nodes4's index test:

📋
delete from nodes4:test where id=100;
Response
Array(
    [_index] => test
    [_id] => 100
    [found] => true
    [result] => deleted
)

Transactions

Manticore supports basic transactions when performing deleting and insertion into real-time and percolate indexes. That is: each change to the index first saved into internal changeset, and then actually committed to the index. By default each command is wrapped into individual automatic transaction, making it transparent: you just 'insert' something, and can see inserted result after it completes, having no care about transactions. However that behaviour can be explicitly managed by starting and committing transactions manually.

Automatic and manual mode

SET AUTOCOMMIT = {0 | 1}

SET AUTOCOMMIT controls the autocommit mode in the active session. AUTOCOMMIT is set to 1 by default. With default you have not to care about transactions, since every statement that performs any changes on any index is implicitly wrapped into separate transaction. Setting it to 0 allows you to manage transactions manually. I.e., they will not be visible until you explicitly commit them.

Transactions are limited to a single RT or percolate index, and also limited in size. They are atomic, consistent, overly isolated, and durable. Overly isolated means that the changes are not only invisible to the concurrent transactions but even to the current session itself.

BEGIN, COMMIT, and ROLLBACK

START TRANSACTION | BEGIN
COMMIT
ROLLBACK

BEGIN statement (or its START TRANSACTION alias) forcibly commits pending transaction, if any, and begins a new one.

COMMIT statement commits the current transaction, making all its changes permanent.

ROLLBACK statement rolls back the current transaction, canceling all its changes.

Examples

Automatic commits (default)

insert into indexrt (id, content, title, channel_id, published) values (1, 'aa', 'blabla', 1, 10);
Query OK, 1 rows affected (0.00 sec)

select * from indexrt where id=1;
+------+------------+-----------+--------+
| id   | channel_id | published | title  |
+------+------------+-----------+--------+
|    1 |          1 |        10 | blabla |
+------+------------+-----------+--------+
1 row in set (0.00 sec)

Inserted value immediately visible by following 'select' statement.

Manual commits (autocommit=0)

set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

insert into indexrt (id, content, title, channel_id, published) values (3, 'aa', 'bb', 1, 1);
Query OK, 1 row affected (0.00 sec)

insert into indexrt (id, content, title, channel_id, published) values (4, 'aa', 'bb', 1, 1);
Query OK, 1 row affected (0.00 sec)

select * from indexrt where id=3;
Empty set (0.01 sec)

select * from indexrt where id=4;
Empty set (0.00 sec)

Here changes is NOT automatically committed. So, insertion is not visible even in the same session, since they're not committed. Also, despite absent BEGIN statement, transaction is implicitly started.

So, let's finally commit it:

commit;
Query OK, 0 rows affected (0.00 sec)

select * from indexrt where id=4;
+------+------------+-----------+-------+
| id   | channel_id | published | title |
+------+------------+-----------+-------+
|    4 |          1 |         1 | bb    |
+------+------------+-----------+-------+
1 row in set (0.00 sec)

select * from indexrt where id=3;
+------+------------+-----------+-------+
| id   | channel_id | published | title |
+------+------------+-----------+-------+
|    3 |          1 |         1 | bb    |
+------+------------+-----------+-------+
1 row in set (0.00 sec)

Now it is finished and visible.

Manual transaction

Using BEGIN and COMMIT you can define bounds of transaction explicitly, so no need to care about autocommit in this case.

begin;
Query OK, 0 rows affected (0.00 sec)

insert into indexrt (id, content, title, channel_id, published) values (2, 'aa', 'bb', 1, 1);
Query OK, 1 row affected (0.00 sec)

select * from indexrt where id=2;
Empty set (0.01 sec)

commit;
Query OK, 0 rows affected (0.01 sec)

select * from indexrt where id=2;
+------+------------+-----------+-------+
| id   | channel_id | published | title |
+------+------------+-----------+-------+
|    2 |          1 |         1 | bb    |
+------+------------+-----------+-------+
1 row in set (0.01 sec)

SET TRANSACTION

SET TRANSACTION ISOLATION LEVEL { READ UNCOMMITTED
    | READ COMMITTED
    | REPEATABLE READ
    | SERIALIZABLE }

SET TRANSACTION statement does nothing. It was implemented to maintain compatibility with 3rd party MySQL client libraries, connectors, and frameworks that may need to run this statement when connecting. They just goes across syntax parser and then returns 'ok'. Nothing usable for your own programs, just a stubs to make third-party clients happy.

mysql> SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Query OK, 0 rows affected (0.00 sec)

✔ ️Searching

Updating index schema

Updating index schema in RT mode

ALTER TABLE index {ADD|DROP} COLUMN column_name [{INTEGER|INT|BIGINT|FLOAT|BOOL|MULTI|MULTI64|JSON|STRING|TIMESTAMP}]

It supports adding one attribute at a time for both plain and RT indexes. The int, bigint, float, bool, multi-valued, multi-valued 64bit, json and string attribute types are supported. You can add json and string attributes, but you cannot modify their values.

Important notes:

  • Querying an index is impossible (because of a write lock) while adding a column.
  • Newly created attribute's values are set to 0.
  • ALTER will not work for distributed indexes and indexes without any attributes.
  • DROP COLUMN will fail if an index has only one attribute.

Updating index schema in plain mode

ALTER RTINDEX index RECONFIGURE

ALTER can also reconfigure an RT index in plain mode, so that new tokenization, morphology, and other text processing settings from the configuration file take effect on the newly INSERT-ed rows, while retaining the existing rows as they were. Internally, it forcibly saves the current RAM chunk as a new disk chunk, and adjusts the index header, so that the new rows are tokenized using the new rules.

Example
📋
mysql> desc plain;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| group_id   | uint      |
| date_added | timestamp |
+------------+-----------+
4 rows in set (0.01 sec)

mysql> alter table plain add column test integer;
Query OK, 0 rows affected (0.04 sec)

mysql> desc plain;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| group_id   | uint      |
| date_added | timestamp |
| test       | uint      |
+------------+-----------+
5 rows in set (0.00 sec)

mysql> alter table plain drop column group_id;
Query OK, 0 rows affected (0.01 sec)

mysql> desc plain;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| date_added | timestamp |
| test       | uint      |
+------------+-----------+
4 rows in set (0.00 sec)

Functions

▪️ Securing and compacting an index

▪️ Security

▪️ Logging

▪️ Profiling and monitoring

▪️ Server settings

▪️ Extensions

Miscellaneous tools

indextool

indextool is a helper tool used to dump miscellaneous information about a physical index. The general usage is:

indextool <command> [options]

Options effective for all commands:

  • --config <file> (-c <file> for short) overrides the built-in config file names.
  • --quiet (-q for short) keep indextool quiet - it will not output banner, etc.
  • --help (-h for short) lists all of the parameters that can be called in your particular build of indextool.
  • -v show version information of your particular build of indextool.

The commands are as follows:

  • --checkconfig just loads and verifies the config file to check if it's valid, without syntax errors.
  • --buildidf DICTFILE1 [DICTFILE2 ...] --out IDFILE build IDF file from one or several dictionary dumps. Additional parameter -skip-uniq will skip unique (df=1) words.
  • --build-infixes INDEXNAME build infixes for an existing dict=keywords index (upgrades .sph, .spi in place). You can use this option for legacy index files that already use dict=keywords, but now need to support infix searching too; updating the index files with indextool may prove easier or faster than regenerating them from scratch with indexer.
  • --dumpheader FILENAME.sph quickly dumps the provided index header file without touching any other index files or even the configuration file. The report provides a breakdown of all the index settings, in particular the entire attribute and field list.
  • --dumpconfig FILENAME.sph dumps the index definition from the given index header file in (almost) compliant sphinx.conf file format.
  • --dumpheader INDEXNAME dumps index header by index name with looking up the header path in the configuration file.
  • --dumpdict INDEXNAME dumps dictionary. Additional -stats switch will dump to dictionary the total number of documents. It is required for dictionary files that are used for creation of IDF files.
  • --dumpdocids INDEXNAME dumps document IDs by index name.
  • --dumphitlist INDEXNAME KEYWORD dumps all the hits (occurrences) of a given keyword in a given index, with keyword specified as text.
  • --dumphitlist INDEXNAME --wordid ID dumps all the hits (occurrences) of a given keyword in a given index, with keyword specified as internal numeric ID.
  • --fold INDEXNAME OPTFILE This options is useful too see how actually tokenizer proceeds input. You can feed indextool with text from file if specified or from stdin otherwise. The output will contain spaces instead of separators (accordingly to your charset_table settings) and lowercased letters in words.
  • --htmlstrip INDEXNAME filters stdin using HTML stripper settings for a given index, and prints the filtering results to stdout. Note that the settings will be taken from sphinx.conf, and not the index header.
  • --mergeidf NODE1.idf [NODE2.idf ...] --out GLOBAL.idf merge several .idf files into a single one. Additional parameter -skip-uniq will skip unique (df=1) words.
  • --morph INDEXNAME applies morphology to the given stdin and prints the result to stdout.
  • --check INDEXNAME checks the index data files for consistency errors that might be introduced either by bugs in indexer and/or hardware faults. --check also works on RT indexes, RAM and disk chunks.
  • --strip-path strips the path names from all the file names referenced from the index (stopwords, wordforms, exceptions, etc). This is useful for checking indexes built on another machine with possibly different path layouts.
  • --rotate works only with --check and defines whether to check index waiting for rotation, i.e. with .new extension. This is useful when you want to check your index before actually using it.
  • --apply-killlists loads and applies kill-lists for all indexes listed in the config file. Changes are saved in .SPM files. Kill-list files (.SPK) are deleted. This can be useful if you want to move applying indexes from server startup to indexing stage.

spelldump

spelldump is used to extract contents of a dictionary file that uses ispell or MySpell format, which can help build word lists for wordforms - all of the possible forms are pre-built for you.

The general usage is:

spelldump [options] <dictionary> <affix> [result] [locale-name]

The two main parameters are the dictionary's main file and its affix file; usually these are named as [language-prefix].dict and [language-prefix].aff and will be available with most common Linux distributions, as well as various places online.

[result] specifies where the dictionary data should be output to, and [locale-name] additionally specifies the locale details you wish to use.

There is an additional option, -c [file], which specifies a file for case conversion details.

Examples of its usage are:

spelldump en.dict en.aff
spelldump ru.dict ru.aff ru.txt ru_RU.CP1251
spelldump ru.dict ru.aff ru.txt .1251

The results file will contain a list of all the words in the dictionary in alphabetical order, output in the format of a wordforms file, which you can use to customize for your specific circumstances. An example of the result file:

zone > zone
zoned > zoned
zoning > zoning

wordbreaker

wordbreaker is used to split compound words, as usual in URLs, into its component words. For example, this tool can split "lordoftherings" into its four component words, or http://manofsteel.warnerbros.com into "man of steel warner bros". This helps searching, without requiring prefixes or infixes: searching for "sphinx" wouldn't match "sphinxsearch" but if you break the compound word and index the separate components, you'll get a match without the costs of prefix and infix larger index files.

Examples of its usage are:

echo manofsteel | bin/wordbreaker -dict dict.txt split
man of steel

The input stream will be separated in words using the -dict dictionary file. In no dictionary specified, wordbreaker looks in the working folder for a wordbreaker-dict.txt file. (The dictionary should match the language of the compound word.) The split command breaks words from the standard input, and outputs the result in the standard output. There are also test and bench commands that let you test the splitting quality and benchmark the splitting functionality.

Wordbreaker needs a dictionary to recognize individual substrings within a string. To differentiate between different guesses, it uses the relative frequency of each word in the dictionary: higher frequency means higher split probability. You can generate such a file using the indexer tool:

indexer --buildstops dict.txt 100000 --buildfreqs myindex -c /path/to/sphinx.conf

which will write the 100,000 most frequent words, along with their counts, from myindex into dict.txt. The output file is a text file, so you can edit it by hand, if need be, to add or remove words.

Changelog

Version 3.5.0, 22 Jul 2020

Major new features:

  • This release took so long, because we were working hard on changing multitasking mode from threads to coroutines. It makes configuration simpler and queries parallelization much more straightforward: Manticore just uses given number of threads (see new setting threads) and the new mode makes sure it's done in the most optimal way.

  • Changes in highlighting:

    • any highlighting that works with several fields (highlight({},'field1, field2') or highlight in json queries) now applies limits per-field by default.
    • any highlighting that works with plain text (highlight({}, string_attr) or snippet() now applies limits to the whole document.
    • per-field limits can be switched to global limits by limits_per_field=0 option (1 by default).
    • allow_empty is now 0 by default for highlighting via HTTP JSON.
  • The same port can now be used for http, https and binary API (to accept connections from a remote Manticore instance). listen = *:mysql is still required for connections via mysql protocol. Manticore now detects automatically the type of client trying to connect to it except for MySQL (due to restrictions of the protocol).

  • In RT mode a field can now be text and string attribute at the same time - GitHub issue #331.

    In plain mode it's called sql_field_string. Now it's available in RT mode for real-time indexes too. You can use it as shown in the example:

    create table t(f string attribute indexed);
    insert into t values(0,'abc','abc');
    select * from t where match('abc');
    +---------------------+------+
    | id                  | f    |
    +---------------------+------+
    | 2810845392541843463 | abc  |
    +---------------------+------+
    1 row in set (0.01 sec)
    
    mysql> select * from t where f='abc';
    +---------------------+------+
    | id                  | f    |
    +---------------------+------+
    | 2810845392541843463 | abc  |
    +---------------------+------+
    1 row in set (0.00 sec)

Minor changes

  • You can now highlight string attributes.
  • SSL and compression support for SQL interface
  • Support of mysql client status command.
  • Replication can now replicate external files (stopwords, exceptions etc.).
  • Filter operator in is now available via HTTP JSON interface.
  • expressions in HTTP JSON.
  • You can now change rt_mem_limit on the fly in RT mode, i.e. can do ALTER ... rt_mem_limit=<new value>.
  • You can now use separate CJK charset tables: chinese, japanese and korean.
  • thread_stack now limits maximum thread stack, not initial.
  • Improved SHOW THREADS output.
  • Display progress of long CALL PQ in SHOW THREADS.
  • cpustat, iostat, coredump can be changed during runtime with SET.
  • SET [GLOBAL] wait_timeout=NUM implemented ,

Breaking changes:

  • Index format has been changed. Indexes built in 3.5.0 cannot be loaded by Manticore version < 3.5.0, but Manticore 3.5.0 understands older formats.
  • INSERT INTO PQ VALUES() (i.e. without providing column list) previously expected exactly (query, tags) as the values. It's been changed to (id,query,tags,filters). The id can be set to 0 if you want it to be auto-generated.
  • allow_empty=0 is a new default in highlighting via HTTP JSON interface.
  • Only absolute paths are allowed for external files (stopwords, exceptions etc.) in CREATE TABLE/ALTER TABLE.

Deprecations:

  • ram_chunks_count was renamed to ram_chunk_segments_count in SHOW INDEX STATUS.
  • workers is obsolete. There's only one workers mode now.
  • dist_threads is obsolete. All queries are as much parallel as possible now (limited by threads and jobs_queue_size).
  • max_children is obsolete. Use threads to set the number of threads Manticore will use (set to the # of CPU cores by default).
  • queue_max_length is obsolete. Instead of that in case it's really needed use jobs_queue_size to fine-tune internal jobs queue size (unlimited by default).
  • All /json/* endpoints are now available w/o /json/, e.g. /search, /insert, /delete, /pq etc.
  • field meaning "full-text field" was renamed to "text" in describe.

    3.4.2:

    mysql> describe t;
    +-------+--------+----------------+
    | Field | Type   | Properties     |
    +-------+--------+----------------+
    | id    | bigint |                |
    | f     | field  | indexed stored |
    +-------+--------+----------------+

    3.5.0:

    mysql> describe t;
    +-------+--------+----------------+
    | Field | Type   | Properties     |
    +-------+--------+----------------+
    | id    | bigint |                |
    | f     | text   | indexed stored |
    +-------+--------+----------------+
  • Cyrillic и doesn't map to i in non_cjk charset_table (which is a default) as it affected Russian stemmers and lemmatizers too much.
  • read_timeout. Use network_timeout instead which controls both reading and writing.

Packages

  • Ubuntu Focal 20.04 official package
  • deb package name changed from manticore-bin to manticore

Bugfixes:

  1. #351 searchd memory leak
  2. ceabe44f Tiny read out of bounds in snippets
  3. 1c3e84a3 Dangerous write into local variable for crash queries
  4. 26e094ab Tiny memory leak of sorter in test 226
  5. d2c7f86a Huge memory leak in test 226
  6. 0dd80122 Cluster shows the nodes are in sync, but count(*) shows different numbers
  7. f1c1ac3f Cosmetic: Duplicate and sometimes lost warning messages in the log
  8. f1c1ac3f Cosmetic: (null) index name in log
  9. 359dbd30 Cannot retrieve more than 70M results
  10. 19f328ee Can't insert PQ rules with no-columns syntax
  11. bf685d5d Misleading error message when inserting a document to an index in a cluster
  12. 2cf18c83 /json/replace and json/update return id in exponent form
  13. #324 Update json scalar properties and mva in the same query
  14. d38409eb hitless_words doesn't work in RT mode
  15. 5813d639 ALTER RECONFIGURE in rt mode should be disallowed
  16. 5813d639 rt_mem_limit gets reset to 128M after searchd restart
  17. highlight() sometimes hangs
  18. 7cd878f4 Failed to use U+code in RT mode
  19. 2b213de4 Failed to use wildcard at wordforms at RT mode
  20. e9d07e68 Fixed SHOW CREATE TABLE vs multiple wordform files
  21. fc90a84f JSON query without "query" crashes searchd
  22. Manticore official docker couldn't index from mysql 8
  23. 23e05d32 HTTP /json/insert requires id
  24. bd679af0 SHOW CREATE TABLE doesn't work for PQ
  25. bd679af0 CREATE TABLE LIKE doesn't work properly for PQ
  26. 5eacf28f End of line in settings in show index status
  27. cb153228 Empty title in "highlight" in HTTP JSON response
  28. #318 CREATE TABLE LIKE infix error
  29. 9040d22c RT crashes under load
  30. cd512c7d Lost crash log on crash at RT disk chunk
  31. #323 Import table fails and closes the connection
  32. 6275316a ALTER reconfigure corrupts a PQ index
  33. 9c1d221e Searchd reload issues after change index type
  34. 71e2b5bb Daemon crashes on import table with missed files
  35. #322 Crash on select using multiple indexes, group by and ranker = none
  36. c3f58490 HIGHLIGHT() doesn't higlight in string attributes
  37. #320 FACET fails to sort on string attribute
  38. 4f1a1f25 Error in case of missing data dir
  39. 04f4ddd4 access_* are not supported in RT mode
  40. 1c0616a2 Bad JSON objects in strings: 1. CALL PQ returns "Bad JSON objects in strings: 1" when the json is greater than some value.
  41. 32f943d6 RT-mode inconsistency. In some cases I can't drop the index since it's unknown and can't create it since the directory is not empty.
  42. #319 Crash on select
  43. 22a28dd7 max_xmlpipe2_field = 2M returned warning on 2M field
  44. #342 Query conditions execution bug
  45. dd8dcab2 Simple 2 terms search finds a document containing only one term
  46. 90919e62 It was impossible in PQ to match a json with capital letters in keys
  47. 56da086a Indexer crashes on csv+docstore
  48. #363 using [null] in json attr in centos 7 causes corrupted inserted data
  49. Major #345 Records not being inserted, count() is random, "replace into" returns OK
  50. max_query_time slows down SELECTs too much
  51. #352 Master-agent communication fails on Mac OS
  52. #328 Error when connecting to Manticore with Connector.Net/Mysql 8.0.19
  53. daa760d2 Fixed escaping of \0 and optimized performance
  54. 9bc5c01a Fixed count distinct vs json
  55. 4f89a965 Fixed drop table at other node failed
  56. 952af5a5 Fix crashes on tightly running call pq

Version 3.4.2, 10 April 2020

Critical bugfixes

  • 2ffe2d26 fix RT index from old version fails to index data

Version 3.4.0, 26 March 2020

Major changes

  • server works in 2 modes: rt-mode and plain-mode
    • rt-mode requires data_dir and no index definition in config
    • in plain-mode indexes are defined in config; no data_dir allowed
  • replication available only in rt-mode

Minor changes

  • charset_table defaults to non_cjk alias
  • in rt-mode full-text fields are indexed and stored by default
  • full-text fields in rt-mode renamed from 'field' to 'text'
  • ALTER RTINDEX is renamed to ALTER TABLE
  • TRUNCATE RTINDEX is renamed to TRUNCATE TABLE

Features

  • stored-only fields
  • SHOW CREATE TABLE, IMPORT TABLE

Improvements

  • much faster lockless PQ
  • /sql can execute any type of SQL statement in mode=raw
  • alias mysql for mysql41 protocol
  • default state.sql in data_dir

Bugfixes

  • a5333644 fix crash on wrong field syntax in highlight()
  • 7fbb9f2e fix crash of server on replicate RT index with docstore
  • 24a04687 fix crash on highlight to index with infix or prefix option and to index wo stored fields enabled
  • 3465c1ce fix false error about empty docstore and dock-id lookup for empty index
  • a707722e fix #314 SQL insert command with trailing semicolon
  • 95628c9b removed warning on query word(s) mismatch
  • b8601b41 fix queries in snippets segmented via ICU
  • 5275516c fix find/add race condition in docstore block cache
  • f06ef97a fix mem leak in docstore
  • a7258ba8 fix #316 LAST_INSERT_ID returns empty on INSERT
  • 1ebd5bf8 fix #317 json/update HTTP endpoint to support array for MVA and object for JSON attribute
  • e426950a fix rash of indexer dumping rt without explicit id

Version 3.3.0, 4 February 2020

Features

  • Parallel Real-Time index searching
  • EXPLAIN QUERY command
  • configuration file without index definitions (alpha version)
  • CREATE/DROP TABLE commands (alpha version)
  • indexer --print-rt - can read from a source and print INSERTs for a Real-Time index

Improvements

  • Updated to Snowball 2.0 stemmers
  • LIKE filter for SHOW INDEX STATUS
  • improved memory usage for high max_matches
  • SHOW INDEX STATUS adds ram_chunks_count for RT indexes
  • lockless PQ
  • changed LimitNOFILE to 65536

Bugfixes

  • 9c33aab8 added check of index schema for duplicate attributes #293
  • a0085f94 fix crash in hitless terms
  • 68953740 fix loose docstore after ATTACH
  • d6f696ed fix docstore issue in distributed setup
  • bce2b7ec replace FixedHash with OpenHash in sorter
  • e0baf739 fix attributes with duplicated names at index definition
  • ca81114b fix html_strip in HIGHLIGHT()
  • 493a5e91 fix passage macro in HIGHLIGHT()
  • a82d41c7 fix double buffer issues when RT index creates small or large disk chunk
  • a404c85d fix event deletion for kqueue
  • 8bea0f6f fix save of disk chunk for large value of rt_mem_limit of RT index
  • 8707f039 fix float overflow on indexing
  • a56434ce fix insert document with negative ID into RT index fails with error now
  • bbebfd75 fix crash of server on ranker fieldmask
  • 3809cc1b fix crash on using query cache
  • dc2a585b fix crash on using RT index RAM segments with parallel inserts

Version 3.2.2, 19 December 2019

Features

  • Autoincrement ID for RT indexes
  • Highlight support for docstore via new HIGHLIGHT() function, available also in HTTP API
  • SNIPPET() can use special function QUERY() which returns current MATCH query
  • new field_separator option for highlighting functions.

Improvements and changes

  • lazy fetch of stored fields for remote nodes (can significantly increase performance)
  • strings and expressions don't break anymore multi-query and FACET optimizations
  • RHEL/CentOS 8 build now uses mysql libclient from mariadb-connector-c-devel
  • ICU data file is now shipped with the packages, icu_data_dir removed
  • systemd service files include 'Restart=on-failure' policy
  • indextool can now check real-time indexes online
  • default conf is now /etc/manticoresearch/manticore.conf
  • service on RHEL/CentOS renamed to 'manticore' from 'searchd'
  • removed query_mode and exact_phrase snippet's options

Bugfixes

  • 6ae474c7 fix crash on SELECT query over HTTP interface
  • 59577513 fix RT index saves disk chunks but does not mark some documents deleted
  • e861f0fc fix crash on search of multi index or multi queries with dist_threads
  • 440991fc fix crash on infix generation for long terms with wide utf8 codepoints
  • 5fd599b4 fix race at adding socket to IOCP
  • cf10d7d3 fix issue of bool queries vs json select list
  • 996de77f fix indextool check to report wrong skiplist offset, check of doc2row lookup
  • 6e3fc9e8 fix indexer produces bad index with negative skiplist offset on large data
  • faed3220 fix JSON converts only numeric to string and JSON string to numeric conversion at expressions
  • 53319720 fix indextool exit with error code in case multiple commands set at command line
  • 795520ac fix #275 binlog invalid state on error no space left on disk
  • 2284da5e fix #279 crash on IN filter to JSON attribute
  • ce2e4b47 fix #281 wrong pipe closing call
  • 535589ba fix server hung at CALL PQ with recursive JSON attribute encoded as string
  • a5fc8a36 fix advancing beyond the end of the doclist in multiand node
  • a3628617 fix retrieving of thread public info
  • f8d2d7bb fix docstore cache locks

Version 3.2.0, 17 October 2019

Features

  • Document storage
  • new directives stored_fields, docstore_cache_size, docstore_block_size, docstore_compression, docstore_compression_level

Improvements and changes

  • improved SSL support
  • non_cjk built-in charset updated
  • disabled UPDATE/DELETE statements logging a SELECT in query log
  • RHEL/CentOS 8 packages

Bugfixes

  • 301a806b1 fix crash on replace document in disk chunk of RT index
  • 46c1cad8f fix #269 LIMIT N OFFSET M
  • 92a46edaa fix DELETE statements with id explicitly set or id list provided to skip search
  • 8ca78c138 fix wrong index after event removed at netloop at windowspoll poller
  • 603631e2b fix float roundup at JSON via HTTP
  • 62f64cb9e fix remote snippets to check empty path first; fixing windows tests
  • aba274c2c fix reload of config to work on windows same way as on linux
  • 6b8c4242e fix #194 PQ to work with morphology and stemmers
  • 174d31290 fix RT retired segments management

Version 3.1.2, 22 August 2019

Features and Improvements

  • Experimental SSL support for HTTP API
  • field filter for CALL KEYWORDS
  • max_matches for /json/search
  • automatic sizing of default Galera gcache.size
  • improved FreeBSD support

Bugfixes

  • 0a1a2c81 fixed replication of RT index into node where same RT index exists and has different path
  • 4adc0752 fix flush rescheduling for indexes without activity
  • d6c00a6f improve rescheduling of flushing RT/PQ indexes
  • d0a7c959 fix #250 index_field_lengths index option for TSV and CSV piped sources
  • 1266d548 fix indextool wrong report for block index check on empty index
  • 553ca73c fix empty select list at Manticore SQL query log
  • 56c85844 fix indexer -h/--help response

Version 3.1.0, 16 July 2019

Features and Improvements

  • replication for RealTime indexes
  • ICU tokenizer for chinese
  • new morphology option icu_chinese
  • new directive icu_data_dir
  • multiple statements transactions for replication
  • LAST_INSERT_ID() and @session.last_insert_id
  • LIKE 'pattern' for SHOW VARIABLES
  • Multiple documents INSERT for percolate indexes
  • Added time parsers for config
  • internal task manager
  • mlock for doc and hit lists components
  • jail snippets path

Removals

  • RLP library support dropped in favor of ICU; all rlp* directives removed
  • updating document ID with UPDATE is disabled

Bugfixes

  • f0472223 fix defects in concat and group_concat
  • b08147ee fix query uid at percolate index to be BIGINT attribute type
  • 4cd85afa do not crash if failed to prealloc a new disk chunk
  • 1a551227 add missing timestamp data type to ALTER
  • f3a8e096 fix crash of wrong mmap read
  • 44757711 fix hash of clusters lock in replication
  • ff476df9 fix leak of providers in replication
  • 58dcbb77 fix #246 undefined sigmask in indexer
  • 3dd8278e fix race in netloop reporting
  • a02aae05 zero gap for HA strategies rebalancer

Version 3.0.2, 31 May 2019

Improvements

  • added mmap readers for docs and hit lists
  • /sql HTTP endpoint response is now the same as /json/search response
  • new directives access_plain_attrs, access_blob_attrs, access_doclists, access_hitlists
  • new directive server_id for replication setups

Removals

  • removed HTTP /search endpoint

Deprecations

  • ondisk_attrs, ondisk_attrs_default, mlock (replaced by [access]()* directives)

Bugfixes

  • 849c16e1 allow attribute names starting with numbers in select list
  • 48e6c302 fixed MVAs in UDFs, fixed MVA aliasing
  • 055586a9 fixed #187 crash when using query with SENTENCE
  • 93bf52f2 fixed #143 support () around MATCH()
  • 599ee79c fixed save of cluster state on ALTER cluster statement
  • 230c321e fixed crash of server on ALTER index with blob attributes
  • 5802b85a fixed #196 filtering by id
  • 25d2dabd discard searching on template indexes
  • 2a30d5b4 fixed id column to have regular bigint type at SQL reply

Version 3.0.0, 6 May 2019

Features and improvements

  • New index storage. Non-scalar attributes are not limited anymore to 4GB size per index
  • attr_update_reserve directive
  • String,JSON and MVAs can be updated using UPDATE
  • killlists are applied at index load time
  • killlist_target directive
  • multi AND searches speedup
  • better average performance and RAM usage
  • convert tool for upgrading indexes made with 2.x
  • CONCAT() function
  • JOIN CLUSTER cluster AT 'nodeaddress:port'
  • ALTER CLUSTER posts UPDATE nodes
  • node_address directive
  • list of nodes printed in SHOW STATUS

Behaviour changes

  • in case of indexes with killists, server doesn't rotate indexes in order defined in conf, but follows the chain of killlist targets
  • order of indexes in a search no longer defines the order in which killlists are applied
  • Document IDs are now signed big integers

Removed directives

  • docinfo (always extern now), inplace_docinfo_gap, mva_updates_pool

Version 2.8.2 GA, 2 April 2019

Features and improvements

  • Galera replication for percolate indexes
  • OPTION morphology

Compiling notes

Cmake minimum version is now 3.13. Compiling requires boost and libssl development libraries.

Bugfixes

  • 6967fedb fixed crash on many stars at select list for query into many distributed indexes
  • 36df1a40 fixed #177 large packet via Manticore SQL interface
  • 57932aec fixed #170 crash of server on RT optimize with MVA updated
  • edb24b87 fixed server crash on binlog removed due to RT index remove after config reload on SIGHUP
  • bd3e66e0 fixed mysql handshake auth plugin payloads
  • 6a217f6e fixed #172 phrase_boundary settings at RT index
  • 3562f652 fixed #168 deadlock at ATTACH index to itself
  • 250b3f0e fixed binlog saves empty meta after server crash
  • 4aa6c69a fixed crash of server due to string at sorter from RT index with disk chunks

Version 2.8.1 GA, 6 March 2019

Features and improvements

  • SUBSTRING_INDEX()
  • SENTENCE and PARAGRAPH support for percolate queries
  • systemd generator for Debian/Ubuntu; also added LimitCORE to allow core dumping

Bugfixes

  • 84fe7405 fixed crash of server on match mode all and empty full text query
  • daa88b57 fixed crash on deleting of static string
  • 22078537 fixed exit code when indextool failed with FATAL
  • 0721696d fixed #109 no matches for prefixes due to wrong exact form check
  • 8af81011 fixed #161 reload of config settings for RT indexes
  • e2d59277 fixed crash of server on access of large JSON string
  • 75cd1342 fixed PQ field at JSON document altered by index stripper causes wrong match from sibling field
  • e2f77543 fixed crash of server at parse JSON on RHEL7 builds
  • 3a25a580 fixed crash of json unescaping when slash is on the edge
  • be9f4978 fixed option 'skip_empty' to skip empty docs and not warn they're not valid json
  • 266e0e7b fixed #140 output 8 digits on floats when 6 is not enough to be precise
  • 3f6d2389 fixed empty jsonobj creation
  • f3c7848a fixed #160 empty mva outputs NULL instead of an empty string
  • 0afa2ed0 fixed fail to build without pthread_getname_np
  • 9405fccd fixed crash on server shutdown with thread_pool workers

Version 2.8.0 GA, 28 January 2019

Improvements

  • Distributed indexes for percolate indexes
  • CALL PQ new options and changes:
    • skip_bad_json
    • mode (sparsed/sharded)
    • json documents can be passed as a json array
    • shift
    • Column names 'UID', 'Documents', 'Query', 'Tags', 'Filters' were renamed to 'id', 'documents', 'query', 'tags', 'filters'
  • DESCRIBE pq TABLE
  • SELECT FROM pq WHERE UID is not possible any more, use 'id' instead
  • SELECT over pq indexes is on par with regular indexes (e.g. you can filter rules via REGEX())
  • ANY/ALL can be used on PQ tags
  • expressions have auto-conversion for JSON fields, not requiring explicit casting
  • built-in 'non_cjk' charset_table and 'cjk' ngram_chars
  • built-in stopwords collections for 50 languages
  • multiple files in a stopwords declaration can also be separated by comma
  • CALL PQ can accept JSON array of documents

Bugfixes

  • a4e19af fixed csjon-related leak
  • 28d8627 fixed crash because of missed value in json
  • bf4e9ea fixed save of empty meta for RT index
  • 33b4573 fixed lost form flag (exact) for sequence of lemmatizer
  • 6b95d48 fixed string attrs > 4M use saturate instead of overflow
  • 621418b fixed crash of server on SIGHUP with disabled index
  • 3f7e35d fixed server crash on simultaneous API session status commands
  • cd9e4f1 fixed crash of server at delete query to RT index with field filters
  • 9376470 fixed crash of server at CALL PQ to distributed index with empty document
  • 8868b20 fixed cut Manticore SQL error message larger 512 chars
  • de9deda fixed crash on save percolate index without binlog
  • 2b219e1 fixed http interface is not working in OSX
  • e92c602 fixed indextool false error message on check of MVA
  • 238bdea fixed write lock at FLUSH RTINDEX to not write lock whole index during save and on regular flush from rt_flush_period
  • c26a236 fixed ALTER percolate index stuck waiting search load
  • 9ee5703 fixed max_children to use default amount of thread_pool workers for value of 0
  • 5138fc0 fixed error on indexing of data into index with index_token_filter plugin along with stopwords and stopword_step=0
  • 2add3d3 fixed crash with absent lemmatizer_base when still using aot lemmatizers in index definitions

Version 2.7.5 GA, 4 December 2018

Improvements

  • REGEX function
  • limit/offset for json API search
  • profiler points for qcache

Bugfixes

  • eb3c768 fixed crash of server on FACET with multiple attribute wide types
  • d915cf6 fixed implicit group by at main select list of FACET query
  • 5c25dc2 fixed crash on query with GROUP N BY
  • 85d30a2 fixed deadlock on handling crash at memory operations
  • 85166b5 fixed indextool memory consumption during check
  • 58fb031 fixed gmock include not needed anymore as upstream resolve itself

Version 2.7.4 GA, 1 November 2018

Improvements

  • SHOW THREADS in case of remote distributed indexes prints the original query instead of API call
  • SHOW THREADS new option format=sphinxql prints all queries in SQL format
  • SHOW PROFILE prints additional clone_attrs stage

Bugfixes

  • 4f15571 fixed failed to build with libc without malloc_stats, malloc_trim
  • f974f20 fixed special symbols inside words for CALL KEYWORDS result set
  • 0920832 fixed broken CALL KEYWORDS to distributed index via API or to remote agent
  • fd686bf fixed distributed index agent_query_timeout propagate to agents as max_query_time
  • 4ffa623 fixed total documents counter at disk chunk got affected by OPTIMIZE command and breaks weight calculation
  • dcaf4e0 fixed multiple tail hits at RT index from blended
  • eee3817 fixed deadlock at rotation

Version 2.7.3 GA, 26 September 2018

Improvements

  • sort_mode option for CALL KEYWORDS
  • DEBUG on VIP connection can perform 'crash ' for intentional SIGEGV action on server
  • DEBUG can perform 'malloc_stats' for dumping malloc stats in searchd.log 'malloc_trim' to perform a malloc_trim()
  • improved backtrace is gdb is present on the system

Bugfixes

  • 0f3cc33 fixed crash or hfailure of rename on Windows
  • 1455ba2 fixed crashes of server on 32-bit systems
  • ad3710d fixed crash or hung of server on empty SNIPPET expression
  • b36d792 fixed broken non progressive optimize and fixed progressive optimize to not create kill-list for oldest disk chunk
  • 34b0324 fixed queue_max_length bad reply for SQL and API at thread pool worker mode
  • ae4b320 fixed crash on adding full-scan query to PQ index with regexp or rlp options set
  • f80f8d5 fixed crash when call one PQ after another
  • 9742f5f refactor AcquireAccum
  • 39e5bc3 fixed leak of memory after call pq
  • 21bcc6d cosmetic refactor (c++11 style c-trs, defaults, nullptrs)
  • 2d69039 fixed memory leak on trying to insert duplicate into PQ index
  • 5ed92c4 fixed crash on JSON field IN with large values
  • 4a5262e fixed crash of server on CALL KEYWORDS statement to RT index with expansion limit set
  • 552646b fixed invalid filter at PQ matches query;
  • 204f521 introduce small obj allocator for ptr attrs
  • 25453e5 refactor ISphFieldFilter to refcounted flavour
  • 1366ee0 fixed ub/sigsegv when using strtod on non-terminated strings
  • 94bc6fc fixed memory leak in json resultset processing
  • e78e9c9 fixed read over the end of mem block applying attribute add
  • fad572f fixed refactor CSphDict for refcount flavour
  • fd841a4 fixed leak of AOT internal type outside
  • 5ee7f20 fixed memory leak tokenizer management
  • 116c5f1 fixed memory leak in grouper
  • 56fdbc9 special free/copy for dynamic ptrs in matches (memory leak grouper)
  • b1fc161 fixed memory leak of dynamic strings for RT
  • 517b9e8 refactor grouper
  • b1fc161 minor refactor (c++11 c-trs, some reformats)
  • 7034e07 refactor ISphMatchComparator to refcounted flavour
  • b1fc161 privatize cloner
  • efbc051 simplify native little-endian for MVA_UPSIZE, DOCINFO2ID_T, DOCINFOSETID
  • 6da0df4 add valgrind support to to ubertests
  • 1d17669 fixed crash because race of 'success' flag on connection
  • 5a09c32 switch epoll to edge-triggered flavour
  • 5d52868 fixed IN statement in expression with formatting like at filter
  • bd8b3c9 fixed crash at RT index on commit of document with large docid
  • ce656b8 fixed argless options in indextool
  • 08c9507 fixed memory leak of expanded keyword
  • 30c75a2 fixed memory leak of json grouper
  • 6023f26 fixed leak of global user vars
  • 7c138f1 fixed leakage of dynamic strings on early rejected matches
  • 9154b18 fixed leakage on length()
  • 43fca3a fixed memory leak because strdup() in parser
  • 71ff777 fixed refactor expression parser to accurate follow refcounts

Version 2.7.2 GA, 27 August 2018

Improvements

  • compatibility with MySQL 8 clients
  • TRUNCATE WITH RECONFIGURE
  • retired memory counter on SHOW STATUS for RT indexes
  • global cache of multi agents
  • improved IOCP on Windows
  • VIP connections for HTTP protocol
  • Manticore SQL DEBUG command which can run various subcommands
  • shutdown_token - SHA1 hash of password needed to invoke shutdown using DEBUG command
  • new stats to SHOW AGENT STATUS (_ping, _has_perspool, _need_resolve)
  • --verbose option of indexer now accept [debugvv] for printing debug messages

Bugfixes

  • 390082 removed wlock at optimize
  • 4c3376 fixed wlock at reload index settings
  • b5ea8d fixed memory leak on query with JSON filter
  • 930e83 fixed empty documents at PQ result set
  • 53deec fixed confusion of tasks due to removed one
  • cad9b9 fixed wrong remote host counting
  • 90008c fixed memory leak of parsed agent descriptors
  • 978d83 fixed leak in search
  • 019394 cosmetic changes on explicit/inline c-trs, override/final usage
  • 943e29 fixed leak of json in local/remote schema
  • 02dbdd fixed leak of json sorting col expr in local/remote schema
  • c74d0b fixed leak of const alias
  • 6e5b57 fixed leak of preread thread
  • 39c740 fixed stuck on exit because of stucked wait in netloop
  • adaf97 fixed stuck of 'ping' behaviour on change HA agent to usual host
  • 32c40e separate gc for dashboard storage
  • 511a3c fixed ref-counted ptr fix
  • 32c40e fixed indextool crash on unexistent index
  • 156edc fixed output name of exceeding attr/field in xmlpipe indexing
  • cdac6d fixed default indexer's value if no indexer section in config
  • e61ec0 fixed wrong embedded stopwords in disk chunk by RT index after server restart
  • 5fba49 fixed skip phantom (already closed, but not finally deleted from the poller) connections
  • f22ae3 fixed blended (orphaned) network tasks
  • 46890e fixed crash on read action after write
  • 03f9df fixed searchd crashes when running tests on windows
  • e9255e fixed handle EINPROGRESS code on usual connect()
  • 248b72 fixed connection timeouts when working with TFO

Version 2.7.1 GA, 4 July 2018

Improvements

  • improved wildcards performance on matching multiple documents at PQ
  • support for fullscan queries at PQ
  • support for MVA attributes at PQ
  • regexp and RLP support for percolate indexes

Bugfixes

  • 688562 fixed loose of query string
  • 0f1770 fixed empty info at SHOW THREADS statement
  • 53faa3 fixed crash on matching with NOTNEAR operator
  • 26029a fixed error message on bad filter to PQ delete

Version 2.7.0 GA, 11 June 2018

Improvements

  • reduced number of syscalls to avoid Meltdown and Spectre patches impact
  • internal rewrite of local index management
  • remote snippets refactor
  • full configuration reload
  • all node connections are now independent
  • proto improvements
  • Windows communication switched from wsapoll to IO completion ports
  • TFO can be used for communication between master and nodes
  • SHOW STATUS now outputs to server version and mysql_version_string
  • added docs_id option for documents called in CALL PQ.
  • percolate queries filter can now contain expressions
  • distributed indexes can work with FEDERATED
  • dummy SHOW NAMES COLLATE and SET wait_timeout (for better ProxySQL compatibility)

Bugfixes

  • 5bcff0 fixed added not equal to tags of PQ
  • 9ebc58 fixed added document id field to JSON document CALL PQ statement
  • 8ae0e5 fixed flush statement handlers to PQ index
  • c24b15 fixed PQ filtering on JSON and string attributes
  • 1b8bdd fixed parsing of empty JSON string
  • 1ad8a0 fixed crash at multi-query with OR filters
  • 69b898 fixed indextool to use config common section (lemmatizer_base option) for commands (dumpheader)
  • 6dbeaf fixed empty string at result set and filter
  • 39c4eb fixed negative document id values
  • 266b70 fixed word clip length for very long words indexed
  • 47823b fixed matching multiple documents of wildcard queries at PQ

Version 2.6.4 GA, 3 May 2018

Features and improvements

  • MySQL FEDERATED engine support
  • MySQL packets return now SERVER_STATUS_AUTOCOMMIT flag, adds compatibility with ProxySQL
  • listen_tfo - enable TCP Fast Open connections for all listeners
  • indexer --dumpheader can dump also RT header from .meta file
  • cmake build script for Ubuntu Bionic

Bugfixes

  • 355b116 fixed invalid query cache entries for RT index;
  • 546e229 fixed index settings got lost next after seamless rotation
  • 0c45098 fixed fixed infix vs prefix length set; added warning on unsupportedinfix length
  • 80542fa fixed RT indexes auto-flush order
  • 705d8c5 fixed result set schema issues for index with multiple attributes and queries to multiple indexes
  • b0ba932 fixed some hits got lost at batch insert with document duplicates
  • 4510fa4 fixed optimize failed to merge disk chunks of RT index with large documents count

Version 2.6.3 GA, 28 March 2018

Improvements

  • jemalloc at compilation. If jemalloc is present on system, it can be enabled with cmake flag -DUSE_JEMALLOC=1

Bugfixes

  • 85a6d7e fixed log expand_keywords option into Manticore SQL query log
  • caaa384 fixed HTTP interface to correctly process query with large size
  • e386d84 fixed crash of server on DELETE to RT index with index_field_lengths enable
  • cd538f3 fixed cpustats searchd cli option to work with unsupported systems
  • 8740fd6 fixed utf8 substring matching with min lengths defined

Version 2.6.2 GA, 23 February 2018

Improvements

  • improved Percolate Queries performance in case of using NOT operator and for batched documents.
  • percolate_query_call can use multiple threads depending on dist_threads
  • new full-text matching operator NOTNEAR/N
  • LIMIT for SELECT on percolate indexes
  • expand_keywords can accept 'start','exact' (where 'star,exact' has same effect as '1')
  • ranged-main-query for joined fields which uses the ranged query defined by sql_query_range

Bugfixes

  • 72dcf66 fixed crash on searching ram segments; deadlock on save disk chunk with double buffer; deadlock on save disk chunk during optimize
  • 3613714 fixed indexer crash on xml embedded schema with empty attribute name
  • 48d7e80 fixed erroneous unlinking of not-owned pid-file
  • a5563a4 fixed orphaned fifos sometimes left in temp folder
  • 2376e8f fixed empty FACET result set with wrong NULL row
  • 4842b67 fixed broken index lock when running server as windows service
  • be35fee fixed wrong iconv libs on mac os
  • 83744a9 fixed wrong count(*)

Version 2.6.1 GA, 26 January 2018

Improvements

  • agent_retry_count in case of agents with mirrors gives the value of retries per mirror instead of per agent, the total retries per agent being agent_retry_count*mirrors.
  • agent_retry_count can now be specified per index, overriding global value. An alias mirror_retry_count is added.
  • a retry_count can be specified in agent definition and the value represents retries per agent
  • Percolate Queries are now in HTTP JSON API at /json/pq.
  • Added -h and -v options (help and version) to executables
  • morphology_skip_fields support for Real-Time indexes

Bugfixes

  • a40b079 fixed ranged-main-query to correctly work with sql_range_step when used at MVA field
  • f2f5375 fixed issue with blackhole system loop hung and blackhole agents seems disconnected
  • 84e1f54 fixed query id to be consistent, fixed duplicated id for stored queries
  • 1948423 fixed server crash on shutdown from various states
  • 9a706b 3495fd7 timeouts on long queries
  • 3359bcd8 refactored master-agent network polling on kqueue-based systems (Mac OS X, BSD).

Version 2.6.0, 29 December 2017

Features and improvements

Bugfixes

  • 0cfae4c fixed crash on debug build of server (and m.b. UB on release) when built with rlp
  • 324291e fixed RT index optimize with progressive option enabled that merges kill-lists with wrong order
  • ac0efee minor crash on mac
  • lots of minor fixes after thorough static code analysis
  • other minor bugfixes

Upgrade

In this release we've changed internal protocol used by masters and agents to speak with each other. In case you run Manticoresearch in a distributed environment with multiple instances make sure your first upgrade agents, then the masters.

Version 2.5.1, 23 November 2017

Features and improvements

  • JSON queries on HTTP API protocol. Supported search, insert, update, delete, replace operations. Data manipulation commands can be also bulked, also there are some limitations currently as MVA and JSON attributes can't be used for inserts, replaces or updates.
  • RELOAD INDEXES command
  • FLUSH LOGS command
  • SHOW THREADS can show progress of optimize, rotation or flushes.
  • GROUP N BY work correctly with MVA attributes
  • blackhole agents are run on separate thread to not affect master query anymore
  • implemented reference count on indexes, to avoid stalls caused by rotations and high load
  • SHA1 hashing implemented, not exposed yet externally
  • fixes for compiling on FreeBSD, macOS and Alpine

Bugfixes

  • 989752b filter regression with block index
  • b1c3864 rename PAGE_SIZE -> ARENA_PAGE_SIZE for compatibility with musl
  • f2133cc disable googletests for cmake < 3.1.0
  • f30ec53 failed to bind socket on server restart
  • 0807240 fixed crash of server on shutdown
  • 3e3acc3 fixed show threads for system blackhole thread
  • 262c3fe Refactored config check of iconv, fixes building on FreeBSD and Darwin

Version 2.4.1 GA, 16 October 2017

Features and improvements

  • OR operator in WHERE clause between attribute filters
  • Maintenance mode ( SET MAINTENANCE=1)
  • CALL KEYWORDS available on distributed indexes
  • Grouping in UTC
  • query_log_mode for custom log files permissions
  • Field weights can be zero or negative
  • max_query_time can now affect full-scans
  • added net_wait_tm, net_throttle_accept and net_throttle_action for network thread fine tuning (in case of workers=thread_pool)
  • COUNT DISTINCT works with facet searches
  • IN can be used with JSON float arrays
  • multi-query optimization is not broken anymore by integer/float expressions
  • SHOW META shows a multiplier row when multi-query optimization is used

Compiling

Manticore Search is built using cmake and the minimum gcc version required for compiling is 4.7.2.

Folders and service

  • Manticore Search runs under manticore user.
  • Default data folder is now /var/lib/manticore/.
  • Default log folder is now /var/log/manticore/.
  • Default pid folder is now /var/run/manticore/.

Bugfixes

  • a58c619 fixed SHOW COLLATION statement that breaks java connector
  • 631cf4e fixed crashes on processing distributed indexes; added locks to distributed index hash; removed move and copy operators from agent
  • 942bec0 fixed crashes on processing distributed indexes due to parallel reconnects
  • e5c1ed2 fixed crash at crash handler on store query to server log
  • 4a4bda5 fixed a crash with pooled attributes in multiqueries
  • 3873bfb fixed reduced core size by prevent index pages got included into core file
  • 11e6254 fixed searchd crashes on startup when invalid agents are specified
  • 4ca6350 fixed indexer reports error in sql_query_killlist query
  • 123a9f0 fixed fold_lemmas=1 vs hit count
  • cb99164 fixed inconsistent behavior of html_strip
  • e406761 fixed optimize rt index loose new settings; fixed optimize with sync option lock leaks;
  • 86aeb82 fixed processing erroneous multiqueries
  • 2645230 fixed result set depends on multi-query order
  • 72395d9 fixed server crash on multi-query with bad query
  • f353326 fixed shared to exclusive lock
  • 3754785 fixed server crash for query without indexes
  • 29f360e fixed dead lock of server

Version 2.3.3, 06 July 2017

  • Manticore branding

Reporting bugs

Unfortunately, Manticore is not yet 100% bug free (even though we're working hard towards that), so you might occasionally run into some issues.

Reporting as much as possible about each bug is very important - because to fix it, we need to be able to either reproduce and fix the bug, or to deduce what's causing it from the information that you provide. So here are some instructions how to do that.

Bug-tracker

We track bugs and feature requests in Github. Feel free to create a new ticket and describe your bug in details so both you and developers can save their time.

Crashes

Manticore is written in C++ - low level programming language allowing to speak to the computer with not so many intermediate layers for faster performance. The drawback of that is that in rare cases there is no way to handle a bug elegantly writing the error about it to a log and skipping processing the command which caused the problem. Instead of that the program can just crash which means it would stop completely and would have to be restarted.

In case of crashes we sometimes can get enough info to fix from the backtrace which Manticore tries to write down in the log file. It might look like this:

./indexer(_Z12sphBacktraceib+0x2d6)[0x5d337e]
./indexer(_Z7sigsegvi+0xbc)[0x4ce26a]
/lib64/libpthread.so.0[0x3f75a0dd40]
/lib64/libc.so.6(fwrite+0x34)[0x3f74e5f564]
./indexer(_ZN27CSphCharsetDefinitionParser5ParseEPKcR10CSphVectorI14CSphRemapRange16CSphVe
ctorPolicyIS3_EE+0x5b)[0x51701b]
./indexer(_ZN13ISphTokenizer14SetCaseFoldingEPKcR10CSphString+0x62)[0x517e4c]
./indexer(_ZN17CSphTokenizerBase14SetCaseFoldingEPKcR10CSphString+0xbd)[0x518283]
./indexer(_ZN18CSphTokenizer_SBCSILb0EEC1Ev+0x3f)[0x5b312b]
./indexer(_Z22sphCreateSBCSTokenizerv+0x20)[0x51835c]
./indexer(_ZN13ISphTokenizer6CreateERK21CSphTokenizerSettingsPK17CSphEmbeddedFilesR10CSphS
tring+0x47)[0x5183d7]
./indexer(_Z7DoIndexRK17CSphConfigSectionPKcRK17SmallStringHash_TIS_EbP8_IO_FILE+0x494)[0x
4d31c8]
./indexer(main+0x1a17)[0x4d6719]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3f74e1d8a4]
./indexer(__gxx_personality_v0+0x231)[0x4cd779]

This was an example of a good backtrace - we can see mangled function names here.

But sometimes backtrace may look like this:

/opt/piler/bin/indexer[0x4c4919]
/opt/piler/bin/indexer[0x405cf0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fc659cb6cb0]
/opt/piler/bin/indexer[0x4237fd]
/opt/piler/bin/indexer[0x491de6]
/opt/piler/bin/indexer[0x451704]
/opt/piler/bin/indexer[0x40861a]
/opt/piler/bin/indexer[0x40442c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fc6588aa76d]
/opt/piler/bin/indexer[0x405b89]

Developers might not get anything useful from these cryptic numbers since it doesn't show function names. To help that you need to provide symbols (function and variable names). If you've installed Manticore by building from sources, run the following command over your binary:

nm -n indexer > indexer.sym

Attach this file to a bug report along with backtrace. You should however ensure that the binary is not stripped. Our official binary packages should be fine. However, if you built Manticore manually from sources, do not run strip utility on that binary, and/or do not let your build/packaging system do that, otherwise the symbols will be lost completely.

Core dumps

A core dump is a file containing a process's address space (memory) when the process terminates unexpectedly.

Sometimes the backtrace doesn't provide enough information about the cause of a crash or the crash cannot be easily reproduced and core files are required for troubleshooting.

For searchd (Manticore search server) to record a core dump in case of a crash, the following needs to be ensured:

  • core dumping needs to be enabled on the running operating system. Some operating systems do not have core dumping enabled by default
  • searchd needs to be started with --coredump option

Please note that searchd core files can use a lot of space as they include data from the loaded indexes and each crash creates a new core file. Free space should be monitored while searchd runs with --coredump option enabled to avoid 100% disk usage.

Hanging

In case Manticore is hanging for some reason and

  1. the instance is under watchdog (which is on by default)
  2. gdb is installed

Then:

  • either connect to the instance via mysql (vip or regular port) and issue debug procdump
  • or manually send USR1 signal to watchdog of the hanging instance (not to the instance process itself)
  • or manually run gdb attach <PID_of_hanged> and then these commands one by one:
    1. info threads
    2. thread apply all bt
    3. bt
    4. info locals
    5. detach

In the first 2 cases trace will be in the server's log. In the last (manual gdb) case it has to be copied from console output. These traces need to be attached, it will be very helpful for investigation.

Uploading your data

To fix your bug developers often need to reproduce it locally. To do it they need your configuration file, index files, binlog (if present), sometimes data to index (like data from external storages or XML/CSV files) and queries.

Attach your data when you create a ticket on Github. In case it's too big or the data is sensitive feel free to upload it to our write-only FTP server:

  • ftp: dev.manticoresearch.com
  • user: manticorebugs
  • pass: shithappens
  • directory: create directory github-issue-N so we understand what data is related with what issue on Github.

DEBUG

DEBUG [ subcommand ]

DEBUG statement is designed to call different internal or vip commands for dev/testing purposes. It is not intended for production automation, since the syntax of subcommand part may be freely changed in any build.

Call DEBUG without params to show list of useful commands (in general) and subcommands (of DEBUG statement) available at the moment.

However you can invoke DEBUG without params to know which subcommands of the statement are available in any particular case:

mysql> debug;
+----------------+---------------------+
| command        | meaning             |
+----------------+---------------------+
| flush logs     | emulate USR1 signal |
| reload indexes | emulate HUP signal  |
+----------------+---------------------+
2 rows in set (0,00 sec)

(these commands are already documented, but such short help just remind about them).

If you connect via 'VIP' connection (see listen for details) the output might be a bit different:

mysql> debug;
+---------------------------+------------------------------+
| command                   | meaning                      |
+---------------------------+------------------------------+
| debug shutdown <password> | emulate TERM signal          |
| debug token <password>    | calculate token for password |
| flush logs                | emulate USR1 signal          |
| reload indexes            | emulate HUP signal           |
+---------------------------+------------------------------+
4 rows in set (0,00 sec)

Here you can see additional commands available only in the current context (namely, if you connected on a VIP port). Two additional subcommands available right now are token and shutdown. The first one just calculates a hash (SHA1) of the (which, in turn, may be empty, or a word, or num/phrase enclosed in '-quotes) like:

mysql> debug token hello;
+-------------+------------------------------------------+
| command     | result                                   |
+-------------+------------------------------------------+
| debug token | aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d |
+-------------+------------------------------------------+
1 row in set (0,00 sec)

Another debug subcommand shutdown will send a TERM signal to the server and so will make it shut down. Since it is quite dangerous (nobody wants accidentally stop a production service), it:

  1. needs a VIP connection, and
  2. needs the password

For the chosen password you need to generate a token with debug token subcommand, and put it into shutdown_token param of searchd section of the config file. If no such section exists, or if a hash of the provided password does not match with the token stored in the config, the subcommand will do nothing. Otherwise it will cause 'clean' shutdown of the server.

References

SQL commands

Schema management
Data management
  • INSERT - Adds new documents
  • UPDATE - Replaces existing documents with new ones
  • UPDATE - Does in-place update in documents
  • DELETE - Deletes documents
  • TRUNCATE TABLE - Deletes all documents from index
SELECT
Flushing misc things
  • FLUSH ATTRIBUTES - Forces flushing updated attributes to disk
  • FLUSH HOSTNAMES - Renews IPs associates to agent host names
  • FLUSH LOGS - Initiates reopen of searchd log and query log files (similar to USR1)
Real-time index optimization
Importing to a real-time index
  • ATTACH INDEX - Moves data from a plain index to a real-time index
  • IMPORT TABLE - Imports previously created RT index into a server running in RT mode
Replication
Plain index rotate
Transactions
  • BEGIN - Begins a transaction
  • COMMIT - Finishes a transaction
  • ROLLBACK - Rolls back a transaction
CALL
Plugins
Server status

HTTP endpoints

  • /sql - Allows running an SQL statement over HTTP
  • /insert - Inserts a document into a real-time index
  • /pq/idx/doc - Inserts a PQ rule into a percolate index
  • /update - Updates a document in a real-time index
  • /replace - Replaces a document in a real-time index
  • /pq/idx/doc/N?refresh=1 - Replaces a PQ rule in a percolate index
  • /delete - Deletes a document in an index
  • /bulk - Perform several insert, update or delete operations in a single call
  • /search - Performs search
  • /pq/idx/search - Performs reverse search in a percolate index

Common things

Common index settings
Plain index settings
Distributed index settings
RT index settings

Full-text search operators

Functions

Mathematical
  • ABS() - Returns absolute value
  • ATAN2() - Returns arctangent function of two arguments
  • BITDOT() - Returns sum of products of an each bit of a mask multiplied with its weight
  • CEIL() - Returns smallest integer value greater or equal to the argument
  • COS() - Returns cosine of the argument
  • CRC32() - Returns CRC32 value of the argument
  • EXP() - Returns exponent of the argument
  • FIBONACCI() - Returns the N-th Fibonacci number, where N is the integer argument
  • FLOOR() - Returns the largest integer value lesser or equal to the argument
  • GREATEST() - Takes JSON/MVA array as the argument and returns the greatest value in that array
  • IDIV() - Returns result of an integer division of the first argument by the second argument
  • LEAST() - Takes JSON/MVA array as the argument, and returns the least value in that array
  • LN() - Returns natural logarithm of the argument
  • LOG10() - Returns common logarithm of the argument
  • LOG2() - Returns binary logarithm of the argument
  • MAX() - Returns the bigger of two arguments
  • MIN() - Returns the smaller of two arguments
  • POW() - Returns the first argument raised to the power of the second argument
  • RAND() - Returns random float between 0..1
  • SIN() - Returns sine of the argument
  • SQRT() - Returns square root of the argument
Searching and ranking
  • BM25F() - Returns precise BM25F formula value
  • EXIST() - Replaces non-existing columns with default values
  • GROUP_CONCAT() - Produces a comma-separated list of the attribute values of all documents in the group
  • HIGHLIGHT() - Highlights search results
  • MIN_TOP_SORTVAL() - Returns sort key value of the worst found element in the current top-N matches
  • MIN_TOP_WEIGHT() - Returns weight of the worst found element in the current top-N matches
  • PACKEDFACTORS() - Outputs weighting factors
  • REMOVE_REPEATS() - Removes repeated adjusted rows with the same 'column' value
  • WEIGHT() - Returns fulltext match score
  • ZONESPANLIST() - Returns pairs of matched zone spans
  • QUERY() - Returns current full-text query
Type casting
  • BIGINT() - Forcibly promotes the integer argument to 64-bit type
  • DOUBLE() - Forcibly promotes given argument to floating point type
  • INTEGER() - Forcibly promotes given argument to 64-bit signed type
  • TO_STRING() - Forcibly promotes the argument to string type
  • UINT() - Forcibly reinterprets given argument to 64-bit unsigned type
  • SINT() - Interprets 32-bit unsigned integer as signed 64-bit integer
Arrays and conditions
  • ALL() - Returns 1 if condition is true for all elements in the array
  • ANY() - Returns 1 if condition is true for any element in the array
  • CONTAINS() - Checks whether the (x,y) point is within the given polygon
  • IF() - Checks whether the 1st argument is equal to 0.0, returns the 2nd argument if it is not zero or the 3rd one when it is
  • IN() - Returns 1 if the first argument is equal to any of the other arguments, or 0 otherwise
  • INDEXOF() - Iterates through all elements in the array and returns index of the first matching element
  • INTERVAL() - Returns index of the argument that is less than the first argument
  • LENGTH() - Returns number of elements in MVA
  • REMAP() - Allows to make some exceptions of expression values depending on the condition values
Date and time
  • NOW() - Returns current timestamp as an INTEGER
  • SECOND() - Returns integer second from the timestamp argument
  • MINUTE() - Returns integer minute from the timestamp argument
  • HOUR() - Returns integer hour from the timestamp argument
  • DAY() - Returns integer day from the timestamp argument
  • MONTH() - Returns integer month from the timestamp argument
  • YEAR() - Returns integer year from the timestamp argument
  • YEARMONTH() - Returns integer year and month code from the timestamp argument
  • YEARMONTHDAY() - Returns integer year, month and day code from the timestamp argument
Geo-spatial
  • GEODIST() - Computes geosphere distance between two given points
  • GEOPOLY2D() - Creates a polygon that takes in account the Earth's curvature
  • POLY2D() - Creates a simple polygon in plain space
String
  • CONCAT() - Concatenates two or more strings
  • REGEX() - Returns 1 if regular expression matched to string of attribute and 0 otherwise
  • SNIPPET() - Highlights search results
  • SUBSTRING_INDEX() - Returns a substring of the string before the specified number of delimiter occurs
  • Other
  • LAST_INSERT_ID() - Returns ids of documents inserted or replaced by last statement in the current session

Common settings in configuration file

To be put to section common {} in configuration file:

[Indexer]

indexer is a tool to create plain indexes

Indexer settings in configuration file

To be put to section indexer {} in configuration file:

Indexer start parameters
indexer [OPTIONS] [indexname1 [indexname2 [...]]]
  • --all - Rebuilds all indexes from the config
  • --buildstops - Reviews the index source, as if it were indexing the data, and produces a list of the terms that are being indexed.
  • --buildfreqs - Adds the quantity present in the index for --buildstops
  • --config, -c - Path to configuration file
  • --dump-rows - Dumps rows fetched by SQL source(s) into the specified file
  • --help - Lists all the parameters
  • --keep-attrs - Allows to reuse existing attributes on reindexing
  • --keep-attrs-names - Allows to specify attributes to reuse from the existing index
  • --merge-dst-range - Runs the filter range given upon merging
  • --merge-killlists - Changes the way kill lists are processed when merging indexes
  • --merge - Merges two plain indexes into one
  • --nohup - Indexer won't send SIGHUP if this option is on
  • --noprogress - Prevents displaying progress details
  • --print-queries - Prints out SQL queries that indexer sends to the database
  • --print-rt - Outputs data fetched from sql source(s) as INSERTs to a real-time index
  • --quiet - Prevents displaying anything
  • --rotate - Forces indexes rotation after all the indexes are built
  • --sighup-each - Forces rotation of each index after it's built
  • -v - Shows indexer version

Index converter from Manticore v2 / Sphinx v2

index_converter is a tool for converting indexes created with Sphinx/Manticore Search 2.x to Manticore Search 3.x index format.

index_converter {--config /path/to/config|--path}
Index converter start parameters
  • --config, -c - Path to indexes configuration file
  • --index - Specifies which index should be converted
  • --path - Defines path containing index(es) instead of the configuration file
  • --strip-path - Strips path from filenames referenced by index
  • --large-docid - Allows to convert documents with ids larger than 2^63
  • --output-dir - Writes the new files in a chosen folder
  • --all - Converts all indexes from the configuration file / path
  • --killlist-target - Sets the target indexes for which kill-lists will be applied

Searchd

searchd is a Manticore server.

Searchd settings in a configuration file

To be put to section searchd {} in configuration file:

Searchd start parameters
searchd [OPTIONS]
  • --config, -c - Path to configuration file
  • --console - Forces running in console mode
  • --coredump - Enables saving core dump on crash
  • --cpustats - Enables CPU time reporting
  • --delete - Removes Manticore service from Microsoft Management Console and other places where the services are registered
  • --force-preread - Forbids the server to serve any incoming connection until pre-reading of the index files completes
  • --help, -h - Lists all the parameters
  • --index - Forces serving only the specified index
  • --install - Installs searchd as a service into Microsoft Management Console
  • --iostats - Enables input/output reporting
  • --listen, -l - Overrides listen from the configuration file
  • --logdebug, --logdebugv, --logdebugvv - Enables additional debug output in the server log
  • --logreplication - Enables additional replication debug output in the server log
  • --new-cluster - Bootstraps a replication cluster and makes the server a reference node with cluster restart protection
  • --new-cluster-force - Bootstraps a replication cluster and makes the server a reference node bypassing cluster restart protection
  • --nodetach - Leaves searchd in foreground
  • --ntservice - Passed by Microsoft Management Console to searchd to invoke it as a service on Windows platforms
  • --pidfile - Overrides pid_file from the configuration file
  • --port, p - Specifies port searchd should listen on disregarding the port specified in the configuration file
  • --replay-flags - Specifies extra binary log replay options
  • --servicename - Applies the given name to searchd when installing or deleting the service, as would appear in Microsoft Management Console
  • --status - Queries running search to return its status
  • --stop - Stops Manticore server
  • --stopwait - Stops Manticore server gracefully
  • --strip-path - Strips path names from all the file names referenced from the index
  • -v - shows version information

Indextool

Miscellaneous index maintenance functionality useful for troubleshooting.

indextool <command> [options]
Indextool start parameters

Used to dump miscellaneous debug information about the physical index

indextool <command> [options]
  • --config, -c - Path to configuration file
  • --quiet, -q - Keeps indextool quiet - it will not output banner, etc
  • --help, -h - Lists all the parameters
  • -v - Shows version information
  • Indextool - Verifies configuration file
  • --buildidf - Builds IDF file from one or several dictionary dumps
  • --build-infixes - Build infixes for an existing dict=keywords index
  • --dumpheader - Quickly dumps the provided index header file
  • --dumpconfig - Dumps index definition from the given index header file in almost compliant manticore.conf file format
  • --dumpheader - Dumps index header by index name with looking up the header path in the configuration file
  • --dumpdict - Dumps index dictionary
  • --dumpdocids - Dumps document IDs by index name
  • --dumphitlist - Dumps all occurrences of the given keyword/id in the given index
  • --fold - Tests tokenization based on index's settings
  • --htmlstrip - Filters STDIN using HTML stripper settings for the given index
  • --mergeidf - Merges several .idf files into a single one
  • --morph - Applies morphology to the given STDIN and prints the result to stdout
  • --check - Checks the index data files for consistency
  • --strip-path - Strips path names from all the file names referenced from the index
  • --rotate - Defines whether to check index waiting for rotation in --check
  • Indextool - Applies kill-lists for all indexes listed in the configuration file

Wordbreaker

Splits compound words into components.

wordbreaker [-dict path/to/dictionary_file] {split|test|bench}
Wordbreaker start parameters.

Spelldump

Used to extract contents of a dictionary file that uses ispell or MySpell format.

spelldump [options] <dictionary> <affix> [result] [locale-name]
  • dictionary - Dictionary's main file
  • affix - Dictionary's affix file
  • result - Specifies where the dictionary data should be output to
  • locale-name - Specifies the locale details to use