Miscellaneous tools

Ranker plugins let you implement a custom ranker that receives all the occurrences of the keywords matched in the document, and computes a WEIGHT() value. They can be called as follows:

SELECT id, attr1 FROM test WHERE match('hello') OPTION ranker=myranker('option1=1');

The call workflow is as follows:

XXX_init() gets called once per query per index, in the very beginning. A few query-wide options are passed to it through a SPH_RANKER_INIT structure, including the user options strings (in the example just above, "option1=1" is that string).
XXX_update() gets called multiple times per matched document, with every matched keyword occurrence passed as its parameter, a SPH_RANKER_HIT structure. The occurrences within each document are guaranteed to be passed in the order of ascending hit->hit_pos values.
XXX_finalize() gets called once per matched document, once there are no more keyword occurrences. It must return the WEIGHT() value. This is the only mandatory function.
XXX_deinit() gets called once per query, in the very end.

Token filter plugins

Last modified: July 22, 2020

Token filter plugins let you implement a custom tokenizer that makes tokens according to custom rules. There are two type:

Index-time tokenizer declared by index_token_filter in index settings
query-time tokenizer declared by token_filter OPTION directive

Token filters processing tokens after base tokenizer processed text at field or query and made tokens from it. In the text processing pipeline, the token filters will run after the base tokenizer processing occurs (which process the text from field or query and create tokens out of them).

Index-time tokenizer gets created by indexer on indexing source data into index or by RT index on processing INSERT or REPLACE statements.

Plugin is declared as library name:plugin name:optional string of settings. The init functions of the plugin can accept arbitrary settings that can be passed as a string in format option1=value1;option2=value2;...

Example:

index_token_filter = my_lib.so:email_process:field=email;split=.io

The call workflow for index-time token filter is as follows:

XXX_init() gets called right after indexer creates token filter with empty fields list then after indexer got index schema with actual fields list. It must return zero for successful initialization or error description otherwise.
XXX_begin_document gets called only for RT index INSERT/REPLACE for every document. It must return zero for successful call or error description otherwise. Using OPTION token_filter_options additional parameters/settings can be passed to the function.
```
INSERT INTO rt (id, title) VALUES (1, 'some text corp@space.io') OPTION token_filter_options='.io'
```
XXX_begin_field gets called once for each field prior to processing field with base tokenizer with field number as its parameter.
XXX_push_token gets called once for each new token produced by base tokenizer with source token as its parameter. It must return token, count of extra tokens made by token filter and delta position for token.
XXX_get_extra_token gets called multiple times in case XXX_push_token reports extra tokens. It must return token and delta position for that extra token.
XXX_end_field gets called once right after source tokens from current field get over.
XXX_deinit gets called in the very end of indexing.

The following functions are mandatory to be defined: XXX_begin_document and XXX_push_token and XXX_get_extra_token.

Query-time tokenizer gets created on search each time full-text invoked by every index involved.

The call workflow for query-time token filter is as follows:

XXX_init() gets called once per index prior to parsing query with parameters - max token length and string set by token_filter option
```
SELECT * FROM index WHERE MATCH ('test') OPTION token_filter='my_lib.so:query_email_process:io'
```
It must return zero for successful initialization or error description otherwise.
XXX_push_token() gets called once for each new token produced by base tokenizer with parameters: token produced by base tokenizer, pointer to raw token at source query string and raw token length. It must return token and delta position for token.
XXX_pre_morph() gets called once for token right before it got passed to morphology processor with reference to token and stopword flag. It might set stopword flag to mark token as stopword.
XXX_post_morph() gets called once for token after it processed by morphology processor with reference to token and stopword flag. It might set stopword flag to mark token as stopword. It must return flag non-zero value of which means to use token prior to morphology processing.
XXX_deinit() gets called in the very end of query processing.

Absence of any of the functions is tolerated.

Ranker plugins ️ Miscellaneous tools

Last modified: July 22, 2020

indextool is a helper tool used to dump miscellaneous information about a physical index. The general usage is:

indextool <command> [options]

Options effective for all commands:

--config <file> (-c <file> for short) overrides the built-in config file names.
--quiet (-q for short) keep indextool quiet - it will not output banner, etc.
--help (-h for short) lists all of the parameters that can be called in your particular build of indextool.
-v show version information of your particular build of indextool.

The commands are as follows:

--checkconfig just loads and verifies the config file to check if it's valid, without syntax errors.
--buildidf DICTFILE1 [DICTFILE2 ...] --out IDFILE build IDF file from one or several dictionary dumps. Additional parameter -skip-uniq will skip unique (df=1) words.
--build-infixes INDEXNAME build infixes for an existing dict=keywords index (upgrades .sph, .spi in place). You can use this option for legacy index files that already use dict=keywords, but now need to support infix searching too; updating the index files with indextool may prove easier or faster than regenerating them from scratch with indexer.
--dumpheader FILENAME.sph quickly dumps the provided index header file without touching any other index files or even the configuration file. The report provides a breakdown of all the index settings, in particular the entire attribute and field list.
--dumpconfig FILENAME.sph dumps the index definition from the given index header file in (almost) compliant sphinx.conf file format.
--dumpheader INDEXNAME dumps index header by index name with looking up the header path in the configuration file.
--dumpdict INDEXNAME dumps dictionary. Additional -stats switch will dump to dictionary the total number of documents. It is required for dictionary files that are used for creation of IDF files.
--dumpdocids INDEXNAME dumps document IDs by index name.
--dumphitlist INDEXNAME KEYWORD dumps all the hits (occurrences) of a given keyword in a given index, with keyword specified as text.
--dumphitlist INDEXNAME --wordid ID dumps all the hits (occurrences) of a given keyword in a given index, with keyword specified as internal numeric ID.
--fold INDEXNAME OPTFILE This options is useful too see how actually tokenizer proceeds input. You can feed indextool with text from file if specified or from stdin otherwise. The output will contain spaces instead of separators (accordingly to your charset_table settings) and lowercased letters in words.
--htmlstrip INDEXNAME filters stdin using HTML stripper settings for a given index, and prints the filtering results to stdout. Note that the settings will be taken from sphinx.conf, and not the index header.
--mergeidf NODE1.idf [NODE2.idf ...] --out GLOBAL.idf merge several .idf files into a single one. Additional parameter -skip-uniq will skip unique (df=1) words.
--morph INDEXNAME applies morphology to the given stdin and prints the result to stdout.
--check INDEXNAME checks the index data files for consistency errors that might be introduced either by bugs in indexer and/or hardware faults. --check also works on RT indexes, RAM and disk chunks.
--strip-path strips the path names from all the file names referenced from the index (stopwords, wordforms, exceptions, etc). This is useful for checking indexes built on another machine with possibly different path layouts.
--rotate works only with --check and defines whether to check index waiting for rotation, i.e. with .new extension. This is useful when you want to check your index before actually using it.
--apply-killlists loads and applies kill-lists for all indexes listed in the config file. Changes are saved in .SPM files. Kill-list files (.SPK) are deleted. This can be useful if you want to move applying indexes from server startup to indexing stage.

spelldump is used to extract contents of a dictionary file that uses ispell or MySpell format, which can help build word lists for wordforms - all of the possible forms are pre-built for you.

The general usage is:

spelldump [options] <dictionary> <affix> [result] [locale-name]

The two main parameters are the dictionary's main file and its affix file; usually these are named as [language-prefix].dict and [language-prefix].aff and will be available with most common Linux distributions, as well as various places online.

[result] specifies where the dictionary data should be output to, and [locale-name] additionally specifies the locale details you wish to use.

There is an additional option, -c [file], which specifies a file for case conversion details.

Examples of its usage are:

spelldump en.dict en.aff
spelldump ru.dict ru.aff ru.txt ru_RU.CP1251
spelldump ru.dict ru.aff ru.txt .1251

The results file will contain a list of all the words in the dictionary in alphabetical order, output in the format of a wordforms file, which you can use to customize for your specific circumstances. An example of the result file:

zone > zone
zoned > zoned
zoning > zoning

wordbreaker is used to split compound words, as usual in URLs, into its component words. For example, this tool can split "lordoftherings" into its four component words, or http://manofsteel.warnerbros.com into "man of steel warner bros". This helps searching, without requiring prefixes or infixes: searching for "sphinx" wouldn't match "sphinxsearch" but if you break the compound word and index the separate components, you'll get a match without the costs of prefix and infix larger index files.

Examples of its usage are:

echo manofsteel | bin/wordbreaker -dict dict.txt split
man of steel

The input stream will be separated in words using the -dict dictionary file. In no dictionary specified, wordbreaker looks in the working folder for a wordbreaker-dict.txt file. (The dictionary should match the language of the compound word.) The split command breaks words from the standard input, and outputs the result in the standard output. There are also test and bench commands that let you test the splitting quality and benchmark the splitting functionality.

Wordbreaker needs a dictionary to recognize individual substrings within a string. To differentiate between different guesses, it uses the relative frequency of each word in the dictionary: higher frequency means higher split probability. You can generate such a file using the indexer tool:

indexer --buildstops dict.txt 100000 --buildfreqs myindex -c /path/to/sphinx.conf

which will write the 100,000 most frequent words, along with their counts, from myindex into dict.txt. The output file is a text file, so you can edit it by hand, if need be, to add or remove words.

Token filter plugins ️ Changelog

Last modified: August 05, 2020

New Python, Javascript and Java clients are generally available now and are well documented in this manual.
automatic drop of a disk chunk of a real-time index. This optimization enables dropping a disk chunk automatically when OPTIMIZing a real-time index when the chunk is obviously not needed any more (all the documents are suppressed). Previously it still required merging, now the chunk can be just dropped instantly. The cutoff option is ignored, i.e. even if nothing is actually merged an obsoleted disk chunk gets removed. This is useful in case you maintain retention in your index and delete older documents. Now compacting such indexes will be faster.
standalone NOT as an option for SELECT

#453 New option indexer --ignore-non-plain is useful in case you run indexer --all and have not only plain indexes in the configuration file. Without --ignore-non-plain you'll get a warning and a respective exit code.
SHOW PLAN ... OPTION format=dot and EXPLAIN QUERY ... OPTION format=dot enable visualization of full-text query plan execution. Useful for understanding complex queries.

indexer --verbose is deprecated as it never added anything to the indexer output
For dumping watchdog's backtrace signal USR2 is now to be used instead of USR1

#449 Transactions Bug while using MariaDB Node.js Connector
#423 cyrillic char period call snippets retain mode don't highlight
#435 RTINDEX - GROUP N BY expression select = fatal crash
2b3b62bd searchd status shows Segmentation fault when in cluster
9dd25c19 'SHOW INDEX index.N SETTINGS' doesn't address chunks >9
#389 Bug that crashes Manticore
fba16617 Converter creates broken indexes
eecd61d8 stopword_step=0 vs CALL SNIPPETS()
ea6850e4 count distinct returns 0 at low max_matches on a local index
362f27db When using aggregation stored texts are not returned in hits

OPTIMIZE reduces disk chunks to a number of chunks ( default is 2* No. of cores) instead of a single one. The optimal number of chunks can be controlled by cutoff option.
NOT operator can be now used standalone. By default it is disabled since accidental single NOT queries can be slow. It can be enabled by setting new searchd directive not_terms_only_allowed to 0.
New setting max_threads_per_query sets how many threads a query can use. If the directive is not set, a query can use threads up to the value of threads. Per SELECT query the number of threads can be limited with OPTION threads=N overriding the global max_threads_per_query.
Percolate indexes can be now be imported with IMPORT TABLE.
HTTP API /search receives basic support for faceting/grouping by new query node aggs.

If no replication listen directive is declared, the engine will try to use ports after the defined 'sphinx' port, up to 200.
listen=...:sphinx needs to be explicit set for SphinxSE connections or SphinxAPI clients.
SHOW INDEX STATUS outputs new metrics: killed_documents, killed_rate, disk_mapped_doclists, disk_mapped_cached_doclists, disk_mapped_hitlists and disk_mapped_cached_hitlists.
SQL command status now outputs Queue\Threads and Tasks\Threads.

dist_threads is completely deprecated now, searchd will log a warning if the directive is still used.

The official Docker image is now based on Ubuntu 20.04 LTS

Besides the usual manticore package, you can also install Manticore Search by components:

manticore-server - provides searchd, config and service files
manticore-tools - provides auxiliary tools ( indexer, indextool etc.)
manticore-icudata - provides ICU data file for icu morphology usage
manticore-dev (DEB) or manticore-devel (RPM) - provides dev headers for UDFs

2a474dc1 Crash of daemon at grouper at RT index with different chunks
57a19e5a Fastpath for empty remote docs
07dd3f31 Expression stack frame detection runtime
08ae357c Matching above 32 fields at percolate indexes
16b9390f Replication listen ports range
5fa671af Show create table on pq
54d133b6 HTTPS port behavior
fdbbe524 Mixing docstore rows when replacing
afb53f64 Switch TFO unavailable message level to 'info'
59d94cef Crash on strcmp invalid use
04af0349 Adding index to cluster with system (stopwords) files
50148b4e Merge indexes with large dictionaries; RT optimize of large disk chunks
a2adf158 Indextool can dump meta from current version
69f6d5f7 Issue in group order in GROUP N
24d5d80f Explicit flush for SphinxSE after handshake
31c4d78a Avoid copy of huge descriptions when not necessary
2959e2ca Negative time in show threads
f0b35710 Token filter plugin vs zero position deltas
a49e5bc1 Change 'FAIL' to 'WARNING' on multiple hits

This release took so long, because we were working hard on changing multitasking mode from threads to coroutines. It makes configuration simpler and queries parallelization much more straightforward: Manticore just uses given number of threads (see new setting threads) and the new mode makes sure it's done in the most optimal way.
Changes in highlighting:
- any highlighting that works with several fields (highlight({},'field1, field2') or highlight in json queries) now applies limits per-field by default.
- any highlighting that works with plain text (highlight({}, string_attr) or snippet() now applies limits to the whole document.
- per-field limits can be switched to global limits by limits_per_field=0 option (1 by default).
- allow_empty is now 0 by default for highlighting via HTTP JSON.
The same port can now be used for http, https and binary API (to accept connections from a remote Manticore instance). listen = *:mysql is still required for connections via mysql protocol. Manticore now detects automatically the type of client trying to connect to it except for MySQL (due to restrictions of the protocol).

In RT mode a field can now be text and string attribute at the same time - GitHub issue #331.

In plain mode it's called sql_field_string. Now it's available in RT mode for real-time indexes too. You can use it as shown in the example:

create table t(f string attribute indexed);
insert into t values(0,'abc','abc');
select * from t where match('abc');
+---------------------+------+
| id                  | f    |
+---------------------+------+
| 2810845392541843463 | abc  |
+---------------------+------+
1 row in set (0.01 sec)
mysql> select * from t where f='abc';
+---------------------+------+
| id                  | f    |
+---------------------+------+
| 2810845392541843463 | abc  |
+---------------------+------+
1 row in set (0.00 sec)

You can now highlight string attributes.
SSL and compression support for SQL interface
Support of mysql client status command.
Replication can now replicate external files (stopwords, exceptions etc.).
Filter operator in is now available via HTTP JSON interface.
expressions in HTTP JSON.
You can now change rt_mem_limit on the fly in RT mode, i.e. can do ALTER ... rt_mem_limit=<new value>.
You can now use separate CJK charset tables: chinese, japanese and korean.
thread_stack now limits maximum thread stack, not initial.
Improved SHOW THREADS output.
Display progress of long CALL PQ in SHOW THREADS.
cpustat, iostat, coredump can be changed during runtime with SET.
SET [GLOBAL] wait_timeout=NUM implemented ,

Index format has been changed. Indexes built in 3.5.0 cannot be loaded by Manticore version < 3.5.0, but Manticore 3.5.0 understands older formats.
INSERT INTO PQ VALUES() (i.e. without providing column list) previously expected exactly (query, tags) as the values. It's been changed to (id,query,tags,filters). The id can be set to 0 if you want it to be auto-generated.
allow_empty=0 is a new default in highlighting via HTTP JSON interface.
Only absolute paths are allowed for external files (stopwords, exceptions etc.) in CREATE TABLE/ALTER TABLE.

ram_chunks_count was renamed to ram_chunk_segments_count in SHOW INDEX STATUS.
workers is obsolete. There's only one workers mode now.
dist_threads is obsolete. All queries are as much parallel as possible now (limited by threads and jobs_queue_size).
max_children is obsolete. Use threads to set the number of threads Manticore will use (set to the # of CPU cores by default).
queue_max_length is obsolete. Instead of that in case it's really needed use jobs_queue_size to fine-tune internal jobs queue size (unlimited by default).
All /json/* endpoints are now available w/o /json/, e.g. /search, /insert, /delete, /pq etc.

field meaning "full-text field" was renamed to "text" in describe.

3.4.2:

mysql> describe t;
+-------+--------+----------------+
| Field | Type   | Properties     |
+-------+--------+----------------+
| id    | bigint |                |
| f     | field  | indexed stored |
+-------+--------+----------------+

3.5.0:

mysql> describe t;
+-------+--------+----------------+
| Field | Type   | Properties     |
+-------+--------+----------------+
| id    | bigint |                |
| f     | text   | indexed stored |
+-------+--------+----------------+

Cyrillic и doesn't map to i in non_cjk charset_table (which is a default) as it affected Russian stemmers and lemmatizers too much.
read_timeout. Use network_timeout instead which controls both reading and writing.

Ubuntu Focal 20.04 official package
deb package name changed from manticore-bin to manticore

#351 searchd memory leak
ceabe44f Tiny read out of bounds in snippets
1c3e84a3 Dangerous write into local variable for crash queries
26e094ab Tiny memory leak of sorter in test 226
d2c7f86a Huge memory leak in test 226
0dd80122 Cluster shows the nodes are in sync, but count(*) shows different numbers
f1c1ac3f Cosmetic: Duplicate and sometimes lost warning messages in the log
f1c1ac3f Cosmetic: (null) index name in log
359dbd30 Cannot retrieve more than 70M results
19f328ee Can't insert PQ rules with no-columns syntax
bf685d5d Misleading error message when inserting a document to an index in a cluster
2cf18c83 /json/replace and json/update return id in exponent form
#324 Update json scalar properties and mva in the same query
d38409eb hitless_words doesn't work in RT mode
5813d639 ALTER RECONFIGURE in rt mode should be disallowed
5813d639 rt_mem_limit gets reset to 128M after searchd restart
highlight() sometimes hangs
7cd878f4 Failed to use U+code in RT mode
2b213de4 Failed to use wildcard at wordforms at RT mode
e9d07e68 Fixed SHOW CREATE TABLE vs multiple wordform files
fc90a84f JSON query without "query" crashes searchd
Manticore official docker couldn't index from mysql 8
23e05d32 HTTP /json/insert requires id
bd679af0 SHOW CREATE TABLE doesn't work for PQ
bd679af0 CREATE TABLE LIKE doesn't work properly for PQ
5eacf28f End of line in settings in show index status
cb153228 Empty title in "highlight" in HTTP JSON response
#318 CREATE TABLE LIKE infix error
9040d22c RT crashes under load
cd512c7d Lost crash log on crash at RT disk chunk
#323 Import table fails and closes the connection
6275316a ALTER reconfigure corrupts a PQ index
9c1d221e Searchd reload issues after change index type
71e2b5bb Daemon crashes on import table with missed files
#322 Crash on select using multiple indexes, group by and ranker = none
c3f58490 HIGHLIGHT() doesn't higlight in string attributes
#320 FACET fails to sort on string attribute
4f1a1f25 Error in case of missing data dir
04f4ddd4 access_* are not supported in RT mode
1c0616a2 Bad JSON objects in strings: 1. CALL PQ returns "Bad JSON objects in strings: 1" when the json is greater than some value.
32f943d6 RT-mode inconsistency. In some cases I can't drop the index since it's unknown and can't create it since the directory is not empty.
#319 Crash on select
22a28dd7 max_xmlpipe2_field = 2M returned warning on 2M field
#342 Query conditions execution bug
dd8dcab2 Simple 2 terms search finds a document containing only one term
90919e62 It was impossible in PQ to match a json with capital letters in keys
56da086a Indexer crashes on csv+docstore
#363 using [null] in json attr in centos 7 causes corrupted inserted data
Major #345 Records not being inserted, count() is random, "replace into" returns OK
max_query_time slows down SELECTs too much
#352 Master-agent communication fails on Mac OS
#328 Error when connecting to Manticore with Connector.Net/Mysql 8.0.19
daa760d2 Fixed escaping of \0 and optimized performance
9bc5c01a Fixed count distinct vs json
4f89a965 Fixed drop table at other node failed
952af5a5 Fix crashes on tightly running call pq

2ffe2d26 fix RT index from old version fails to index data

server works in 2 modes: rt-mode and plain-mode
- rt-mode requires data_dir and no index definition in config
- in plain-mode indexes are defined in config; no data_dir allowed
replication available only in rt-mode

charset_table defaults to non_cjk alias
in rt-mode full-text fields are indexed and stored by default
full-text fields in rt-mode renamed from 'field' to 'text'
ALTER RTINDEX is renamed to ALTER TABLE
TRUNCATE RTINDEX is renamed to TRUNCATE TABLE

stored-only fields
SHOW CREATE TABLE, IMPORT TABLE

much faster lockless PQ
/sql can execute any type of SQL statement in mode=raw
alias mysql for mysql41 protocol
default state.sql in data_dir

a5333644 fix crash on wrong field syntax in highlight()
7fbb9f2e fix crash of server on replicate RT index with docstore
24a04687 fix crash on highlight to index with infix or prefix option and to index wo stored fields enabled
3465c1ce fix false error about empty docstore and dock-id lookup for empty index
a707722e fix #314 SQL insert command with trailing semicolon
95628c9b removed warning on query word(s) mismatch
b8601b41 fix queries in snippets segmented via ICU
5275516c fix find/add race condition in docstore block cache
f06ef97a fix mem leak in docstore
a7258ba8 fix #316 LAST_INSERT_ID returns empty on INSERT
1ebd5bf8 fix #317 json/update HTTP endpoint to support array for MVA and object for JSON attribute
e426950a fix rash of indexer dumping rt without explicit id

Parallel Real-Time index searching
EXPLAIN QUERY command
configuration file without index definitions (alpha version)
CREATE/DROP TABLE commands (alpha version)
indexer --print-rt - can read from a source and print INSERTs for a Real-Time index

Updated to Snowball 2.0 stemmers
LIKE filter for SHOW INDEX STATUS
improved memory usage for high max_matches
SHOW INDEX STATUS adds ram_chunks_count for RT indexes
lockless PQ
changed LimitNOFILE to 65536

9c33aab8 added check of index schema for duplicate attributes #293
a0085f94 fix crash in hitless terms
68953740 fix loose docstore after ATTACH
d6f696ed fix docstore issue in distributed setup
bce2b7ec replace FixedHash with OpenHash in sorter
e0baf739 fix attributes with duplicated names at index definition
ca81114b fix html_strip in HIGHLIGHT()
493a5e91 fix passage macro in HIGHLIGHT()
a82d41c7 fix double buffer issues when RT index creates small or large disk chunk
a404c85d fix event deletion for kqueue
8bea0f6f fix save of disk chunk for large value of rt_mem_limit of RT index
8707f039 fix float overflow on indexing
a56434ce fix insert document with negative ID into RT index fails with error now
bbebfd75 fix crash of server on ranker fieldmask
3809cc1b fix crash on using query cache
dc2a585b fix crash on using RT index RAM segments with parallel inserts

Autoincrement ID for RT indexes
Highlight support for docstore via new HIGHLIGHT() function, available also in HTTP API
SNIPPET() can use special function QUERY() which returns current MATCH query
new field_separator option for highlighting functions.

lazy fetch of stored fields for remote nodes (can significantly increase performance)
strings and expressions don't break anymore multi-query and FACET optimizations
RHEL/CentOS 8 build now uses mysql libclient from mariadb-connector-c-devel
ICU data file is now shipped with the packages, icu_data_dir removed
systemd service files include 'Restart=on-failure' policy
indextool can now check real-time indexes online
default conf is now /etc/manticoresearch/manticore.conf
service on RHEL/CentOS renamed to 'manticore' from 'searchd'
removed query_mode and exact_phrase snippet's options

6ae474c7 fix crash on SELECT query over HTTP interface
59577513 fix RT index saves disk chunks but does not mark some documents deleted
e861f0fc fix crash on search of multi index or multi queries with dist_threads
440991fc fix crash on infix generation for long terms with wide utf8 codepoints
5fd599b4 fix race at adding socket to IOCP
cf10d7d3 fix issue of bool queries vs json select list
996de77f fix indextool check to report wrong skiplist offset, check of doc2row lookup
6e3fc9e8 fix indexer produces bad index with negative skiplist offset on large data
faed3220 fix JSON converts only numeric to string and JSON string to numeric conversion at expressions
53319720 fix indextool exit with error code in case multiple commands set at command line
795520ac fix #275 binlog invalid state on error no space left on disk
2284da5e fix #279 crash on IN filter to JSON attribute
ce2e4b47 fix #281 wrong pipe closing call
535589ba fix server hung at CALL PQ with recursive JSON attribute encoded as string
a5fc8a36 fix advancing beyond the end of the doclist in multiand node
a3628617 fix retrieving of thread public info
f8d2d7bb fix docstore cache locks

Document storage
new directives stored_fields, docstore_cache_size, docstore_block_size, docstore_compression, docstore_compression_level

improved SSL support
non_cjk built-in charset updated
disabled UPDATE/DELETE statements logging a SELECT in query log
RHEL/CentOS 8 packages

301a806b1 fix crash on replace document in disk chunk of RT index
46c1cad8f fix #269 LIMIT N OFFSET M
92a46edaa fix DELETE statements with id explicitly set or id list provided to skip search
8ca78c138 fix wrong index after event removed at netloop at windowspoll poller
603631e2b fix float roundup at JSON via HTTP
62f64cb9e fix remote snippets to check empty path first; fixing windows tests
aba274c2c fix reload of config to work on windows same way as on linux
6b8c4242e fix #194 PQ to work with morphology and stemmers
174d31290 fix RT retired segments management

Experimental SSL support for HTTP API
field filter for CALL KEYWORDS
max_matches for /json/search
automatic sizing of default Galera gcache.size
improved FreeBSD support

0a1a2c81 fixed replication of RT index into node where same RT index exists and has different path
4adc0752 fix flush rescheduling for indexes without activity
d6c00a6f improve rescheduling of flushing RT/PQ indexes
d0a7c959 fix #250 index_field_lengths index option for TSV and CSV piped sources
1266d548 fix indextool wrong report for block index check on empty index
553ca73c fix empty select list at Manticore SQL query log
56c85844 fix indexer -h/--help response

replication for RealTime indexes
ICU tokenizer for chinese
new morphology option icu_chinese
new directive icu_data_dir
multiple statements transactions for replication
LAST_INSERT_ID() and @session.last_insert_id
LIKE 'pattern' for SHOW VARIABLES
Multiple documents INSERT for percolate indexes
Added time parsers for config
internal task manager
mlock for doc and hit lists components
jail snippets path

RLP library support dropped in favor of ICU; all rlp* directives removed
updating document ID with UPDATE is disabled

f0472223 fix defects in concat and group_concat
b08147ee fix query uid at percolate index to be BIGINT attribute type
4cd85afa do not crash if failed to prealloc a new disk chunk
1a551227 add missing timestamp data type to ALTER
f3a8e096 fix crash of wrong mmap read
44757711 fix hash of clusters lock in replication
ff476df9 fix leak of providers in replication
58dcbb77 fix #246 undefined sigmask in indexer
3dd8278e fix race in netloop reporting
a02aae05 zero gap for HA strategies rebalancer

added mmap readers for docs and hit lists
/sql HTTP endpoint response is now the same as /json/search response
new directives access_plain_attrs, access_blob_attrs, access_doclists, access_hitlists
new directive server_id for replication setups

removed HTTP /search endpoint

ondisk_attrs, ondisk_attrs_default, mlock (replaced by [access]()* directives)

849c16e1 allow attribute names starting with numbers in select list
48e6c302 fixed MVAs in UDFs, fixed MVA aliasing
055586a9 fixed #187 crash when using query with SENTENCE
93bf52f2 fixed #143 support () around MATCH()
599ee79c fixed save of cluster state on ALTER cluster statement
230c321e fixed crash of server on ALTER index with blob attributes
5802b85a fixed #196 filtering by id
25d2dabd discard searching on template indexes
2a30d5b4 fixed id column to have regular bigint type at SQL reply

New index storage. Non-scalar attributes are not limited anymore to 4GB size per index
attr_update_reserve directive
String,JSON and MVAs can be updated using UPDATE
killlists are applied at index load time
killlist_target directive
multi AND searches speedup
better average performance and RAM usage
convert tool for upgrading indexes made with 2.x
CONCAT() function
JOIN CLUSTER cluster AT 'nodeaddress:port'
ALTER CLUSTER posts UPDATE nodes
node_address directive
list of nodes printed in SHOW STATUS

in case of indexes with killists, server doesn't rotate indexes in order defined in conf, but follows the chain of killlist targets
order of indexes in a search no longer defines the order in which killlists are applied
Document IDs are now signed big integers

docinfo (always extern now), inplace_docinfo_gap, mva_updates_pool

Galera replication for percolate indexes
OPTION morphology

Cmake minimum version is now 3.13. Compiling requires boost and libssl development libraries.

6967fedb fixed crash on many stars at select list for query into many distributed indexes
36df1a40 fixed #177 large packet via Manticore SQL interface
57932aec fixed #170 crash of server on RT optimize with MVA updated
edb24b87 fixed server crash on binlog removed due to RT index remove after config reload on SIGHUP
bd3e66e0 fixed mysql handshake auth plugin payloads
6a217f6e fixed #172 phrase_boundary settings at RT index
3562f652 fixed #168 deadlock at ATTACH index to itself
250b3f0e fixed binlog saves empty meta after server crash
4aa6c69a fixed crash of server due to string at sorter from RT index with disk chunks

SUBSTRING_INDEX()
SENTENCE and PARAGRAPH support for percolate queries
systemd generator for Debian/Ubuntu; also added LimitCORE to allow core dumping

84fe7405 fixed crash of server on match mode all and empty full text query
daa88b57 fixed crash on deleting of static string
22078537 fixed exit code when indextool failed with FATAL
0721696d fixed #109 no matches for prefixes due to wrong exact form check
8af81011 fixed #161 reload of config settings for RT indexes
e2d59277 fixed crash of server on access of large JSON string
75cd1342 fixed PQ field at JSON document altered by index stripper causes wrong match from sibling field
e2f77543 fixed crash of server at parse JSON on RHEL7 builds
3a25a580 fixed crash of json unescaping when slash is on the edge
be9f4978 fixed option 'skip_empty' to skip empty docs and not warn they're not valid json
266e0e7b fixed #140 output 8 digits on floats when 6 is not enough to be precise
3f6d2389 fixed empty jsonobj creation
f3c7848a fixed #160 empty mva outputs NULL instead of an empty string
0afa2ed0 fixed fail to build without pthread_getname_np
9405fccd fixed crash on server shutdown with thread_pool workers

Distributed indexes for percolate indexes
CALL PQ new options and changes:
- skip_bad_json
- mode (sparsed/sharded)
- json documents can be passed as a json array
- shift
- Column names 'UID', 'Documents', 'Query', 'Tags', 'Filters' were renamed to 'id', 'documents', 'query', 'tags', 'filters'
DESCRIBE pq TABLE
SELECT FROM pq WHERE UID is not possible any more, use 'id' instead
SELECT over pq indexes is on par with regular indexes (e.g. you can filter rules via REGEX())
ANY/ALL can be used on PQ tags
expressions have auto-conversion for JSON fields, not requiring explicit casting
built-in 'non_cjk' charset_table and 'cjk' ngram_chars
built-in stopwords collections for 50 languages
multiple files in a stopwords declaration can also be separated by comma
CALL PQ can accept JSON array of documents

a4e19af fixed csjon-related leak
28d8627 fixed crash because of missed value in json
bf4e9ea fixed save of empty meta for RT index
33b4573 fixed lost form flag (exact) for sequence of lemmatizer
6b95d48 fixed string attrs > 4M use saturate instead of overflow
621418b fixed crash of server on SIGHUP with disabled index
3f7e35d fixed server crash on simultaneous API session status commands
cd9e4f1 fixed crash of server at delete query to RT index with field filters
9376470 fixed crash of server at CALL PQ to distributed index with empty document
8868b20 fixed cut Manticore SQL error message larger 512 chars
de9deda fixed crash on save percolate index without binlog
2b219e1 fixed http interface is not working in OSX
e92c602 fixed indextool false error message on check of MVA
238bdea fixed write lock at FLUSH RTINDEX to not write lock whole index during save and on regular flush from rt_flush_period
c26a236 fixed ALTER percolate index stuck waiting search load
9ee5703 fixed max_children to use default amount of thread_pool workers for value of 0
5138fc0 fixed error on indexing of data into index with index_token_filter plugin along with stopwords and stopword_step=0
2add3d3 fixed crash with absent lemmatizer_base when still using aot lemmatizers in index definitions

REGEX function
limit/offset for json API search
profiler points for qcache

eb3c768 fixed crash of server on FACET with multiple attribute wide types
d915cf6 fixed implicit group by at main select list of FACET query
5c25dc2 fixed crash on query with GROUP N BY
85d30a2 fixed deadlock on handling crash at memory operations
85166b5 fixed indextool memory consumption during check
58fb031 fixed gmock include not needed anymore as upstream resolve itself

SHOW THREADS in case of remote distributed indexes prints the original query instead of API call
SHOW THREADS new option format=sphinxql prints all queries in SQL format
SHOW PROFILE prints additional clone_attrs stage

4f15571 fixed failed to build with libc without malloc_stats, malloc_trim
f974f20 fixed special symbols inside words for CALL KEYWORDS result set
0920832 fixed broken CALL KEYWORDS to distributed index via API or to remote agent
fd686bf fixed distributed index agent_query_timeout propagate to agents as max_query_time
4ffa623 fixed total documents counter at disk chunk got affected by OPTIMIZE command and breaks weight calculation
dcaf4e0 fixed multiple tail hits at RT index from blended
eee3817 fixed deadlock at rotation

sort_mode option for CALL KEYWORDS
DEBUG on VIP connection can perform 'crash ' for intentional SIGEGV action on server
DEBUG can perform 'malloc_stats' for dumping malloc stats in searchd.log 'malloc_trim' to perform a malloc_trim()
improved backtrace is gdb is present on the system

0f3cc33 fixed crash or hfailure of rename on Windows
1455ba2 fixed crashes of server on 32-bit systems
ad3710d fixed crash or hung of server on empty SNIPPET expression
b36d792 fixed broken non progressive optimize and fixed progressive optimize to not create kill-list for oldest disk chunk
34b0324 fixed queue_max_length bad reply for SQL and API at thread pool worker mode
ae4b320 fixed crash on adding full-scan query to PQ index with regexp or rlp options set
f80f8d5 fixed crash when call one PQ after another
9742f5f refactor AcquireAccum
39e5bc3 fixed leak of memory after call pq
21bcc6d cosmetic refactor (c++11 style c-trs, defaults, nullptrs)
2d69039 fixed memory leak on trying to insert duplicate into PQ index
5ed92c4 fixed crash on JSON field IN with large values
4a5262e fixed crash of server on CALL KEYWORDS statement to RT index with expansion limit set
552646b fixed invalid filter at PQ matches query;
204f521 introduce small obj allocator for ptr attrs
25453e5 refactor ISphFieldFilter to refcounted flavour
1366ee0 fixed ub/sigsegv when using strtod on non-terminated strings
94bc6fc fixed memory leak in json resultset processing
e78e9c9 fixed read over the end of mem block applying attribute add
fad572f fixed refactor CSphDict for refcount flavour
fd841a4 fixed leak of AOT internal type outside
5ee7f20 fixed memory leak tokenizer management
116c5f1 fixed memory leak in grouper
56fdbc9 special free/copy for dynamic ptrs in matches (memory leak grouper)
b1fc161 fixed memory leak of dynamic strings for RT
517b9e8 refactor grouper
b1fc161 minor refactor (c++11 c-trs, some reformats)
7034e07 refactor ISphMatchComparator to refcounted flavour
b1fc161 privatize cloner
efbc051 simplify native little-endian for MVA_UPSIZE, DOCINFO2ID_T, DOCINFOSETID
6da0df4 add valgrind support to to ubertests
1d17669 fixed crash because race of 'success' flag on connection
5a09c32 switch epoll to edge-triggered flavour
5d52868 fixed IN statement in expression with formatting like at filter
bd8b3c9 fixed crash at RT index on commit of document with large docid
ce656b8 fixed argless options in indextool
08c9507 fixed memory leak of expanded keyword
30c75a2 fixed memory leak of json grouper
6023f26 fixed leak of global user vars
7c138f1 fixed leakage of dynamic strings on early rejected matches
9154b18 fixed leakage on length()
43fca3a fixed memory leak because strdup() in parser
71ff777 fixed refactor expression parser to accurate follow refcounts

compatibility with MySQL 8 clients
TRUNCATE WITH RECONFIGURE
retired memory counter on SHOW STATUS for RT indexes
global cache of multi agents
improved IOCP on Windows
VIP connections for HTTP protocol
Manticore SQL DEBUG command which can run various subcommands
shutdown_token - SHA1 hash of password needed to invoke shutdown using DEBUG command
new stats to SHOW AGENT STATUS (_ping, _has_perspool, _need_resolve)
--verbose option of indexer now accept [debugvv] for printing debug messages

390082 removed wlock at optimize
4c3376 fixed wlock at reload index settings
b5ea8d fixed memory leak on query with JSON filter
930e83 fixed empty documents at PQ result set
53deec fixed confusion of tasks due to removed one
cad9b9 fixed wrong remote host counting
90008c fixed memory leak of parsed agent descriptors
978d83 fixed leak in search
019394 cosmetic changes on explicit/inline c-trs, override/final usage
943e29 fixed leak of json in local/remote schema
02dbdd fixed leak of json sorting col expr in local/remote schema
c74d0b fixed leak of const alias
6e5b57 fixed leak of preread thread
39c740 fixed stuck on exit because of stucked wait in netloop
adaf97 fixed stuck of 'ping' behaviour on change HA agent to usual host
32c40e separate gc for dashboard storage
511a3c fixed ref-counted ptr fix
32c40e fixed indextool crash on unexistent index
156edc fixed output name of exceeding attr/field in xmlpipe indexing
cdac6d fixed default indexer's value if no indexer section in config
e61ec0 fixed wrong embedded stopwords in disk chunk by RT index after server restart
5fba49 fixed skip phantom (already closed, but not finally deleted from the poller) connections
f22ae3 fixed blended (orphaned) network tasks
46890e fixed crash on read action after write
03f9df fixed searchd crashes when running tests on windows
e9255e fixed handle EINPROGRESS code on usual connect()
248b72 fixed connection timeouts when working with TFO

improved wildcards performance on matching multiple documents at PQ
support for fullscan queries at PQ
support for MVA attributes at PQ
regexp and RLP support for percolate indexes

688562 fixed loose of query string
0f1770 fixed empty info at SHOW THREADS statement
53faa3 fixed crash on matching with NOTNEAR operator
26029a fixed error message on bad filter to PQ delete

reduced number of syscalls to avoid Meltdown and Spectre patches impact
internal rewrite of local index management
remote snippets refactor
full configuration reload
all node connections are now independent
proto improvements
Windows communication switched from wsapoll to IO completion ports
TFO can be used for communication between master and nodes
SHOW STATUS now outputs to server version and mysql_version_string
added docs_id option for documents called in CALL PQ.
percolate queries filter can now contain expressions
distributed indexes can work with FEDERATED
dummy SHOW NAMES COLLATE and SET wait_timeout (for better ProxySQL compatibility)

5bcff0 fixed added not equal to tags of PQ
9ebc58 fixed added document id field to JSON document CALL PQ statement
8ae0e5 fixed flush statement handlers to PQ index
c24b15 fixed PQ filtering on JSON and string attributes
1b8bdd fixed parsing of empty JSON string
1ad8a0 fixed crash at multi-query with OR filters
69b898 fixed indextool to use config common section (lemmatizer_base option) for commands (dumpheader)
6dbeaf fixed empty string at result set and filter
39c4eb fixed negative document id values
266b70 fixed word clip length for very long words indexed
47823b fixed matching multiple documents of wildcard queries at PQ

MySQL FEDERATED engine support
MySQL packets return now SERVER_STATUS_AUTOCOMMIT flag, adds compatibility with ProxySQL
listen_tfo - enable TCP Fast Open connections for all listeners
indexer --dumpheader can dump also RT header from .meta file
cmake build script for Ubuntu Bionic

355b116 fixed invalid query cache entries for RT index;
546e229 fixed index settings got lost next after seamless rotation
0c45098 fixed fixed infix vs prefix length set; added warning on unsupportedinfix length
80542fa fixed RT indexes auto-flush order
705d8c5 fixed result set schema issues for index with multiple attributes and queries to multiple indexes
b0ba932 fixed some hits got lost at batch insert with document duplicates
4510fa4 fixed optimize failed to merge disk chunks of RT index with large documents count

jemalloc at compilation. If jemalloc is present on system, it can be enabled with cmake flag -DUSE_JEMALLOC=1

85a6d7e fixed log expand_keywords option into Manticore SQL query log
caaa384 fixed HTTP interface to correctly process query with large size
e386d84 fixed crash of server on DELETE to RT index with index_field_lengths enable
cd538f3 fixed cpustats searchd cli option to work with unsupported systems
8740fd6 fixed utf8 substring matching with min lengths defined

improved Percolate Queries performance in case of using NOT operator and for batched documents.
percolate_query_call can use multiple threads depending on dist_threads
new full-text matching operator NOTNEAR/N
LIMIT for SELECT on percolate indexes
expand_keywords can accept 'start','exact' (where 'star,exact' has same effect as '1')
ranged-main-query for joined fields which uses the ranged query defined by sql_query_range

72dcf66 fixed crash on searching ram segments; deadlock on save disk chunk with double buffer; deadlock on save disk chunk during optimize
3613714 fixed indexer crash on xml embedded schema with empty attribute name
48d7e80 fixed erroneous unlinking of not-owned pid-file
a5563a4 fixed orphaned fifos sometimes left in temp folder
2376e8f fixed empty FACET result set with wrong NULL row
4842b67 fixed broken index lock when running server as windows service
be35fee fixed wrong iconv libs on mac os
83744a9 fixed wrong count(*)

agent_retry_count in case of agents with mirrors gives the value of retries per mirror instead of per agent, the total retries per agent being agent_retry_count*mirrors.
agent_retry_count can now be specified per index, overriding global value. An alias mirror_retry_count is added.
a retry_count can be specified in agent definition and the value represents retries per agent
Percolate Queries are now in HTTP JSON API at /json/pq.
Added -h and -v options (help and version) to executables
morphology_skip_fields support for Real-Time indexes

a40b079 fixed ranged-main-query to correctly work with sql_range_step when used at MVA field
f2f5375 fixed issue with blackhole system loop hung and blackhole agents seems disconnected
84e1f54 fixed query id to be consistent, fixed duplicated id for stored queries
1948423 fixed server crash on shutdown from various states
9a706b 3495fd7 timeouts on long queries
3359bcd8 refactored master-agent network polling on kqueue-based systems (Mac OS X, BSD).

HTTP JSON: JSON queries can now do equality on attributes, MVA and JSON attributes can be used in inserts and updates, updates and deletes via JSON API can be performed on distributed indexes
Percolate Queries
Removed support for 32-bit docids from the code. Also removed all the code that converts/loads legacy indexes with 32-bit docids.
Morphology only for certain fields . A new index directive morphology_skip_fields allows defining a list of fields for which morphology does not apply.
expand_keywords can now be a query runtime directive set using the OPTION statement

0cfae4c fixed crash on debug build of server (and m.b. UB on release) when built with rlp
324291e fixed RT index optimize with progressive option enabled that merges kill-lists with wrong order
ac0efee minor crash on mac
lots of minor fixes after thorough static code analysis
other minor bugfixes

In this release we've changed internal protocol used by masters and agents to speak with each other. In case you run Manticoresearch in a distributed environment with multiple instances make sure your first upgrade agents, then the masters.

JSON queries on HTTP API protocol. Supported search, insert, update, delete, replace operations. Data manipulation commands can be also bulked, also there are some limitations currently as MVA and JSON attributes can't be used for inserts, replaces or updates.
RELOAD INDEXES command
FLUSH LOGS command
SHOW THREADS can show progress of optimize, rotation or flushes.
GROUP N BY work correctly with MVA attributes
blackhole agents are run on separate thread to not affect master query anymore
implemented reference count on indexes, to avoid stalls caused by rotations and high load
SHA1 hashing implemented, not exposed yet externally
fixes for compiling on FreeBSD, macOS and Alpine

989752b filter regression with block index
b1c3864 rename PAGE_SIZE -> ARENA_PAGE_SIZE for compatibility with musl
f2133cc disable googletests for cmake < 3.1.0
f30ec53 failed to bind socket on server restart
0807240 fixed crash of server on shutdown
3e3acc3 fixed show threads for system blackhole thread
262c3fe Refactored config check of iconv, fixes building on FreeBSD and Darwin

OR operator in WHERE clause between attribute filters
Maintenance mode ( SET MAINTENANCE=1)
CALL KEYWORDS available on distributed indexes
Grouping in UTC
query_log_mode for custom log files permissions
Field weights can be zero or negative
max_query_time can now affect full-scans
added net_wait_tm, net_throttle_accept and net_throttle_action for network thread fine tuning (in case of workers=thread_pool)
COUNT DISTINCT works with facet searches
IN can be used with JSON float arrays
multi-query optimization is not broken anymore by integer/float expressions
SHOW META shows a multiplier row when multi-query optimization is used

Manticore Search is built using cmake and the minimum gcc version required for compiling is 4.7.2.

Manticore Search runs under manticore user.
Default data folder is now /var/lib/manticore/.
Default log folder is now /var/log/manticore/.
Default pid folder is now /var/run/manticore/.

a58c619 fixed SHOW COLLATION statement that breaks java connector
631cf4e fixed crashes on processing distributed indexes; added locks to distributed index hash; removed move and copy operators from agent
942bec0 fixed crashes on processing distributed indexes due to parallel reconnects
e5c1ed2 fixed crash at crash handler on store query to server log
4a4bda5 fixed a crash with pooled attributes in multiqueries
3873bfb fixed reduced core size by prevent index pages got included into core file
11e6254 fixed searchd crashes on startup when invalid agents are specified
4ca6350 fixed indexer reports error in sql_query_killlist query
123a9f0 fixed fold_lemmas=1 vs hit count
cb99164 fixed inconsistent behavior of html_strip
e406761 fixed optimize rt index loose new settings; fixed optimize with sync option lock leaks;
86aeb82 fixed processing erroneous multiqueries
2645230 fixed result set depends on multi-query order
72395d9 fixed server crash on multi-query with bad query
f353326 fixed shared to exclusive lock
3754785 fixed server crash for query without indexes
29f360e fixed dead lock of server

Manticore branding

️ Miscellaneous tools Reporting bugs

Last modified: December 10, 2020

Ranker plugins

Token filter plugins

Index-time tokenizer

query-time token filter

indextool

spelldump

wordbreaker

Changelog

Version 3.5.4, Dec 10 2020

New Features

Minor Changes

Deprecations

Bugfixes

Version 3.5.2, Oct 1 2020

New features

Minor changes

Deprecations:

Docker

Packaging

Bugifixes

Version 3.5.0, 22 Jul 2020

Major new features:

Minor changes

Breaking changes:

Deprecations:

Packages

Bugfixes:

Version 3.4.2, 10 April 2020

Critical bugfixes

Version 3.4.0, 26 March 2020

Major changes

Minor changes

Features

Improvements

Bugfixes

Version 3.3.0, 4 February 2020

Features

Improvements

Bugfixes

Version 3.2.2, 19 December 2019

Features

Improvements and changes

Bugfixes

Version 3.2.0, 17 October 2019

Features

Improvements and changes

Bugfixes

Version 3.1.2, 22 August 2019

Features and Improvements

Bugfixes

Version 3.1.0, 16 July 2019

Features and Improvements

Removals

Bugfixes

Version 3.0.2, 31 May 2019

Improvements

Removals

Deprecations

Bugfixes

Version 3.0.0, 6 May 2019

Features and improvements

Behaviour changes

Removed directives

Version 2.8.2 GA, 2 April 2019

Features and improvements

Compiling notes

Bugfixes

Version 2.8.1 GA, 6 March 2019

Features and improvements

Bugfixes

Version 2.8.0 GA, 28 January 2019

Improvements

Bugfixes

Version 2.7.5 GA, 4 December 2018

Improvements

Bugfixes

Version 2.7.4 GA, 1 November 2018

Improvements

Bugfixes