• Downloads
  • Services
  • Resources

    • Documentation
    • Cheatsheets
    • Courses
    • Free Config Review
    • Blog
    • Forum
  • About

    • Manticore Search
    • Clients
  • Contact
  • ☝ Introduction
  • ❗ Read this first
  • ✔ ️Installation
    • Docker
    • RedHat and Centos
    • Debian and Ubuntu
    • MacOS
    • Windows
    • Compiling from sources
    • Migration from Sphinx
  • ⚡ Quick start guide
  • ✔ ️Starting the server
    • In Linux
    • Manually
    • In Docker
    • In Windows
    • In MacOS
  • ▪️ Creating an index
    • Data types
    • Creating a local index
      • ✔ ️Real-time index
      • Plain index
      • Plain and real-time index settings
      • Percolate index
      • Template index
    • NLP and tokenization
      • Data tokenization
      • Supported languages
      • CJK
      • Low-level tokenization
      • Wildcard searching settings
      • Ignoring stop words
      • Word forms
      • Exceptions
      • Morphology
      • Advanced HTML tokenization
    • Creating a distributed index
      • Creating a local distributed index
      • Remote indexes
  • ▪️ Listing indexes
  • ▪️ Deleting an index
  • ▪️ Emptying an index
  • ▪️ Creating a cluster
    • Adding a new node
    • Remote nodes
      • Mirroring
      • Load balancing
    • Setting up replication
      • Creating a replication cluster
      • Joining a replication cluster
      • Deleting a replication cluster
      • Adding and removing an index from a replication cluster
      • Managing replication nodes
      • Replication cluster status
      • Restarting a cluster
      • Cluster recovery
  • ✔ Connecting to the server
    • MySQL protocol
    • HTTP
  • ▪️ Adding documents to an index
    • ✔ ️Adding documents to a real-time index
    • Adding rules to a percolate index
  • ▪️ Adding data from external storages
    • Plain indexes creation
    • Fetching from databases
      • Introduction
      • Database connection
      • Execution of fetch queries
      • Indexing fetched data
      • Ranged queries
    • Fetching from XML stream
    • Fetching from CSV,TSV
    • Main+delta schema
    • Adding data from indexes
      • Merging indexes
      • Killlists in plain indexes
      • Attaching a plain index to RT index
      • Importing RT index
    • Rotating an index
  • ✔ ️Updating documents
    • REPLACE vs UPDATE
    • REPLACE
    • UPDATE
  • ▪️ Deleting documents
  • ▪️ Transactions
  • ✔ ️Searching
    • Full-text matching
      • Basic usage
      • Operators
      • Escaping
      • Search profiling
      • Boolean optimization
    • Search results
    • Filters
    • Expressions
    • Search options
    • Highlighting
    • Sorting and ranking
    • Pagination
    • Distributed searching
    • Multi-queries
    • Sub-selects
    • Grouping
    • Faceted search
    • Geo search
    • Percolate query
    • Autocomplete
    • Spell correction
    • Query cache
    • Collations
  • ▪️ Updating index schema
  • ▪️ Functions
    • Mathematical functions
    • Searching and ranking functions
    • Type casting functions
    • Functions to handle arrays and conditions
    • Date and time functions
    • Geo-spatial functions
    • String functions
    • Other functions
  • ▪️ Securing and compacting an index
    • Few words about RT index structure
    • Flushing RAM chunk to a new disk chunk
    • Flushing RT index to disk
    • Compacting an index
    • Flushing attributes
    • Flushing hostnames
  • ▪️ Security
    • SSL
  • ▪️ Logging
    • Query logging
    • Server logging
    • Binary logging
    • Docker logging
    • Rotating query and server logs
  • ▪️ Profiling and monitoring
    • Node status
    • SHOW META
    • SHOW THREADS
    • SHOW WARNINGS
    • SHOW VARIABLES
    • SHOW COLLATION
    • Profiling
      • Query profiling
      • Query plan
    • Index settings and status
      • SHOW INDEX STATUS
      • SHOW INDEX SETTINGS
  • ▪️ Server settings
    • Searchd
    • Common
    • Special suffixes
    • Scripted configuration
    • Comments
    • Inheritance of index and source declarations
    • Setting variables online
  • ▪️ Extensions
    • SphinxSE
    • FEDERATED
    • UDFs and Plugins
      • Listing plugins
      • UDF
        • Creating a function
        • Deleting a function
      • Plugins
        • Creating a plugin
        • Deleting a plugin
        • Reloading plugins
        • Ranker plugins
        • Token filter plugins
  • ▪️ Miscellaneous tools
  • ▪️ Changelog
  • 🐞 Reporting bugs
  • 📖 References

Collations

Collations essentially affect the string attribute comparisons. They specify both the character set encoding and the strategy that Manticore uses to compare strings when doing ORDER BY or GROUP BY with a string attribute involved.

String attributes are stored as is when indexing, and no character set or language information is attached to them. That's okay as long as Manticore only needs to store and return the strings to the calling application verbatim. But when you ask Manticore to sort by a string value, that request immediately becomes quite ambiguous.

First, single-byte (ASCII, or ISO-8859-1, or Windows-1251) strings need to be processed differently that the UTF-8 ones that may encode every character with a variable number of bytes. So we need to know what is the character set type to interpret the raw bytes as meaningful characters properly.

Second, we additionally need to know the language-specific string sorting rules. For instance, when sorting according to US rules in en_US locale, the accented character ï (small letter i with diaeresis) should be placed somewhere after z. However, when sorting with French rules and fr_FR locale in mind, it should be placed between i and j. And some other set of rules might choose to ignore accents at all, allowing ï and i to be mixed arbitrarily.

Third, but not least, we might need case-sensitive sorting in some scenarios and case-insensitive sorting in some others.

Collations combine all of the above: the character set, the language rules, and the case sensitivity. Manticore currently provides the following four collations.

  1. libc_ci
  2. libc_cs
  3. utf8_general_ci
  4. binary

The first two collations rely on several standard C library (libc) calls and can thus support any locale that is installed on your system. They provide case-insensitive (_ci) and case-sensitive (_cs) comparisons respectively. By default they will use C locale, effectively resorting to bytewise comparisons. To change that, you need to specify a different available locale using collation_libc_locale directive. The list of locales available on your system can usually be obtained with the locale command:

$ locale -a
C
en_AG
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_NG
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZW.utf8
es_ES
fr_FR
POSIX
ru_RU.utf8
ru_UA.utf8

The specific list of the system locales may vary. Consult your OS documentation to install additional needed locales.

utf8_general_ci and binary locales are built-in into Manticore. The first one is a generic collation for UTF-8 data (without any so-called language tailoring); it should behave similar to utf8_general_ci collation in MySQL. The second one is a simple bytewise comparison.

Collation can be overridden via SQL on a per-session basis using SET collation_connection statement. All subsequent SQL queries will use this collation. Otherwise all queries will use the server default collation or as specified in collation_server configuration directive. Manticore currently defaults to libc_ci collation.

Collations affect all string attribute comparisons, including those within ORDER BY and GROUP BY, so differently ordered or grouped results can be returned depending on the collation chosen. Note that collations don't affect full-text searching, for that use charset_table

️ Updating index schema

Updating index schema

Updating index schema in RT mode

ALTER TABLE index ADD COLUMN column_name [{INTEGER|INT|BIGINT|FLOAT|BOOL|MULTI|MULTI64|JSON|STRING|TIMESTAMP|TEXT [INDEXED [ATTRIBUTE]]}] [engine='columnar']

ALTER TABLE index DROP COLUMN column_name

It supports adding one field at a time for RT indexes. Supported data types are:

  • int - integer attribute
  • timestamp - timestamp attribute
  • bigint - big integer attribute
  • float - float attribute
  • bool - boolean attribute
  • multi - multi-valued integer attribute
  • multi64 - multi-valued bigint attribute
  • json - json attribute
  • string / text attribute / string attribute - string attribute
  • text / text indexed stored / string indexed stored - full-text indexed field with original value stored in docstore
  • text indexed / string indexed - full-text indexed field, indexed only (the original value is not stored in docstore)
  • text indexed attribute / string indexed attribute - full text indexed field + string attribute (not storing the original value in docstore)
  • text stored / string stored - the value will be only stored in docstore, not full-text indexed, not a string attribute
  • adding engine='columnar' to any attribute (except for json) will make it stored in the columnar storage

Important notes:

  • Querying an index is impossible while a column is being added.
  • Newly created attribute's values are set to 0.
  • ALTER will not work for distributed indexes and indexes without any attributes.
  • DROP COLUMN will fail if an index has only one field.
  • When dropping a field which is both a full-text field and a string attribute the first ALTER DROP drops the attribute, the second one drops the full-text field.
  • Adding/dropping full-text field is only supported in RT mode.
‹›
  • Example
Example
📋
⚙

mysql> desc rt;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| group_id   | uint      |
| date_added | timestamp |
+------------+-----------+

mysql> alter table rt add column test integer;

mysql> desc rt;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| group_id   | uint      |
| date_added | timestamp |
| test       | uint      |
+------------+-----------+

mysql> alter table rt drop column group_id;

mysql> desc rt;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| date_added | timestamp |
| test       | uint      |
+------------+-----------+

mysql> alter table rt add column title text indexed;

mysql> desc rt;
+------------+-----------+------------+
| Field      | Type      | Properties |
+------------+-----------+------------+
| id         | bigint    |            |
| text       | text      | indexed    |
| title      | text      | indexed    |
| date_added | timestamp |            |
| test       | uint      |            |
+------------+-----------+------------+

mysql> alter table rt add column title text attribute;

mysql> desc rt;
+------------+-----------+------------+
| Field      | Type      | Properties |
+------------+-----------+------------+
| id         | bigint    |            |
| text       | text      | indexed    |
| title      | text      | indexed    |
| date_added | timestamp |            |
| test       | uint      |            |
| title      | string    |            |
+------------+-----------+------------+

mysql> alter table rt drop column title;

mysql> desc rt;
+------------+-----------+------------+
| Field      | Type      | Properties |
+------------+-----------+------------+
| id         | bigint    |            |
| text       | text      | indexed    |
| title      | text      | indexed    |
| date_added | timestamp |            |
| test       | uint      |            |
+------------+-----------+------------+
mysql> alter table rt drop column title;

mysql> desc rt;
+------------+-----------+------------+
| Field      | Type      | Properties |
+------------+-----------+------------+
| id         | bigint    |            |
| text       | text      | indexed    |
| date_added | timestamp |            |
| test       | uint      |            |
+------------+-----------+------------+

Updating index FT settings in plain mode

ALTER RTINDEX index RECONFIGURE

ALTER can also reconfigure an RT index in plain mode, so that new tokenization, morphology and other text processing settings from the configuration file take effect on the newly INSERT-ed rows, while retaining the existing rows as they were. Internally, it forcibly saves the current RAM chunk as a new disk chunk and adjusts the index header, so that the new rows are tokenized using the new rules.

‹›
  • Example
Example
📋
⚙
mysql> show index rt settings;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| settings      |       |
+---------------+-------+
1 row in set (0.00 sec)

mysql> alter rtindex rt reconfigure;
Query OK, 0 rows affected (0.00 sec)

mysql> show index rt settings;
+---------------+----------------------+
| Variable_name | Value                |
+---------------+----------------------+
| settings      | morphology = stem_en |
+---------------+----------------------+
1 row in set (0.00 sec)
Collations ️ Functions

Functions

️ Updating index schema Mathematical functions

Mathematical functions

ABS()

Returns the absolute value of the argument.

ATAN2()

Returns the arctangent function of two arguments, expressed in radians.

BITDOT()

BITDOT(mask, w0, w1, ...) returns the sum of products of an each bit of a mask multiplied with its weight. bit0*w0 + bit1*w1 + ...

CEIL()

Returns the smallest integer value greater or equal to the argument.

COS()

Returns the cosine of the argument.

CRC32()

Returns the CRC32 value of a string argument.

EXP()

Returns the exponent of the argument (e=2.718... to the power of the argument).

FIBONACCI()

Returns the N-th Fibonacci number, where N is the integer argument. That is, arguments of 0 and up will generate the values 0, 1, 1, 2, 3, 5, 8, 13 and so on. Note that the computations are done using 32-bit integer math and thus numbers 48th and up will be returned modulo 2\^32.

FLOOR()

Returns the largest integer value lesser or equal to the argument.

GREATEST()

GREATEST(attr_json.some_array) function takes JSON array as the argument, and returns the greatest value in that array. Also works for MVA.

IDIV()

Returns the result of an integer division of the first argument by the second argument. Both arguments must be of an integer type.

LEAST()

LEAST(attr_json.some_array) function takes JSON array as the argument, and returns the least value in that array. Also works for MVA.

LN()

Returns the natural logarithm of the argument (with the base of e=2.718...).

LOG10()

Returns the common logarithm of the argument (with the base of 10).

LOG2()

Returns the binary logarithm of the argument (with the base of 2).

MAX()

Returns the bigger of two arguments.

MIN()

Returns the smaller of two arguments.

POW()

Returns the first argument raised to the power of the second argument.

RAND()

RAND(seed) function returns a random float between 0..1. Optionally can accept seed which can be:

  • constant integer
  • or integer attribute's name

If you use the seed take into account that it resets rand()'s starting point separately for each plain index / RT disk / RAM chunk / pseudo shard, so queries to a distributed index in any form can return multiple identical random values.

SIN()

Returns the sine of the argument.

SQRT()

Returns the square root of the argument.

️ Functions Searching and ranking functions