Special suffixes

Manticore search recognizes and parses special suffixes which makes easier to use numeric values with special meaning. Common form for them is integer number+literal, like 10k or 100d, but not 40.3s(since 40.3 is not integer), or not 2d 4h (since there are two, not one value). Literals are case-insensitive, so 10W is the same as 10w. There are 2 types of such suffixes currently supported:

  • Size suffixes - can be used in parameters that define size of something (memory buffer, disk file, limit of RAM, etc. ) in bytes. "Naked" numbers in that places mean literally size in bytes (octets). Size values take suffix k for kilobytes (1k=1024), m for megabytes (1m=1024k), g for gigabytes (1g=1024m) and t for terabytes (1t=1024g).
  • Time suffixes - can be used in parameters defining some time interval values like delays, timeouts, etc. "Naked" values for those parameters usually have documented scale, and you must know if their numbers, say, 100, means '100 seconds' or '100 milliseconds'. However instead of guessing you just can write suffixed value and it will be fully determined by its suffix. Time values take suffix us for useconds (microseconds), ms for milliseconds, s for seconds, m for minutes, h for hours, d for days and w for weeks.

Warning

Giga-, and especially tera- size suffixes are not very usable right now, since most of the sizes inside are limited by 2Gb (or, being precise, 2Gb - 1 byte), and for the moment only rt_mem_limit, attr_update_reserve from index config, and qcache_max_bytes from searchd config accepts 64-bit values which may exceed 2Gb.

Scripted configuration

Manticore configuration supports shebang syntax, meaning that the configuration can be written in a programming language and interpreted at loading, allowing dynamic settings.

For example, indexes can be generated by querying a database table, various settings can be modified depending on external factors or external files can be included (which contain indexes and/sources).

The configuration file is parsed by declared declared interpreter and the output is used as the actual configuration. This is happening each time the configuration is read (not only at searchd startup).

This facility is not available on Windows platform.

In the following example, we are using PHP to create multiple indexes with different name and we also scan a specific folder for file containing extra declarations of indexes.

#!/usr/bin/php
...
<?php for ($i=1; $i<=6; $i++) { ?>
index test_<?=$i?> {
  type = rt
  path = /var/lib/manticore/data/test_<?=$i?>
  rt_field = subject
  ...
 }
 <?php } ?>
 ...

 <?php
 $confd_folder='/etc/manticore.conf.d/';
 $files = scandir($confd_folder);
 foreach($files as $file)
 {
         if(($file == '.') || ($file =='..'))
         {} else {
                 $fp = new SplFileInfo($confd_folder.$file);
                 if('conf' == $fp->getExtension()){
                         include ($confd_folder.$file);
                 }
         }
 }
 ?>

Comments

The configuration file supports comments, with # character used as start comment section. The comment character can be present at the start of the line or inline.

Extra care should be considered when using # in character tokenization settings as everything after it will not be taken into consideration. To avoid this, use # UTF-8 which is U+23.

# can also be escaped using \. Escaping is required if # is present in database credential in source declarations.

Inheritance of index and source declarations

Both index and source declarations support inheritance. It allows a better organization of indexes having similar settings or structure and reduces the size of the configuration.

For a parent index/source nothing needs to be specified.

For the child index/source the declaration will contain the index/source name followed by : and the parent name.

index parent {
path = /var/lib/manticore/parent
...
}

index child:parent {
path = /var/lib/manticore/child
...
}

The child will inherit the entire configuration of the parent. In the child declaration any setting declared will overwrite the inherited values. Please note that in case of multi-value settings, defining a single value in child will clear out all inherited values. For example in the parent there are several sql_query_pre declaration and the child has a single sql_query_pre declaration, all the sql_query_pre inherited declarations are cleared. If you need to override some of the inherited values from parent, they need to be explicitly declared in the child. This is also available if you don't need a value from parent. For example if the value of sql_query_pre from parent is not needed, then in the child you can declare the directive with an empty value like sql_query_pre=. This also means that existing values of a multi-value setting will not be copied if the child declares one value for that setting. The inheritance behavior applies to fields and attributes and not just index options. If, for example, the parent has 2 integer attributes and the child needs a new integer attribute, the integer attributes declaration from parent must be copied in the child configuration.

Setting variables online

SET

SET [GLOBAL] server_variable_name = value
SET [INDEX index_name] GLOBAL @user_variable_name = (int_val1 [, int_val2, ...])
SET NAMES value [COLLATE value]
SET @@dummy_variable = ignored_value

SET statement modifies a variable value. The variable names are case-insensitive. No variable value changes survive server restart.

SET NAMES statement and SET @@variable_name syntax, both introduced do nothing. They were implemented to maintain compatibility with 3rd party MySQL client libraries, connectors, and frameworks that may need to run this statement when connecting.

There are the following classes of the variables:

  1. per-session server variable
  2. global server variable
  3. global user variable
  4. global distributed variable

Global user variables are shared between concurrent sessions. Currently, the only supported value type is the list of BIGINTs, and these variables can only be used along with IN() for filtering purpose. The intended usage scenario is uploading huge lists of values to searchd (once) and reusing them (many times) later, saving on network overheads. Global user variables might be either transferred to all agents of distributed index or set locally in case of local index defined at distributed index. Example:

// in session 1
mysql> SET GLOBAL @myfilter=(2,3,5,7,11,13);
Query OK, 0 rows affected (0.00 sec)

// later in session 2
mysql> SELECT * FROM test1 WHERE group_id IN @myfilter;
+------+--------+----------+------------+-----------------+------+
| id   | weight | group_id | date_added | title           | tag  |
+------+--------+----------+------------+-----------------+------+
|    3 |      1 |        2 | 1299338153 | another doc     | 15   |
|    4 |      1 |        2 | 1299338153 | doc number four | 7,40 |
+------+--------+----------+------------+-----------------+------+
2 rows in set (0.02 sec)

Per-session and global server variables affect certain server settings in the respective scope. Known per-session server variables are:

  • AUTOCOMMIT = {0 | 1} Whether any data modification statement should be implicitly wrapped by BEGIN and COMMIT.
  • COLLATION_CONNECTION = collation_name Selects the collation to be used for ORDER BY or GROUP BY on string values in the subsequent queries. Refer to Collations for a list of known collation names.
  • CHARACTER_SET_RESULTS = charset_name Does nothing; a placeholder to support frameworks, clients, and connectors that attempt to automatically enforce a charset when connecting to a Manticore server.
  • SQL_AUTO_IS_NULL = value Does nothing; a placeholder to support frameworks, clients, and connectors that attempt to automatically enforce a charset when connecting to a Manticore server.
  • SQL_MODE = value Does nothing; a placeholder to support frameworks, clients, and connectors that attempt to automatically enforce a charset when connecting to a Manticore server.
  • WAIT_TIMEOUT = value Set connection timeout, either per session or global. Global can only be set on a VIP connection.
  • PROFILING = {0 | 1} Enables query profiling in the current session. Defaults to 0. See also show profile
  • MAX_THREADS_PER_QUERY = <POSITIVE_INT_VALUE> Redefines max_threads_per_query in the runtime. Per-session variable influences only the queries run in the same session (connection), i.e. up to disconnect. Value 0 means 'no limit'. If both per-session and the global variables are set, the per-session one has a higher priority.

Known global server variables are:

  • QUERY_LOG_FORMAT = {plain | sphinxql} Changes the current log format.
  • LOG_LEVEL = {info | debug | replication | debugv | debugvv} Changes the current log verboseness level.
  • QCACHE_MAX_BYTES = <value> Changes the query_cache RAM use limit to a given value.
  • QCACHE_THRESH_MSEC = <value> Changes the query_cache> minimum wall time threshold to a given value.
  • QCACHE_TTL_SEC = <value> Changes the query_cache TTL for a cached result to a given value.
  • MAINTENANCE = {0 | 1} When set to 1, puts the server in maintenance mode. Only clients with vip connections can execute queries in this mode. All new non-vip incoming connections are refused.
  • GROUPING_IN_UTC = {0 | 1} When set to 1, cause timed grouping functions (day(), month(), year(), yearmonth(), yearmonthday()) to be calculated in utc. Read the doc for grouping_in_utc config params for more details.
  • QUERY_LOG_MIN_MSEC = <value> Changes the query_log_min_msec searchd settings value. In this case it expects value exactly in milliseconds and doesn't parse time suffixes, as in config.

Warning This is very specific and 'hard' variable; filtered out messages will be just dropped and not written into the log at all. Better just filter your log with something like 'grep', in this case you'll have at least full original log as backup.

  • LOG_DEBUG_FILTER = <string value> Filters out redundant log messages. If the value is set, then all logs with level > INFO (i.e., DEBUG, DEBUGV, etc.) will be compared with the string and output only in the case they starts with given value.
  • CPUSTAT = {0 | 1} Enable or disable CPU time reporting in query log and status reports.
  • IOSTAT = {0 | 1} Enable or disable I/O operations reporting in query log.
  • COREDUMP = {0 | 1} Enable or disable saving the core file or minidump on case of a crash.
  • MAX_THREADS_PER_QUERY = <POSITIVE_INT_VALUE> Redefines max_threads_per_query in the runtime. As global it changes behaviour for all sessions. Value 0 means 'no limit'. If both per-session and the global variables are set, the per-session one has a higher priority.
  • NET_WAIT = {-1 | 0 | POSITIVE_INT_VALUE} Changes the net_wait_tm searchd settings value.
  • cpustats= {1|0} Turns on/off cpu time tracking.
  • coredump= {1|0} Tunes on/off saving a core file or a minidump of the server on crash. More details here.

Examples:

mysql> SET autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL query_log_format=sphinxql;
Query OK, 0 rows affected (0.00 sec)