Section "Common" in configuration

lemmatizer_base

Lemmatizer dictionaries base path. Optional, default is /usr/local/share (as in --datadir switch to ./configure script).

Our lemmatizer implementation (see Morphology for a discussion of what lemmatizers are) is dictionary driven. lemmatizer_base directive configures the base dictionary path. File names are hardcoded and specific to a given lemmatizer; the Russian lemmatizer uses ru.pak dictionary file. The dictionaries can be obtained from the Manticore website (https://manticoresearch.com/downloads/).

Example:

lemmatizer_base = /usr/local/share/sphinx/dicts/

progressive_merge

Merge Real-Time index chunks during OPTIMIZE operation from smaller to bigger. Progressive merge merger faster and reads/write less data. Enabled by default. If disabled, chunks are merged from first to last created.

json_autoconv_keynames

Whether and how to auto-convert key names within JSON attributes. Known value is 'lowercase'. Optional, default value is unspecified (do not convert anything).

When this directive is set to 'lowercase', key names within JSON attributes will be automatically brought to lower case when indexing. This conversion applies to any data source, that is, JSON attributes originating from either SQL or XMLpipe2 sources will all be affected.

Example:

json_autoconv_keynames = lowercase

json_autoconv_numbers

Automatically detect and convert possible JSON strings that represent numbers, into numeric attributes. Optional, default value is 0 (do not convert strings into numbers).

When this option is 1, values such as "1234" will be indexed as numbers instead of strings; if the option is 0, such values will be indexed as strings. This conversion applies to any data source, that is, JSON attributes originating from either SQL or XMLpipe2 sources will all be affected.

Example:

json_autoconv_numbers = 1

on_json_attr_error

What to do if JSON format errors are found. Optional, default value is ignore_attr (ignore errors). Applies only to sql_attr_json attributes.

By default, JSON format errors are ignored (ignore_attr) and the indexer tool will just show a warning. Setting this option to fail_index will rather make indexing fail at the first JSON format error.

Example:

on_json_attr_error = ignore_attr

plugin_dir

Trusted location for the dynamic libraries (UDFs). Optional, default is empty (no location).

Specifies the trusted directory from which the UDF libraries can be loaded.

Example:

plugin_dir = /usr/local/sphinx/lib

Special suffixes

Manticore search recognizes and parses special suffixes which makes easier to use numeric values with special meaning. Common form for them is integer number+literal, like 10k or 100d, but not 40.3s(since 40.3 is not integer), or not 2d 4h (since there are two, not one value). Literals are case-insensitive, so 10W is the same as 10w. There are 2 types of such suffixes currently supported:

  • Size suffixes - can be used in parameters that define size of something (memory buffer, disk file, limit of RAM, etc. ) in bytes. "Naked" numbers in that places mean literally size in bytes (octets). Size values take suffix k for kilobytes (1k=1024), m for megabytes (1m=1024k), g for gigabytes (1g=1024m) and t for terabytes (1t=1024g).
  • Time suffixes - can be used in parameters defining some time interval values like delays, timeouts, etc. "Naked" values for those parameters usually have documented scale, and you must know if their numbers, say, 100, means '100 seconds' or '100 milliseconds'. However instead of guessing you just can write suffixed value and it will be fully determined by its suffix. Time values take suffix us for useconds (microseconds), ms for milliseconds, s for seconds, m for minutes, h for hours, d for days and w for weeks.

Warning

Giga-, and especially tera- size suffixes are not very usable right now, since most of the sizes inside are limited by 2Gb (or, being precise, 2Gb - 1 byte), and for the moment only rt_mem_limit, attr_update_reserve from index config, and qcache_max_bytes from searchd config accepts 64-bit values which may exceed 2Gb.

Scripted configuration

Manticore configuration supports shebang syntax, meaning that the configuration can be written in a programming language and interpreted at loading, allowing dynamic settings.

For example, indexes can be generated by querying a database table, various settings can be modified depending on external factors or external files can be included (which contain indexes and/sources).

The configuration file is parsed by declared declared interpreter and the output is used as the actual configuration. This is happening each time the configuration is read (not only at searchd startup).

This facility is not available on Windows platform.

In the following example, we are using PHP to create multiple indexes with different name and we also scan a specific folder for file containing extra declarations of indexes.

#!/usr/bin/php
...
<?php for ($i=1; $i<=6; $i++) { ?>
index test_<?=$i?> {
  type = rt
  path = /var/lib/manticore/data/test_<?=$i?>
  rt_field = subject
  ...
 }
 <?php } ?>
 ...

 <?php
 $confd_folder='/etc/manticore.conf.d/';
 $files = scandir($confd_folder);
 foreach($files as $file)
 {
         if(($file == '.') || ($file =='..'))
         {} else {
                 $fp = new SplFileInfo($confd_folder.$file);
                 if('conf' == $fp->getExtension()){
                         include ($confd_folder.$file);
                 }
         }
 }
 ?>

Comments

The configuration file supports comments, with # character used as start comment section. The comment character can be present at the start of the line or inline.

Extra care should be considered when using # in character tokenization settings as everything after it will not be taken into consideration. To avoid this, use # UTF-8 which is U+23.

# can also be escaped using \. Escaping is required if # is present in database credential in source declarations.

Inheritance of index and source declarations

Both index and source declarations support inheritance. It allows a better organization of indexes having similar settings or structure and reduces the size of the configuration.

For a parent index/source nothing needs to be specified.

For the child index/source the declaration will contain the index/source name followed by : and the parent name.

index parent {
path = /var/lib/manticore/parent
...
}

index child:parent {
path = /var/lib/manticore/child
...
}

The child will inherit the entire configuration of the parent. In the child declaration any setting declared will overwrite the inherited values. Please note that in case of multi-value settings, defining a single value in child will clear out all inherited values. For example in the parent there are several sql_query_pre declaration and the child has a single sql_query_pre declaration, all the sql_query_pre inherited declarations are cleared. If you need to override some of the inherited values from parent, they need to be explicitly declared in the child. This is also available if you don't need a value from parent. For example if the value of sql_query_pre from parent is not needed, then in the child you can declare the directive with an empty value like sql_query_pre=. This also means that existing values of a multi-value setting will not be copied if the child declares one value for that setting. The inheritance behavior applies to fields and attributes and not just index options. If, for example, the parent has 2 integer attributes and the child needs a new integer attribute, the integer attributes declaration from parent must be copied in the child configuration.