Attaching a plain index to a real-time index

A plain index can be converted into a real-time index or added to an existing real-time index.

The first case is useful when you need to regenerated a real-time index completely which may be needed for example if tokenization settings need an update. Then preparing a plain index and converting it into a real-time index may be easier than preparing a batch job to perform INSERTs for adding all the data into a real-time index.

In the second you normally want to add a large bulk of new data to a real-time index and again creating a plain index with that data is easier than populating the existing real-time index.

Attaching index - general syntax

The ATTACH statement allows to convert a plain index to be attached to an existing real-time index.

ATTACH INDEX diskindex TO RTINDEX rtindex [WITH TRUNCATE]

ATTACH INDEX statement lets you move data from a plain index to an RT index.

After a successful ATTACH the data originally stored in the source plain index becomes a part of the target RT index, and the source plain index becomes unavailable (until the next rebuild). ATTACH does not result in any index data changes. Basically, it just renames the files (making the source index a new disk chunk of the target RT index) and updates the metadata. So it is a generally quick operation which might (frequently) complete as fast as under a second.

Note that when an index is attached to an empty RT index the fields, attributes and text processing settings (tokenizer, wordforms, etc) from the source index are copied over and take effect. The respective parts of the RT index definition from the configuration file will be ignored.

When TRUNCATE option is used RT index gets truncated prior to attaching source plain index. This allows to make operation atomic or make sure that the attached source plain index will be the only data in the target RT index.

ATTACH INDEX comes with a number of restrictions. Most notably, the target RT index is currently required to be either empty or have the same settings as the source plain index. In case the source plain index gets attached to a non-empty RT index the RT index data collected so far gets stored as a regular disk chunk and index being attached becomes the newest disk chunk and documents with same IDs get killed. The complete list is as follows:

‹›
  • Example
Example
📋

Before ATTACH the RT index is empty and has 3 fields:

mysql> DESC rt;
Empty set (0.00 sec)

mysql> SELECT * FROM rt;
+-----------+---------+
| Field     | Type    |
+-----------+---------+
| id        | integer |
| testfield | field   |
| testattr  | uint    |
+-----------+---------+
3 rows in set (0.00 sec)

The plain index is not empty:

mysql> SELECT * FROM disk WHERE MATCH('test');
+------+--------+----------+------------+
| id   | weight | group_id | date_added |
+------+--------+----------+------------+
|    1 |   1304 |        1 | 1313643256 |
|    2 |   1304 |        1 | 1313643256 |
|    3 |   1304 |        1 | 1313643256 |
|    4 |   1304 |        1 | 1313643256 |
+------+--------+----------+------------+
4 rows in set (0.00 sec)

Attaching:

mysql> ATTACH INDEX disk TO RTINDEX rt;
Query OK, 0 rows affected (0.00 sec)

The RT index now has 5 fields:

mysql> DESC rt;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | integer   |
| title      | field     |
| content    | field     |
| group_id   | uint      |
| date_added | timestamp |
+------------+-----------+
5 rows in set (0.00 sec)

And it's not empty:

mysql> SELECT * FROM rt WHERE MATCH('test');
+------+--------+----------+------------+
| id   | weight | group_id | date_added |
+------+--------+----------+------------+
|    1 |   1304 |        1 | 1313643256 |
|    2 |   1304 |        1 | 1313643256 |
|    3 |   1304 |        1 | 1313643256 |
|    4 |   1304 |        1 | 1313643256 |
+------+--------+----------+------------+
4 rows in set (0.00 sec)

The plain index was removed:

mysql> SELECT * FROM disk WHERE MATCH('test');
ERROR 1064 (42000): no enabled local indexes to search

Importing index

If you decide to migrate from Plain mode to RT mode and in some other cases, real-time and percolate indexes built in Plain mode can be imported to Manticore running in RT mode using the IMPORT TABLE statement. The general syntax is as follows:

IMPORT TABLE table_name FROM 'path'

Executing this command makes all the index files of the specified index copied to data_dir. All the external index files such as wordforms, exceptions and stopwords are also copied to the same data_dir. IMPORT TABLE has the following limitations:

  • paths to the external files that were originally specified in the config file must be absolute
  • only real-time and percolate indexes are supported
  • plain indexes need to be preliminarily (in the plain mode) converted to real-time indexes via ATTACH INDEX

indexer --print-rt

If the above method for migrating plain index to RT index is not possible you may use indexer --print-rt to dump data from plain index directly without the need to convert it to RT type index and then import dump into RT index right from command line.

This method has few limitations though:

  • Only sql-based sources are supported
  • MVAs are not supported
‹›
  • bash
bash
📋
/usr/bin/indexer --rotate --config /etc/manticoresearch/manticore.conf --print-rt my_rt_index my_plain_index > /tmp/dump_regular.sql

mysql -P $9306 -h0 -e "truncate table my_rt_index"

mysql -P 9306 -h0 < /tmp/dump_regular.sql

rm /tmp/dump_regular.sql

Rotating an index

Index rotation is a procedure in which the searchd server looks upon new versions of defined indexes in the configuration. Rotation is subject only to Plain mode of operation.

There can be two cases:

  • for plain indexes that are already loaded
  • indexes added in configuration, but not loaded yet

In the first case, indexer cannot put the new version of the index online as the running copy is locked and loaded by searchd. In this case indexer needs to be called with --rotate parameter. If rotate is used, indexes creates new index files with .new. in their name and sends a HUP signal to searchd informing it about the new version. The searchd will perform a lookup and will put in place the new version of the index and discard the old one. In some cases it might be desired to create the new version of the index but not perform the rotate as soon as possible. For example it might be desired to check first the health of the new index versions. In this case, indexer can accept --nohup parameter which will forbid sending the HUP signal to the server.

New indexes can be loaded by rotation, however the regular handling of HUP signal is to check for new indexes only if configuration has changed since server startup. If the index was already defined in the configuration, the index should be first created by running indexer without rotation and perform RELOAD INDEXES statement instead.

There are also two specialized statements can be used to perform rotations on indexes:

RELOAD INDEX

RELOAD INDEX idx [ FROM '/path/to/index_files' ];

RELOAD INDEX allows you to rotate indexes using SQL.

It has two modes of operation. First one (without specifying a path) makes Manticore server check for new index files in directory specified in path. New index files must have a idx.new.sp? names.

And if you additionally specify a path, server will look for index files in specified directory, move them to index path, rename from index_files.sp? to idx.new.sp? and rotate them.

mysql> RELOAD INDEX plain_index;
mysql> RELOAD INDEX plain_index FROM '/home/mighty/new_index_files';

RELOAD INDEXES

RELOAD INDEXES;

Works same as system HUP signal. Initiates index rotation. Unlike regular HUP signalling (which can come from kill or indexer ), the statement forces lookup on possible indexes to rotate even if the configuration has no changes since the startup of the server.

Depending on the value of seamless_rotate setting, new queries might be shortly stalled; clients will receive temporary errors. Command is non-blocking (i.e., returns immediately).

mysql> RELOAD INDEXES;
Query OK, 0 rows affected (0.01 sec)

Seamless rotate

The rotate assumes old index version is discarded and new index version is loaded and replace the existing one. During this swapping, the server needs also to serve incoming queries made on the index that is going to be updated. To not have stalls of the queries, the server implements by default a seamless rotate of the index as described below.

Indexes may contain some data that needs to be precached in RAM. At the moment, .spa, .spb, .spi and .spm files are fully precached (they contain attribute data, blob attribute data, keyword index and killed row map, respectively.) Without seamless rotate, rotating an index tries to use as little RAM as possible and works as follows:

  1. new queries are temporarily rejected (with "retry" error code);
  2. searchd waits for all currently running queries to finish;
  3. old index is deallocated and its files are renamed;
  4. new index files are renamed and required RAM is allocated;
  5. new index attribute and dictionary data is preloaded to RAM;
  6. searchd resumes serving queries from new index.

However, if there's a lot of attribute or dictionary data, then preloading step could take noticeable time - up to several minutes in case of preloading 1-5+ GB files.

With seamless rotate enabled, rotation works as follows:

  1. new index RAM storage is allocated
  2. new index attribute and dictionary data is asynchronously preloaded to RAM
  3. on success, old index is deallocated and both indexes' files are renamed
  4. on failure, new index is deallocated
  5. at any given moment, queries are served either from old or new index copy

Seamless rotate comes at the cost of higher peak memory usage during the rotation (because both old and new copies of .spa/.spb/.spi/.spm data need to be in RAM while preloading new copy). Average usage stays the same.

Example:

seamless_rotate = 1