Attaching a plain table to a real-time table

A plain table can be converted into a real-time table or added to an existing real-time table.

The first case is useful when you need to regenerate a real-time table completely, which may be needed, for example, if tokenization settings need an update. In this situation, preparing a plain table and converting it into a real-time table may be easier than preparing a batch job to perform INSERTs for adding all the data into a real-time table.

In the second case, you normally want to add a large bulk of new data to a real-time table, and again, creating a plain table with that data is easier than populating the existing real-time table.

Attaching table - general syntax

The ATTACH statement allows you to convert a plain table to be attached to an existing real-time table. It also allows you to attach the content of one real-time table into another real-time table.

ATTACH TABLE plain_or_rt_table TO TABLE rt_table [WITH TRUNCATE]

ATTACH TABLE statement lets you move data from a plain table or a RT table to an RT table.

After a successful ATTACH the data originally stored in the source plain table becomes a part of the target RT table, and the source plain table becomes unavailable (until the next rebuild). If the source table is an RT table, its content is moved into the destination RT table, and the source RT table remains empty. ATTACH does not result in any table data changes. Essentially, it just renames the files (making the source table a new disk chunk of the target RT table) and updates the metadata. So it is generally a quick operation that might (frequently) complete as fast as under a second.

Note that when a table is attached to an empty RT table, the fields, attributes, and text processing settings (tokenizer, wordforms, etc.) from the source table are copied over and take effect. The respective parts of the RT table definition from the configuration file will be ignored.

When the TRUNCATE option is used, the RT table gets truncated prior to attaching the source plain table. This allows the operation to be atomic or ensures that the attached source plain table will be the only data in the target RT table.

ATTACH TABLE comes with a number of restrictions. Most notably, the target RT table is currently required to be either empty or have the same settings as the source plain table. In case the source table gets attached to a non-empty RT table, the RT table data collected so far gets stored as a regular disk chunk, and the table being attached becomes the newest disk chunk, with documents having the same IDs getting killed. The complete list of restrictions is as follows:

  • The target RT table needs to be either empty or have the same settings as the source plain table.
  • The source plain table needs to have phrase_boundary_stepset to 0 and stopword_stepset to 1.
‹›
  • Example
Example
📋

Before the ATTACH, the RT table is empty and has 3 fields:

mysql> DESC rt;
Empty set (0.00 sec)

mysql> SELECT * FROM rt;
+-----------+---------+
| Field     | Type    |
+-----------+---------+
| id        | integer |
| testfield | field   |
| testattr  | uint    |
+-----------+---------+
3 rows in set (0.00 sec)

The plain table is not empty:

mysql> SELECT * FROM plain WHERE MATCH('test');
+------+--------+----------+------------+
| id   | weight | group_id | date_added |
+------+--------+----------+------------+
|    1 |   1304 |        1 | 1313643256 |
|    2 |   1304 |        1 | 1313643256 |
|    3 |   1304 |        1 | 1313643256 |
|    4 |   1304 |        1 | 1313643256 |
+------+--------+----------+------------+
4 rows in set (0.00 sec)

Attaching the plain table to the RT table:

mysql> ATTACH TABLE plain TO TABLE rt;
Query OK, 0 rows affected (0.00 sec)

The RT table now has 5 fields:

mysql> DESC rt;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | integer   |
| title      | field     |
| content    | field     |
| group_id   | uint      |
| date_added | timestamp |
+------------+-----------+
5 rows in set (0.00 sec)

And it's not empty:

mysql> SELECT * FROM rt WHERE MATCH('test');
+------+--------+----------+------------+
| id   | weight | group_id | date_added |
+------+--------+----------+------------+
|    1 |   1304 |        1 | 1313643256 |
|    2 |   1304 |        1 | 1313643256 |
|    3 |   1304 |        1 | 1313643256 |
|    4 |   1304 |        1 | 1313643256 |
+------+--------+----------+------------+
4 rows in set (0.00 sec)

After the ATTACH, the plain table is removed and no longer available for searching:

mysql> SELECT * FROM plain WHERE MATCH('test');
ERROR 1064 (42000): no enabled local indexes to search

Importing table

If you decide to migrate from Plain mode to RT mode or in some other cases, real-time and percolate tables built in Plain mode can be imported to Manticore running in RT mode using the IMPORT TABLE statement. The general syntax is as follows:

IMPORT TABLE table_name FROM 'path'

where the 'path' parameter must be set as: /your_backup_folder/your_backup_name/data/your_table_name/your_table_name

‹›
  • bash
bash
📋
mysql -P9306 -h0 -e 'create table t(f text)'

mysql -P9306 -h0 -e "backup table t to /tmp/"

mysql -P9306 -h0 -e "drop table t"

BACKUP_NAME=$(ls /tmp | grep 'backup-' | tail -n 1)

mysql -P9306 -h0 -e "import table t from '/tmp/$BACKUP_NAME/data/t/t'

mysql -P9306 -h0 -e "show tables"

Executing this command makes all the table files of the specified table copied to data_dir. All the external table files such as wordforms, exceptions and stopwords are also copied to the same data_dir. IMPORT TABLE has the following limitations:

  • paths to the external files that were originally specified in the config file must be absolute
  • only real-time and percolate tables are supported
  • plain tables need to be preliminarily (in the plain mode) converted to real-time tables via ATTACH TABLE

indexer --print-rt

If the above method for migrating a plain table to an RT table is not possible, you may use indexer --print-rt to dump data from a plain table directly without the need to convert it to an RT type table and then import the dump into an RT table right from the command line.

This method has a few limitations though:

  • Only SQL-based sources are supported
  • MVAs are not supported
‹›
  • bash
bash
📋
/usr/bin/indexer --rotate --config /etc/manticoresearch/manticore.conf --print-rt my_rt_index my_plain_index > /tmp/dump_regular.sql

mysql -P $9306 -h0 -e "truncate table my_rt_index"

mysql -P 9306 -h0 < /tmp/dump_regular.sql

rm /tmp/dump_regular.sql

Rotating a table

Table rotation is a procedure in which the searchd server looks for new versions of defined tables in the configuration. Rotation is supported only in Plain mode of operation.

There can be two cases:

  • for plain tables that are already loaded
  • tables added in configuration, but not loaded yet

In the first case, the indexer cannot put the new version of the table online as the running copy is locked and loaded by searchd. In this case indexer needs to be called with the --rotate parameter. If rotate is used, indexer creates new table files with .new. in their names and sends a HUP signal to searchd informing it about the new version. The searchd will perform a lookup and will put the new version of the table in place and discard the old one. In some cases, it might be desired to create the new version of the table but not perform rotate as soon as possible. For example, it might be desired to first check the health of the new table versions. In this case, indexer can accept the--nohup parameter which will forbid sending the HUP signal to the server.

New tables can be loaded by rotation; however, the regular handling of the HUP signal is to check for new tables only if the configuration has changed since server startup. If the table was already defined in the configuration, the table should be first created by running indexer without rotation and perform the RELOAD TABLES statement instead.

There are also two specialized statements that can be used to perform rotations on tables:

RELOAD TABLE

RELOAD TABLE tbl [ FROM '/path/to/table_files' [ OPTION switchover=1 ] ];

The RELOAD TABLE command enables table rotation via SQL.

This command functions in three modes. In the first mode, without specifying a path, the Manticore server checks for new table files in the directory indicated by the path. New table files must be named as tbl.new.sp?.

If you specify a path, the server searches for table files in that directory, relocates them to the table path, renames them from tbl.sp? to tbl.new.sp?, and rotates them.

The third mode, activated by OPTION switchover=1, switches the index to the new path. Here, the daemon tries to load the table directly from the new path without moving the files. If loading is successful, this new index supersedes the old one.

Also, the daemon writes a unique link file (tbl.link) in the directory specified by path, maintaining persistent redirection.

If you revert a redirected index to the path specified in the configuration, the daemon will detect this and delete the link file.

Once redirected, the daemon retrieves the table from the newly linked path. When rotating, it looks for new table versions at the newly redirected path. Bear in mind, the daemon checks the configuration for common errors, like duplicate paths across different tables. However, it won't identify if multiple tables point to the same path via redirection. Under normal operations, tables are locked with the .spl file, but disabling the lock may cause problems. If there's an error (e.g., the path is inaccessible for any reason), you should manually correct (or simply delete) the link file.

indextool follows the link file, but other tools (indexer, index_converter, etc.) do not recognize the link file and consistently use the path defined in the configuration file, ignoring any redirection. Thus, you can inspect the index with indextool, and it will read from the new location. However, more complex operations like merging will not acknowledge any link file.

mysql> RELOAD TABLE plain_table;
mysql> RELOAD TABLE plain_table FROM '/home/mighty/new_table_files';
mysql> RELOAD TABLE plain_table FROM '/home/mighty/new/place/for/table/table_files' OPTION switchover=1;

RELOAD TABLES

RELOAD TABLES;

This command functions similarly to the HUP system signal, triggering a rotation of tables. Nevertheless, it doesn't exactly mirror the typical HUP signal (which can come from a kill -HUP command or indexer --rotate). This command actively searches for any tables needing rotation and is capable of re-reading the configuration. Suppose you launch Manticore in plain mode with a config file that points to a nonexistent plain table. If you then attempt to indexer --rotate the table, the new table won't be recognized by the server until you execute RELOAD TABLES or restart the server.

Depending on the value of the seamless_rotate setting, new queries might be shortly stalled, and clients will receive temporary errors.

mysql> RELOAD TABLES;
Query OK, 0 rows affected (0.01 sec)

Seamless rotate

The rotate assumes old table version is discarded and new table version is loaded and replaces the existing one. During this swapping, the server needs to also serve incoming queries made on the table that is going to be updated. To avoid stalls of the queries, the server implements a seamless rotate of the table by default, as described below.

Tables may contain data that needs to be precached in RAM. At the moment, .spa, .spb, .spi and .spm files are fully precached (they contain attribute data, blob attribute data, keyword table, and killed row map, respectively). Without seamless rotate, rotating a table tries to use as little RAM as possible and works as follows:

  1. New queries are temporarily rejected (with "retry" error code).
  2. searchd waits for all currently running queries to finish.
  3. Old table is deallocated and its files are renamed.
  4. New table files are renamed and required RAM is allocated.
  5. New table attribute and dictionary data is preloaded to RAM.
  6. searchd resumes serving queries from the new table.

However, if there's a lot of attribute or dictionary data, then the preloading step could take noticeable time - up to several minutes in case of preloading 1-5+ GB files.

With seamless rotate enabled, rotation works as follows:

  1. New table RAM storage is allocated.
  2. New table attribute and dictionary data is asynchronously preloaded to RAM.
  3. On success, old table is deallocated and both tables' files are renamed.
  4. On failure, new table is deallocated.
  5. At any given moment, queries are served either from old or new table copy.

Seamless rotate comes at the cost of higher peak memory usage during the rotation (because both old and new copies of .spa/.spb/.spi/.spm data need to be in RAM while preloading the new copy). However, average usage stays the same.

Example:

seamless_rotate = 1