Manticore Search Manual: Securing and compacting an index

FLUSH RTINDEX rtindex

FLUSH RTINDEX forcibly flushes RT index RAM chunk contents to disk.

Backing up an RT index is as simple as copying over its data files, followed by the binary log. However, recovering from that backup means that all the transactions in the log since the last successful RAM chunk write would need to be replayed. Those writes normally happen either on a clean shutdown, or periodically with a (big enough!) interval between writes specified in rt_flush_period directive. So such a backup made at an arbitrary point in time just might end up with way too much binary log data to replay.

FLUSH RTINDEX forcibly writes the RAM chunk contents to disk, and also causes the subsequent cleanup of (now redundant) binary log files. Thus, recovering from a backup made just after FLUSH RTINDEX should be almost instant.

‹›

SQL

📋

FLUSH RTINDEX rt;

‹›

Response

Query OK, 0 rows affected (0.05 sec)

Over time, RT indexes can grow fragmented into many disk chunks and/or tainted with deleted, but unpurged data, impacting search performance. When that happens, they can be optimized. Basically, the optimization pass merges together disk chunks pairs, purging off documents suppressed by K-list as it goes.

OPTIMIZE INDEX index_name [OPTION opt_name = opt_value [,...]]

OPTIMIZE statement enqueues an RT index for optimization in a background thread.

The optimize process reduces the disk chunks by default to a number equal with # of CPU cores * 2. The number of optimized disk chunks can be controlled with option cutoff.

In previous releases OPTIMIZE reduced the disk chunks to a single one. This can still be achieved if desired by setting OPTION cutoff=1.

If OPTION sync=1 is used (0 by default), the command will wait until the optimization process is done (in case the connection interrupts the optimization will continue to run on the server).

Optimize can be a lengthy and IO intensive process, so to limit the impact, all the actual merge work is executed serially in a special background thread, and the OPTIMIZE statement simply adds a job to its queue. Currently, there is no way to check the index or queue status (that might be added in the future to the SHOW INDEX STATUS and SHOW STATUS statements respectively). The optimization thread can be IO-throttled, you can control the maximum number of IOs per second and the maximum IO size with rt_merge_iops and rt_merge_maxiosize directives respectively.

The RT index being optimized stays online and available for both searching and updates at (almost) all times during the optimization. It gets locked for a very short time when a pair of disk chunks is merged successfully, to rename the old and the new files, and update the index header.

At the moment, OPTIMIZE needs to be issued manually, the indexes are not optimized automatically. It will be changed in future releases.

‹›

SQL

📋

OPTIMIZE INDEX rt;

‹›

Response

Query OK, 0 rows affected (0.00 sec)

Currently indexes cannot be optimized directly while are being part of a cluster.

The following procedure should be used:

On one of the nodes drop the index from the cluster:

mysql> ALTER CLUSTER mycluster DROP myindex;

Optimize the index:

mysql> OPTIMIZE INDEX myindex;

Add back the index to the cluster:

mysql> ALTER CLUSTER mycluster ADD myindex;

When the index is added back, the new files created by the optimize process will be replicated to the other nodes in the cluster. Any changes made locally to the index on other nodes will be lost.

Writes on the index should either be stopped or directed to the node were the optimize process is running. Note that after the index is out of the cluster, writes must be made locally and the index name must not contain the cluster name as prefix (for SQL statements or cluster property for HTTP requests). As soon as the index is added back to the cluster, writes can be resumed. At this point the writes operations on the index must include (again) the cluster prefix (for SQL statements or cluster property for HTTP requests). Searches will be available as usual during the process on any of the nodes.

In future releases it's expected to remove the need of this process and simply perform OPTIMIZE without the need to take the index out of the cluster.

FLUSH ATTRIBUTES

Flushes all in-memory attribute updates in all the active disk indexes to disk. Returns a tag that identifies the result on-disk state (basically, a number of actual disk attribute saves performed since the server startup).

mysql> UPDATE testindex SET channel_id=1107025 WHERE id=1;
Query OK, 1 row affected (0.04 sec)

mysql> FLUSH ATTRIBUTES;
+------+
| tag  |
+------+
|    1 |
+------+
1 row in set (0.19 sec)

Flushing RAM chunk to disk

FLUSH RTINDEX

Compacting an index

OPTIMIZE INDEX

Number of optimized disk chunks

Running in foreground

Throttling the IO impact

Optimizing clustered indexes

FLUSH ATTRIBUTES