Flushing RAM chunk to a new disk chunk

FLUSH RAMCHUNK

FLUSH RAMCHUNK rtindex

FLUSH RAMCHUNK forcibly creates a new disk chunk in an RT index.

Normally, RT index would flush and convert the contents of the RAM chunk into a new disk chunk automatically, once the RAM chunk reaches the maximum allowed rt_mem_limit size. However, for debugging and testing it might be useful to forcibly create a new disk chunk, and FLUSH RAMCHUNK statement does exactly that.

Note that using FLUSH RAMCHUNK increases RT index fragmentation. Most likely, you want to use FLUSH RTINDEX instead. We suggest that you abstain from using just this statement unless you're absolutely sure what you're doing. As the right way is to issue FLUSH RAMCHUNK with following OPTIMIZE command. Such combo allows to keep RT index fragmentation on minimum.

‹›
  • SQL
SQL
📋
FLUSH RAMCHUNK rt;
‹›
Response
Query OK, 0 rows affected (0.05 sec)

Flushing RAM chunk to disk

FLUSH RTINDEX

FLUSH RTINDEX rtindex

FLUSH RTINDEX forcibly flushes RT index RAM chunk contents to disk.

Backing up an RT index is as simple as copying over its data files, followed by the binary log. However, recovering from that backup means that all the transactions in the log since the last successful RAM chunk write would need to be replayed. Those writes normally happen either on a clean shutdown, or periodically with a (big enough!) interval between writes specified in rt_flush_period directive. So such a backup made at an arbitrary point in time just might end up with way too much binary log data to replay.

FLUSH RTINDEX forcibly writes the RAM chunk contents to disk, and also causes the subsequent cleanup of (now redundant) binary log files. Thus, recovering from a backup made just after FLUSH RTINDEX should be almost instant.

‹›
  • SQL
SQL
📋
FLUSH RTINDEX rt;
‹›
Response
Query OK, 0 rows affected (0.05 sec)

Compacting an index

Over time, RT indexes can grow fragmented into many disk chunks and/or tainted with deleted, but unpurged data, impacting search performance. When that happens, they can be optimized. Basically, the optimization pass merges together disk chunks pairs, purging off documents suppressed by K-list as it goes.

OPTIMIZE INDEX

OPTIMIZE INDEX index_name

OPTIMIZE statement enqueues an RT index for optimization in a background thread.

If OPTION sync=1 is used (0 by default), the command will wait until the optimization process is done (or if the connection timeout - but the optimization will continue to run).

That is a lengthy and IO intensive process, so to limit the impact, all the actual merge work is executed serially in a special background thread, and the OPTIMIZE statement simply adds a job to its queue. Currently, there is no way to check the index or queue status (that might be added in the future to the SHOW INDEX STATUS and SHOW STATUS statements respectively). The optimization thread can be IO-throttled, you can control the maximum number of IOs per second and the maximum IO size with rt_merge_iops and rt_merge_maxiosize directives respectively. The optimization jobs queue is lost on server crash.

The RT index being optimized stays online and available for both searching and updates at (almost) all times during the optimization. It gets locked (very) briefly every time that a pair of disk chunks is merged successfully, to rename the old and the new files, and update the index header.

At the moment, OPTIMIZE needs to be issued manually, the indexes will not be optimized automatically. That will change in future releases.

‹›
  • SQL
SQL
📋
OPTIMIZE INDEX rt;
‹›
Response
Query OK, 0 rows affected (0.00 sec)