Flushing RAM chunk to a new disk chunk

FLUSH RAMCHUNK

FLUSH RAMCHUNK rt_table

The FLUSH RAMCHUNK command creates a new disk chunk in an RT table.

Normally, an RT table would automatically flush and convert the contents of the RAM chunk into a new disk chunk once the RAM chunk reaches the maximum allowed rt_mem_limit size. However, for debugging and testing purposes, it might be useful to forcibly create a new disk chunk, and the FLUSH RAMCHUNK statement does exactly that.

‹›
  • SQL
SQL
📋
FLUSH RAMCHUNK rt;
‹›
Response
Query OK, 0 rows affected (0.05 sec)

Flushing RAM chunk to disk

FLUSH TABLE

FLUSH TABLE rt_table

FLUSH TABLE forcefully flushes RT table RAM chunk contents to disk.

The real-time table RAM chunk is automatically flushed to disk during a clean shutdown, or periodically every rt_flush_period seconds.

Issuing a FLUSH TABLE command not only forces the RAM chunk contents to be written to disk but also triggers the cleanup of binary log files.

‹›
  • SQL
SQL
📋
FLUSH TABLE rt;
‹›
Response
Query OK, 0 rows affected (0.05 sec)

Compacting a Table

Over time, RT tables may become fragmented into numerous disk chunks and/or contaminated with deleted, yet unpurged data, affecting search performance. In these cases, optimization is necessary. Essentially, the optimization process combines pairs of disk chunks, removing documents that were previously deleted using DELETE statements.

Beginning with Manticore 4, this process occurs automatically by default. However, you can also use the following commands to manually initiate table compaction.

OPTIMIZE TABLE

OPTIMIZE TABLE index_name [OPTION opt_name = opt_value [,...]]

OPTIMIZE statement adds an RT table to the optimization queue, which will be processed in a background thread.

‹›
  • SQL
SQL
📋
OPTIMIZE TABLE rt;

Number of optimized disk chunks

By default, OPTIMIZE merges the RT table's disk chunks down to a number equal to # of CPU cores * 2. You can control the number of optimized disk chunks using the cutoff option.

Additional options include:

‹›
  • SQL
SQL
📋
OPTIMIZE TABLE rt OPTION cutoff=4;

Running in foreground

When using OPTION sync=1 (0 by default), the command will wait for the optimization process to complete before returning. If the connection is interrupted, the optimization will continue running on the server.

‹›
  • SQL
SQL
📋
OPTIMIZE TABLE rt OPTION sync=1;

Throttling the IO impact

Optimization can be a lengthy and I/O-intensive process. To minimize the impact, all actual merge work is executed serially in a special background thread, and the OPTIMIZE statement simply adds a job to its queue. The optimization thread can be I/O-throttled, and you can control the maximum number of I/Os per second and the maximum I/O size with the rt_merge_iops and rt_merge_maxiosize directives, respectively.

During optimization, the RT table being optimized remains online and available for both searching and updates nearly all the time. It is locked for a very brief period when a pair of disk chunks is successfully merged, allowing for the renaming of old and new files and updating the table header.

Optimizing clustered tables

As long as auto_optimize is not disabled, tables are optimized automatically.

If you are experiencing unexpected SSTs or want tables across all nodes of the cluster to be binary identical, you need to:

  1. Disable auto_optimize.
  2. Manually optimize tables:

    On one of the nodes, drop the table from the cluster:

    ‹›
    • SQL
    SQL
    📋
    ALTER CLUSTER mycluster DROP myindex;

    Optimize the table:

    ‹›
    • SQL
    SQL
    📋
    OPTIMIZE TABLE myindex;

    Add back the table to the cluster:

    ‹›
    • SQL
    SQL
    📋
    ALTER CLUSTER mycluster ADD myindex;

    When the table is added back, the new files created by the optimization process will be replicated to the other nodes in the cluster. Any local changes made to the table on other nodes will be lost.

Table data modifications (inserts, replaces, deletes, updates) should either:

  1. Be postponed, or
  2. Be directed to the node where the optimization process is running.

Note that while the table is out of the cluster, insert/replace/delete/update commands should refer to it without the cluster name prefix (for SQL statements or the cluster property in case of an HTTP JSON request), otherwise they will fail. Once the table is added back to the cluster, you must resume write operations on the table and include the cluster name prefix again, or they will fail.

Search operations are available as usual during the process on any of the nodes.