Securing and compacting an index > Flushing RAM chunk to disk

Plain index can be created from an external source by special tool [indexer], which reads a "recipe from configuration, then connects to the data sources, pulls documents and builds index files. That is quite a long process. If your data then changes, the index becomes no more actual, and you need to rebuild it from the refreshed sources. If your data changes incrementally - for example, you index some blog or newsfeed, where old documents never change, and only new added, such rebuild will take more and more time, as on each pass you will need to process the archive sources again and again.

One of the ways to deal with this problem is by using several indexes instead of one solid. For example, you can index sources produced previous years and save the index. Then take only sources from current year and index them into a separate index, rebuilding it as often as necessary. Then you can place both indexes as parts of a distributed index, and use it for querying. The point here is that each time you rebuild only data for at most last 12 months, and the index with older data remains untouched without need to be rebuilt. You can go further and divide the last 12 months index into: monthly, weekly, daily indexes, and so on.

This approach works, but you need to maintain your distributed index manually. I.e., add new chunks, delete old and keep overall number of partial indexes not so big (with too many indexes searching can become slower, and also the OS usually limits the number of simultaneously opened files). To deal with it, you can manually merge several indexes together, by running indexer --merge. However, that solves only the problem of many indexes, by making maintenance harder. And even with 'per-hour' reindexing you most probably will have noticeable time gap between arriving new data in sources and rebuilding the index which populates this data for searching.

Real-time index is intended to solve the problem. It consists of two parts:

Special RAM-based index (called RAM chunk), which contains portions of data arriving right now.
Collection of plain indexes called disk chunks, that were built in past.

That is very similar to a usual distributed index, made from several locals.

You don't need to build such index the traditional way - by running indexer, which reads a "recipe" from config and indexes data sources. Instead, real-time index provides ability to 'insert' new documents, and 'replace' existing. When executing the 'insert' command, you push new documents to the server. It then builds a small index from the added documents, and immediately brings it on-line. So, right after the 'insert' command completes you can perform searches in all the index parts, including just added documents.

Maintaining is performed automatically by search server, so you don't have to care about it. But you may be interested to know about few details on 'how it is maintained'.

First, since indexed data is stored in RAM - what about emergency power-off? Will I lose my index then? Well, before completion, server saves new data into special 'binlog'. That is one or several files, living on your persistent storage, which incrementally grows as you add more and more changes. You may tune the behaviour on how often new queries (or transactions) are stored there, and how often 'sync' command is executed over the binlog file in order to force the OS to actually save the data on a safe storage. Most paranoid way - to flush and sync on every transaction. That is slowest, but also the safest approach. The least expensive way - to switch off binlog at all. That is fastest, but you can lose your indexed data. Intermediate variants, like flush/sync every second are also provided.

Binlog is designed especially for sequential saving of new arriving transactions, it is not an index, and it can't be searched over. That is just an insurance that the server will not lose your data. If a sudden disruption happened and everything crashed because of a software or hardware problem, the server will load teh freshest available dump of the RAM chunk, and then will replay the binlog, repeating stored transactions. Finally it will achieve the same state as was at the moment of the last change.

Second, what about limits? What if I want to index, say, 10TB of data, it just doesn't fit to RAM! RAM for a real-time index is limited and may be configured. When some quantity of data indexed, the server maintains RAM part of index by merging together small transactions, keeping their number and overall size small. That sometimes causes delays on insertion, however. When merging helps no more, and new insertions hit the RAM limit, the server converts the RAM-based index into a plain index, stored on disk (called disk chunk). That index is added to the collection of indexes of the second part of the RT index and comes on-line. The RAM is then flushed and the space gets deallocated.

When the data from RAM is surely saved to disk, which happens:

when the server saves the collected data as a disk index
or when it dumps the RAM part during a clean shutdown or by manual flushing

the binlog for that index is no more necessary. So, it gets discarded. If all the indexes are saved, it will be deleted.

Third, what about disk collection? If having many disk parts makes searching slower, what's difference if I make them manually in the distributed index manner, or they're produced as disk parts (or, 'chunks') by an RT index? Well, in both cases you can merge several indexes into one. Say, you can merge hourly indexes from yesterday and keep one 'daily' yesterday's index instead. With the manual maintenance you have to think about the schema and commands yourself. With an RT index the server provides command OPTIMIZE, which does the same, but keeps you away from unnecessary internal details.

Fourth, if my "document" constitutes a 'mini-index' and I don't need it anymore I can just throw it away. But if it is 'optimized', i.e. mixed together with tons of other documents, how can I un-index or delete it? Yes, indexed documents are 'mixed' together, and there is no easy way to delete one without rebuilding the whole index. And if for plain indexes rebuilding or merging is just a normal way of maintenance, for a real-time index it keeps only the simplicity of manipulation, but not 'real-timeness'. To address the problem, Manticore uses a trick: when you delete a document, identified by document ID, the server just tracks the number. Together with other deleted documents their ids are saved in so-called kill-list. When you search over the index, the server first retrieves all matching documents, and then throws out the documents that are found in the kill-list (that is the most basic description; in fact internally it's more complex). The point is - for the sake of 'immediate' deletion documents are not actually deleted, but are just marked as 'deleted'. They still occupy space in different index structures, being actually a garbage. Word statistics, which affects ranking, also isn't affected, meaning, it works exactly as it is declared: we search among all documents, and then just hide ones, marked as deleted from the final result. When document is replaced means that it is killed in old parts of the index and is inserted again in the freshest part. All consequences of 'hiding by killlist' are also in play in this case.

When a rebuild of some part of an index happens, e.g when some transactions (segments) of a RAM chunk are merged, or when RAM chunk is converted into a disk chunk, or when two disk chunk are merged together the server performs comprehensive iteration over the affected parts and physically exclude deleted documents from all them. I.e., if they were in document lists of some words - they are wiped away. If it was a unique word - it gets removed completely.

As a summary: the deletion works in two phases:

First, we mark documents as 'deleted' in realtime and suppress them in search results
During some operation with an RT index chunk we finally physically wipe the deleted documents for good

Fifth, if RT index contains plain disk indexes in it's collection, can I just add my ready old disk index to it? No. It's not possible to avoid unneeded complexity and avoid accidental corruption. But if your RT index has just been created and contains no data - then you can ATTACH INDEX your disk index to it. Your old index will be moved inside the RT index, and will become it's part.

As a summary about the RT index structure: that is clever-organized collection of plain disk indexes with a fast in-memory index, intended for real-time insertions and semi-realtime deletions of documents, which has common schema, common settings, and which can be easily maintained without deep digging into details.

Flushing RAM chunk to a new disk chunk

FLUSH RAMCHUNK rtindex

FLUSH RAMCHUNK forcibly creates a new disk chunk in an RT index.

Normally, RT index would flush and convert the contents of the RAM chunk into a new disk chunk automatically, once the RAM chunk reaches the maximum allowed rt_mem_limit size. However, for debugging and testing it might be useful to forcibly create a new disk chunk, and FLUSH RAMCHUNK statement does exactly that.

Note that using FLUSH RAMCHUNK increases RT index fragmentation. Most likely, you want to use FLUSH RTINDEX instead. We suggest that you abstain from using just this statement unless you're absolutely sure what you're doing. As the right way is to issue FLUSH RAMCHUNK with following OPTIMIZE command. Such combo allows to keep RT index fragmentation on minimum.

‹›

SQL

SQL

📋

FLUSH RAMCHUNK rt;

‹›

Response

Query OK, 0 rows affected (0.05 sec)

Few words about RT index structure Flushing RT index to disk

FLUSH RTINDEX rtindex

FLUSH RTINDEX forcibly flushes RT index RAM chunk contents to disk.

Backing up an RT index is as simple as copying over its data files, followed by the binary log. However, recovering from that backup means that all the transactions in the log since the last successful RAM chunk write would need to be replayed. Those writes normally happen either on a clean shutdown, or periodically with a (big enough!) interval between writes specified in rt_flush_period directive. So such a backup made at an arbitrary point in time just might end up with way too much binary log data to replay.

FLUSH RTINDEX forcibly writes the RAM chunk contents to disk, and also causes the subsequent cleanup of (now redundant) binary log files. Thus, recovering from a backup made just after FLUSH RTINDEX should be almost instant.

‹›

SQL

SQL

📋

FLUSH RTINDEX rt;

‹›

Response

Query OK, 0 rows affected (0.05 sec)

Flushing RAM chunk to a new disk chunk Compacting an index

Over time, RT indexes can become fragmented into many disk chunks and/or tainted with deleted, but unpurged data, impacting search performance. When that happens, they can be optimized. Basically, the optimization pass merges together disk chunks pairs, purging off documents suppressed previously by DELETEs.

Starting Manticore 4 it happens automaticaly by default, but you can also use the below commands to force index compaction.

OPTIMIZE INDEX index_name [OPTION opt_name = opt_value [,...]]

OPTIMIZE statement enqueues an RT index for optimization in a background thread.

‹›

SQL

SQL

📋

OPTIMIZE INDEX rt;

OPTIMIZE merges the RT index's disk chunks down to the number which equals to # of CPU cores * 2 by default. The number of optimized disk chunks can be controlled with option cutoff.

There's also a server setting optimize_cutoff for overriding the above threshold.

‹›

SQL

SQL

📋

OPTIMIZE INDEX rt OPTION cutoff=4;

If OPTION sync=1 is used (0 by default), the command will wait until the optimization process is done (in case the connection interrupts the optimization will continue to run on the server).

‹›

SQL

SQL

📋

OPTIMIZE INDEX rt OPTION sync=1;

Optimize can be a lengthy and IO intensive process, so to limit the impact, all the actual merge work is executed serially in a special background thread, and the OPTIMIZE statement simply adds a job to its queue. Currently, there is no way to check the index or queue status (that might be added in the future to the SHOW INDEX STATUS and SHOW STATUS statements respectively). The optimization thread can be IO-throttled, you can control the maximum number of IOs per second and the maximum IO size with rt_merge_iops and rt_merge_maxiosize directives respectively.

The RT index being optimized stays online and available for both searching and updates at (almost) all times during the optimization. It gets locked for a very short time when a pair of disk chunks is merged successfully, to rename the old and the new files, and update the index header.

As long as you don't have auto_optimize disabled indexes are optimized automatically

In case you are experiencing unexpected SSTs or want indexes across all nodes of the cluster be binary identical you need to:

Disable auto_optimize.
Optimize indexes manually:
On one of the nodes drop the index from the cluster:
‹›

SQL

SQL

📋
⚙

ALTER CLUSTER mycluster DROP myindex;
Optimize the index:
‹›

SQL

SQL

📋
⚙

OPTIMIZE INDEX myindex;
Add back the index to the cluster:
‹›

SQL

SQL

📋
⚙

ALTER CLUSTER mycluster ADD myindex;
When the index is added back, the new files created by the optimize process will be replicated to the other nodes in the cluster. Any changes made locally to the index on other nodes will be lost.

Index data modifications (inserts, replaces, deletes, updates) should:

either be postponed
or directed to the node where the optimize process is running.

Note, while the index is out of the cluster, insert/replace/delete/update commands should refer to it without cluster name prefix (for SQL statements or cluster property fin case of a HTTP JSON request), otherwise they will fail. As soon as the index is added back to the cluster, writes can be resumed. At this point write operations on the index must include the cluster name prefix again, or they will fail. Search operations are available as usual during the process on any of the nodes.

Flushing RT index to disk Flushing attributes

Real-time index structure

Flushing RAM chunk to a new disk chunk

FLUSH RAMCHUNK

Flushing RAM chunk to disk

FLUSH RTINDEX

Compacting an index

OPTIMIZE INDEX

Number of optimized disk chunks

Running in foreground

Throttling the IO impact

Optimizing clustered indexes