There are 2 different approaches to deal with tables in Manticore:
Real-time mode requires no table definition in the configuration file, but presence of data_dir directive in searchd
section is mandatory. Index files are stored inside the data_dir
.
Replication is available only in this mode.
In this mode you can use SQL commands like CREATE TABLE
, ALTER TABLE
and DROP TABLE
to create and change table schema and drop it. This mode is especially useful for real-time and percolate tables.
Table names are case insensitive in the RT mode.
In this mode you can specify table schema in config which will be read on Manticore start and if the table doesn't exist yet it will be created. This mode is especially useful for plain tables that are built upon indexing data from an external storage.
Dropping tables is only possible by removing them from the configuration file or by removing the path setting and sending a HUP signal to the server or restarting it.
Table names are case sensitive in this mode.
All table types are supported in this mode.
Index type | RT mode | Plain mode |
---|---|---|
Real-time | supported | supported |
Plain | not supported | supported |
Percolate | supported | supported |
Distributed | supported | supported |
Template | not supported | supported |
A Real-time table is a main type of table in Manticore, allowing you to add, update, and delete documents with immediate availability of the changes. The settings for a Real-time Table can be defined in a configuration file or online using CREATE
/UPDATE
/DELETE
/ALTER
commands.
A Real-time Table is comprised of one or multiple plain tables called chunks. There are two types of chunks:
- multiple disk chunks - These are stored on disk and have the same structure as a Plain Table.
- single ram chunk - This is stored in memory and is used as an accumulator of changes.
The size of the RAM chunk is controlled by the rt_mem_limit setting. Once this limit is reached, the RAM chunk is flushed to disk in the form of a disk chunk. If there are too many disk chunks, they can be merged into one for better performance using the OPTIMIZE command or automatically.
- Add documents using the Add feature.
- Update attributes and full-text fields through the Update process.
- Delete documents using the Delete feature.
- Emptying the table using the Truncate process.
- Change the schema online using the
ALTER
command as described in Change schema online. - Define the table in a configuration file as described in Define table.
- Use the UUID feature for automatic ID provisioning.
- Index data using the indexer feature.
- Connect it to sources for easy indexing from external storage.
- Update the killlist_target, as it is automatically managed by the real-time table.
The following table outlines the different file extensions and their respective descriptions in a real-time table:
Extension | Description |
---|---|
.lock |
A lock file that ensures that only one process can access the table at a time. |
.ram |
The RAM chunk of the table, stored in memory and used as an accumulator of changes. |
.meta |
The headers of the real-time table that define its structure and settings. |
.*.sp* |
Disk chunks that are stored on disk with the same format as plain tables. They are created when the RAM chunk size exceeds the rt_mem_limit. |
For more information on the structure of disk chunks, refer to the plain table format)
Plain table is a basic element for non-percolate searching. It can be defined only in a configuration file using the Plain mode, and is not supported in the RT mode. It is typically used in conjunction with a source to process data from the external storage and can later be attached to a real-time table.
- Build it from external storage using a source and indexer
- Perform an in-place update of integer, float, string and MVA attribute
- update it's killlist_target
- Insert additional data into the table once it has been built
- Delete data from the table
- Create, delete, or alter the table online
- Use UUID for automatic ID generation (data from external storage must include a unique identifier)
Numeric attributes, including MVAs, are the only elements that can be updated in a plain table. All other data in the table is immutable. If updates or new records are required, the table must be rebuilt. During the rebuilding process, the existing table remains available to serve requests, and a process called rotation is performed when the new version is ready, bringing it online and discarding the old version.
- Plain table example
To create a plain table, you'll need to define it in a configuration file. It's not supported by the CREATE TABLE
command.
Here's an example of a plain table configuration and a source for fetching data from a MySQL database:
source source {
type = mysql
sql_host = localhost
sql_user = myuser
sql_pass = mypass
sql_db = mydb
sql_query = SELECT id, title, description, category_id from mytable
sql_attr_uint = category_id
sql_field_string = title
}
table tbl {
type = plain
source = source
path = /path/to/table
}
The speed at which a plain table is indexed depends on several factors, including:
- Data source retrieval speed
- tokenization settings
- The hardware specifications (such as CPU, RAM, and disk performance)
For small data sets, the simplest option is to have a single plain table that is fully rebuilt as needed. This approach is acceptable when:
- The data in the table is not as fresh as the data in the source
- The time it takes to build the table increases as the data set grows
For larger data sets, a plain table can be used instead of a Real-Time. The main+delta scenario involves:
- Creating a smaller table for incremental indexing
- Combining the two tables using a distributed table
This approach allows for infrequent rebuilding of the larger table and more frequent processing of updates from the source. The smaller table can be rebuilt more often (e.g. every minute or even every few seconds).
However, as time goes on, the indexing duration for the smaller table will become too long, requiring a rebuild of the larger table and the emptying of the smaller one.
The main+delta schema is explained in detail in this interactive course.
The mechanism of kill list and killlist_target directive is used to ensure that documents from the current table take precedence over those from the other table.
For more information on this topic, see here.
The following table outlines the various file extensions used in a plain table and their respective descriptions:
| Extension | Description |
| - | - |
|.spa
| stores document attributes in row-wise mode |
|.spb
| stores blob attributes in row-wise mode: strings, MVA, json |
|.spc
| stores document attributes in columnar mode |
|.spd
| stores matching document ID lists for each word ID |
|.sph
| stores table header information |
|.sphi
| stores histograms of attribute values |
|.spi
| stores word lists (word IDs and pointers to .spd
file) |
|.spidx
| stores secondary indexes data |
|.spk
| stores kill-lists |
|.spl
| lock file |
|.spm
| stores a bitmap of killed documents |
|.spp
| stores hit (aka posting, aka word occurrence) lists for each word ID |
|.spt
| stores additional data structures to speed up lookups by document ids |
|.spe
| stores skip-lists to speed up doc-list filtering |
|.spds
| stores document texts |
|.tmp*
|temporary files during index_settings_and_status |
|.new.sp*
| new version of a plain table before rotation |
|.old.sp*
| old version of a plain table after rotation |