Plain table is a basic element for non-percolate searching. It can be specified only in a configuration file in the Plain mode. It's not supported in the RT mode. It's normally used together with a source to process data from an external storage and afterwards can be attached to a real-time table.
- build it from an external storage with help of source and indexer
- do an in-place update of an integer, float, string and MVA attribute
- update it's killlist_target
- insert more data into a table after it's built
- delete data from it
- create/delete/alter a plain table online (you need to define it in a configuration file)
- use UUID for automatic ID generation. When you fetch data from an external storage it must include a unique identifier for each document
Except numeric attributes (including MVA), the rest of the data in a plain table is immutable. If you need to update/add new records you need to rebuild the table. While table is being rebuilt, existing table is still available for serving requests. When a new version of the table is ready, a process called rotation is performed which puts the new version online and discards the old one.
- Plain table example
A plain table can be only defined in a configuration file. It's not supported by command CREATE TABLE
source source {
type = mysql
sql_host = localhost
sql_user = myuser
sql_pass = mypass
sql_db = mydb
sql_query = SELECT id, title, description, category_id from mytable
sql_attr_uint = category_id
sql_field_string = title
}
table tbl {
type = plain
source = source
path = /path/to/table
}
Speed of plain indexing depends on several factors:
- how fast the source can be providing the data
- tokenization settings
- your hardware (CPU, amount of RAM, disk performance)
In the simplest usage scenario, we would use a single plain table which we just fully rebuild from time to time. It works fine for smaller data sets and if you are ready that:
- the table will be not as fresh as data in the source
- indexing duration grows with the data, the more data you have in the source the longer it will take to build the table
If you have a bigger data set and still want to use a plain table rather than Real-Time what you can do is:
- make another smaller table for incremental indexing
- combine the both using a distributed table
What it can give is you can rebuild the bigger table seldom (say once per week), save the position of the freshest indexed document and after that use the smaller table to process anything new or updated from your source. Since you will only need to fetch the updates from your storage you can do it much more frequently (say once per minute or even each few seconds).
But after a while the smaller indexing duration will become too high and that will be the moment when you need to rebuild the bigger table and empty the smaller one.
This is called main+delta schema and you can learn more about it in this interactive course.
When you build a smaller "delta" table it can get documents that are already in the "main" table. To let Manticore know that documents from the current table should take precedence there's a mechanism called kill list and corresponding directive killlist_target.
More information on this topic can be found here.
Extension | Description |
---|---|
.spa |
stores document attributes in row-wise mode |
.spb |
stores blob attributes in row-wise mode: strings, MVA, json |
.spc |
stores document attributes in columnar mode |
.spd |
stores matching document ID lists for each word ID |
.sph |
stores table header information |
.sphi |
stores histograms of attribute values |
.spi |
stores word lists (word IDs and pointers to .spd file) |
.spidx |
stores secondary indexes data |
.spk |
stores kill-lists |
.spl |
lock file |
.spm |
stores a bitmap of killed documents |
.spp |
stores hit (aka posting, aka word occurrence) lists for each word ID |
.spt |
stores additional data structures to speed up lookups by document ids |
.spe |
stores skip-lists to speed up doc-list filtering |
.spds |
stores document texts |
.tmp* |
temporary files during index_settings_and_status |
.new.sp* |
new version of a plain table before rotation |
.old.sp* |
old version of a plain table after rotation |