Creating a local table

In Manticore Search, there are two ways to manage tables:

Online schema management (RT mode)

Real-time mode requires no table definition in the configuration file. However, the data_dir directive in the searchd section is mandatory. Index files are stored inside the data_dir.

Replication is only available in this mode.

You can use SQL commands such as CREATE TABLE, ALTER TABLE and DROP TABLE to create and modify table schema, and to drop it. This mode is particularly useful for real-time and percolate tables.

Table names are converted to lowercase when created.

Defining table schema in config (Plain mode)

In this mode, you can specify the table schema in the configuration file. Manticore reads this schema on startup and creates the table if it doesn't exist yet. This mode is particularly useful for plain tables that use data from an external storage.

To drop a table, remove it from the configuration file or remove the path setting and send a HUP signal to the server or restart it.

Table names are case-sensitive in this mode.

All table types are supported in this mode.

Table types and modes

Table type RT mode Plain mode
Real-time supported supported
Plain not supported supported
Percolate supported supported
Distributed supported supported
Template not supported supported

Real-time table

A real-time table is a main type of table in Manticore. It lets you add, update, and delete documents, and you can see these changes right away. You can set up a real-time Table in a configuration file or use commands like CREATE, UPDATE, DELETE, or ALTER.

Internally a real-time table consists of one or more plain tables called chunks. There are two kinds of chunks:

  • multiple disk chunks - these are saved on a disk and are structured like a plain table.
  • single ram chunk - this is kept in memory and collects all changes.

The size of the RAM chunk is controlled by the rt_mem_limit setting. Once this limit is reached, the RAM chunk is transferred to disk as a disk chunk. If there are too many disk chunks, Manticore combines some of them to improve performance.

Creating a real-time table:

You can create a new real-time table in two ways: by using the CREATE TABLE command or through the _mapping endpoint of the HTTP JSON API.

CREATE TABLE command:

You can use this command via both SQL and HTTP protocols:

‹›
  • SQL
  • JSON
  • PHP
  • Python
  • Javascript
  • Java
  • C#
  • CONFIG
📋
CREATE TABLE products(title text, price float) morphology='stem_en';
‹›
Response
Query OK, 0 rows affected (0.00 sec)

_mapping API:

Alternatively, you can create a new table via the _mapping endpoint. This endpoint allows you to define an Elasticsearch-like table structure to be converted to a Manticore table.

The body of your request must have the following structure:

"properties"
{
  "FIELD_NAME_1":
  {
    "type": "FIELD_TYPE_1"
  },
  "FIELD_NAME_2":
  {
    "type": "FIELD_TYPE_2"
  },

  ...

  "FIELD_NAME_N":
  {
    "type": "FIELD_TYPE_M"
  }
}

When creating a table, Elasticsearch data types will be mapped to Manticore types according to the following rules:

  • aggregate_metric => json
  • binary => string
  • boolean => bool
  • byte => int
  • completion => string
  • date => timestamp
  • date_nanos => bigint
  • date_range => json
  • dense_vector => json
  • flattened => json
  • flat_object => json
  • float => float
  • float_range => json
  • geo_point => json
  • geo_shape => json
  • half_float => float
  • histogram => json
  • integer => int
  • integer_range => json
  • ip => string
  • ip_range => json
  • keyword => string
  • knn_vector => float_vector
  • long => bigint
  • long_range => json
  • match_only_text => text
  • object => json
  • point => json
  • scaled_float => float
  • search_as_you_type => text
  • shape => json
  • short => int
  • text => text
  • unsigned_long => int
  • version => string
‹›
  • JSON
JSON
📋
POST /your_table_name/_mapping -d '
{
  "test": {
    "mappings": {
      "properties": {
        "price": {
            "type": "float"
        },
        "title": {
            "type": "text"
        }
      }
    }
  }
}
'
‹›
Response
{
"total":0,
"error":"",
"warning":""
}

👍 What you can do with a real-time table:

⛔ What you cannot do with a real-time table:

  • Ingest data using the indexer feature.
  • Connect it to sources for easy indexing from external storage.
  • Update the killlist_target, as it is automatically managed by the real-time table.

Real-time table files structure

The following table outlines the different file extensions and their respective descriptions in a real-time table:

Extension Description
.lock A lock file that ensures that only one process can access the table at a time.
.ram The RAM chunk of the table, stored in memory and used as an accumulator of changes.
.meta The headers of the real-time table that define its structure and settings.
.*.sp* Disk chunks that are stored on disk with the same format as plain tables. They are created when the RAM chunk size exceeds the rt_mem_limit.

For more information on the structure of disk chunks, refer to the plain table files structure.

Plain table

Plain table is a basic element for non-percolate searching. It can be defined only in a configuration file using the Plain mode, and is not supported in the RT mode. It is typically used in conjunction with a source to process data from the external storage and can later be attached to a real-time table.

Creating a plain table

To create a plain table, you'll need to define it in a configuration file. It's not supported by the CREATE TABLE command.

Here's an example of a plain table configuration and a source for fetching data from a MySQL database:

‹›
  • Plain table example
Plain table example
📋
source source {
  type             = mysql
  sql_host         = localhost
  sql_user         = myuser
  sql_pass         = mypass
  sql_db           = mydb
  sql_query        = SELECT id, title, description, category_id  from mytable
  sql_attr_uint    = category_id
  sql_field_string = title
 }

table tbl {
  type   = plain
  source = source
  path   = /path/to/table
 }

👍 What you can do with a plain table:

⛔ What you cannot do with a plain table:

  • Insert additional data into the table once it has been built
  • Delete data from the table
  • Create, delete, or alter the table schema online
  • Use UUID for automatic ID generation (data from external storage must include a unique identifier)

Numeric attributes, including MVAs, are the only elements that can be updated in a plain table. All other data in the table is immutable. If updates or new records are required, the table must be rebuilt. During the rebuilding process, the existing table remains available to serve requests, and a process called rotation is performed when the new version is ready, bringing it online and discarding the old version.

Plain table building performance

The speed at which a plain table is indexed depends on several factors, including:

  • Data source retrieval speed
  • Tokenization settings
  • The hardware specifications (such as CPU, RAM, and disk performance)

Plain table building scenarios

Rebuild fully when needed

For small data sets, the simplest option is to have a single plain table that is fully rebuilt as needed. This approach is acceptable when:

  • The data in the table is not as fresh as the data in the source
  • The time it takes to build the table increases as the data set grows
Main+delta scenario

For larger data sets, a plain table can be used instead of a Real-Time. The main+delta scenario involves:

  • Creating a smaller table for incremental indexing
  • Combining the two tables using a distributed table

This approach allows for infrequent rebuilding of the larger table and more frequent processing of updates from the source. The smaller table can be rebuilt more often (e.g. every minute or even every few seconds).

However, as time goes on, the indexing duration for the smaller table will become too long, requiring a rebuild of the larger table and the emptying of the smaller one.

The main+delta schema is explained in detail in this interactive course.

The mechanism of kill list and killlist_target directive is used to ensure that documents from the current table take precedence over those from the other table.

For more information on this topic, see here.

Plain table files structure

The following table outlines the various file extensions used in a plain table and their respective descriptions:

Extension Description
.spa stores document attributes in row-wise mode
.spb stores blob attributes in row-wise mode: strings, MVA, json
.spc stores document attributes in columnar mode
.spd stores matching document ID lists for each word ID
.sph stores table header information
.sphi stores histograms of attribute values
.spi stores word lists (word IDs and pointers to .spd file)
.spidx stores secondary indexes data
.spk stores kill-lists
.spl lock file
.spm stores a bitmap of killed documents
.spp stores hit (aka posting, aka word occurrence) lists for each word ID
.spt stores additional data structures to speed up lookups by document ids
.spe stores skip-lists to speed up doc-list filtering
.spds stores document texts
.tmp* temporary files during index_settings_and_status
.new.sp* new version of a plain table before rotation
.old.sp* old version of a plain table after rotation