Index settings and status | Manticore Search Manual

In Manticore Search, there are two ways to manage tables:

Real-time mode requires no table definition in the configuration file. However, the data_dir directive in the searchd section is mandatory. Table files are stored inside the data_dir.

Replication is only available in this mode.

You can use SQL commands such as CREATE TABLE, ALTER TABLE and DROP TABLE to create and modify table schema, and to drop it. This mode is particularly useful for real-time and percolate tables.

Table names are converted to lowercase when created.

In this mode, you can specify the table schema in the configuration file. Manticore reads this schema on startup and creates the table if it doesn't exist yet. This mode is particularly useful for plain tables that use data from an external storage.

To drop a table, remove it from the configuration file or remove the path setting and send a HUP signal to the server or restart it.

Table names are case-sensitive in this mode.

All table types are supported in this mode.

Table type	RT mode	Plain mode
Real-time	supported	supported
Plain	not supported	supported
Percolate	supported	supported
Distributed	supported	supported
Template	not supported	supported

Real-time table

A real-time table is a main type of table in Manticore. It lets you add, update, and delete documents, and you can see these changes right away. You can set up a real-time Table in a configuration file or use commands like CREATE, UPDATE, DELETE, or ALTER.

Internally a real-time table consists of one or more plain tables called chunks. There are two kinds of chunks:

multiple disk chunks - these are saved on a disk and are structured like a plain table.
single ram chunk - this is kept in memory and collects all changes.

The size of the RAM chunk is controlled by the rt_mem_limit setting. Once this limit is reached, the RAM chunk is transferred to disk as a disk chunk. If there are too many disk chunks, Manticore combines some of them to improve performance.

You can create a new real-time table in two ways: by using the CREATE TABLE command or through the _mapping endpoint of the HTTP JSON API.

You can use this command via both SQL and HTTP protocols:

‹›

SQL
JSON
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust
Typescript
Go
CONFIG

📋

CREATE TABLE products(title text, price float) morphology='stem_en';

POST /cli -d "CREATE TABLE products(title text, price float)  morphology='stem_en'"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'price'=>['type'=>'float'],
]);

utilsApi.sql('CREATE TABLE forum(title text, price float)')

await utilsApi.sql('CREATE TABLE forum(title text, price float)')

res = await utilsApi.sql('CREATE TABLE forum(title text, price float)');

utilsApi.sql("CREATE TABLE forum(title text, price float)");

utilsApi.Sql("CREATE TABLE forum(title text, price float)");

utils_api.sql("CREATE TABLE forum(title text, price float)", Some(true)).await;

res = await utilsApi.sql('CREATE TABLE forum(title text, price float)');

utilsAPI.Sql(context.Background()).Body("CREATE TABLE forum(title text, price float)").Execute()

table products {
  type = rt
  path = tbl
  rt_field = title
  rt_attr_uint = price
  stored_fields = title
}

‹›

Response

Query OK, 0 rows affected (0.00 sec)

NOTE: The _mapping API requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.

Alternatively, you can create a new table via the _mapping endpoint. This endpoint allows you to define an Elasticsearch-like table structure to be converted to a Manticore table.

The body of your request must have the following structure:

"properties"
{
  "FIELD_NAME_1":
  {
    "type": "FIELD_TYPE_1"
  },
  "FIELD_NAME_2":
  {
    "type": "FIELD_TYPE_2"
  },
  ...
  "FIELD_NAME_N":
  {
    "type": "FIELD_TYPE_M"
  }
}

When creating a table, Elasticsearch data types will be mapped to Manticore types according to the following rules:

aggregate_metric => json
binary => string
boolean => bool
byte => int
completion => string
date => timestamp
date_nanos => bigint
date_range => json
dense_vector => json
flattened => json
flat_object => json
float => float
float_range => json
geo_point => json
geo_shape => json
half_float => float
histogram => json
integer => int
integer_range => json
ip => string
ip_range => json
keyword => string
knn_vector => float_vector
long => bigint
long_range => json
match_only_text => text
object => json
point => json
scaled_float => float
search_as_you_type => text
shape => json
short => int
text => text
unsigned_long => int
version => string

‹›

JSON

JSON

📋

⚙

POST /your_table_name/_mapping -d '
{
  "properties": {
    "price": {
        "type": "float"
    },
    "title": {
        "type": "text"
    }
  }
}
'

‹›

Response

{
"total":0,
"error":"",
"warning":""
}

You can create a copy of a real-time table, with or without its data. Please note that if the table is large, copying it with data may take some time. Copying works in synchronous mode, but if the connection is dropped, it will continue in the background.

CREATE TABLE [IF NOT EXISTS] table_name LIKE old_table_name [WITH DATA]

NOTE: Copying a table requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.

‹›

SQL
Example (WITH DATA)

📋

create table products LIKE old_products;

Add documents.
Update attributes and full-text fields using the Update process.
Delete documents.
Empty the table.
Change the schema online with the ALTER command, as explained in Change schema online.
Define the table in a configuration file, as detailed in Define table.
Use the UUID feature for automatic ID provisioning.

Ingest data using the indexer feature.
Connect it to sources for easy indexing from external storage.
Update the killlist_target, as it is automatically managed by the real-time table.

The following table outlines the different file extensions and their respective descriptions in a real-time table:

Extension	Description
`.lock`	A lock file that ensures that only one process can access the table at a time.
`.ram`	The RAM chunk of the table, stored in memory and used as an accumulator of changes.
`.meta`	The headers of the real-time table that define its structure and settings.
`..sp`	Disk chunks that are stored on disk with the same format as plain tables. They are created when the RAM chunk size exceeds the rt_mem_limit.

For more information on the structure of disk chunks, refer to the plain table files structure.

Creating a local table Plain table

Plain table is a basic element for non-percolate searching. It can be defined only in a configuration file using the Plain mode, and is not supported in the RT mode. It is typically used in conjunction with a source to process data from the external storage and can later be attached to a real-time table.

To create a plain table, you'll need to define it in a configuration file. It's not supported by the CREATE TABLE command.

Here's an example of a plain table configuration and a source for fetching data from a MySQL database:

‹›

Plain table example

Plain table example

📋

source source {
  type             = mysql
  sql_host         = localhost
  sql_user         = myuser
  sql_pass         = mypass
  sql_db           = mydb
  sql_query        = SELECT id, title, description, category_id  from mytable
  sql_attr_uint    = category_id
  sql_field_string = title
 }
table tbl {
  type   = plain
  source = source
  path   = /path/to/table
 }

Build it from an external storage using a source and indexer
Perform an in-place update of integer, float, string and MVA attribute
Update it's killlist_target

Insert additional data into the table once it has been built
Delete data from the table
Create, delete, or alter the table schema online
Use UUID for automatic ID generation (data from external storage must include a unique identifier)

Numeric attributes, including MVAs, are the only elements that can be updated in a plain table. All other data in the table is immutable. If updates or new records are required, the table must be rebuilt. During the rebuilding process, the existing table remains available to serve requests, and a process called rotation is performed when the new version is ready, bringing it online and discarding the old version.

The speed at which a plain table is indexed depends on several factors, including:

Data source retrieval speed
Tokenization settings
The hardware specifications (such as CPU, RAM, and disk performance)

For small data sets, the simplest option is to have a single plain table that is fully rebuilt as needed. This approach is acceptable when:

The data in the table is not as fresh as the data in the source
The time it takes to build the table increases as the data set grows

For larger data sets, a plain table can be used instead of a Real-Time. The main+delta scenario involves:

Creating a smaller table for incremental indexing
Combining the two tables using a distributed table

This approach allows for infrequent rebuilding of the larger table and more frequent processing of updates from the source. The smaller table can be rebuilt more often (e.g. every minute or even every few seconds).

However, as time goes on, the indexing duration for the smaller table will become too long, requiring a rebuild of the larger table and the emptying of the smaller one.

The main+delta schema is explained in detail in this interactive course.

The mechanism of kill list and killlist_target directive is used to ensure that documents from the current table take precedence over those from the other table.

For more information on this topic, see here.

The following table outlines the various file extensions used in a plain table and their respective descriptions:

Extension	Description
`.spa`	stores document attributes in row-wise mode
`.spb`	stores blob attributes in row-wise mode: strings, MVA, json
`.spc`	stores document attributes in columnar mode
`.spd`	stores matching document ID lists for each word ID
`.sph`	stores table header information
`.sphi`	stores histograms of attribute values
`.spi`	stores word lists (word IDs and pointers to `.spd` file)
`.spidx`	stores secondary indexes data
`.spjidx`	stores data of secondary indexes generated for JSON attributes
`.spk`	stores kill-lists
`.spl`	lock file
`.spm`	stores a bitmap of killed documents
`.spp`	stores hit (aka posting, aka word occurrence) lists for each word ID
`.spt`	stores additional data structures to speed up lookups by document ids
`.spe`	stores skip-lists to speed up doc-list filtering
`.spds`	stores document texts
`.tmp*`	temporary files during index_settings_and_status
`.new.sp*`	new version of a plain table before rotation
`.old.sp*`	old version of a plain table after rotation

Real-time table Plain and real-time table settings

table <table_name>[:<parent table name>] {
...
}

‹›

Plain
Real-time

📋

table <table name> {
  type = plain
  path = /path/to/table
  source = <source_name>
  source = <another source_name>
  [stored_fields = <comma separated list of full-text fields that should be stored, all are stored by default, can be empty>]
}

type = plain
type = rt

Table type: "plain" or "rt" (real-time)

Value: plain (default), rt

path = path/to/table

The path to where the table will be stored or located, either absolute or relative, without the extension.

Value: The path to the table, mandatory

stored_fields = title, content

By default, the original content of full-text fields is indexed and stored when a table is defined in a configuration file. This setting allows you to specify the fields that should have their original values stored.

Value: A comma-separated list of full-text fields that should be stored. An empty value (i.e. stored_fields = ) disables the storage of original values for all fields.

Note: In the case of a real-time table, the fields listed in stored_fields should also be declared as rt_field. Also, note that you don't need to list attributes in stored_fields, since their original values are stored anyway. stored_fields can only be used for full-text fields.

Note: In a distributed table setup, document IDs must be unique across all agent tables. If multiple agents contain the same document ID, retrieving stored fields may return values from a different agent than the matched row.

See also docstore_block_size, docstore_compression for document storage compression options.

‹›

SQL
JSON
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust
Typescript
Go
CONFIG

📋

CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)

POST /cli -d "
CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)"

$params = [
    'body' => [
        'columns' => [
            'title'=>['type'=>'text'],
            'content'=>['type'=>'text', 'options' => ['indexed', 'stored']],
            'name'=>['type'=>'text', 'options' => ['indexed']],
            'price'=>['type'=>'float']
        ]
    ],
    'table' => 'products'
];
$index = new \Manticoresearch\Index($client);
$index->create($params);

utilsApi.sql('CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)')

await utilsApi.sql('CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)')

res = await utilsApi.sql('CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)');

utilsApi.sql("CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)");

utilsApi.Sql("CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)");

utils_api.sql("CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)", Some(true)).await;

res = await utilsApi.sql('CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)');

utilsAPI.Sql(context.Background()).Body("CREATE TABLE products(title text, content text stored indexed, name text indexed, price float)").Execute()

table products {
  stored_fields = title, content # we want to store only "title" and "content", "name" shouldn't be stored
  type = rt
  path = tbl
  rt_field = title
  rt_field = content
  rt_field = name
  rt_attr_uint = price
}

stored_only_fields = title,content

Use stored_only_fields when you want Manticore to store some fields of a plain or real-time table on disk but not index them. These fields won't be searchable with full-text queries, but you can still retrieve their values in search results.

For example, this is useful if you want to store data like JSON documents that should be returned with each result, but don't need to be searched, filtered, or grouped. In that case, storing them only — and not indexing them — saves memory and improves performance.

You can do this in two ways:

In plain mode in a table config, use the stored_only_fields setting.
In the SQL interface (RT mode), use the stored property when defining a text field (instead of indexed or indexed stored). In SQL, you don't need to include stored_only_fields — it's not supported in CREATE TABLE statements.

The value of stored_only_fields is a comma-separated list of field names. By default, it's empty. If you're using a real-time table, each field listed in stored_only_fields must also be declared as an rt_field.

Note: you don't need to list attributes in stored_only_fields, since their original values are stored anyway.

How stored-only fields compare to string attributes:

Stored-only field:
- Stored on disk only
- Compressed format
- Can only be retrieved (not used for filtering, sorting, etc.)
String attribute:
- Stored on disk and in memory
- Uncompressed format (unless you are using columnar storage)
- Can be used for sorting, filtering, grouping, etc.

If you are looking to have Manticore store text data for you that you only want stored on disk (eg: json data that is returned with every result), and not in memory or searchable/filterable/groupable, use stored_only_fields, or stored as your text field property.

When creating your tables using the SQL interface, label your text field as stored (and not indexed or indexed stored). You will not need the stored_only_fields option in your CREATE TABLE statement; including it may result in a failed query.

json_secondary_indexes = json_attr

By default, secondary indexes are generated for all attributes except JSON attributes. However, secondary indexes for JSON attributes can be explicitly generated using the json_secondary_indexes setting. When a JSON attribute is included in this option, its contents are flattened into multiple secondary indexes. These indexes can be used by the query optimizer to speed up queries.

You can view the available secondary indexes using the SHOW TABLE INDEXES command.

Value: A comma-separated list of JSON attributes for which secondary indexes should be generated.

‹›

SQL
JSON
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust
Typescript
Go
CONFIG

📋

CREATE TABLE products(title text, j json secondary_index='1')

POST /cli -d "
CREATE TABLE products(title text, j json secondary_index='1')"

$params = [
    'body' => [
        'columns' => [
            'title'=>['type'=>'text'],
            'j'=>['type'=>'json', 'options' => ['secondary_index' => 1]]
        ]
    ],
    'table' => 'products'
];
$index = new \Manticoresearch\Index($client);
$index->create($params);

utilsApi.sql('CREATE TABLE products(title text, j json secondary_index='1')')

await utilsApi.sql('CREATE TABLE products(title text, j json secondary_index='1')')

res = await utilsApi.sql('CREATE TABLE products(title text, j json secondary_index=\'1\')');

utilsApi.sql("CREATE TABLE products(title text, j json secondary_index='1')");

utilsApi.Sql("CREATE TABLE products(title text, j json secondary_index='1')");

utils_api.sql("CREATE TABLE products(title text, j json secondary_index='1')", Some(true)).await;

res = await utilsApi.sql('CREATE TABLE products(title text, j json secondary_index=\'1\')');

utilsAPI.Sql(context.Background()).Body("CREATE TABLE products(title text, j json secondary_index='1')").Execute()

table products {
  json_secondary_indexes = j
  type = rt
  path = tbl
  rt_field = title
  rt_attr_json = j
}

diskchunk_flush_search_timeout = 10s

The timeout for preventing auto-flushing a RAM chunk if there are no searches in the table. Learn more here.

diskchunk_flush_write_timeout = 60s

The timeout for auto-flushing a RAM chunk if there are no writes to it. Learn more here.

The maximum number of disk chunks for the RT table. Learn more here.

rt_field = subject

This field declaration determines the full-text fields that will be indexed. The field names must be unique, and the order is preserved. When inserting data, the field values must be in the same order as specified in the configuration.

This is a multi-value, optional field.

rt_attr_uint = gid

This declaration defines an unsigned integer attribute.

Value: the field name or field_name:N (where N is the maximum number of bits to keep).

rt_attr_bigint = gid

This declaration defines a BIGINT attribute.

Value: field name, multiple records allowed.

rt_attr_multi = tags

Declares a multi-valued attribute (MVA) with unsigned 32-bit integer values.

Value: field name. Multiple records allowed.

rt_attr_multi_64 = wide_tags

Declares a multi-valued attribute (MVA) with signed 64-bit BIGINT values.

Value: field name. Multiple records allowed.

rt_attr_float = lat
rt_attr_float = lon

Declares floating point attributes with single precision, 32-bit IEEE 754 format.

Value: field name. Multiple records allowed.

rt_attr_float_vector = image_vector
rt_attr_float_vector = text_vector

Declares a vector of floating-point values for storing embeddings and enabling k-nearest neighbor (KNN) vector searches.

Value: field name. Multiple records allowed.

Each vector attribute stores an array of floating-point numbers that represent data (such as text, images, or other content) as high-dimensional vectors. These vectors are typically generated by machine learning models and can be used for similarity search, recommendations, semantic search, and other AI-powered features.

Important: Float vector attributes require additional KNN configuration to enable vector search capabilities.

To enable KNN searches on float vector attributes, you must add a knn configuration that specifies the indexing parameters. You can configure KNN in two ways:

1. Manual vector insertion (you provide pre-computed vectors):

rt_attr_float_vector = image_vector
rt_attr_float_vector = text_vector
knn = {"attrs":[{"name":"image_vector","type":"hnsw","dims":768,"hnsw_similarity":"COSINE","hnsw_m":16,"hnsw_ef_construction":200},{"name":"text_vector","type":"hnsw","dims":768,"hnsw_similarity":"COSINE","hnsw_m":16,"hnsw_ef_construction":200}]}

2. Auto embeddings (Manticore generates vectors from text automatically):

rt_attr_float_vector = embedding_vector
rt_field = title
rt_field = description
knn = {"attrs":[{"name":"embedding_vector","type":"hnsw","hnsw_similarity":"L2","hnsw_m":16,"hnsw_ef_construction":200,"model_name":"sentence-transformers/all-MiniLM-L6-v2","from":"title"}]}

Required KNN parameters:

name: The name of the vector attribute (must match the rt_attr_float_vector name)
type: Index type, currently only "hnsw" is supported
dims: Number of dimensions in the vectors. Required for manual vector insertion, must be omitted when using model_name (the model determines dimensions automatically)
hnsw_similarity: Distance function - "L2", "IP" (inner product), or "COSINE"

Optional KNN parameters:

hnsw_m: Maximum connections in the graph (default: 16)
hnsw_ef_construction: Construction time/accuracy trade-off (default: 200)

Auto-embeddings parameters (when using model_name):

model_name: The embedding model to use (e.g., "sentence-transformers/all-MiniLM-L6-v2", "openai/text-embedding-ada-002"). When specified, dims must be omitted as the model determines the dimensions automatically.
from: Comma-separated list of field names to use for embedding generation, or empty string "" to use all text/string fields. This parameter is required when model_name is specified.
api_key: API key for API-based models (OpenAI, Voyage, Jina). Only required for API-based embedding services.
cache_path: Optional path for caching downloaded models (for sentence-transformers models).
use_gpu: Optional boolean to enable GPU acceleration if available.

Important: You cannot specify both dims and model_name in the same configuration - they are mutually exclusive. Use dims for manual vector insertion, or model_name for auto-embeddings. Use dims for manual vector insertion, or model_name for auto-embeddings.

For more details on KNN vector search and auto-embeddings, see the KNN documentation.

rt_attr_bool = available

Declares a boolean attribute with 1-bit unsigned integer values.

Value: field name.

rt_attr_string = title

String attribute declaration.

Value: field name.

rt_attr_json = properties

Declares a JSON attribute.

Value: field name.

rt_attr_timestamp = date_added

Declares a timestamp attribute.

Value: field name.

rt_mem_limit = 512M

Memory limit for a RAM chunk of the table. Optional, default is 128M.

RT tables store some data in memory, known as the "RAM chunk", and also maintain a number of on-disk tables, referred to as "disk chunks." This directive allows you to control the size of the RAM chunk. When there is too much data to keep in memory, RT tables will flush it to disk, activate a newly created disk chunk, and reset the RAM chunk.

Please note that the limit is strict, and RT tables will never allocate more memory than what is specified in the rt_mem_limit. Additionally, memory is not preallocated, so specifying a 512MB limit and only inserting 3MB of data will result in allocating only 3MB, not 512MB.

The rt_mem_limit is never exceeded, but the actual RAM chunk size can be significantly lower than the limit. RT tables adapt to the data insertion pace and adjust the actual limit dynamically to minimize memory usage and maximize data write speed. This is how it works:

By default, the RAM chunk size is 50% of the rt_mem_limit, referred to as the "rt_mem_limit limit".
As soon as the RAM chunk accumulates data equivalent to rt_mem_limit * rate data (50% of rt_mem_limit by default), Manticore starts saving the RAM chunk as a new disk chunk.
While a new disk chunk is being saved, Manticore assesses the number of new/updated documents.
After saving a new disk chunk, the rt_mem_limit rate is updated.
The rate is reset to 50% each time you restart the searchd.

For instance, if 90MB of data is saved to a disk chunk and an additional 10MB of data arrives while the save is in progress, the rate would be 90%. Next time, the RT table will collect up to 90% of rt_mem_limit before flushing the data. The faster the insertion pace, the lower the rt_mem_limit rate. The rate varies between 33.3% to 95%. You can view the current rate of a table using the SHOW TABLE STATUS command.

In real-time mode, you can adjust the size limit of RAM chunks and the maximum number of disk chunks using the ALTER TABLE statement. To set rt_mem_limit to 1 gigabyte for the table "t", run the following query: ALTER TABLE t rt_mem_limit='1G'. To change the maximum number of disk chunks, run the query: ALTER TABLE t optimize_cutoff='5'.

In the plain mode, you can change the values of rt_mem_limit and optimize_cutoff by updating the table configuration or running the command ALTER TABLE <table_name> RECONFIGURE

Real-time tables are similar to distributed consisting of multiple local tables, also known as disk chunks.
Each RAM chunk is made up of multiple segments, which are special RAM-only tables.
While disk chunks are stored on disk, RAM chunks are stored in memory.
Each transaction made to a real-time table generates a new segment, and RAM chunk segments are merged after each transaction commit. It is more efficient to perform bulk INSERTs of hundreds or thousands of documents rather than multiple separate INSERTs with one document to reduce the overhead from merging RAM chunk segments.
When the number of segments exceeds 32, they will be merged to keep the count below 32.
Real-time tables always have one RAM chunk (which may be empty) and one or more disk chunks.
Merging larger segments takes longer, so it's best to avoid having a very large RAM chunk (and therefore rt_mem_limit).
The number of disk chunks depends on the data in the table and the rt_mem_limit setting.
Searchd flushes the RAM chunk to disk (as a persisted file, not as a disk chunk) on shutdown and periodically according to the rt_flush_period setting. Flushing several gigabytes to disk may take some time.
A large RAM chunk puts more pressure on storage, both when flushing to disk into the .ram file and when the RAM chunk is full and dumped to disk as a disk chunk.
The RAM chunk is backed up by a binary log until it is flushed to disk, and a larger rt_mem_limit, setting will increase the time it takes to replay the binary log and recover the RAM chunk.
The RAM chunk may be slightly slower than a disk chunk.
Although the RAM chunk itself doesn't take up more memory than rt_mem_limit, Manticore may take up more memory in some cases, such as when you start a transaction to insert data and don't commit it for a while. In this case, the data you have already transmitted within the transaction will remain in memory.

In addition to rt_mem_limit, the flushing behavior of RAM chunks is also influenced by the following options and conditions:

Frozen state. If the table is frozen, flushing is deferred. That is a permanent rule; nothing can override it. If the rt_mem_limit condition is reached while the table is frozen, all further inserts will be delayed until the table is unfrozen.
diskchunk_flush_write_timeout: This option defines the timeout (in seconds) for auto-flushing a RAM chunk if there are no writes to it. If no write occurs within this time, the chunk will be flushed to disk. Setting it to -1 disables auto-flushing based on write activity. The default value is 1 second.
diskchunk_flush_search_timeout: This option sets the timeout (in seconds) for preventing auto-flushing a RAM chunk if there are no searches in the table. Auto-flushing will only occur if there has been at least one search within this time. The default value is 30 seconds.
ongoing optimization: If an optimization process is currently running, and the number of existing disk chunks has reached or exceeded a configured internal cutoff threshold, the flush triggered by the diskchunk_flush_write_timeout or diskchunk_flush_search_timeout timeout will be skipped.
too few documents in RAM segments: If the number of documents across RAM segments is below a minimum threshold (8192), the flush triggered by the diskchunk_flush_write_timeout or diskchunk_flush_search_timeout timeout will be skipped to avoid creating very small disk chunks. This helps minimize unnecessary disk writes and chunk fragmentation.

These timeouts work in conjunction. A RAM chunk will be flushed if either timeout is reached. This ensures that even if there are no writes, the data will eventually be persisted to disk, and conversely, even if there are constant writes but no searches, the data will also be persisted. These settings provide more granular control over how frequently RAM chunks are flushed, balancing the need for data durability with performance considerations. Per-table directives for these settings have higher priority and will override the instance-wide defaults.

source = srcpart1
source = srcpart2
source = srcpart3

The source field specifies the source from which documents will be obtained during indexing of the current table. There must be at least one source. The sources can be of different types (e.g. one could be MySQL, another PostgreSQL). For more information on indexing from external storages, read here

Value: The name of the source is mandatory. Multiple values are allowed.

killlist_target = main:kl

This setting determines the table(s) to which the kill-list will be applied. Matches in the targeted table that are updated or deleted in the current table will be suppressed. In :kl mode, the documents to suppress are taken from the kill-list. In :id mode, all document IDs from the current table are suppressed in the targeted one. If neither is specified, both modes will take effect. Learn more about kill-lists here

Value: not specified (default), target_table_name:kl, target_table_name:id, target_table_name. Multiple values are allowed

columnar_attrs = *
columnar_attrs = id, attr1, attr2, attr3

This configuration setting determines which attributes should be stored in the columnar storage instead of the row-wise storage.

You can set columnar_attrs = * to store all supported data types in the columnar storage.

Additionally, id is a supported attribute to store in the columnar storage.

columnar_strings_no_hash = attr1, attr2, attr3

By default, all string attributes stored in columnar storage store pre-calculated hashes. These hashes are used for grouping and filtering. However, they occupy extra space, and if you don't need to group by that attribute, you can save space by disabling hash generation.

CREATE TABLE [IF NOT EXISTS] name ( <field name> <field data type> [data type options] [, ...]) [table_options]

For more information on data types, see more about data types here.

Type	Equivalent in a configuration file	Notes	Aliases
text	rt_field	Options: indexed, stored. Default: both. To keep text stored, but indexed, specify "stored" only. To keep text indexed only, specify "indexed" only.	string
integer	rt_attr_uint	integer	int, uint
bigint	rt_attr_bigint	big integer
float	rt_attr_float	float
float_vector	rt_attr_float_vector	a vector of float values
multi	rt_attr_multi	multi-integer	mva
multi64	rt_attr_multi_64	multi-bigint	mva64
bool	rt_attr_bool	boolean
json	rt_attr_json	JSON
string	rt_attr_string	string. Option `indexed, attribute` will make the value full-text indexed and filterable, sortable and groupable at the same time
timestamp	rt_attr_timestamp	timestamp
bit(n)	rt_attr_uint field_name:N	N is the max number of bits to keep

‹›

SQL

📋

CREATE TABLE products (title text, price float) morphology='stem_en'

This creates the "products" table with two fields: "title" (full-text) and "price" (float), and sets the "morphology" to "stem_en".

CREATE TABLE products (title text indexed, description text stored, author text, price float)

This creates the "products" table with three fields:

"title" is indexed, but not stored.
"description" is stored, but not indexed.
"author" is both stored and indexed.

create table ... engine='columnar';
create table ... engine='rowwise';

The engine setting changes the default attribute storage for all attributes in the table. You can also specify engine separately for each attribute.

For information on how to enable columnar storage for a plain table, see columnar_attrs .

Values:

columnar - Enables columnar storage for all table attributes, except for json
rowwise (default) - Doesn't change anything and uses the traditional row-wise storage for the table.

The following settings are applicable for both real-time and plain tables, regardless of whether they are specified in a configuration file or set online using the CREATE or ALTER command.

Manticore supports two access modes for reading table data: seek+read and mmap.

In seek+read mode, the server uses the pread system call to read document lists and keyword positions, represented by the *.spd and *.spp files. The server uses internal read buffers to optimize the reading process, and the size of these buffers can be adjusted using the options read_buffer_docs and read_buffer_hits.There is also the option preopen that controls how Manticore opens files at start.

In mmap access mode, the search server maps the table's file into memory using the mmap system call, and the OS caches the file contents. The options read_buffer_docs and read_buffer_hits have no effect for corresponding files in this mode. The mmap reader can also lock the table's data in memory using themlock privileged call, which prevents the OS from swapping the cached data out to disk.

To control which access mode to use, the options access_plain_attrs, access_blob_attrs, access_doclists, access_hitlists and access_dict are available, with the following values:

Value	Description
file	server reads the table files from disk with seek+read using internal buffers on file access
mmap	server maps the table files into memory and OS caches up its contents on file access
mmap_preread	server maps the table files into memory and a background thread reads it once to warm up the cache
mlock	server maps the table files into memory and then executes the mlock() system call to cache up the file contents and lock it into memory to prevent it being swapped out

Setting	Values	Description
access_plain_attrs	mmap, mmap_preread (default), mlock	controls how `.spa` (plain attributes) `.spe` (skip lists) `.spt` (lookups) `.spm` (killed docs) will be read
access_blob_attrs	mmap, mmap_preread (default), mlock	controls how `*.spb` (blob attributes) (string, mva and json attributes) will be read
access_doclists	file (default), mmap, mlock	controls how `*.spd` (doc lists) data will be read
access_hitlists	file (default), mmap, mlock	controls how `*.spp` (hit lists) data will be read
access_dict	mmap, mmap_preread (default), mlock	controls how `*.spi` (dictionary) will be read

Here is a table which can help you select your desired mode:

table part	keep it on disk	keep it in memory	cached in memory on server start	lock it in memory
plain attributes in row-wise (non-columnar) storage, skip lists, word lists, lookups, killed docs	mmap	mmap	mmap_preread (default)	mlock
row-wise string, multi-value attributes (MVA) and json attributes	mmap	mmap	mmap_preread (default)	mlock
columnar numeric, string and multi-value attributes	always	only by means of OS	no	not supported
doc lists	file (default)	mmap	no	mlock
hit lists	file (default)	mmap	no	mlock
dictionary	mmap	mmap	mmap_preread (default)	mlock

For the fastest search response time and ample memory availability, use row-wise attributes and lock them in memory using mlock. Additionally, use mlock for doclists/hitlists.
If you prioritize can't afford lower performance after start and are willing to accept longer startup time, use the --force-preread. option. If you desire faster searchd restart, stick to the default mmap_preread option.
If you are looking to conserve memory, while still having enough memory for all attributes, skip the use of mlock. The operating system will determine what should be kept in memory based on frequent disk reads.
If row-wise attributes do not fit into memory, opt for columnar attributes
If full-text search performance is not a concern, and you wish to save memory, use access_doclists/access_hitlists=file

The default mode offers a balance of:

mmap,
Prereading non-columnar attributes,
Seeking and reading columnar attributes with no preread,
Seeking and reading doclists/hitlists with no preread.

This provides a decent search performance, optimal memory utilization, and faster searchd restart in most scenarios.

attr_update_reserve = 256k

This setting reserves extra space for updates to blob attributes such as multi-value attributes (MVA), strings, and JSON. The default value is 128k. When updating these attributes, their length may change. If the updated string is shorter than the previous one, it will overwrite the old data in the *.spb file. If the updated string is longer, it will be written to the end of the *.spb file. This file is memory-mapped, making resizing it a potentially slow process, depending on the operating system's memory-mapped file implementation. To avoid frequent resizing, you can use this setting to reserve extra space at the end of the .spb file.

Value: size, default 128k.

docstore_block_size = 32k

This setting controls the size of blocks used by the document storage. The default value is 16kb. When original document text is stored using stored_fields or stored_only_fields, it is stored within the table and compressed for efficiency. To optimize disk access and compression ratios for small documents, these documents are concatenated into blocks. The indexing process collects documents until their total size reaches the threshold specified by this option. At that point, the block of documents is compressed. This option can be adjusted to achieve better compression ratios (by increasing the block size) or faster access to document text (by decreasing the block size).

Value: size, default 16k.

docstore_compression = lz4hc

This setting determines the type of compression used for compressing blocks of documents stored in document storage. If stored_fields or stored_only_fields are specified, the document storage stores compressed document blocks. 'lz4' offers fast compression and decompression speeds, while 'lz4hc' (high compression) sacrifices some compression speed for a better compression ratio. 'none' disables compression completely.

Values: lz4 (default), lz4hc, none.

docstore_compression_level = 12

The compression level used when 'lz4hc' compression is applied in document storage. By adjusting the compression level, you can find the right balance between performance and compression ratio when using 'lz4hc' compression. Note that this option is not applicable when using 'lz4' compression.

Value: An integer between 1 and 12, with a default of 9.

preopen = 1

This setting indicates that searchd should open all table files on startup or rotation, and keep them open while running. By default, the files are not pre-opened. Pre-opened tables require a few file descriptors per table, but they eliminate the need for per-query open() calls and are immune to race conditions that might occur during table rotation under high load. However, if you are serving many tables, it may still be more efficient to open them on a per-query basis in order to conserve file descriptors.

Value: 0 (default), or 1.

read_buffer_docs = 1M

Buffer size for storing the list of documents per keyword. Increasing this value will result in higher memory usage during query execution, but may reduce I/O time.

Value: size, default 256k, minimum value is 8k.

read_buffer_hits = 1M

Buffer size for storing the list of hits per keyword. Increasing this value will result in higher memory usage during query execution, but may reduce I/O time.

Value: size, default 256k, minimum value is 8k.

inplace_enable = {0|1}

Enables in-place table inversion. Optional, default is 0 (uses separate temporary files).

The inplace_enable option reduces the disk footprint during indexing of plain tables, while slightly slowing down indexing (it uses approximately 2 times less disk, but yields around 90-95% of the original performance).

Indexing is comprised of two primary phases. During the first phase, documents are collected, processed, and partially sorted by keyword, and the intermediate results are written to temporary files (.tmp*). During the second phase, the documents are fully sorted and the final table files are created. Rebuilding a production table on-the-fly requires approximately 3 times the peak disk footprint: first for the intermediate temporary files, second for the newly constructed copy, and third for the old table that will be serving production queries in the meantime. (Intermediate data is comparable in size to the final table.) This may be too much disk footprint for large data collections, and the inplace_enable option can be used to reduce it. When enabled, it reuses the temporary files, outputs the final data back to them, and renames them upon completion. However, this may require additional temporary data chunk relocation, which is where the performance impact comes from.

This directive has no effect on searchd, it only affects the indexer.

‹›

CONFIG

CONFIG

📋

table products {
  inplace_enable = 1
  path = products
  source = src_base
}

inplace_hit_gap = size

The option In-place inversion fine-tuning option. Controls preallocated hitlist gap size. Optional, default is 0.

This directive only affects the searchd tool, and does not have any impact on the indexer.

‹›

CONFIG

CONFIG

📋

table products {
  inplace_hit_gap = 1M
  inplace_enable = 1
  path = products
  source = src_base
}

inplace_reloc_factor = 0.1

The inplace_reloc_factor setting determines the size of the relocation buffer within the memory arena used during indexing. The default value is 0.1.

This option is optional and only affects the indexer tool, not the searchd server.

‹›

CONFIG

CONFIG

📋

table products {
  inplace_reloc_factor = 0.1
  inplace_enable = 1
  path = products
  source = src_base
}

inplace_write_factor = 0.1

Controls the size of the buffer used for in-place writing during indexing. Optional, with a default value of 0.1.

It's important to note that this directive only impacts the indexer tool and not the searchd server.

‹›

CONFIG

CONFIG

📋

table products {
  inplace_write_factor = 0.1
  inplace_enable = 1
  path = products
  source = src_base
}

The following settings are supported. They are all described in section NLP and tokenization.

Plain table Percolate table

Creating a local table

Online schema management (RT mode)

Defining table schema in config (Plain mode)

Table types and modes

Real-time table

Creating a real-time table:

CREATE TABLE command:

_mapping API:

CREATE TABLE LIKE:

👍 What you can do with a real-time table:

⛔ What you cannot do with a real-time table:

Real-time table files structure

Plain table

Creating a plain table

👍 What you can do with a plain table:

⛔ What you cannot do with a plain table:

Plain table building performance

Plain table building scenarios

Rebuild fully when needed

Main+delta scenario

Plain table files structure

Plain and real-time table settings

Defining table schema in a configuration file

Common plain and real-time tables settings

type

path

stored_fields

stored_only_fields

json_secondary_indexes

Real-time table settings:

diskchunk_flush_search_timeout

diskchunk_flush_write_timeout

optimize_cutoff

rt_field

rt_attr_uint

rt_attr_bigint

rt_attr_multi

rt_attr_multi_64

rt_attr_float

rt_attr_float_vector

Configuring KNN for vector attributes

rt_attr_bool

rt_attr_string

rt_attr_json

rt_attr_timestamp

rt_mem_limit

How to change rt_mem_limit and optimize_cutoff

Important notes about RAM chunks

RAM chunk flushing conditions

Plain table settings:

source

killlist_target

columnar_attrs

columnar_strings_no_hash

Creating a real-time table online via CREATE TABLE

General syntax of CREATE TABLE

Data types:

Engine

Other settings

Performance related

Accessing table files

The recommendations are:

Other performance related settings

attr_update_reserve

docstore_block_size

docstore_compression

docstore_compression_level

preopen

read_buffer_docs

read_buffer_hits

Plain table disk footprint settings

inplace_enable

inplace_hit_gap

inplace_reloc_factor

inplace_write_factor

Natural language processing specific settings