Creating an index > Local indexes > Plain index

There are 2 different approaches to deal with indexes in Manticore:

Real-time mode requires no index definition in the configuration file and data_dir directive in searchd section. Index files are stored inside the data_dir.

Replication is available only in this mode.

In this mode you can use SQL commands like CREATE TABLE, ALTER TABLE and DROP TABLE to create and change index schema and drop it. This mode is especially useful for real-time and percolate indexes.

Index names are case insensitive in RT mode.

In this mode you can specify index schema in config which will be read on Manticore start and if the index doesn't exist yet it will be created. This mode is especially useful for plain indexes that are built upon indexing data from an external storage.

Dropping indexes is only possible by removing them from the configuration file or by removing the path setting and sending a HUP signal to the server or restarting it.

Index names are case sensitive in this mode.

All index types are supported in this mode.

Index type	RT mode	plain mode
Real-time	yes	yes
Plain	no	yes
Percolate	yes	yes
Distributed	yes	yes
Template	yes	no

️Real-time index

Real-time index is a main type of Manticore indexes. It allows adding, updating and deleting documents with immediate availability of the changes. Real-time index settings can be defined in a configuration file or online via CREATE/UPDATE/DELETE/ALTER commands.

Real-time index internally consists of one or multiple plain indexes called chunks. There can be:

multiple disk chunks. They are stored on disk with the same structure as any plain index
single ram chunk. Stored in memory and used as an accumulator of changes

RAM chunk size is controlled by rt_mem_limit. Once the limit is exceeded the RAM chunk is flushed to disk in a form of a disk chunk. When there are too many disk chunks they can be merged into one for better performance using command OPTIMIZE.

‹›

SQL
HTTP
PHP
Python
Javascript
Java
CONFIG

📋

CREATE TABLE products(title text, price float) morphology='stem_en';

POST /sql -d "mode=raw&query=CREATE TABLE products(title text, price float)  morphology='stem_en'"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'price'=>['type'=>'float'],
]);

utilsApi.sql('mode=raw&query=CREATE TABLE forum(title text, price float)')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE forum(title text, price float)');

utilsApi.sql("mode=raw&query=CREATE TABLE forum(title text, price float)");

index products {
  type = rt
  path = idx
  rt_field = title
  rt_attr_uint = price
  stored_fields = title
}

‹›

Response

Query OK, 0 rows affected (0.00 sec)

‹›

Add documents
Update attributes and full-text fields
Delete documents
Truncate index
Change schema online with help of command ALTER
Define index in a configuration file
Use UUID for automatic ID provisioning

Index data with help of indexer
Link it with sources for easy indexing from external storages
Update it's killlist_target, it's just not needed as the real-time index takes controls of it automatically

Extension	Description
`.lock`	lock file
`.ram`	RAM chunk
`.meta`	RT index headers
`..sp`	disk chunks (see plain index format)

Creating a local index Plain index

Plain index is a basic element for non-percolate searching. It can be specified only in a configuration file in Plain mode. It is not supported in RT mode. It's normally used together with a source to index data from an external storage and afterwards can be attached to a real-time index.

build it with help of source and indexer tool which is the fastest possible way to index data
do an in-place update of an integer, float, string and MVA attribute
update it's killlist_target

insert more data into an index after it's built
update it
delete from it
create/delete/alter a plain index online (you need to define it in a configuration file)
use UUID for automatic ID generation. When you fetch data from an external storage it must include a unique identifier for each document

Except numeric attributes (including MVA), the rest of the data in a plain index is immutable. If you need to update/add new records you need to rebuild the index. While index is being rebuilt, existing index is still available for serving requests. When a new version of the index is ready, a process called rotation is performed which puts the new version online and discards the old one.

‹›

Plain index example

Plain index example

📋

A plain index can be only defined in a configuration file. It's not supported by command CREATE TABLE

source source {
  type             = mysql
  sql_host         = localhost
  sql_user         = myuser
  sql_pass         = mypass
  sql_db           = mydb
  sql_query        = SELECT id, title, description, category_id  from mytable
  sql_attr_uint    = category_id
  sql_field_string = title
 }

index idx {
  type   = plain
  source = source
  path   = /path/to/index
 }

Speed of plain indexing depends on several factors:

how fast the source can be providing the data
tokenization settings
your hardware (CPU, amount of RAM, disk performance)

In the simplest usage scenario, we would use a single plain index which we just fully rebuild from time to time. It works fine for smaller data sets and if you are ready that:

the index will be not as fresh as data in the source
indexing duration grows with the data, the more data you have in the source the longer it will take to build the index

If you have a bigger data set and still want to use a plain index rather than Real-Time what you can do is:

make another smaller index for incremental indexing
combine the both using a distributed index

What it can give is you can rebuild the bigger index seldom (say once per week), save the position of the freshest indexed document and after that use the smaller index to index anything new or updated from your source. Since you will only need to fetch the updates from your storage you can do it much more frequently (say once per minute or even each few seconds).

But after a while the smaller indexing duration will become too high and that will be the moment when you need to rebuild the bigger index and empty the smaller one.

This is called main+delta schema and you can learn more about it in this interactive course.

When you build a smaller "delta" index it can get documents that are already in the "main" index. To let Manticore know that documents from the current index should take precedence there's a mechanism called kill list and corresponding directive killlist_target.

More information on this topic can be found here.

Extension	Description
`.spa`	stores document attributes
`.spb`	stores blob attributes: strings, MVA, json
`.spd`	stores matching document ID lists for each word ID
`.sph`	stores index header information
`.sphi`	stores histograms of attribute values
`.spi`	stores word lists (word IDs and pointers to `.spd` file)
`.spk`	stores kill-lists
`.spl`	lock file
`.spm`	stores a bitmap of killed documents
`.spp`	stores hit (aka posting, aka word occurrence) lists for each word ID
`.spt`	stores additional data structures to speed up lookups by document ids
`.spe`	stores skip-lists to speed up doc-list filtering
`.spds`	stores document texts
`.tmp*`	temporary files during index_settings_and_status
`.new.sp*`	during indexing new version of the index is written by default in the same folder
`.old.sp*`	after rotation previous version files are saved with .old extension

️Real-time index Plain and real-time index settings

index <index_name>[:<parent index name>] {
...
}

‹›

Plain
Real-time

📋

index <index_name> {
  type = plain
  path = /path/to/index
  source = <source_name>
  source = <another source_name>
  [stored_fields = <comma separated list of full-text fields that should be stored>]
}

type = plain

type = rt

Index type: "plain" or "rt" (real-time)

Value: plain (default), rt

path = path/to/index

Absolute or relative path without extension where to store the index or where to look for it

Value: path to the index, mandatory

stored_fields = title, content

By default when an index is defined in a configuration file, original full-text field's content is not stored, but just indexed. If this option is set, the field's contents will be both indexes and stored

Value: comma separated list of full-text fields that should be stored. Default is empty.

A list of fields to be stored in the index. Optional, default is empty (do not store original field text) for Plain mode, but enabled for every field for RT mode.

By default, original document text is not stored in the index in the Plain mode. If stored_fields option is set (or RT mode is used), the field's full text is stored in the index. It can be returned with search results.

See also docstore_block_size, docstore_compression for document storage compression options.

‹›

SQL
HTTP
PHP
Python
Javascript
Java
CONFIG

📋

CREATE TABLE products(title text stored indexed, content text stored indexed, name text indexed, price float)

POST /sql -d "mode=raw&query=
CREATE TABLE products(title text stored indexed, content text stored indexed, name text indexed, price float)"

$params = [
    'body' => [
        'columns' => [
            'title'=>['type'=>'text', 'options' => ['indexed', 'stored']],
            'content'=>['type'=>'text', 'options' => ['indexed', 'stored']],
            'name'=>['type'=>'text', 'options' => ['indexed']],
            'price'=>['type'=>'float']
        ]
    ],
    'index' => 'products'
];
$index = new \Manticoresearch\Index($client);
$index->create($params);

utilsApi.sql('mode=raw&query=CREATE TABLE products(title text stored indexed, content text stored indexed, name text indexed, price float)')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text stored indexed, content text stored indexed, name text indexed, price float)');

utilsApi.sql("mode=raw&query=CREATE TABLE products(title text stored indexed, content text stored indexed, name text indexed, price float)");

index products {
  stored_fields = title,content

  type = rt
  path = idx
  rt_field = title
  rt_field = content
  rt_field = name
  rt_attr_uint = price
}

stored_only_fields = title,content

A list of fields that will be stored in the index but will be not indexed. Similar to stored_fields except when a field is specified in stored_only_fields it is only stored, not indexed and can’t be searched with fulltext queries. It can only be returned with search results.

Value: comma separated list of fields that should be stored only, not indexed. Default is empty.

rt_field = subject

Full-text fields to be indexed. The names must be unique. The order is preserved; and so field values in INSERT statements without an explicit list of inserted columns will have to be in the same order as configured.

Value: at least one full-text field should be specified in an index, multiple records allowed.

rt_attr_uint = gid

Unsigned integer attribute declaration

Value: field_name or field_name:N, can be multiple records. N is the max number of bits to keep.

rt_attr_bigint = gid

BIGINT attribute declaration

Value: field name, multiple records allowed

rt_attr_multi = tags

Multi-valued attribute (MVA) declaration. Declares the UNSIGNED INTEGER (unsigned 32-bit) MVA attribute. Multi-value (ie. there may be more than one such attribute declared), optional.