Creating an index > Local indexes > Real-time index

Manticore's data types can be split into full-text fields and attributes.

Full-text fields are:

indexed
can be searched for keywords
cannot be used for sorting or grouping
original document's content can be retrieved
original document's content can be used for highlighting

Full-text fields are represented data type text. All the other data types are called "attributes".

Attributes are non-full-text values associated with each document that can be used to perform non-full-text filtering, sorting and grouping during search.

It is often desired to process full-text search results based not only on matching document ID and its rank, but on a number of other per-document values as well. For instance, one might need to sort news search results by date and then relevance, or search through products within specified price range, or limit blog search to posts made by selected users, or group results by month. To do that efficiently, Manticore enables not only full-text fields, but additional attributes to each document. It's then possible to use them to filter, sort, or group full-text matches or search only by attributes.

The attributes, unlike full-text fields, are not full-text indexed. They are stored in the index, but it is not possible to search them as full-text.

A good example for attributes would be a forum posts index. Assume that only title and content fields need to be full-text searchable - but that sometimes it is also required to limit search to a certain author or a sub-forum (ie. search only those rows that have some specific values of author_id or forum_id); or to sort matches by post_date column; or to group matching posts by month of the post_date and calculate per-group match counts.

‹›

SQL
HTTP
PHP
Python
Javascript
Java
config

📋

CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp);

POST /sql -d "mode=raw&query=CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)"

$index = new \Manticoresearch\Index($client);
$index->setName('forum');
$index->create([
    'title'=>['type'=>'text'],
    'content'=>['type'=>'text'],
    'author_id'=>['type'=>'int'],
    'forum_id'=>['type'=>'int'],
    'post_date'=>['type'=>'timestamp']
]);

utilsApi.sql('mode=raw&query=CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)');

utilsApi.sql("mode=raw&query=CREATE TABLE forum(title text, content text, author_id int, forum_id int, post_date timestamp)");

index forum
{
    type = rt
    path = forum

    # when configuring fields via config, they are indexed (and not stored) by default
    rt_field = title
    rt_field = content

    # this option needs to be specified for the field to be stored
    stored_fields = title, content

    rt_attr_uint = author_id
    rt_attr_uint = forum_id
    rt_attr_timestamp = post_date
}

This example shows running a full-text query filtered by author_id, forum_id and sorted by post_date.

‹›

SQL
HTTP
PHP
Python
javascript
java

📋

select * from forum where author_id=123 and forum_id in (1,3,7) order by post_date desc

POST /search
{
  "index": "forum",
  "query":
  {
    "match_all": {},
    "bool":
    {
      "must":
      [
        { "equals": { "author_id": 123 } },
        { "in": { "forum_id": [1,3,7] } }
      ]
    }
  },
  "sort": [ { "post_date": "desc" } ]
}

$client->search([
        'index' => 'forum',
        'query' =>
        [
            'match_all' => [],
            'bool' => [
                'must' => [
                    'equals' => ['author_id' => 123],
                    'in' => [
                        'forum_id' => [
                            1,3,7
                        ]
                    ]
                ]
            ]
        ],
        'sort' => [
        ['post_date' => 'desc']
    ]
]);

searchApi.search({"index":"forum","query":{"match_all":{},"bool":{"must":[{"equals":{"author_id":123}},{"in":{"forum_id":[1,3,7]}}]}},"sort":[{"post_date":"desc"}]})

res = await searchApi.search({"index":"forum","query":{"match_all":{},"bool":{"must":[{"equals":{"author_id":123}},{"in":{"forum_id":[1,3,7]}}]}},"sort":[{"post_date":"desc"}]});

HashMap<String,Object> filters = new HashMap<String,Object>(){{
    put("must", new HashMap<String,Object>(){{
        put("equals",new HashMap<String,Integer>(){{
            put("author_id",123);
        }});
        put("in",
            new HashMap<String,Object>(){{
                put("forum_id",new int[] {1,3,7});
        }});
    }});
}};
Map<String,Object> query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("bool",filters);
SearchRequest searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
    add(new HashMap<String,String>(){{ put("post_date","desc");}});
}});
SearchResponse searchResponse = searchApi.search(searchRequest);

Manticore supports two types of attribute storages:

row-wise - traditional storage available in Manticore Search out of the box
columnar - provided by Manticore Columnar Library

As can be understood from their names, they store data differently. The traditional row-wise storage:

stores attributes uncompressed
all attributes of the same document are stored in one row close to each other
rows are stored one by one
accessing attributes is basically done by just multiplying rowid by stride (length of a single vector) and getting the requested attribute from the calculated memory location. It gives very low random access latency
attributes have to be in memory to get acceptable performance, otherwise due to the row-wise nature of the storage Manticore may have to read from disk too much unneded data which is in many cases suboptimal.

With the columnar storage:

each attribute is stored independently from all other attributes in its separate "column"
storage is split into blocks of 65536 entries
the blocks are stored compressed. This often allows to store just a few distinct values instead of storing all of them like in the row-wise storage. High compression ratio allows to read from disk faster and makes the memory requirement much lower
when data is indexed, storage scheme is selected for each block independently. For example, if all values in a block are the same, it gets "const" storage and only one value is stored for the whole block. If there are less than 256 unique values per block, it gets "table" storage and stores indexes to a table of values instead of the values themselves
search in a block can be early rejected if it's clear the requested value is not present in the block.

The columnar storage was designed to handle large data volume that does not fit into RAM, so the recommendations are:

if you have enough memory for all your attributes you will benefit from the row-wise storage
otherwise the columnar storage can still give you decent performance with much lower memory footprint which will allow you to store much more documents in your index

The traditional row-wise storage is default, so if you want everything to be stored in a row-wise fashion you don't need to do anything when you create an index.

To enable the columnar storage you need to:

specify engine='columnar' in CREATE TABLE to make all attributes of the index columnar. Then if you want to keep a specific attribute row-wise you need to add engine='rowwise' when you declare it. For example:
```
create table idx(title text, type int, price float engine='rowwise') engine='columnar'
```

specify engine='columnar' for a specific attribute in CREATE TABLE to make it columnar. For example:

create table idx(title text, type int, price float engine='columnar');

create table idx(title text, type int, price float engine='columnar') engine='rowwise';

in plain mode you need to list attributes you want to be columnar in columnar_attrs.

Below is the list of data types supported by Manticore Search:

The identifier of a document in the index. Document IDs must be unique signed positive non-zero 64-bit integers. Note that no negative or zero values are allowed. Document IDs are implicit and have no declaration. Update operation is forbidden on document ids.

General syntax:

string|text [stored|attribute] [indexed]

Properties:

indexed - full-text indexed (can be used in full-text queries)
stored - stored in a docstore (stored on disk, not in RAM, lazy read)
attribute - makes it string attribute (can sort/group by it)

Specifying at least one property overrides all the default ones (see below), i.e. if you decide to use a custom combination of properties you need to list all the properties you want.

No properties specified:

string and text are aliases, but if you don’t specify any properties they by default means different things:

just string by default means attribute (see details below).
just text by default means stored + indexed (see details below).

Text (just text or text/string indexed) data type forms the full-text part of the index. Text fields are indexed and can be searched for keywords.

Text is passed through an analyzer pipeline that converts the text to words, applies morphology transformations etc. Eventually a full-text index (a special data structure that enables quick searches for a keyword) gets built from that text.

Full-text fields can only be used in MATCH() clause and cannot be used for sorting or aggregation. Words are stored in an inverted index along with references to the fields they belong and positions in the field. This allows to search a word inside each field and to use advanced operators like proximity. By default the original text of the fields is both indexed and stored in document storage. It means that the original text can be returned with the query results and it can be used in search result highlighting.

‹›

SQL
HTTP
PHP
Python
javascript
java
config

📋

CREATE TABLE products(title text);

POST /sql -d "mode=raw&query=CREATE TABLE products(title text)"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text']
]);

utilsApi.sql('mode=raw&query=CREATE TABLE products(title text)')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text)');

utilsApi.sql("mode=raw&query=CREATE TABLE products(title text)");

index products
{
    type = rt
    path = products

    # when configuring fields via config, they are indexed (and not stored) by default
    rt_field = title

    # this option needs to be specified for the field to be stored
    stored_fields = title
}

This behavior can be overridden by explicitly specifying that the text is only indexed.

‹›

SQL
HTTP
PHP
Python
javascript
java
config

📋

CREATE TABLE products(title text indexed);

POST /sql -d "mode=raw&query=CREATE TABLE products(title text indexed)"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text','options'=>['indexed']]
]);

utilsApi.sql('mode=raw&query=CREATE TABLE products(title text indexed)')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text indexed)');

utilsApi.sql("mode=raw&query=CREATE TABLE products(title text indexed)");

index products
{
    type = rt
    path = products

    # when configuring fields via config, they are indexed (and not stored) by default
    rt_field = title
}

Fields are named, and you can limit your searches to a single field (e.g. search through "title" only) or a subset of fields (eg. to "title" and "abstract" only). Manticore index format generally supports up to 256 fields.

‹›

SQL
HTTP
PHP
Python
javascript
java

📋

select * from products where match('@title first');

POST /search
{
    "index": "products",
    "query":
    {
        "match": { "title": "first" }
    }
}

$index->setName('products')->search('@title')->get();

searchApi.search({"index":"products","query":{"match":{"title":"first"}}})

res = await searchApi.search({"index":"products","query":{"match":{"title":"first"}}});

utilsApi.sql("mode=raw&query=CREATE TABLE products(title text indexed)");

Unlike full-text fields, string attributes (just string or string/text attribute) are stored as they are received and cannot be used in full-text searches. Instead they are returned in results, they can be used in WHERE clause for comparison filtering or REGEX and they can be used for sorting and aggregation. In general it's not recommended to store large texts in string attributes, but use string attributes for metadata like names, titles, tags, keys.

‹›

SQL
HTTP
PHP
Python
javascript
java
config

📋

CREATE TABLE products(title text, keys string);

POST /sql -d "mode=raw&query=CREATE TABLE products(title text, keys string)"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'keys'=>['type'=>'string']
]);

utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, keys string)')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, keys string)');

utilsApi.sql("mode=raw&query=CREATE TABLE products(title text, keys string)");

index products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_string = keys
}

Integer type allows storing 32 bit unsigned integer values.

‹›

SQL
HTTP
PHP
Python
javascript
java
config

📋

CREATE TABLE products(title text, price int);

POST /sql -d "mode=raw&query=CREATE TABLE products(title text, price int)"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'price'=>['type'=>'int']
]);

utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, price int)')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, price int)');

utilsApi.sql("mode=raw&query=CREATE TABLE products(title text, price int)");

index products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_uint = type
}

Integers can be stored in shorter sizes than 32 bit by specifying a bit count. For example if we want to store a numeric value which we know is not going to be bigger than 8, the type can be defined as bit(3). Bitcount integers perform slower than the full size ones, but they require less RAM. They are saved in 32-bit chunks, so in order to save space they should be grouped at the end of attributes definitions (otherwise a bitcount integer between 2 full-size integers will occupy 32 bits as well).

‹›

SQL
HTTP
PHP
Python
javascript
java
config

📋

CREATE TABLE products(title text, flags bit(3), tags bit(2) );

POST /sql -d "mode=raw&query=CREATE TABLE products(title text, flags bit(3), tags bit(2))"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'flags'=>['type'=>'bit(3)'],
    'tags'=>['type'=>'bit(2)']
]);

utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, flags bit(3), tags bit(2) ')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, flags bit(3), tags bit(2) ');

utilsApi.sql("mode=raw&query=CREATE TABLE products(title text, flags bit(3), tags bit(2)");

index products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_uint = flags:3
    rt_attr_uint = tags:2
}

Big integers are 64-bit wide signed integers.

‹›

SQL
HTTP
PHP
Python
javascript
java
config

📋

CREATE TABLE products(title text, price bigint );

POST /sql -d "mode=raw&query=CREATE TABLE products(title text, price bigint)"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'price'=>['type'=>'bigint']
]);

utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, price bigint )')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, price bigint )');

utilsApi.sql("mode=raw&query=CREATE TABLE products(title text, price bigint )");

index products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_bigint = type
}

Declares a boolean attribute. It's equivalent to an integer attribute with bit count of 1.

‹›

SQL
HTTP
PHP
Python
javascript
java
config

📋

CREATE TABLE products(title text, sold bool );

POST /sql -d "mode=raw&query=CREATE TABLE products(title text, sold bool)"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'sold'=>['type'=>'bool']
]);

utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, sold bool )')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, sold bool )');

utilsApi.sql("mode=raw&query=CREATE TABLE products(title text, sold bool )");

index products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_bool = sold
}

Timestamp type represents unix timestamps which is stored as a 32-bit integer. The difference is that time and date functions are available for the timestamp type.

‹›

SQL
HTTP
PHP
Python
javascript
java
config

📋

CREATE TABLE products(title text, date timestamp);

POST /sql -d "mode=raw&query=CREATE TABLE products(title text, date timestamp)"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'date'=>['type'=>'timestamp']
]);

utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, date timestamp)')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, date timestamp)');

utilsApi.sql("mode=raw&query=CREATE TABLE products(title text, date timestamp)");

index products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_timestamp = date
}

Real numbers are stored as 32-bit IEEE 754 single precision floats.

‹›

SQL
HTTP
PHP
Python
java
javascript
config

📋

CREATE TABLE products(title text, coeff float);

POST /sql -d "mode=raw&query=CREATE TABLE products(title text, coeff float)"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'coeff'=>['type'=>'float']
]);

utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, coeff float)')

utilsApi.sql("mode=raw&query=CREATE TABLE products(title text, coeff float)");

res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, coeff float)');

index products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_float = coeff
}

Unlike integer types, equal comparison of floats is forbidden due to rounding errors. A near equal can be used instead, by checking the absolute error margin.

‹›

SQL
HTTP
PHP
Python
javascript
java

📋

select abs(a-b)<=0.00001 from products

POST /search
{
  "index": "products",
  "query": { "match_all": {} } },
  "expressions": { "eps": "abs(a-b)" }
}

$index->setName('products')->search('')->expression('eps','abs(a-b)')->get();

searchApi.search({"index":"products","query":{"match_all":{}},"expressions":{"eps":"abs(a-b)"}})

res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"eps":"abs(a-b)"}});

searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
    put("ebs","abs(a-b)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);

Another alternative, which can also be used to perform IN(attr,val1,val2,val3) is to compare floats as integers by choosing a multiplier factor and convert the floats to integers in operations. Example illustrates modifying IN(attr,2.0,2.5,3.5) to work with integer values.

‹›

SQL
HTTP
PHP
Python
javascript
java

📋

select in(ceil(attr*100),200,250,350) from products

POST /search
{
  "index": "products",
  "query": { "match_all": {} } },
  "expressions": { "inc": "in(ceil(attr*100),200,250,350)" }
}

$index->setName('products')->search('')->expression('inc','in(ceil(attr*100),200,250,350)')->get();

searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"inc":"in(ceil(attr*100),200,250,350)"}})

res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"inc":"in(ceil(attr*100),200,250,350)"}});

searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
    put("inc","in(ceil(attr*100),200,250,350)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);

This data type allows storing JSON objects for schema-less data. It is not supported by the columnar storage, but since you can combine the both storages in the same index you can have it stored in the traditional storage instead.

‹›

SQL
HTTP
PHP
Python
javascript
java
config

📋

CREATE TABLE products(title text, data json);

POST /sql -d "mode=raw&query=CREATE TABLE products(title text, data json)"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'data'=>['type'=>'json']
]);

utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, data json)')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, data json)');

utilsApi.sql'mode=raw&query=CREATE TABLE products(title text, data json)');

index products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_json = data
}

JSON properties can be used in most operations. There are also special functions such as ALL(), ANY(), GREATEST(), LEAST() and INDEXOF() that allow traversal of property arrays.

‹›

SQL
HTTP
PHP
Python
javascript
java

📋

select indexof(x>2 for x in data.intarray) from products

POST /search
{
  "index": "products",
  "query": { "match_all": {} } },
  "expressions": { "idx": "indexof(x>2 for x in data.intarray)" }
}

$index->setName('products')->search('')->expression('idx','indexof(x>2 for x in data.intarray)')->get();

searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"idx":"indexof(x>2 for x in data.intarray)"}})

res = await searchApi.search({"index":"products","query":{"match_all":{}}},"expressions":{"idx":"indexof(x>2 for x in data.intarray)"}});

searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
    put("idx","indexof(x>2 for x in data.intarray)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);

Text properties are treated same as strings so it's not possible to use them in full-text matches expressions, but string functions like REGEX() can be used.

‹›

SQL
HTTP
PHP
Python
javascript
java

📋

select regex(data.name, 'est') as c from products where c>0

POST /search
{
  "index": "products",
  "query":
  {
    "match_all": {},
    "range": { "c": { "gt": 0 } } }
  },
  "expressions": { "c": "regex(data.name, 'est')" }
}

$index->setName('products')->search('')->expression('idx',"regex(data.name, 'est')")->filter('c','gt',0)->get();

searchApi.search({"index":"products","query":{"match_all":{},"range":{"c":{"gt":0}}}},"expressions":{"c":"regex(data.name, 'est')"}})

res = await searchApi.search({"index":"products","query":{"match_all":{},"range":{"c":{"gt":0}}}},"expressions":{"c":"regex(data.name, 'est')"}});

searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("range", new HashMap<String,Object>(){{
    put("c", new HashMap<String,Object>(){{
        put("gt",0);
    }});
}});
searchRequest.setQuery(query);
Object expressions = new HashMap<String,Object>(){{
    put("idx","indexof(x>2 for x in data.intarray)");
}};
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);

In case of JSON properties, enforcing data type is required to be casted in some situations for proper functionality. For example in case of float values DOUBLE() must be used for proper sorting.

‹›

SQL
HTTP
PHP
Python
javascript
java

📋

select * from products order by double(data.myfloat) desc

POST /search
{
  "index": "products",
  "query": { "match_all": {} } },
  "sort": [ { "double(data.myfloat)": { "order": "desc"} } ]
}

$index->setName('products')->search('')->sort('double(data.myfloat)','desc')->get();

searchApi.search({"index":"products","query":{"match_all":{}}},"sort":[{"double(data.myfloat)":{"order":"desc"}}]})

res = await searchApi.search({"index":"products","query":{"match_all":{}}},"sort":[{"double(data.myfloat)":{"order":"desc"}}]});

searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
    add(new HashMap<String,String>(){{ put("double(data.myfloat)",new HashMap<String,String>(){{ put("order","desc");}});}});
}});
searchResponse = searchApi.search(searchRequest);

Multi-value attributes allow storing variable-length lists of 32-bit unsigned integers. It can be used to store one-to-many numeric values like tags, product categories, properties.

‹›

SQL
HTTP
PHP
Python
javascript
java
config

📋

CREATE TABLE products(title text, product_codes multi);

POST /sql -d "mode=raw&query=CREATE TABLE products(title text, product_codes multi)"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'product_codes'=>['type'=>'multi']
]);

utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, product_codes multi)')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, product_codes multi)');

utilsApi.sql("mode=raw&query=CREATE TABLE products(title text, product_codes multi)");

index products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_multi = product_codes
}

It supports filtering and aggregation, but not sorting. Filtering can be made of a condition that requires at least one element to pass (using ANY()) or all (ALL()).

‹›

SQL
HTTP
PHP
Python
javascript
java

📋

select * from products where any(product_codes)=3

POST /search
{
  "index": "products",
  "query":
  {
    "match_all": {},
    "equals" : { "any(product_codes)": 3 }
  }
}

$index->setName('products')->search('')->filter('any(product_codes)','equals',3)->get();

searchApi.search({"index":"products","query":{"match_all":{},"equals":{"any(product_codes)":3}}}})

res = await searchApi.search({"index":"products","query":{"match_all":{},"equals":{"any(product_codes)":3}}}})'

searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
query.put("equals",new HashMap<String,Integer>(){{
     put("any(product_codes)",3);
}});
searchRequest.setQuery(query);
searchRequest.setExpressions(expressions);
searchResponse = searchApi.search(searchRequest);

Information like least or greatest element and length of the list can be extracted. An example shows ordering by the least element of a multi-value attribute.

‹›

SQL
HTTP
PHP
Python
javascript
java

📋

select least(product_codes) l from products order by l asc

POST /search
{
  "index": "products",
  "query":
  {
    "match_all": {},
    "sort": [ { "product_codes":{ "order":"asc", "mode":"min" } } ]
  }
}

$index->setName('products')->search('')->sort('product_codes','asc','min')->get();

searchApi.search({"index":"products","query":{"match_all":{},"sort":[{"product_codes":{"order":"asc","mode":"min"}}]}})

res = await searchApi.search({"index":"products","query":{"match_all":{},"sort":[{"product_codes":{"order":"asc","mode":"min"}}]}});

searchRequest = new SearchRequest();
searchRequest.setIndex("forum");
query = new HashMap<String,Object>();
query.put("match_all",null);
searchRequest.setQuery(query);
searchRequest.setSort(new ArrayList<Object>(){{
    add(new HashMap<String,String>(){{ put("product_codes",new HashMap<String,String>(){{ put("order","asc");put("mode","min");}});}});
}});
searchResponse = searchApi.search(searchRequest);

When grouping by multi-value attribute, a document will contribute to as many groups as there are different values associated with that document. For instance, if the collection contains exactly 1 document having a 'product_codes' multi-value attribute with values 5, 7, and 11, grouping on 'product_codes' will produce 3 groups with COUNT(*)equal to 1 and GROUPBY() key values of 5, 7, and 11 respectively. Also note that grouping by multi-value attributes might lead to duplicate documents in the result set because each document can participate in many groups.

‹›

SQL

📋

insert into products values ( 1, 'doc one', (5,7,11) );
select id, count(*), groupby() from products group by product_codes;

‹›

Response

Query OK, 1 row affected (0.00 sec)

+------+----------+-----------+
| id   | count(*) | groupby() |
+------+----------+-----------+
|    1 |        1 |        11 |
|    1 |        1 |         7 |
|    1 |        1 |         5 |
+------+----------+-----------+
3 rows in set (0.00 sec)

The order of the numbers inserted as values of multi-valued attributes is not preserved. Values are stored internally as a sorted set.

‹›

SQL
HTTP
PHP
Python
javascript
java

📋

insert into product values (1,'first',(4,2,1,3));
select * from products;

POST /insert
{
    "index":"products",
    "id":1,
    "doc":
    {
        "title":"first",
        "product_codes":[4,2,1,3]
    }
}

POST /search
{
  "index": "products",
  "query": { "match_all": {} }
}

$index->addDocument([
    "title"=>"first",
    "product_codes"=>[4,2,1,3]
]);
$index->search('')-get();

indexApi.insert({"index":"products","id":1,"doc":{"title":"first","product_codes":[4,2,1,3]}})
searchApi.search({"index":"products","query":{"match_all":{}}})

await indexApi.insert({"index":"products","id":1,"doc":{"title":"first","product_codes":[4,2,1,3]}});
res = await searchApi.search({"index":"products","query":{"match_all":{}}});

InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","first");
    put("product_codes",new int[] {4,2,1,3});
}};
newdoc.index("products").id(1L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
Map<String,Object> query = new HashMap<String,Object>();
query.put("match_all",null);
SearchRequest searchRequest = new SearchRequest();
searchRequest.setIndex("products");
searchRequest.setQuery(query);
SearchResponse searchResponse = searchApi.search(searchRequest);
System.out.println(searchResponse.toString() );

‹›

Response

Query OK, 1 row affected (0.00 sec)

+------+---------------+-------+
| id   | product_codes | title |
+------+---------------+-------+
|    1 | 1,2,3,4       | first |
+------+---------------+-------+
1 row in set (0.01 sec)

{
   "_index":"products",
   "_id":1,
   "created":true,
   "result":"created",
   "status":201
}

{
   "took":0,
   "timed_out":false,
   "hits":{
      "total":1,
      "hits":[
         {
            "_id":"1",
            "_score":1,
            "_source":{
               "product_codes":[
                  1,
                  2,
                  3,
                  4
               ],
               "title":"first"
            }
         }
      ]
   }
}

Array
(
    [_index] => products
    [_id] => 1
    [created] => 1
    [result] => created
    [status] => 201
)
Array
(
    [took] => 0
    [timed_out] =>
    [hits] => Array
        (
            [total] => 1
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_id] => 1
                            [_score] => 1
                            [_source] => Array
                                (
                                    [product_codes] => Array
                                        (
                                            [0] => 1
                                            [1] => 2
                                            [2] => 3
                                            [3] => 4
                                        )

                                    [title] => first
                                )
                        )
                )
        )
)

{'created': True,
 'found': None,
 'id': 1,
 'index': 'products',
 'result': 'created'}
{'hits': {'hits': [{u'_id': u'1',
                    u'_score': 1,
                    u'_source': {u'product_codes': [1, 2, 3, 4],
                                 u'title': u'first'}}],
          'total': 1},
 'profile': None,
 'timed_out': False,
 'took': 29}

{"took":0,"timed_out":false,"hits":{"total":1,"hits":[{"_id":"1","_score":1,"_source":{"product_codes":[1,2,3,4],"title":"first"}}]}}

class SearchResponse {
    took: 0
    timedOut: false
    hits: class SearchResponseHits {
        total: 1
        hits: [{_id=1, _score=1, _source={product_codes=[1, 2, 3, 4], title=first}}]
        aggregations: null
    }
    profile: null
}

A data type type that allows storing variable-length lists of 64-bit signed integers. It has the same functionality as multi-value integer.

‹›

SQL
HTTP
PHP
Python
javascript
java
config

📋

CREATE TABLE products(title text, values multi64);

POST /sql -d "mode=raw&query=CREATE TABLE products(title text, values multi64)"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'values'=>['type'=>'multi64']
]);

utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, values multi64))')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE products(title text, values multi64))');

utilsApi.sql("mode=raw&query=CREATE TABLE products(title text, values multi64))");

index products
{
    type = rt
    path = products

    rt_field = title
    stored_fields = title

    rt_attr_multi_64 = values
}

Creating a local index

There are 2 different approaches to deal with indexes in Manticore:

Real-time mode requires no index definition in the configuration file, but presence of data_dir directive in searchd section is mandatory. Index files are stored inside the data_dir.

Replication is available only in this mode.

In this mode you can use SQL commands like CREATE TABLE, ALTER TABLE and DROP TABLE to create and change index schema and drop it. This mode is especially useful for real-time and percolate indexes.

Index names are case insensitive in RT mode.

In this mode you can specify index schema in config which will be read on Manticore start and if the index doesn't exist yet it will be created. This mode is especially useful for plain indexes that are built upon indexing data from an external storage.

Dropping indexes is only possible by removing them from the configuration file or by removing the path setting and sending a HUP signal to the server or restarting it.

Index names are case sensitive in this mode.

All index types are supported in this mode.

Index type	RT mode	Plain mode
Real-time	supported	supported
Plain	not supported	supported
Percolate	supported	supported
Distributed	supported	supported
Template	not supported	supported

Row-wise and columnar attribute storages ️Real-time index

Real-time index is a main type of Manticore indexes. It allows adding, updating and deleting documents with immediate availability of the changes. Real-time index settings can be defined in a configuration file or online via CREATE/UPDATE/DELETE/ALTER commands.

Real-time index internally consists of one or multiple plain indexes called chunks. There can be:

multiple disk chunks. They are stored on disk with the same structure as any plain index
single ram chunk. Stored in memory and used as an accumulator of changes

RAM chunk size is controlled by rt_mem_limit. Once the limit is exceeded the RAM chunk is flushed to disk in a form of a disk chunk. When there are too many disk chunks they can be merged into one for better performance using command OPTIMIZE.

‹›

SQL
HTTP
PHP
Python
Javascript
Java
CONFIG

📋

CREATE TABLE products(title text, price float) morphology='stem_en';

POST /sql -d "mode=raw&query=CREATE TABLE products(title text, price float)  morphology='stem_en'"

$index = new \Manticoresearch\Index($client);
$index->setName('products');
$index->create([
    'title'=>['type'=>'text'],
    'price'=>['type'=>'float'],
]);

utilsApi.sql('mode=raw&query=CREATE TABLE forum(title text, price float)')

res = await utilsApi.sql('mode=raw&query=CREATE TABLE forum(title text, price float)');

utilsApi.sql("mode=raw&query=CREATE TABLE forum(title text, price float)");

index products {
  type = rt
  path = idx
  rt_field = title
  rt_attr_uint = price
  stored_fields = title
}

‹›

Response

Query OK, 0 rows affected (0.00 sec)

‹›

Add documents
Update attributes and full-text fields
Delete documents
Truncate index
Change schema online with help of command ALTER
Define index in a configuration file
Use UUID for automatic ID provisioning

Index data with help of indexer
Link it with sources for easy indexing from external storages
Update it's killlist_target, it's just not needed as the real-time index takes controls of it automatically

Extension	Description
`.lock`	lock file
`.ram`	RAM chunk
`.meta`	RT index headers
`..sp`	disk chunks (see plain index format)

Creating a local index Plain index

Plain index is a basic element for non-percolate searching. It can be specified only in a configuration file in Plain mode. It is not supported in RT mode. It's normally used together with a source to index data from an external storage and afterwards can be attached to a real-time index.

build it with help of source and indexer tool which is the fastest possible way to index data
do an in-place update of an integer, float, string and MVA attribute
update it's killlist_target

insert more data into an index after it's built
update it
delete from it
create/delete/alter a plain index online (you need to define it in a configuration file)
use UUID for automatic ID generation. When you fetch data from an external storage it must include a unique identifier for each document

Except numeric attributes (including MVA), the rest of the data in a plain index is immutable. If you need to update/add new records you need to rebuild the index. While index is being rebuilt, existing index is still available for serving requests. When a new version of the index is ready, a process called rotation is performed which puts the new version online and discards the old one.

‹›

Plain index example

Plain index example

📋

A plain index can be only defined in a configuration file. It's not supported by command CREATE TABLE

source source {
  type             = mysql
  sql_host         = localhost
  sql_user         = myuser
  sql_pass         = mypass
  sql_db           = mydb
  sql_query        = SELECT id, title, description, category_id  from mytable
  sql_attr_uint    = category_id
  sql_field_string = title
 }

index idx {
  type   = plain
  source = source
  path   = /path/to/index
 }

Speed of plain indexing depends on several factors:

how fast the source can be providing the data
tokenization settings
your hardware (CPU, amount of RAM, disk performance)

In the simplest usage scenario, we would use a single plain index which we just fully rebuild from time to time. It works fine for smaller data sets and if you are ready that:

the index will be not as fresh as data in the source
indexing duration grows with the data, the more data you have in the source the longer it will take to build the index

If you have a bigger data set and still want to use a plain index rather than Real-Time what you can do is:

make another smaller index for incremental indexing
combine the both using a distributed index

What it can give is you can rebuild the bigger index seldom (say once per week), save the position of the freshest indexed document and after that use the smaller index to index anything new or updated from your source. Since you will only need to fetch the updates from your storage you can do it much more frequently (say once per minute or even each few seconds).

But after a while the smaller indexing duration will become too high and that will be the moment when you need to rebuild the bigger index and empty the smaller one.

This is called main+delta schema and you can learn more about it in this interactive course.

When you build a smaller "delta" index it can get documents that are already in the "main" index. To let Manticore know that documents from the current index should take precedence there's a mechanism called kill list and corresponding directive killlist_target.

More information on this topic can be found here.

Extension	Description
`.spa`	stores document attributes in row-wise mode
`.spb`	stores blob attributes in row-wise mode: strings, MVA, json
`.spc`	stores document attributes in columnar mode
`.spd`	stores matching document ID lists for each word ID
`.sph`	stores index header information
`.sphi`	stores histograms of attribute values
`.spi`	stores word lists (word IDs and pointers to `.spd` file)
`.spk`	stores kill-lists
`.spl`	lock file
`.spm`	stores a bitmap of killed documents
`.spp`	stores hit (aka posting, aka word occurrence) lists for each word ID
`.spt`	stores additional data structures to speed up lookups by document ids
`.spe`	stores skip-lists to speed up doc-list filtering
`.spds`	stores document texts
`.tmp*`	temporary files during index_settings_and_status
`.new.sp*`	new version of a plain index before rotation
`.old.sp*`	old version of a plain index after rotation

️Real-time index Plain and real-time index settings

Data types

Full-text fields and attributes

Full-text fields

Attributes

Row-wise and columnar attribute storages

How to switch between the storages

Document ID

Character data types

Text

String

Integer

Big Integer

Boolean

Timestamps

Float

JSON

Multi-value integer (MVA)

Multi-value big integer

Creating a local index

Online schema management (RT mode)

Defining index schema in config (Plain mode)

Index types and modes

Real-time index

Creating a real-time index via JSON over HTTP:

👍 What you can do with a real-time index:

⛔ What you cannot do with a real-time index:

Real-time index files structure

Plain index

👍 What you can do with a plain index:

⛔ What you cannot do with a plain index:

Plain indexing performance

Plain indexing scenarios

Rebuild fully when needed

Main+delta

Plain index files structure