Adding data from external storages

If you're looking for information on adding documents to a plain table, please refer to the section on adding data from external storages.

Adding documents in real-time is supported only for Real-Time and percolate tables. The corresponding SQL command, HTTP endpoint, or client functions insert new rows (documents) into a table with the provided field values. It's not necessary for a table to exist before adding documents to it. If the table doesn't exist, Manticore will attempt to create it automatically. For more information, see Auto schema.

You can insert a single or multiple documents with values for all fields of the table or just a portion of them. In this case, the other fields will be filled with their default values (0 for scalar types, an empty string for text types).

Expressions are not currently supported in INSERT, so values must be explicitly specified.

The ID field/value can be omitted, as RT and PQ tables support auto-id functionality. You can also use 0 as the id value to force automatic ID generation. Rows with duplicate IDs will not be overwritten by INSERT. Instead, you can use REPLACE for that purpose.

When using the HTTP JSON protocol, you have two different request formats to choose from: a common Manticore format and an Elasticsearch-like format. Both formats are demonstrated in the examples below.

Additionally, when using the Manticore JSON request format, keep in mind that the doc node is required, and all the values should be provided within it.

‹›

SQL
JSON
Elasticsearch
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

General syntax:

INSERT INTO <table name> [(column, ...)]
VALUES (value, ...)
[, (...)]

INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85);
INSERT INTO products(title) VALUES ('Crossbody Bag with Tassel');
INSERT INTO products VALUES (0,'Yellow bag', 4.95);

POST /insert
{
  "table":"products",
  "id":1,
  "doc":
  {
    "title" : "Crossbody Bag with Tassel",
    "price" : 19.85
  }
}
POST /insert
{
  "table":"products",
  "id":2,
  "doc":
  {
    "title" : "Crossbody Bag with Tassel"
  }
}
POST /insert
{
  "table":"products",
  "id":0,
  "doc":
  {
    "title" : "Yellow bag"
  }
}

NOTE: _create requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.

POST /products/_create/3
{
  "title": "Yellow Bag with Tassel",
  "price": 19.85
}
POST /products/_create/
{
  "title": "Red Bag with Tassel",
  "price": 19.85
}

$index->addDocuments([
        ['id' => 1, 'title' => 'Crossbody Bag with Tassel', 'price' => 19.85]
]);
$index->addDocuments([
        ['id' => 2, 'title' => 'Crossbody Bag with Tassel']
]);
$index->addDocuments([
        ['id' => 0, 'title' => 'Yellow bag']
]);

indexApi.insert({"table" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}})
indexApi.insert({"table" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}})
indexApi.insert({"table" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}})

await indexApi.insert({"table" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}})
await indexApi.insert({"table" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}})
await indexApi.insert({"table" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}})

res = await indexApi.insert({"table" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}});
res = await indexApi.insert({"table" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}});
res = await indexApi.insert({"table" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}});

InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Crossbody Bag with Tassel");
    put("price",19.85);
}};
newdoc.index("products").id(1L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Crossbody Bag with Tassel");
}};
newdoc.index("products").id(2L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "Crossbody Bag with Tassel");
doc.Add("price", 19.85);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 1, doc: doc);
var sqlresult = indexApi.Insert(newdoc);
doc = new Dictionary<string, Object>();
doc.Add("title", "Crossbody Bag with Tassel");
newdoc = new InsertDocumentRequest(index: "products", id: 2, doc: doc);
sqlresult = indexApi.Insert(newdoc);
doc = new Dictionary<string, Object>();
doc.Add("title", "Yellow bag");
newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
sqlresult = indexApi.Insert(newdoc);

let mut doc = HashMap::new();
doc.insert("title".to_string(), serde_json::json!("Crossbody Bag with Tassel"));
doc.insert("price".to_string(), serde_json::json!(19.85));
let mut insert_req = InsertDocumentRequest {
    table: serde_json::json!("products"),
    doc: serde_json::json!(doc),
    id: serde_json::json!(1),
    ..Default::default(),
};
let mut insert_res = index_api.insert(insert_req).await;
doc = HashMap::new();
doc.insert("title".to_string(), serde_json::json!("Crossbody Bag with Tassel"));
insert_req = InsertDocumentRequest {
    table: serde_json::json!("products"),
    doc: serde_json::json!(doc),
    id: serde_json::json!(2),
    ..Default::default(),
};
insert_res = index_api.insert(insert_req).await;
doc = HashMap::new();
doc.insert("title".to_string(), serde_json::json!("Tellow bag"));
insert_req = InsertDocumentRequest {
    table: serde_json::json!("products"),
    doc: serde_json::json!(doc),
    id: serde_json::json!(0),
    ..Default::default(),
};
insert_res = index_api.insert(insert_req).await;

‹›

Response

Query OK, 1 rows affected (0.00 sec)
Query OK, 1 rows affected (0.00 sec)
Query OK, 1 rows affected (0.00 sec)

{
  "table": "products",
  "_id": 1,
  "created": true,
  "result": "created",
  "status": 201
}
{
  "table": "products",
  "_id": 2,
  "created": true,
  "result": "created",
  "status": 201
}
{
  "table": "products",
  "_id": 1657860156022587406,
  "created": true,
  "result": "created",
  "status": 201
}

{
"_id":3,
"table":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
    "failed":0,
    "successful":1,
    "total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}
{
"_id":2235747273424240642,
"table":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
    "failed":0,
    "successful":1,
    "total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}

NOTE: Auto schema requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.

Manticore features an automatic table creation mechanism, which activates when a specified table in the insert query doesn't yet exist. This mechanism is enabled by default. To disable it, set auto_schema = 0 in the Searchd section of your Manticore config file.

By default, all text values in the VALUES clause are considered to be of the text type, except for values representing valid email addresses, which are treated as the string type.

If you attempt to INSERT multiple rows with different, incompatible value types for the same field, auto table creation will be canceled, and an error message will be returned. However, if the different value types are compatible, the resulting field type will be the one that accommodates all the values. Some automatic data type conversions that may occur include:

mva -> mva64
uint -> bigint -> float (this may cause some precision loss)
string -> text

The auto schema mechanism does not support creating tables with vector fields (fields of type float_vector) used for KNN (K-Nearest Neighbors) similarity search. To use vector fields in your table, you must explicitly create the table with a schema that defines these fields. If you need to store vector data in a regular table without KNN search capability, you can store it as a JSON array using the standard JSON syntax, for example: INSERT INTO table_name (vector_field) VALUES ('[1.0, 2.0, 3.0]').

Also, the following formats of dates will be recognized and converted to timestamps while all other date formats will be treated as strings:

%Y-%m-%dT%H:%M:%E*S%Z
%Y-%m-%d'T'%H:%M:%S%Z
%Y-%m-%dT%H:%M:%E*S
%Y-%m-%dT%H:%M:%s
%Y-%m-%dT%H:%M
%Y-%m-%dT%H

Keep in mind that the /bulk HTTP endpoint does not support automatic table creation (auto schema). Only the /_bulk (Elasticsearch-like) HTTP endpoint and the SQL interface support this feature.

‹›

SQL
JSON

📋

MySQL [(none)]> drop table if exists t; insert into t(i,f,t,s,j,b,m,mb) values(123,1.2,'text here','test@mail.com','{"a": 123}',1099511627776,(1,2),(1099511627776,1099511627777)); desc t; select * from t;

‹›

Response

--------------
drop table if exists t
--------------
Query OK, 0 rows affected (0.42 sec)
--------------
insert into t(i,f,t,j,b,m,mb) values(123,1.2,'text here','{"a": 123}',1099511627776,(1,2),(1099511627776,1099511627777))
--------------
Query OK, 1 row affected (0.00 sec)
--------------
desc t
--------------
+-------+--------+----------------+
| Field | Type   | Properties     |
+-------+--------+----------------+
| id    | bigint |                |
| t     | text   | indexed stored |
| s     | string |                |
| j     | json   |                |
| i     | uint   |                |
| b     | bigint |                |
| f     | float  |                |
| m     | mva    |                |
| mb    | mva64  |                |
+-------+--------+----------------+
8 rows in set (0.00 sec)
--------------
select * from t
--------------
+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
| id                  | i    | b             | f        | m    | mb                          | t         | s             | j          |
+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
| 5045949922868723723 |  123 | 1099511627776 | 1.200000 | 1,2  | 1099511627776,1099511627777 | text here | test@mail.com | {"a": 123} |
+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
1 row in set (0.00 sec)

Manticore provides an auto ID generation functionality for the column ID of documents inserted or replaced into a real-time or Percolate table. The generator produces a unique ID for a document with some guarantees, but it should not be considered an auto-incremented ID.

The generated ID value is guaranteed to be unique under the following conditions:

The server_id value of the current server is in the range of 0 to 127 and is unique among nodes in the cluster, or it uses the default value generated from the MAC address as a seed
The system time does not change for the Manticore node between server restarts
The auto ID is generated fewer than 16 million times per second between search server restarts

The auto ID generator creates a 64-bit integer for a document ID and uses the following schema:

Bits 0 to 23 form a counter that gets incremented on every call to the auto ID generator
Bits 24 to 55 represent the Unix timestamp of the server start
Bits 56 to 63 correspond to the server_id

This schema ensures that the generated ID is unique among all nodes in the cluster and that data inserted into different cluster nodes does not create collisions between the nodes.

As a result, the first ID from the generator used for auto ID is NOT 1 but a larger number. Additionally, the document stream inserted into a table might have non-sequential ID values if inserts into other tables occur between calls, as the ID generator is singular in the server and shared between all its tables.

‹›

SQL
JSON
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85);
INSERT INTO products VALUES (0,'Yello bag', 4.95);
select * from products;

POST /insert
{
  "table":"products",
  "id":0,
  "doc":
  {
    "title" : "Yellow bag"
  }
}
GET /search
{
  "table":"products",
  "query":{
    "query_string":""
  }
}

$index->addDocuments([
        ['id' => 0, 'title' => 'Yellow bag']
]);

indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}})

await indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}})

res = await indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}});

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "Yellow bag");
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

let doc = HashMap::new();
doc.insert("title".to_string(), serde_json::json!("Yellow bag"));
let insert_req = InsertDocumentRequest {
    table: serde_json::json!("products"),
    doc: serde_json::json!(doc),
    id: serde_json::json!(0),
    ..Default::default(),
};
let insert_res = index_api.insert(insert_req).await;

‹›

Response

+---------------------+-----------+---------------------------+
| id                  | price     | title                     |
+---------------------+-----------+---------------------------+
| 1657860156022587404 | 19.850000 | Crossbody Bag with Tassel |
| 1657860156022587405 |  4.950000 | Yello bag                 |
+---------------------+-----------+---------------------------+

CALL UUID_SHORT(N)

The CALL UUID_SHORT(N) statement allows for generating N unique 64-bit IDs in a single call without inserting any documents. It is particularly useful when you need to pre-generate IDs in Manticore for use in other systems or storage solutions. For example, you can generate auto-IDs in Manticore and then use them in another database, application, or workflow, ensuring consistent and unique identifiers across different environments.

‹›

Example

Example

📋

CALL UUID_SHORT(3)

‹›

Response

+---------------------+
| uuid_short()        |
+---------------------+
| 1227930988733973183 |
| 1227930988733973184 |
| 1227930988733973185 |
+---------------------+

You can insert not just a single document into a real-time table, but as many as you'd like. It's perfectly fine to insert batches of tens of thousands of documents into a real-time table. However, it's important to keep the following points in mind:

The larger the batch, the higher the latency of each insert operation
The larger the batch, the higher the indexation speed you can expect
You might want to increase the max_packet_size value to allow for larger batches
Normally, each batch insert operation is considered a single transaction with atomicity guarantee, so you will either have all the new documents in the table at once or, in case of failure, none of them will be added. See more details about an empty line or switching to another table in the "JSON" example.

Note that the /bulk HTTP endpoint does not support automatic creation of tables (auto schema). Only the /_bulk (Elasticsearch-like) HTTP endpoint and the SQL interface support this feature. The /_bulk (Elasticsearch-like) HTTP endpoint allows the table name to include the cluster name in the format cluster_name:table_name.

/_bulk endpoint accepts document IDs in the same format as Elasticsearch, and you can also include the id within the document itself:

{ "index": { "table" : "products", "_id" : "1" } }
{ "title" : "Crossbody Bag with Tassel", "price": 19.85 }

{ "index": { "table" : "products" } }
{ "title" : "Crossbody Bag with Tassel", "price": 19.85, "id": "1" }

The /bulk (Manticore mode) endpoint supports Chunked transfer encoding. You can use it to transmit large batches. It:

reduces peak RAM usage, lowering the risk of OOM
decreases response time
allows you to bypass max_packet_size and transfer batches much larger than the maximum allowed value of max_packet_size (128MB), for example, 1GB at a time.

‹›

SQL
JSON
Elasticsearch
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

For bulk insert, simply provide more documents in brackets after VALUES(). The syntax is:

INSERT INTO <table name>[(column1, column2, ...)] VALUES(value1[, value2 , ...]), (...)

The optional column name list allows you to explicitly specify values for some of the columns present in the table. All other columns will be filled with their default values (0 for scalar types, empty string for string types).

For example:

INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85), ('microfiber sheet set', 19.99), ('Pet Hair Remover Glove', 7.99);

The syntax is generally the same as for inserting a single document. Just provide more lines, one for each document, and use the /bulk endpoint instead of /insert. Enclose each document in the "insert" node. Note that it also requires:

Content-Type: application/x-ndjson
The data should be formatted as newline-delimited JSON (NDJSON). Essentially, this means that each line should contain exactly one JSON statement and end with a newline \n and possibly \r.

The /bulk endpoint supports 'insert', 'replace', 'delete', and 'update' queries. Keep in mind that you can direct operations to multiple tables, but transactions are only possible for a single table. If you specify more, Manticore will gather operations directed to one table into a single transaction. When the table changes, it will commit the collected operations and initiate a new transaction on the new table. An empty line separating batches also leads to committing the previous batch and starting a new transaction.

In the response for a /bulk request, you can find the following fields:

"errors": shows whether any errors occurred (true/false)
"error": describes the error that took place
"current_line": the line number where execution stopped (or failed); empty lines, including the first empty line, are also counted
"skipped_lines": the count of non-committed lines, beginning from the current_line and moving backward

POST /bulk
-H "Content-Type: application/x-ndjson" -d '
{"insert": {"table":"products", "id":1, "doc":  {"title":"Crossbody Bag with Tassel","price" : 19.85}}}
{"insert":{"table":"products", "id":2, "doc":  {"title":"microfiber sheet set","price" : 19.99}}}
'
POST /bulk
-H "Content-Type: application/x-ndjson" -d '
{"insert":{"table":"test1","id":21,"doc":{"int_col":1,"price":1.1,"title":"bulk doc one"}}}
{"insert":{"table":"test1","id":22,"doc":{"int_col":2,"price":2.2,"title":"bulk doc two"}}}
{"insert":{"table":"test1","id":23,"doc":{"int_col":3,"price":3.3,"title":"bulk doc three"}}}
{"insert":{"table":"test2","id":24,"doc":{"int_col":4,"price":4.4,"title":"bulk doc four"}}}
{"insert":{"table":"test2","id":25,"doc":{"int_col":5,"price":5.5,"title":"bulk doc five"}}}
'

NOTE: _bulk requires Manticore Buddy if the table doesn't exist yet. If it doesn't work, make sure Buddy is installed.

POST /_bulk
-H "Content-Type: application/x-ndjson" -d '
{ "index" : { "table" : "products" } }
{ "title" : "Yellow Bag", "price": 12 }
{ "create" : { "table" : "products" } }
{ "title" : "Red Bag", "price": 12.5, "id": 3 }
'

Use method addDocuments():

$index->addDocuments([
        ['id' => 1, 'title' => 'Crossbody Bag with Tassel', 'price' => 19.85],
        ['id' => 2, 'title' => 'microfiber sheet set', 'price' => 19.99],
        ['id' => 3, 'title' => 'Pet Hair Remover Glove', 'price' => 7.99]
]);

docs = [ \
    {"insert": {"table" : "products", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}}, \
    {"insert": {"table" : "products", "id" : 2, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}}, \
    {"insert": {"table" : "products", "id" : 3, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
]
res = indexApi.bulk('\n'.join(map(json.dumps,docs)))

docs = [ \
    {"insert": {"table" : "products", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}}, \
    {"insert": {"table" : "products", "id" : 2, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}}, \
    {"insert": {"table" : "products", "id" : 3, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
]
res = await indexApi.bulk('\n'.join(map(json.dumps,docs)))

let docs = [
    {"insert": {"table" : "products", "id" : 3, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}},
    {"insert": {"table" : "products", "id" : 4, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}},
    {"insert": {"table" : "products", "id" : 5, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
];
res =  await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));

String body = "{\"insert\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"Crossbody Bag with Tassel\", \"price\" : 19.85}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 4, \"doc\" : {\"title\" : \"microfiber sheet set\", \"price\" : 19.99}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 5, \"doc\" : {\"title\" : \"CPet Hair Remover Glove\", \"price\" : 7.99}}}"+"\n";
BulkResponse bulkresult = indexApi.bulk(body);

string body = "{\"insert\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"Crossbody Bag with Tassel\", \"price\" : 19.85}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 4, \"doc\" : {\"title\" : \"microfiber sheet set\", \"price\" : 19.99}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 5, \"doc\" : {\"title\" : \"CPet Hair Remover Glove\", \"price\" : 7.99}}}"+"\n";
BulkResponse bulkresult = indexApi.Bulk(string.Join("\n", docs));

let bulk_body = r#"{"insert": "index" : "products", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}}
    {"insert": {"index" : "products", "id" : 4, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}}
    {"insert": {"index" : "products", "id" : 5, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
"#;
index_api.bulk(bulk_body).await;

‹›

Response

Query OK, 3 rows affected (0.01 sec)

Expressions are currently not supported in INSERT, and values should be explicitly specified.

{
  "items": [
    {
      "bulk": {
        "table": "products",
        "_id": 2,
        "created": 2,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    }
  ],
  "current_line": 4,
  "skipped_lines": 0,
  "errors": false,
  "error": ""
}
{
  "items": [
    {
      "bulk": {
        "table": "test1",
        "_id": 22,
        "created": 2,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    },
    {
      "bulk": {
        "table": "test1",
        "_id": 23,
        "created": 1,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    },
    {
      "bulk": {
        "table": "test2",
        "_id": 25,
        "created": 2,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    }
  ],
  "current_line": 8,
  "skipped_lines": 0,
  "errors": false,
  "error": ""
}

{
  "items": [
    {
      "table": {
        "table": "products",
        "_type": "doc",
        "_id": 1657860156022587406,
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "create": {
        "table": "products",
        "_type": "doc",
        "_id": 3,
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1,
        "status": 201
      }
    }
  ],
  "errors": false,
  "took": 1
}

Multi-value attributes (MVA) are inserted as arrays of numbers.

‹›

SQL
JSON
Elasticsearch
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

INSERT INTO products(title, sizes) VALUES('shoes', (40,41,42,43));

POST /insert
{
  "table":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "sizes" : [40, 41, 42, 43]
  }
}

POST /products/_create/1
{
  "title": "shoes",
  "sizes" : [40, 41, 42, 43]
}

Or, alternatively

POST /products/_doc/
{
  "title": "shoes",
  "sizes" : [40, 41, 42, 43]
}

$index->addDocument(
  ['title' => 'shoes', 'sizes' => [40,41,42,43]],
  1
);

indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}})

await indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}})

res = await indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}});

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
    put("sizes",new int[]{40,41,42,43});
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "Yellow bag");
doc.Add("sizes", new List<Object> {40,41,42,43});
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

let mut doc = HashMap::new();
doc.insert("title".to_string(), serde_json::json!("Yellow bag"));
doc.insert("sizes".to_string(), serde_json::json!([40,41,42,43]));
let insert_req = InsertDocumentRequest::new("products".to_string(), serde_json::json!(doc));
let insert_res = index_api.insert(insert_req).await;

JSON value can be inserted as an escaped string (via SQL or JSON) or as a JSON object (via the JSON interface).

‹›

SQL
JSON
Elasticsearch
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

INSERT INTO products VALUES (1, 'shoes', '{"size": 41, "color": "red"}');

JSON value can be inserted as a JSON object

POST /insert
{
  "table":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "meta" : {
      "size": 41,
      "color": "red"
    }
  }
}

JSON value can be also inserted as a string containing escaped JSON:

POST /insert
{
  "table":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "meta" : "{\"size\": 41, \"color\": \"red\"}"
  }
}

POST /products/_create/1
{
  "title": "shoes",
  "meta" : {
    "size": 41,
    "color": "red"
  }
}

Or, alternatively

POST /products/_doc/
{
  "title": "shoes",
  "meta" : {
    "size": 41,
    "color": "red"
  }
}

$index->addDocument(
  ['title' => 'shoes', 'meta' => '{"size": 41, "color": "red"}'],
  1
);

indexApi = api = manticoresearch.IndexApi(client)
indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}})

indexApi = api = manticoresearch.IndexApi(client)
await indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}})

res = await indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}});

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
    put("meta",
        new HashMap<String,Object>(){{
            put("size",41);
            put("color","red");
        }});
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

Dictionary<string, Object> meta = new Dictionary<string, Object>();
meta.Add("size", 41);
meta.Add("color", "red");
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "Yellow bag");
doc.Add("meta", meta);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

let mut meta = HashMap::new();
metadoc.insert("size".to_string(), serde_json::json!(41));
meta.insert("color".to_string(), serde_json::json!("red"));
let mut doc = HashMap::new();
doc.insert("title".to_string(), serde_json::json!("Yellow bag"));
doc.insert("meta".to_string(), serde_json::json!(meta));
let insert_req = InsertDocumentRequest::new("products".to_string(), serde_json::json!(doc));
let insert_res = index_api.insert(insert_req).await;

Adding rules to a percolate table

In a percolate table documents that are percolate query rules are stored and must follow the exact schema of four fields:

field	type	description
id	bigint	PQ rule identifier (if omitted, it will be assigned automatically)
query	string	Full-text query (can be empty) compatible with the percolate table
filters	string	Additional filters by non-full-text fields (can be empty) compatible with the percolate table
tags	string	A string with one or many comma-separated tags, which may be used to selectively show/delete saved queries

Any other field names are not supported and will trigger an error.

Warning: Inserting/replacing JSON-formatted PQ rules via SQL will not work. In other words, the JSON-specific operators (match etc) will be considered just parts of the rule's text that should match with documents. If you prefer JSON syntax, use the HTTP endpoint instead of INSERT/REPLACE.

‹›

SQL
JSON
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

INSERT INTO pq(id, query, filters) VALUES (1, '@title shoes', 'price > 5');
INSERT INTO pq(id, query, tags) VALUES (2, '@title bag', 'Louis Vuitton');
SELECT * FROM pq;

There are two ways you can add a percolate query into a percolate table:

Query in JSON /search compatible format, described at json/search

PUT /pq/pq_table/doc/1
{
"query": {
  "match": {
    "title": "shoes"
  },
  "range": {
    "price": {
      "gt": 5
    }
  }
},
"tags": ["Loius Vuitton"]
}

Query in SQL format, described at search query syntax

PUT /pq/pq_table/doc/2
{
"query": {
  "ql": "@title shoes"
},
"filters": "price > 5",
"tags": ["Loius Vuitton"]
}

$newstoredquery = [
    'table' => 'test_pq',
    'body' => [
        'query' => [
                       'match' => [
                               'title' => 'shoes'
                       ]
               ],
               'range' => [
                       'price' => [
                               'gt' => 5
                       ]
               ]
       ],
    'tags' => ['Loius Vuitton']
];
$client->pq()->doc($newstoredquery);

newstoredquery ={"table" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
indexApi.insert(newstoredquery)

newstoredquery ={"table" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
await indexApi.insert(newstoredquery)

newstoredquery ={"table" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}};
indexApi.insert(newstoredquery);

newstoredquery = new HashMap<String,Object>(){{
    put("query",new HashMap<String,Object >(){{
        put("q1","@title shoes");
        put("filters","price>5");
        put("tags",new String[] {"Loius Vuitton"});
    }});
}};
newdoc.index("test_pq").id(2L).setDoc(doc);
indexApi.insert(newdoc);

Dictionary<string, Object> query = new Dictionary<string, Object>();
query.Add("q1", "@title shoes");
query.Add("filters", "price>5");
query.Add("tags", new List<string> {"Loius Vuitton"});
Dictionary<string, Object> newstoredquery = new Dictionary<string, Object>();
newstoredquery.Add("query", query);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "test_pq", id: 2, doc: doc);
indexApi.Insert(newdoc);

let mut pq_doc = HashMap::new();
pq_doc.insert("q1".to_string(), serde_json::json!("@title shoes"));
pq_doc.insert("filters".to_string(), serde_json::json!("price>5"));
pq_doc.insert("tags".to_string(), serde_json::json!(["Louis Vitton"]));
let mut doc = HashMap::new();
pq_doc.insert("query".to_string(), serde_json::json!(pq_doc));
let insert_req = InsertDocumentRequest::new("test_pq".to_string(), serde_json::json!(doc));
let insert_res = index_api.insert(insert_req).await;

‹›

Response

+------+--------------+---------------+---------+
| id   | query        | tags          | filters |
+------+--------------+---------------+---------+
|    1 | @title shoes |               | price>5 |
|    2 | @title bag   | Louis Vuitton |         |
+------+--------------+---------------+---------+

If you don't specify an ID, it will be assigned automatically. You can read more about auto-ID here.

‹›

SQL
JSON
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

INSERT INTO pq(query, filters) VALUES ('wristband', 'price > 5');
SELECT * FROM pq;

PUT /pq/pq_table/doc
{
"query": {
  "match": {
    "title": "shoes"
  },
  "range": {
    "price": {
      "gt": 5
    }
  }
},
"tags": ["Loius Vuitton"]
}
PUT /pq/pq_table/doc
{
"query": {
  "ql": "@title shoes"
},
"filters": "price > 5",
"tags": ["Loius Vuitton"]
}

$newstoredquery = [
    'table' => 'pq_table',
    'body' => [
        'query' => [
                       'match' => [
                               'title' => 'shoes'
                       ]
               ],
               'range' => [
                       'price' => [
                               'gt' => 5
                       ]
               ]
       ],
    'tags' => ['Loius Vuitton']
];
$client->pq()->doc($newstoredquery);

indexApi = api = manticoresearch.IndexApi(client)
newstoredquery ={"table" : "test_pq",   "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
indexApi.insert(store_query)

indexApi = api = manticoresearch.IndexApi(client)
newstoredquery ={"table" : "test_pq",   "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
await indexApi.insert(store_query)

newstoredquery ={"table" : "test_pq",  "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}};
res =  await indexApi.insert(store_query);

newstoredquery = new HashMap<String,Object>(){{
    put("query",new HashMap<String,Object >(){{
        put("q1","@title shoes");
        put("filters","price>5");
        put("tags",new String[] {"Loius Vuitton"});
    }});
}};
newdoc.index("test_pq").setDoc(doc);
indexApi.insert(newdoc);

Dictionary<string, Object> query = new Dictionary<string, Object>();
query.Add("q1", "@title shoes");
query.Add("filters", "price>5");
query.Add("tags", new List<string> {"Loius Vuitton"});
Dictionary<string, Object> newstoredquery = new Dictionary<string, Object>();
newstoredquery.Add("query", query);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "test_pq", doc: doc);
indexApi.Insert(newdoc);

let mut pq_doc = HashMap::new();
pq_doc.insert("q1".to_string(), serde_json::json!("@title shoes"));
pq_doc.insert("filters".to_string(), serde_json::json!("price>5"));
pq_doc.insert("tags".to_string(), serde_json::json!(["Louis Vitton"]));
let mut doc = HashMap::new();
pq_doc.insert("query".to_string(), serde_json::json!(pq_doc));
let insert_req = InsertDocumentRequest::new("test_pq".to_string(), serde_json::json!(doc));
let insert_res = index_api.insert(insert_req).await;

‹›

Response

+---------------------+-----------+------+---------+
| id                  | query     | tags | filters |
+---------------------+-----------+------+---------+
| 1657843905795719192 | wristband |      | price>5 |
+---------------------+-----------+------+---------+

{
  "table": "pq_table",
  "type": "doc",
  "_id": 1657843905795719196,
  "result": "created"
}
{
  "table": "pq_table",
  "type": "doc",
  "_id": 1657843905795719198,
  "result": "created"
}

Array(
       [index] => pq_table
       [type] => doc
       [_id] => 1657843905795719198
       [result] => created
)

{'created': True,
 'found': None,
 'id': 1657843905795719198,
 'table': 'test_pq',
 'result': 'created'}

{'created': True,
 'found': None,
 'id': 1657843905795719198,
 'table': 'test_pq',
 'result': 'created'}

{"table":"test_pq","_id":1657843905795719198,"created":true,"result":"created"}

In case of omitted schema in SQL INSERT command, the following parameters are expected:

ID. You can use 0 as the ID to trigger auto-ID generation.
Query - Full-text query.
Tags - PQ rule tags string.
Filters - Additional filters by attributes.

‹›

SQL

📋

INSERT INTO pq VALUES (0, '@title shoes', '', '');
INSERT INTO pq VALUES (0, '@title shoes', 'Louis Vuitton', '');
SELECT * FROM pq;

‹›

Response

+---------------------+--------------+---------------+---------+
| id                  | query        | tags          | filters |
+---------------------+--------------+---------------+---------+
| 2810855531667783688 | @title shoes |               |         |
| 2810855531667783689 | @title shoes | Louis Vuitton |         |
+---------------------+--------------+---------------+---------+

To replace an existing PQ rule with a new one in SQL, just use a regular REPLACE command. There's a special syntax ?refresh=1 to replace a PQ rule defined in JSON mode via the HTTP JSON interface.

‹›

SQL
JSON

📋

mysql> select * from pq;
+---------------------+--------------+------+---------+
| id                  | query        | tags | filters |
+---------------------+--------------+------+---------+
| 2810823411335430148 | @title shoes |      |         |
+---------------------+--------------+------+---------+
1 row in set (0.00 sec)
mysql> replace into pq(id,query) values(2810823411335430148,'@title boots');
Query OK, 1 row affected (0.00 sec)
mysql> select * from pq;
+---------------------+--------------+------+---------+
| id                  | query        | tags | filters |
+---------------------+--------------+------+---------+
| 2810823411335430148 | @title boots |      |         |
+---------------------+--------------+------+---------+
1 row in set (0.00 sec)

Adding documents to a real-time table Adding data from external storages

You can ingest data into Manticore from external storages using various methods:

Indexer tool to fetch data from various databases into plain tables.
Logstash, Filebeat, and Vector.dev integrations to put data to Manticore real-time tables from these tools.
Kafka integration to synchronize data from Kafka topics into a real-time table.

Adding rules to a percolate table Plain tables creation

Plain tables are tables that are created one-time by fetching data at creation from one or several sources. A plain table is immutable as documents cannot be added or deleted during its lifespan. It is only possible to update values of numeric attributes (including MVA). Refreshing the data is only possible by recreating the whole table.

Plain tables are available only in the Plain mode and their definition is made up of a table declaration and one or several source declarations. The data gathering and table creation are not made by the searchd server but by the auxiliary tool indexer.

Indexer is a command-line tool that can be called directly from the command line or from shell scripts.

It can accept a number of arguments when called, but there are also several settings of its own in the Manticore configuration file.

In the typical scenario, indexer does the following:

Fetches the data from the source
Builds the plain table
Writes the table files
(Optional) Informs the search server about the new table which triggers table rotation

The indexer tool is used to create plain tables in Manticore Search. It has a general syntax of:

indexer [OPTIONS] [table_name1 [table_name2 [...]]]

When creating tables with indexer, the generated table files must be made with permissions that allow searchd to read, write, and delete them. In case of the official Linux packages, searchd runs under the manticore user. Therefore, indexer must also run under the manticore user:

sudo -u manticore indexer ...

If you are running searchd differently, you might need to omit sudo -u manticore. Just make sure that the user under which your searchd instance is running has read/write permissions to the tables generated using indexer.

To create a plain table, you need to list the table(s) you want to process. For example, if your manticore.conf file contains details on two tables, mybigindex and mysmallindex, you could run:

sudo -u manticore indexer mysmallindex mybigindex

You can also use wildcard tokens to match table names:

? matches any single character
* matches any count of any characters
% matches none or any single character

sudo -u manticore indexer indexpart*main --rotate

The exit codes for indexer are as follows:

0: everything went OK
1: there was a problem while indexing (and if --rotate was specified, it was skipped) or an operation emitted a warning
2: indexing went OK, but the --rotate attempt failed

You can also start indexer using the following systemctl unit file:

systemctl start --no-block manticore-indexer

Or, in case you want to build a specific table:

systemctl start --no-block manticore-indexer@specific-table-name

Use the systemctl set-environment INDEXER_CONFIG command to run the Indexer with a custom configuration, which replaces the default settings.

The systemctl set-environment INDEXER_ARGS command lets you add custom startup options for the Indexer. For a complete list of command-line options, see here.

For instance, to start the Indexer in quiet mode, run:

systemctl set-environment INDEXER_ARGS='--quiet'
systemctl restart manticore-indexer

To revert the changes, run:

systemctl set-environment INDEXER_ARGS=''
systemctl restart manticore-indexer

--config <file> (-c <file> for short) tells indexer to use the given file as its configuration. Normally, it will look for manticore.conf in the installation directory (e.g. /etc/manticoresearch/manticore.conf), followed by the current directory you are in when calling indexer from the shell. This is most useful in shared environments where the binary files are installed in a global folder, e.g. /usr/bin/, but you want to provide users with the ability to make their own custom Manticore set-ups, or if you want to run multiple instances on a single server. In cases like those you could allow them to create their own manticore.conf files and pass them to indexer with this option. For example:
```
sudo -u manticore indexer --config /home/myuser/manticore.conf mytable
```
--all tells indexer to update every table listed in manticore.conf instead of listing individual tables. This would be useful in small configurations or cron-kind or maintenance jobs where the entire table set will get rebuilt each day or week or whatever period is best. Please note that since --all tries to update all found tables in the configuration, it will issue a warning if it encounters RealTime tables and the exit code of the command will be 1 not 0 even if the plain tables finished without issue. Example usage:
```
sudo -u manticore indexer --config /home/myuser/manticore.conf --all
```
--rotate is used for rotating tables. Unless you have the situation where you can take the search function offline without troubling users you will almost certainly need to keep search running whilst indexing new documents. --rotate creates a second table, parallel to the first (in the same place, simply including .new in the filenames). Once complete, indexer notifies searchd via sending the SIGHUP signal, and the searchd will attempt to rename the tables (renaming the existing ones to include .old and renaming the .new to replace them), and then will start serving from the newer files. Depending on the setting of seamless_rotate there may be a slight delay in being able to search the newer tables. In case multiple tables are rotated at once which are chained by killlist_target relations rotation will start with the tables that are not targets and finish with the ones at the end of target chain. Example usage:
```
sudo -u manticore indexer --rotate --all
```
--quiet tells indexer ot to output anything, unless there is an error. This is mostly used for cron-type or other scripted jobs where the output is irrelevant or unnecessary, except in the event of some kind of error. Example usage:
```
sudo -u manticore indexer --rotate --all --quiet
```
--noprogress does not display progress details as they occur. Instead, the final status details (such as documents indexed, speed of indexing and so on are only reported at completion of indexing. In instances where the script is not being run on a console (or 'tty'), this will be on by default. Example usage:
```
sudo -u manticore indexer --rotate --all --noprogress
```
--buildstops <outputfile.text> <N> reviews the table source, as if it were indexing the data, and produces a list of the terms that are being indexed. In other words, it produces a list of all the searchable terms that are becoming part of the table. Note, it does not update the table in question, it simply processes the data as if it were indexing, including running queries defined with sql_query_pre or sql_query_post. outputfile.txt will contain the list of words, one per line, sorted by frequency with most frequent first, and N specifies the maximum number of words that will be listed. If it's sufficiently large to encompass every word in the table, only that many words will be returned. Such a dictionary list could be used for client application features around "Did you mean…" functionality, usually in conjunction with --buildfreqs, below. Example:
```
sudo -u manticore indexer mytable --buildstops word_freq.txt 1000
```
This would produce a document in the current directory, word_freq.txt, with the 1,000 most common words in 'mytable', ordered by most common first. Note that the file will pertain to the last table indexed when specified with multiple tables or --all (i.e. the last one listed in the configuration file)
--buildfreqs works with --buildstops (and is ignored if --buildstops is not specified). As --buildstops provides the list of words used within the table, --buildfreqs adds the quantity present in the table, which would be useful in establishing whether certain words should be considered stopwords if they are too prevalent. It will also help with developing "Did you mean…" features where you need to know how much more common a given word compared to another, similar one. For example:
```
sudo -u manticore indexer mytable --buildstops word_freq.txt 1000 --buildfreqs
```
This would produce the word_freq.txt as above, however after each word would be the number of times it occurred in the table in question.
--merge <dst-table> <src-table> is used for physically merging tables together, for example if you have a main+delta scheme, where the main table rarely changes, but the delta table is rebuilt frequently, and --merge would be used to combine the two. The operation moves from right to left - the contents of src-table get examined and physically combined with the contents of dst-table and the result is left in dst-table. In pseudo-code, it might be expressed as: dst-table += src-table An example:
```
sudo -u manticore indexer --merge main delta --rotate
```
In the above example, where the main is the master, rarely modified table, and the delta is more frequently modified one, you might use the above to call indexer to combine the contents of the delta into the main table and rotate the tables.
--merge-dst-range <attr> <min> <max> runs the filter range given upon merging. Specifically, as the merge is applied to the destination table (as part of --merge, and is ignored if --merge is not specified), indexer will also filter the documents ending up in the destination table, and only documents will pass through the filter given will end up in the final table. This could be used for example, in a table where there is a 'deleted' attribute, where 0 means 'not deleted'. Such a table could be merged with:
```
sudo -u manticore indexer --merge main delta --merge-dst-range deleted 0 0
```
Any documents marked as deleted (value 1) will be removed from the newly-merged destination table. It can be added several times to the command line, to add successive filters to the merge, all of which must be met in order for a document to become part of the final table.
--merge-killlists (and its shorter alias --merge-klists) changes the way kill lists are processed when merging tables. By default, both kill lists get discarded after a merge. That supports the most typical main+delta merge scenario. With this option enabled, however, kill lists from both tables get concatenated and stored into the destination table. Note that a source (delta) table kill list will be used to suppress rows from a destination (main) table at all times.
--keep-attrs allows to reuse existing attributes on reindexing. Whenever the table is rebuilt, each new document id is checked for presence in the "old" table, and if it already exists, its attributes are transferred to the "new" table; if not found, attributes from the new table are used. If the user has updated attributes in the table, but not in the actual source used for the table, all updates will be lost when reindexing; using --keep-attrs enables saving the updated attribute values from the previous table. It is possible to specify a path for table files to be used instead of the reference path from the config:
```
sudo -u manticore indexer mytable --keep-attrs=/path/to/index/files
```
--keep-attrs-names=<attributes list> allows you to specify attributes to reuse from an existing table on reindexing. By default, all attributes from the existing table are reused in the new table:
```
sudo -u manticore indexer mytable --keep-attrs=/path/to/table/files --keep-attrs-names=update,state
```
--dump-rows <FILE> dumps rows fetched by SQL source(s) into the specified file, in a MySQL compatible syntax. The resulting dumps are the exact representation of data as received by indexer and can help repeat indexing-time issues. The command performs fetching from the source and creates both table files and the dump file.
--print-rt <rt_index> <table> outputs fetched data from the source as INSERTs for a real-time table. The first lines of the dump will contain the real-time fields and attributes (as a reflection of the plain table fields and attributes). The command performs fetching from the source and creates both table files and the dump output. The command can be used as sudo -u manticore indexer -c manticore.conf --print-rt indexrt indexplain > dump.sql. Only SQL-based sources are supported. MVAs are not supported.
--sighup-each is useful when you are rebuilding many big tables and want each one rotated into searchd as soon as possible. With --sighup-each, indexer will send the SIGHUP signal to searchd after successfully completing work on each table. (The default behavior is to send a single SIGHUP after all the tables are built).
--nohup is useful when you want to check your table with indextool before actually rotating it. indexer won't send the SIGHUP if this option is on. Table files are renamed to .tmp. Use indextool to rename table files to .new and rotate it. Example usage:
```
sudo -u manticore indexer --rotate --nohup mytable
sudo -u manticore indextool --rotate --check mytable
```
--print-queries prints out SQL queries that indexer sends to the database, along with SQL connection and disconnection events. That is useful to diagnose and fix problems with SQL sources.
--help (-h for short) lists all the parameters that can be called in indexer.
-v shows indexer version.

You can also configure indexer behavior in the Manticore configuration file in the indexer section:

indexer {
...
}

lemmatizer_cache = 256M

Lemmatizer cache size. Optional, default is 256K.

Our lemmatizer implementation uses a compressed dictionary format that enables a space/speed tradeoff. It can either perform lemmatization off the compressed data, using more CPU but less RAM, or it can decompress and precache the dictionary either partially or fully, thus using less CPU but more RAM. The lemmatizer_cache directive lets you control how much RAM exactly can be spent for that uncompressed dictionary cache.

Currently, the only available dictionaries are ru.pak, en.pak, and de.pak. These are the Russian, English, and German dictionaries. The compressed dictionary is approximately 2 to 10 MB in size. Note that the dictionary stays in memory at all times too. The default cache size is 256 KB. The accepted cache sizes are 0 to 2047 MB. It's safe to raise the cache size too high; the lemmatizer will only use the needed memory. For example, the entire Russian dictionary decompresses to approximately 110 MB; thus settinglemmatizer_cache higher than that will not affect the memory use. Even when 1024 MB is allowed for the cache, if only 110 MB is needed, it will only use those 110 MB.

max_file_field_buffer = 128M

Maximum file field adaptive buffer size in bytes. Optional, default is 8MB, minimum is 1MB.

The file field buffer is used to load files referred to from sql_file_field columns. This buffer is adaptive, starting at 1 MB at first allocation, and growing in 2x steps until either the file contents can be loaded or the maximum buffer size, specified by the max_file_field_buffer directive, is reached.

Thus, if no file fields are specified, no buffer is allocated at all. If all files loaded during indexing are under (for example) 2 MB in size, but the max_file_field_buffer value is 128 MB, the peak buffer usage would still be only 2 MB. However, files over 128 MB would be entirely skipped.

max_iops = 40

Maximum I/O operations per second, for I/O throttling. Optional, default is 0 (unlimited).

I/O throttling related option. It limits the maximum count of I/O operations (reads or writes) per any given second. A value of 0 means that no limit is imposed.

indexer can cause bursts of intensive disk I/O during building a table, and it might be desirable to limit its disk activity (and reserve something for other programs running on the same machine, such as searchd). I/O throttling helps to do that. It works by enforcing a minimum guaranteed delay between subsequent disk I/O operations performed by indexer. Throttling I/O can help reduce search performance degradation caused by building. This setting is not effective for other kinds of data ingestion, e.g. inserting data into a real-time table.

max_iosize = 1048576

Maximum allowed I/O operation size, in bytes, for I/O throttling. Optional, default is 0 (unlimited).

I/O throttling related option. It limits the maximum file I/O operation (read or write) size for all operations performed by indexer. A value of 0 means that no limit is imposed. Reads or writes that are bigger than the limit will be split into several smaller operations, and counted as several operations by the max_iops setting. At the time of this writing, all I/O calls should be under 256 KB (default internal buffer size) anyway, so max_iosize values higher than 256 KB should not have any effect.

max_xmlpipe2_field = 8M

Maximum allowed field size for XMLpipe2 source type, in bytes. Optional, default is 2 MB.

mem_limit = 256M
# mem_limit = 262144K # same, but in KB
# mem_limit = 268435456 # same, but in bytes

Plain table building RAM usage limit. Optional, default is 128 MB. Enforced memory usage limit that the indexer will not go above. Can be specified in bytes, or kilobytes (using K postfix), or megabytes (using M postfix); see the example. This limit will be automatically raised if set to an extremely low value causing I/O buffers to be less than 8 KB; the exact lower bound for that depends on the built data size. If the buffers are less than 256 KB, a warning will be produced.

The maximum possible limit is 2047M. Too low values can hurt plain table building speed, but 256M to 1024M should be enough for most, if not all datasets. Setting this value too high can cause SQL server timeouts. During the document collection phase, there will be periods when the memory buffer is partially sorted and no communication with the database is performed; and the database server can timeout. You can resolve that either by raising timeouts on the SQL server side or by lowering mem_limit.

on_file_field_error = skip_document

How to handle IO errors in file fields. Optional, default is ignore_field. When there is a problem indexing a file referenced by a file field (sql_file_field), indexer can either process the document, assuming empty content in this particular field, or skip the document, or fail indexing entirely. on_file_field_error directive controls that behavior. The values it takes are:

ignore_field, process the current document without field;
skip_document, skip the current document but continue indexing;
fail_index, fail indexing with an error message.

The problems that can arise are: open error, size error (file too big), and data read error. Warning messages on any problem will be given at all times, regardless of the phase and the on_file_field_error setting.

Note that with on_file_field_error = skip_document documents will only be ignored if problems are detected during an early check phase, and not during the actual file parsing phase. indexer will open every referenced file and check its size before doing any work, and then open it again when doing actual parsing work. So in case a file goes away between these two open attempts, the document will still be indexed.

write_buffer = 4M

Write buffer size, bytes. Optional, default is 1MB. Write buffers are used to write both temporary and final table files when indexing. Larger buffers reduce the number of required disk writes. Memory for the buffers is allocated in addition to mem_limit. Note that several (currently up to 4) buffers for different files will be allocated, proportionally increasing the RAM usage.

ignore_non_plain = 1

ignore_non_plain allows you to completely ignore warnings about skipping non-plain tables. The default is 0 (not ignoring).

There are two approaches to scheduling indexer runs. The first way is the classical method of using crontab. The second way is using a systemd timer with a user-defined schedule. To create the timer unit files, you should place them in the appropriate directory where systemd looks for such unit files. On most Linux distributions, this directory is typically /etc/systemd/system. Here's how to do it:

Create a timer unit file for your custom schedule:

cat << EOF > /etc/systemd/system/manticore-indexer@.timer
[Unit]
Description=Run Manticore Search's indexer on schedule
[Timer]
OnCalendar=minutely
RandomizedDelaySec=5m
Unit=manticore-indexer@%i.service
[Install]
WantedBy=timers.target
EOF

More on the OnCalendar syntax and examples can be found here.

Edit the timer unit for your specific needs.

Enable the timer:

systemctl enable manticore-indexer@idx1.timer

Start the timer:

systemctl start manticore-indexer@idx1.timer

Repeat steps 2-4 for any additional timers.

Adding data from external storages Fetching from databases

Adding documents to a real-time table

Auto schema

Auto ID

UUID_SHORT multi-ID generation

Bulk adding documents

Chunked transfer in /bulk

Inserting multi-value attributes (MVA) values

Inserting JSON

Adding rules to a percolate table

Auto ID provisioning

No schema in SQL

Replacing rules in a PQ table

Plain tables creation

Indexer tool

Indexer systemd service

Indexer command line arguments

Indexer configuration settings

lemmatizer_cache

max_file_field_buffer

max_iops

max_iosize

max_xmlpipe2_field

mem_limit

on_file_field_error

write_buffer

ignore_non_plain

Schedule indexer via systemd