数据创建和修改 > 从外部存储添加数据

如果您正在寻找有关将文档添加到普通表的信息，请参阅从外部存储添加数据部分。

实时添加文档仅支持实时和Percolate表。相应的SQL命令、HTTP端点或客户端函数将新行（文档）插入到表中，提供字段值。在添加文档之前，表不一定存在。如果表不存在，Manticore 将尝试自动创建它。有关更多信息，请参阅自动模式。

您可以插入单个或多个文档，这些文档包含表的所有字段值或仅部分字段值。在这种情况下，其他字段将使用其默认值填充（标量类型为0，文本类型为空字符串）。

INSERT 中目前不支持表达式，因此必须显式指定值。

ID字段/值可以省略，因为RT和PQ表支持自动ID功能。您也可以使用0作为ID值以强制生成自动ID。具有重复ID的行不会通过INSERT 被覆盖。相反，您可以使用REPLACE来实现这一点。

使用HTTP JSON协议时，您有两种不同的请求格式可供选择：通用Manticore格式和Elasticsearch类似的格式。这两种格式在下面的示例中都有展示。

此外，在使用Manticore JSON请求格式时，请注意doc节点是必需的，并且所有值都应在此节点内提供。

‹›

SQL
JSON
Elasticsearch
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

通用语法：

INSERT INTO <table name> [(column, ...)]
VALUES (value, ...)
[, (...)]

INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85);
INSERT INTO products(title) VALUES ('Crossbody Bag with Tassel');
INSERT INTO products VALUES (0,'Yellow bag', 4.95);

POST /insert
{
  "table":"products",
  "id":1,
  "doc":
  {
    "title" : "Crossbody Bag with Tassel",
    "price" : 19.85
  }
}
POST /insert
{
  "table":"products",
  "id":2,
  "doc":
  {
    "title" : "Crossbody Bag with Tassel"
  }
}
POST /insert
{
  "table":"products",
  "id":0,
  "doc":
  {
    "title" : "Yellow bag"
  }
}

注意：_create 需要Manticore Buddy。如果不起作用，请确保已安装Buddy。

POST /products/_create/3
{
  "title": "Yellow Bag with Tassel",
  "price": 19.85
}
POST /products/_create/
{
  "title": "Red Bag with Tassel",
  "price": 19.85
}

$index->addDocuments([
        ['id' => 1, 'title' => 'Crossbody Bag with Tassel', 'price' => 19.85]
]);
$index->addDocuments([
        ['id' => 2, 'title' => 'Crossbody Bag with Tassel']
]);
$index->addDocuments([
        ['id' => 0, 'title' => 'Yellow bag']
]);

indexApi.insert({"table" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}})
indexApi.insert({"table" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}})
indexApi.insert({"table" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}})

await indexApi.insert({"table" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}})
await indexApi.insert({"table" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}})
await indexApi.insert({"table" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}})

res = await indexApi.insert({"table" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}});
res = await indexApi.insert({"table" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}});
res = await indexApi.insert({"table" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}});

InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Crossbody Bag with Tassel");
    put("price",19.85);
}};
newdoc.index("products").id(1L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Crossbody Bag with Tassel");
}};
newdoc.index("products").id(2L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "Crossbody Bag with Tassel");
doc.Add("price", 19.85);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 1, doc: doc);
var sqlresult = indexApi.Insert(newdoc);
doc = new Dictionary<string, Object>();
doc.Add("title", "Crossbody Bag with Tassel");
newdoc = new InsertDocumentRequest(index: "products", id: 2, doc: doc);
sqlresult = indexApi.Insert(newdoc);
doc = new Dictionary<string, Object>();
doc.Add("title", "Yellow bag");
newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
sqlresult = indexApi.Insert(newdoc);

let mut doc = HashMap::new();
doc.insert("title".to_string(), serde_json::json!("Crossbody Bag with Tassel"));
doc.insert("price".to_string(), serde_json::json!(19.85));
let mut insert_req = InsertDocumentRequest {
    table: serde_json::json!("products"),
    doc: serde_json::json!(doc),
    id: serde_json::json!(1),
    ..Default::default(),
};
let mut insert_res = index_api.insert(insert_req).await;
doc = HashMap::new();
doc.insert("title".to_string(), serde_json::json!("Crossbody Bag with Tassel"));
insert_req = InsertDocumentRequest {
    table: serde_json::json!("products"),
    doc: serde_json::json!(doc),
    id: serde_json::json!(2),
    ..Default::default(),
};
insert_res = index_api.insert(insert_req).await;
doc = HashMap::new();
doc.insert("title".to_string(), serde_json::json!("Tellow bag"));
insert_req = InsertDocumentRequest {
    table: serde_json::json!("products"),
    doc: serde_json::json!(doc),
    id: serde_json::json!(0),
    ..Default::default(),
};
insert_res = index_api.insert(insert_req).await;

‹›

Response

Query OK, 1 rows affected (0.00 sec)
Query OK, 1 rows affected (0.00 sec)
Query OK, 1 rows affected (0.00 sec)

{
  "table": "products",
  "_id": 1,
  "created": true,
  "result": "created",
  "status": 201
}
{
  "table": "products",
  "_id": 2,
  "created": true,
  "result": "created",
  "status": 201
}
{
  "table": "products",
  "_id": 1657860156022587406,
  "created": true,
  "result": "created",
  "status": 201
}

{
"_id":3,
"table":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
    "failed":0,
    "successful":1,
    "total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}
{
"_id":2235747273424240642,
"table":"products",
"_primary_term":1,
"_seq_no":0,
"_shards":{
    "failed":0,
    "successful":1,
    "total":1
},
"_type":"_doc",
"_version":1,
"result":"updated"
}

注意：自动模式需要Manticore Buddy。如果不起作用，请确保已安装Buddy。

Manticore 具有自动表创建机制，当插入查询中指定的表尚未存在时，该机制会自动激活。此机制默认启用。要禁用它，请在Manticore配置文件的Searchd部分中设置auto_schema = 0。

默认情况下，VALUES 子句中的所有文本值被视为text类型，除非它们表示有效的电子邮件地址，这些地址将被视为string类型。

如果您尝试插入具有相同字段但不兼容值类型的多行，则自动表创建将被取消，并返回错误消息。但是，如果不同的值类型是兼容的，则结果字段类型将是能够容纳所有值的类型。可能发生的自动数据类型转换包括：

mva -> mva64
uint -> bigint -> float（这可能会导致一些精度损失）
string -> text

自动模式机制不支持创建包含用于KNN（K-最近邻）相似搜索的向量字段（类型为float_vector）的表。要在表中使用向量字段，您必须显式创建具有定义这些字段的模式的表。如果您需要在没有KNN搜索功能的普通表中存储向量数据，可以使用标准JSON语法将其存储为JSON数组，例如：INSERT INTO table_name (vector_field) VALUES ('[1.0, 2.0, 3.0]')。

此外，以下日期格式将被识别并转换为时间戳，而其他日期格式将被视为字符串：

%Y-%m-%dT%H:%M:%E*S%Z
%Y-%m-%d'T'%H:%M:%S%Z
%Y-%m-%dT%H:%M:%E*S
%Y-%m-%dT%H:%M:%s
%Y-%m-%dT%H:%M
%Y-%m-%dT%H

请注意，/bulk HTTP端点不支持自动表创建（自动模式）。仅/_bulk（Elasticsearch类似的）HTTP端点和SQL接口支持此功能。

‹›

SQL
JSON

📋

MySQL [(none)]> drop table if exists t; insert into t(i,f,t,s,j,b,m,mb) values(123,1.2,'text here','test@mail.com','{"a": 123}',1099511627776,(1,2),(1099511627776,1099511627777)); desc t; select * from t;

‹›

Response

--------------
drop table if exists t
--------------
Query OK, 0 rows affected (0.42 sec)
--------------
insert into t(i,f,t,j,b,m,mb) values(123,1.2,'text here','{"a": 123}',1099511627776,(1,2),(1099511627776,1099511627777))
--------------
Query OK, 1 row affected (0.00 sec)
--------------
desc t
--------------
+-------+--------+----------------+
| Field | Type   | Properties     |
+-------+--------+----------------+
| id    | bigint |                |
| t     | text   | indexed stored |
| s     | string |                |
| j     | json   |                |
| i     | uint   |                |
| b     | bigint |                |
| f     | float  |                |
| m     | mva    |                |
| mb    | mva64  |                |
+-------+--------+----------------+
8 rows in set (0.00 sec)
--------------
select * from t
--------------
+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
| id                  | i    | b             | f        | m    | mb                          | t         | s             | j          |
+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
| 5045949922868723723 |  123 | 1099511627776 | 1.200000 | 1,2  | 1099511627776,1099511627777 | text here | test@mail.com | {"a": 123} |
+---------------------+------+---------------+----------+------+-----------------------------+-----------+---------------+------------+
1 row in set (0.00 sec)

Manticore 为插入或替换到实时表或Percolate表的文档的ID列提供了自动ID生成功能。生成器为文档生成一个唯一的ID，但不应将其视为自动递增ID。

生成的 ID 值在以下条件下保证唯一：

当前服务器的 server_id 值在 0 到 127 范围内，并且在集群节点中唯一，或者它使用从 MAC 地址生成的默认值作为种子
系统时间在 Manticore 节点服务器重启间不发生变化
自动 ID 在搜索服务器重启间每秒生成次数少于 1600 万次

自动 ID 生成器为文档 ID 创建一个 64 位整数，使用以下架构：

位 0 至 23 形成一个计数器，每次调用自动 ID 生成器时递增
位 24 至 55 代表服务器启动的 Unix 时间戳
位 56 至 63 对应 server_id

该架构确保生成的 ID 在集群所有节点中唯一，且插入不同集群节点的数据不会在节点之间造成冲突。

因此，自动 ID 生成器生成的第一个 ID 不是 1，而是一个较大的数字。此外，插入表中的文档流可能具有非连续的 ID 值，如果在调用之间向其他表插入数据，因为 ID 生成器在服务器中是唯一的并且在其所有表之间共享。

‹›

SQL
JSON
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85);
INSERT INTO products VALUES (0,'Yello bag', 4.95);
select * from products;

POST /insert
{
  "table":"products",
  "id":0,
  "doc":
  {
    "title" : "Yellow bag"
  }
}
GET /search
{
  "table":"products",
  "query":{
    "query_string":""
  }
}

$index->addDocuments([
        ['id' => 0, 'title' => 'Yellow bag']
]);

indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}})

await indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}})

res = await indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}});

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "Yellow bag");
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

let doc = HashMap::new();
doc.insert("title".to_string(), serde_json::json!("Yellow bag"));
let insert_req = InsertDocumentRequest {
    table: serde_json::json!("products"),
    doc: serde_json::json!(doc),
    id: serde_json::json!(0),
    ..Default::default(),
};
let insert_res = index_api.insert(insert_req).await;

‹›

Response

+---------------------+-----------+---------------------------+
| id                  | price     | title                     |
+---------------------+-----------+---------------------------+
| 1657860156022587404 | 19.850000 | Crossbody Bag with Tassel |
| 1657860156022587405 |  4.950000 | Yello bag                 |
+---------------------+-----------+---------------------------+

CALL UUID_SHORT(N)

CALL UUID_SHORT(N) 语句允许在一次调用中生成 N 个唯一的 64 位 ID，无需插入任何文档。它在需要在 Manticore 中预生成 ID 以供其他系统或存储方案使用时尤其有用。比如，你可以在 Manticore 中生成自动 ID，然后将它们用于另一个数据库、应用程序或工作流程，确保不同环境中的标识符一致且唯一。

‹›

Example

Example

📋

CALL UUID_SHORT(3)

‹›

Response

+---------------------+
| uuid_short()        |
+---------------------+
| 1227930988733973183 |
| 1227930988733973184 |
| 1227930988733973185 |
+---------------------+

你不仅可以向实时表插入单个文档，还可以插入任意数量的文档。向实时表一次插入数万个文档是完全可以的。然而，需要注意以下几点：

批量越大，每次插入操作的延迟越高
批量越大，索引速度越快
你可能需要增加 max_packet_size 的值以允许更大的批量
通常，每次批量插入操作被视为具有原子性保证的单个事务，因此你要么一次性将所有新文档放入表中，要么在失败情况下一个都不添加。关于空行或切换到另一张表的更多细节见“JSON”示例。

请注意，/bulk HTTP 端点不支持自动创建表（自动模式）。只有 /_bulk（类似 Elasticsearch）HTTP 端点和 SQL 接口支持此功能。/_bulk（类似 Elasticsearch）HTTP 端点允许表名包含集群名，格式为 cluster_name:table_name。

/_bulk 端点接受与 Elasticsearch 相同格式的文档 ID，你也可以在文档内部包含 id：

{ "index": { "table" : "products", "_id" : "1" } }
{ "title" : "Crossbody Bag with Tassel", "price": 19.85 }

或者

{ "index": { "table" : "products" } }
{ "title" : "Crossbody Bag with Tassel", "price": 19.85, "id": "1" }

/bulk（Manticore 模式）端点支持分块传输编码。你可以用它来传输大批量数据。它能够：

降低峰值内存使用，减少内存溢出风险
缩短响应时间
允许绕过 max_packet_size 限制，传输远大于最大允许值（128MB）的批量，例如一次 1GB。

‹›

SQL
JSON
Elasticsearch
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

对于批量插入，只需在VALUES()后提供更多文档，语法如下：

INSERT INTO <table name>[(column1, column2, ...)] VALUES(value1[, value2 , ...]), (...)

可选的列名列表允许你显式指定表中某些列的值。所有其他列将被填充默认值（标量类型为 0，字符串类型为空字符串）。

例如：

INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85), ('microfiber sheet set', 19.99), ('Pet Hair Remover Glove', 7.99);

语法基本与插入单个文档相同。只需提供更多行，每行一个文档，并使用 /bulk 端点代替 /insert。将每个文档封装在 "insert" 节点中。注意还需：

Content-Type: application/x-ndjson
数据格式应为换行分隔 JSON（NDJSON）。本质上，这意味着每行应仅包含一条 JSON 声明，并以换行符 \n（可能还有 \r）结尾。

/bulk 端点支持 'insert'、'replace'、'delete' 和 'update' 查询。请注意，您可以将操作定向到多个表，但事务仅适用于单个表。如果指定了多个表，Manticore 会将针对一个表的操作收集到单个事务中。当表发生变化时，它将提交已收集的操作，并在新表上启动新的事务。分隔批次的空行也会导致提交前一批次并开始新事务。

在 /bulk 请求的响应中，您可以找到以下字段：

"errors"：显示是否发生了任何错误（true/false）
"error"：描述发生的错误
"current_line"：执行停止（或失败）的行号；空行（包括第一个空行）也会被计数
"skipped_lines"：未提交行的数量，从 current_line 开始向后计算

POST /bulk
-H "Content-Type: application/x-ndjson" -d '
{"insert": {"table":"products", "id":1, "doc":  {"title":"Crossbody Bag with Tassel","price" : 19.85}}}
{"insert":{"table":"products", "id":2, "doc":  {"title":"microfiber sheet set","price" : 19.99}}}
'
POST /bulk
-H "Content-Type: application/x-ndjson" -d '
{"insert":{"table":"test1","id":21,"doc":{"int_col":1,"price":1.1,"title":"bulk doc one"}}}
{"insert":{"table":"test1","id":22,"doc":{"int_col":2,"price":2.2,"title":"bulk doc two"}}}
{"insert":{"table":"test1","id":23,"doc":{"int_col":3,"price":3.3,"title":"bulk doc three"}}}
{"insert":{"table":"test2","id":24,"doc":{"int_col":4,"price":4.4,"title":"bulk doc four"}}}
{"insert":{"table":"test2","id":25,"doc":{"int_col":5,"price":5.5,"title":"bulk doc five"}}}
'

注意：如果表尚不存在，_bulk 需要 Manticore Buddy。如果不起作用，请确保已安装 Buddy。

POST /_bulk
-H "Content-Type: application/x-ndjson" -d '
{ "index" : { "table" : "products" } }
{ "title" : "Yellow Bag", "price": 12 }
{ "create" : { "table" : "products" } }
{ "title" : "Red Bag", "price": 12.5, "id": 3 }
'

使用 addDocuments() 方法：

$index->addDocuments([
        ['id' => 1, 'title' => 'Crossbody Bag with Tassel', 'price' => 19.85],
        ['id' => 2, 'title' => 'microfiber sheet set', 'price' => 19.99],
        ['id' => 3, 'title' => 'Pet Hair Remover Glove', 'price' => 7.99]
]);

docs = [ \
    {"insert": {"table" : "products", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}}, \
    {"insert": {"table" : "products", "id" : 2, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}}, \
    {"insert": {"table" : "products", "id" : 3, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
]
res = indexApi.bulk('\n'.join(map(json.dumps,docs)))

docs = [ \
    {"insert": {"table" : "products", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}}, \
    {"insert": {"table" : "products", "id" : 2, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}}, \
    {"insert": {"table" : "products", "id" : 3, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
]
res = await indexApi.bulk('\n'.join(map(json.dumps,docs)))

let docs = [
    {"insert": {"table" : "products", "id" : 3, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}},
    {"insert": {"table" : "products", "id" : 4, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}},
    {"insert": {"table" : "products", "id" : 5, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
];
res =  await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));

String body = "{\"insert\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"Crossbody Bag with Tassel\", \"price\" : 19.85}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 4, \"doc\" : {\"title\" : \"microfiber sheet set\", \"price\" : 19.99}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 5, \"doc\" : {\"title\" : \"CPet Hair Remover Glove\", \"price\" : 7.99}}}"+"\n";
BulkResponse bulkresult = indexApi.bulk(body);

string body = "{\"insert\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"Crossbody Bag with Tassel\", \"price\" : 19.85}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 4, \"doc\" : {\"title\" : \"microfiber sheet set\", \"price\" : 19.99}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 5, \"doc\" : {\"title\" : \"CPet Hair Remover Glove\", \"price\" : 7.99}}}"+"\n";
BulkResponse bulkresult = indexApi.Bulk(string.Join("\n", docs));

let bulk_body = r#"{"insert": "index" : "products", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}}
    {"insert": {"index" : "products", "id" : 4, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}}
    {"insert": {"index" : "products", "id" : 5, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
"#;
index_api.bulk(bulk_body).await;

‹›

Response

Query OK, 3 rows affected (0.01 sec)

当前 INSERT 不支持表达式，值应明确指定。

{
  "items": [
    {
      "bulk": {
        "table": "products",
        "_id": 2,
        "created": 2,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    }
  ],
  "current_line": 4,
  "skipped_lines": 0,
  "errors": false,
  "error": ""
}
{
  "items": [
    {
      "bulk": {
        "table": "test1",
        "_id": 22,
        "created": 2,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    },
    {
      "bulk": {
        "table": "test1",
        "_id": 23,
        "created": 1,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    },
    {
      "bulk": {
        "table": "test2",
        "_id": 25,
        "created": 2,
        "deleted": 0,
        "updated": 0,
        "result": "created",
        "status": 201
      }
    }
  ],
  "current_line": 8,
  "skipped_lines": 0,
  "errors": false,
  "error": ""
}

{
  "items": [
    {
      "table": {
        "table": "products",
        "_type": "doc",
        "_id": 1657860156022587406,
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "create": {
        "table": "products",
        "_type": "doc",
        "_id": 3,
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1,
        "status": 201
      }
    }
  ],
  "errors": false,
  "took": 1
}

多值属性（MVA）以数字数组的形式插入。

‹›

SQL
JSON
Elasticsearch
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

INSERT INTO products(title, sizes) VALUES('shoes', (40,41,42,43));

POST /insert
{
  "table":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "sizes" : [40, 41, 42, 43]
  }
}

POST /products/_create/1
{
  "title": "shoes",
  "sizes" : [40, 41, 42, 43]
}

或者，也可以这样

POST /products/_doc/
{
  "title": "shoes",
  "sizes" : [40, 41, 42, 43]
}

$index->addDocument(
  ['title' => 'shoes', 'sizes' => [40,41,42,43]],
  1
);

indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}})

await indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}})

res = await indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}});

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
    put("sizes",new int[]{40,41,42,43});
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "Yellow bag");
doc.Add("sizes", new List<Object> {40,41,42,43});
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

let mut doc = HashMap::new();
doc.insert("title".to_string(), serde_json::json!("Yellow bag"));
doc.insert("sizes".to_string(), serde_json::json!([40,41,42,43]));
let insert_req = InsertDocumentRequest::new("products".to_string(), serde_json::json!(doc));
let insert_res = index_api.insert(insert_req).await;

JSON 值可以作为转义字符串（通过 SQL 或 JSON）或作为 JSON 对象（通过 JSON 接口）插入。

‹›

SQL
JSON
Elasticsearch
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

INSERT INTO products VALUES (1, 'shoes', '{"size": 41, "color": "red"}');

JSON 值可以作为 JSON 对象插入

POST /insert
{
  "table":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "meta" : {
      "size": 41,
      "color": "red"
    }
  }
}

JSON 值也可以作为包含转义 JSON 的字符串插入：

POST /insert
{
  "table":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "meta" : "{\"size\": 41, \"color\": \"red\"}"
  }
}

POST /products/_create/1
{
  "title": "shoes",
  "meta" : {
    "size": 41,
    "color": "red"
  }
}

或者，也可以这样

POST /products/_doc/
{
  "title": "shoes",
  "meta" : {
    "size": 41,
    "color": "red"
  }
}

$index->addDocument(
  ['title' => 'shoes', 'meta' => '{"size": 41, "color": "red"}'],
  1
);

indexApi = api = manticoresearch.IndexApi(client)
indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}})

indexApi = api = manticoresearch.IndexApi(client)
await indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}})

res = await indexApi.insert({"table" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}});

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
    put("meta",
        new HashMap<String,Object>(){{
            put("size",41);
            put("color","red");
        }});
 }};
newdoc.index("products").id(0L).setDoc(doc);
sqlresult = indexApi.insert(newdoc);

Dictionary<string, Object> meta = new Dictionary<string, Object>();
meta.Add("size", 41);
meta.Add("color", "red");
Dictionary<string, Object> doc = new Dictionary<string, Object>();
doc.Add("title", "Yellow bag");
doc.Add("meta", meta);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "products", id: 0, doc: doc);
var sqlresult = indexApi.Insert(newdoc);

let mut meta = HashMap::new();
metadoc.insert("size".to_string(), serde_json::json!(41));
meta.insert("color".to_string(), serde_json::json!("red"));
let mut doc = HashMap::new();
doc.insert("title".to_string(), serde_json::json!("Yellow bag"));
doc.insert("meta".to_string(), serde_json::json!(meta));
let insert_req = InsertDocumentRequest::new("products".to_string(), serde_json::json!(doc));
let insert_res = index_api.insert(insert_req).await;

过滤表中添加规则

Last modified: August 28, 2025

在一个percolate表中，存储了可查询规则的文档，并且必须遵循四个字段的确切模式：

字段	类型	描述
id	大整数	PQ规则标识符（如果省略，则将自动分配）
query	字符串	兼容percolate表的全文查询（可以为空）
filters	字符串	兼容percolate表的非全文字段的附加过滤器（可以为空）
tags	字符串	一个或多个用逗号分隔的标签字符串，可用于选择性地显示/删除保存的查询

其他任何字段名称都不受支持，并将触发错误。

警告： 通过SQL插入/替换JSON格式的PQ规则不会生效。换句话说，JSON特定的操作符（如match等）将被视为规则文本的一部分，应该与文档匹配。如果您更喜欢JSON语法，请使用HTTP端点而不是INSERT/REPLACE。

‹›

SQL
JSON
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

INSERT INTO pq(id, query, filters) VALUES (1, '@title shoes', 'price > 5');
INSERT INTO pq(id, query, tags) VALUES (2, '@title bag', 'Louis Vuitton');
SELECT * FROM pq;

有以下两种方式可以将percolate查询添加到percolate表中：

使用JSON /search兼容格式的查询，描述在json/search中

PUT /pq/pq_table/doc/1
{
"query": {
  "match": {
    "title": "shoes"
  },
  "range": {
    "price": {
      "gt": 5
    }
  }
},
"tags": ["Loius Vuitton"]
}

使用SQL格式的查询，描述在搜索查询语法中

PUT /pq/pq_table/doc/2
{
"query": {
  "ql": "@title shoes"
},
"filters": "price > 5",
"tags": ["Loius Vuitton"]
}

$newstoredquery = [
    'table' => 'test_pq',
    'body' => [
        'query' => [
                       'match' => [
                               'title' => 'shoes'
                       ]
               ],
               'range' => [
                       'price' => [
                               'gt' => 5
                       ]
               ]
       ],
    'tags' => ['Loius Vuitton']
];
$client->pq()->doc($newstoredquery);

newstoredquery ={"table" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
indexApi.insert(newstoredquery)

newstoredquery ={"table" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
await indexApi.insert(newstoredquery)

newstoredquery ={"table" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}};
indexApi.insert(newstoredquery);

newstoredquery = new HashMap<String,Object>(){{
    put("query",new HashMap<String,Object >(){{
        put("q1","@title shoes");
        put("filters","price>5");
        put("tags",new String[] {"Loius Vuitton"});
    }});
}};
newdoc.index("test_pq").id(2L).setDoc(doc);
indexApi.insert(newdoc);

Dictionary<string, Object> query = new Dictionary<string, Object>();
query.Add("q1", "@title shoes");
query.Add("filters", "price>5");
query.Add("tags", new List<string> {"Loius Vuitton"});
Dictionary<string, Object> newstoredquery = new Dictionary<string, Object>();
newstoredquery.Add("query", query);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "test_pq", id: 2, doc: doc);
indexApi.Insert(newdoc);

let mut pq_doc = HashMap::new();
pq_doc.insert("q1".to_string(), serde_json::json!("@title shoes"));
pq_doc.insert("filters".to_string(), serde_json::json!("price>5"));
pq_doc.insert("tags".to_string(), serde_json::json!(["Louis Vitton"]));
let mut doc = HashMap::new();
pq_doc.insert("query".to_string(), serde_json::json!(pq_doc));
let insert_req = InsertDocumentRequest::new("test_pq".to_string(), serde_json::json!(doc));
let insert_res = index_api.insert(insert_req).await;

‹›

Response

+------+--------------+---------------+---------+
| id   | query        | tags          | filters |
+------+--------------+---------------+---------+
|    1 | @title shoes |               | price>5 |
|    2 | @title bag   | Louis Vuitton |         |
+------+--------------+---------------+---------+

如果没有指定ID，将自动分配ID。您可以阅读更多关于自动ID的内容这里。

‹›

SQL
JSON
PHP
Python
Python-asyncio
Javascript
Java
C#
Rust

📋

INSERT INTO pq(query, filters) VALUES ('wristband', 'price > 5');
SELECT * FROM pq;

PUT /pq/pq_table/doc
{
"query": {
  "match": {
    "title": "shoes"
  },
  "range": {
    "price": {
      "gt": 5
    }
  }
},
"tags": ["Loius Vuitton"]
}
PUT /pq/pq_table/doc
{
"query": {
  "ql": "@title shoes"
},
"filters": "price > 5",
"tags": ["Loius Vuitton"]
}

$newstoredquery = [
    'table' => 'pq_table',
    'body' => [
        'query' => [
                       'match' => [
                               'title' => 'shoes'
                       ]
               ],
               'range' => [
                       'price' => [
                               'gt' => 5
                       ]
               ]
       ],
    'tags' => ['Loius Vuitton']
];
$client->pq()->doc($newstoredquery);

indexApi = api = manticoresearch.IndexApi(client)
newstoredquery ={"table" : "test_pq",   "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
indexApi.insert(store_query)

indexApi = api = manticoresearch.IndexApi(client)
newstoredquery ={"table" : "test_pq",   "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
await indexApi.insert(store_query)

newstoredquery ={"table" : "test_pq",  "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}};
res =  await indexApi.insert(store_query);

newstoredquery = new HashMap<String,Object>(){{
    put("query",new HashMap<String,Object >(){{
        put("q1","@title shoes");
        put("filters","price>5");
        put("tags",new String[] {"Loius Vuitton"});
    }});
}};
newdoc.index("test_pq").setDoc(doc);
indexApi.insert(newdoc);

Dictionary<string, Object> query = new Dictionary<string, Object>();
query.Add("q1", "@title shoes");
query.Add("filters", "price>5");
query.Add("tags", new List<string> {"Loius Vuitton"});
Dictionary<string, Object> newstoredquery = new Dictionary<string, Object>();
newstoredquery.Add("query", query);
InsertDocumentRequest newdoc = new InsertDocumentRequest(index: "test_pq", doc: doc);
indexApi.Insert(newdoc);

let mut pq_doc = HashMap::new();
pq_doc.insert("q1".to_string(), serde_json::json!("@title shoes"));
pq_doc.insert("filters".to_string(), serde_json::json!("price>5"));
pq_doc.insert("tags".to_string(), serde_json::json!(["Louis Vitton"]));
let mut doc = HashMap::new();
pq_doc.insert("query".to_string(), serde_json::json!(pq_doc));
let insert_req = InsertDocumentRequest::new("test_pq".to_string(), serde_json::json!(doc));
let insert_res = index_api.insert(insert_req).await;

‹›

Response

+---------------------+-----------+------+---------+
| id                  | query     | tags | filters |
+---------------------+-----------+------+---------+
| 1657843905795719192 | wristband |      | price>5 |
+---------------------+-----------+------+---------+

{
  "table": "pq_table",
  "type": "doc",
  "_id": 1657843905795719196,
  "result": "created"
}
{
  "table": "pq_table",
  "type": "doc",
  "_id": 1657843905795719198,
  "result": "created"
}

Array(
       [index] => pq_table
       [type] => doc
       [_id] => 1657843905795719198
       [result] => created
)

{'created': True,
 'found': None,
 'id': 1657843905795719198,
 'table': 'test_pq',
 'result': 'created'}

{'created': True,
 'found': None,
 'id': 1657843905795719198,
 'table': 'test_pq',
 'result': 'created'}

{"table":"test_pq","_id":1657843905795719198,"created":true,"result":"created"}

如果SQL INSERT命令中省略了模式，则预期以下参数：

ID。可以使用0作为ID以触发自动ID生成。
查询 - 全文查询。
标签 - PQ规则标签字符串。
过滤器 - 属性的附加过滤器。

‹›

SQL

📋

INSERT INTO pq VALUES (0, '@title shoes', '', '');
INSERT INTO pq VALUES (0, '@title shoes', 'Louis Vuitton', '');
SELECT * FROM pq;

‹›

Response

+---------------------+--------------+---------------+---------+
| id                  | query        | tags          | filters |
+---------------------+--------------+---------------+---------+
| 2810855531667783688 | @title shoes |               |         |
| 2810855531667783689 | @title shoes | Louis Vuitton |         |
+---------------------+--------------+---------------+---------+

要通过SQL替换现有PQ规则为新规则，只需使用常规的REPLACE命令。有一种特殊的语法?refresh=1，用于通过HTTP JSON接口替换定义在JSON模式下的PQ规则。

‹›

SQL
JSON

📋

mysql> select * from pq;
+---------------------+--------------+------+---------+
| id                  | query        | tags | filters |
+---------------------+--------------+------+---------+
| 2810823411335430148 | @title shoes |      |         |
+---------------------+--------------+------+---------+
1 row in set (0.00 sec)
mysql> replace into pq(id,query) values(2810823411335430148,'@title boots');
Query OK, 1 row affected (0.00 sec)
mysql> select * from pq;
+---------------------+--------------+------+---------+
| id                  | query        | tags | filters |
+---------------------+--------------+------+---------+
| 2810823411335430148 | @title boots |      |         |
+---------------------+--------------+------+---------+
1 row in set (0.00 sec)

向实时表中添加文档从外部存储添加数据

Last modified: August 28, 2025

您可以通过多种方法将数据从外部存储导入到 Manticore：

使用 Indexer 工具从各种数据库抓取数据到普通表。
通过 Logstash、Filebeat 和 Vector.dev 集成，将数据从这些工具写入 Manticore 实时表。
使用 Kafka 集成将 Kafka 主题中的数据同步到实时表。

过滤表中添加规则通表创建

Last modified: August 28, 2025

纯表是通过在创建时从一个或多个源获取数据一次性创建的表。纯表是不可变的，在其生命周期内无法添加或删除文档。只能更新数值属性（包括多值属性 MVA）的值。刷新数据只能通过重新创建整个表来实现。

纯表仅在纯模式中可用，其定义由表声明和一个或多个源声明组成。数据收集和表创建不是由searchd服务器完成，而是由辅助工具indexer完成。

Indexer是一个命令行工具，可以直接从命令行或shell脚本中调用。

调用时可以接受多个参数，此外，它在Manticore配置文件中还有自己的若干设置。

在典型场景中，indexer执行以下操作：

从源获取数据
构建纯表
写入表文件
（可选）通知搜索服务器有关新表的信息，从而触发表轮换

indexer工具用于在Manticore Search中创建纯表。其通用语法为：

indexer [OPTIONS] [table_name1 [table_name2 [...]]]

使用indexer创建表时，生成的表文件必须具有允许searchd读取、写入和删除的权限。在官方Linux软件包中，searchd以manticore用户身份运行。因此，indexer也必须以manticore用户身份运行：

sudo -u manticore indexer ...

如果您以不同方式运行searchd，则可能需要省略sudo -u manticore。只需确保运行searchd实例的用户对使用indexer生成的表具有读写权限。

要创建纯表，需要列出要处理的表。例如，如果您的manticore.conf文件包含两个表mybigindex和mysmallindex的详细信息，您可以运行：

sudo -u manticore indexer mysmallindex mybigindex

您还可以使用通配符来匹配表名：

?匹配任何单个字符
*匹配任意数量的任意字符
%匹配无字符或任意单个字符

sudo -u manticore indexer indexpart*main --rotate

indexer的退出代码如下：

0：一切正常
1：索引过程中出现问题（如果指定了--rotate，则跳过），或者操作发出警告
2：索引成功，但--rotate尝试失败

您还可以使用以下systemctl单元文件启动indexer：

systemctl start --no-block manticore-indexer

或者，如果您想构建特定表：

systemctl start --no-block manticore-indexer@specific-table-name

使用systemctl set-environment INDEXER_CONFIG命令可以使用自定义配置运行Indexer，替代默认设置。

systemctl set-environment INDEXER_ARGS命令允许您添加Indexer的自定义启动选项。有关命令行选项的完整列表，请参见这里。

例如，要以静默模式启动Indexer，运行：

systemctl set-environment INDEXER_ARGS='--quiet'
systemctl restart manticore-indexer

要恢复更改，运行：

systemctl set-environment INDEXER_ARGS=''
systemctl restart manticore-indexer

--config <file>（简写为-c <file>）告诉indexer使用指定的文件作为其配置。通常，它会查找安装目录中的manticore.conf（例如/etc/manticoresearch/manticore.conf），然后是调用indexer时所在的当前目录。此选项在共享环境中特别有用，例如二进制文件安装在全局文件夹（如/usr/bin/）中，但您希望用户能够进行自定义Manticore配置，或者希望在单台服务器上运行多个实例。在这种情况下，您可以允许他们创建自己的manticore.conf文件，并使用此选项传递给indexer。例如：
```
sudo -u manticore indexer --config /home/myuser/manticore.conf mytable
```
--all告诉indexer更新manticore.conf中列出的所有表，而非指定单个表。这在小型配置或周期性维护任务（如每天或每周重建整个表集）中非常实用。请注意，由于--all尝试更新配置中的所有表，如果遇到实时表（RealTime tables），会发出警告，且命令的退出码为1，即使纯表没有问题。用法示例：
```
sudo -u manticore indexer --config /home/myuser/manticore.conf --all
```
--rotate用于表轮换。除非您能将搜索功能下线而不影响用户，否则几乎肯定需要在索引新文档时保持搜索服务运行。--rotate创建一个与原表平行的第二个表（在同一路径下，只是文件名中包含.new）。完成后，indexer通过发送SIGHUP信号通知searchd，searchd将尝试重命名表（将现有表重命名为带.old后缀，将.new表重命名替换它们），然后开始从新文件提供服务。根据seamless_rotate的设置，可能会有短暂的延迟才能搜索更新的表。如果一次轮换多个表，而它们通过killlist_target链式关联，则轮换将从未被指定为目标的表开始，最后轮换链条末端的表。用法示例：
```
sudo -u manticore indexer --rotate --all
```
--quiet告诉indexer除非发生错误，否则不输出任何内容。此选项多用于类似cron的定时任务或其他脚本任务，其中输出无关紧要或不必要，除非出现错误。用法示例：
```
sudo -u manticore indexer --rotate --all --quiet
```
--noprogress 不显示进度细节。相反，只有在索引完成时才会报告最终状态详情（例如，已索引的文档数量、索引速度等）。在脚本未在控制台（或“tty”）上运行的情况下，此选项将默认启用。示例用法：
```
sudo -u manticore indexer --rotate --all --noprogress
```
--buildstops <outputfile.text> <N> 会审查表源，就像正在对数据进行索引一样，并生成正在被索引的术语列表。换句话说，它生成所有成为表一部分的可搜索术语列表。注意，它不会更新表本身，只是像索引那样处理数据，包括运行通过sql_query_pre或sql_query_post定义的查询。outputfile.txt 将包含按频率排序的一系列单词列表，最频繁出现的单词排在前面，N 指定要列出的最大单词数。如果足够大以涵盖表中的每个单词，则只会返回这么多单词。这样的词典列表可以用于客户端应用程序中的“您是否是指…”功能，通常与 --buildfreqs 结合使用。示例：
```
sudo -u manticore indexer mytable --buildstops word_freq.txt 1000
```
这将生成当前目录中的一个名为 word_freq.txt 的文件，其中包含 mytable 中最常见的 1,000 个单词，按最常见顺序排列。请注意，该文件将针对指定多个表或 --all 时最后索引的表相关（即配置文件中列出的最后一个表）
--buildfreqs 与 --buildstops 共同使用（如果没有指定 --buildstops，则会被忽略）。--buildstops 提供了表中使用的单词列表，而 --buildfreqs 添加了这些单词在表中的数量，这对于确定某些单词是否应被视为停用词（如果它们过于普遍）非常有用。它还将帮助开发“您是否是指…”功能，其中需要知道一个给定单词比另一个相似单词更常见的程度。例如：
```
sudo -u manticore indexer mytable --buildstops word_freq.txt 1000 --buildfreqs
```
这将生成上述的 word_freq.txt 文件，但在每个单词后面还会列出它在问题表中出现的次数。
--merge <dst-table> <src-table> 用于物理合并表，例如，如果您有一个主+增量方案，其中主表很少更改，但增量表经常重建，那么 --merge 将用于结合两者。操作从右到左进行 - src-table 的内容将被检查并与 dst-table 的内容物理合并，结果将保留在 dst-table 中。伪代码表示为：dst-table += src-table 示例：
```
sudo -u manticore indexer --merge main delta --rotate
```
在上述示例中，其中主表是很少修改的主表，而增量表是更常修改的一个，您可以使用上述命令调用 indexer 来将增量表的内容合并到主表中并旋转表。
--merge-dst-range <attr> <min> <max> 在合并时应用给定的过滤范围。具体来说，当合并应用于目标表（作为 --merge 的一部分，如果未指定 --merge 则忽略），indexer 也会过滤最终进入目标表的文档，只有符合给定过滤条件的文档才会进入最终表。例如，在一个具有 'deleted' 属性的表中，值为 0 表示“未删除”。此类表可以与以下内容合并：
```
sudo -u manticore indexer --merge main delta --merge-dst-range deleted 0 0
```
标记为已删除（值为 1）的任何文档都将从新合并的目标表中移除。可以通过多次添加到命令行来添加多个过滤器，所有这些过滤器都必须同时满足，文档才能成为最终表的一部分。
--merge-killlists（及其较短别名 --merge-klists）改变了在合并表时处理杀列表的方式。默认情况下，合并后会丢弃两个杀列表。然而，启用此选项后，两个表的杀列表将被连接并存储到目标表中。请注意，源（增量）表的杀列表将始终用于抑制来自目标（主）表的行。
--keep-attrs 允许在重新索引时重用现有属性。每当表被重建时，每个新文档 ID 都会检查是否存在旧表中，如果存在，则其属性将转移到新表；如果不存在，则使用新表中的属性。如果用户更新了表中的属性，但未更新实际用于表的数据源，则在重新索引时所有更新都会丢失；使用 --keep-attrs 可以保存之前表中的更新属性值。还可以指定用于表文件的路径，而不是从配置文件中引用的路径：
```
sudo -u manticore indexer mytable --keep-attrs=/path/to/index/files
```
--keep-attrs-names=<attributes list> 允许您指定在重新索引时要重用的现有表中的属性列表。默认情况下，所有现有表中的属性都会在新表中重用：
```
sudo -u manticore indexer mytable --keep-attrs=/path/to/table/files --keep-attrs-names=update,state
```
--dump-rows <FILE> 将由 SQL 源获取的行以 MySQL 兼容的语法转储到指定文件中。生成的转储是 indexer 接收的数据的精确表示，有助于重现索引时间的问题。该命令从源执行获取操作，并创建表文件和转储文件。
--print-rt <rt_index> <table> 以 INSERT 语句的形式输出从源获取的实时表数据。转储的首几行将包含实时字段和属性（作为普通表字段和属性的反映）。该命令从源执行获取操作，并创建表文件和转储输出。命令示例如 sudo -u manticore indexer -c manticore.conf --print-rt indexrt indexplain > dump.sql。仅支持基于 SQL 的源。不支持 MVA。
--sighup-each 在重建多个大型表且希望尽快将每个表轮转至 searchd 时非常有用。启用 --sighup-each 时，indexer 在成功完成每个表的工作后会发送 SIGHUP 信号给 searchd。（默认行为是在所有表构建完毕后发送一个 SIGHUP）。
--nohup 在你想在实际轮转前先用 indextool 检查表时很有用。如果启用此选项，indexer 不会发送 SIGHUP。表文件被重命名为 .tmp。使用 indextool 将表文件重命名为 .new 并进行轮转。示例用法：
```
sudo -u manticore indexer --rotate --nohup mytable
sudo -u manticore indextool --rotate --check mytable
```
--print-queries 打印 indexer 发送给数据库的 SQL 查询及 SQL 连接和断开事件。这对诊断和修复 SQL 源的相关问题很有帮助。
--help（简写为 -h）列出所有可以由 indexer 调用的参数。
-v 显示 indexer 版本。

你也可以在 Manticore 配置文件的 indexer 部分配置 indexer 的行为：

indexer {
...
}

lemmatizer_cache = 256M

词形还原缓存大小。可选，默认是 256K。

我们的词形还原器实现使用了压缩字典格式，支持空间和速度之间的权衡。它可以直接在压缩数据上执行词形还原，用更多的 CPU 但更少的内存，或者可以解压字典并部分或全部预缓存，从而使用更少的 CPU 但更多的内存。lemmatizer_cache 指令让你控制可以为该未压缩字典缓存使用多少内存。

目前，唯一可用的字典是ru.pak, en.pak 和 de.pak，分别是俄语、英语和德语字典。压缩字典大小约为 2 到 10 MB。注意，字典始终驻留在内存中。默认缓存大小是 256 KB。可接受的缓存大小是 0 到 2047 MB。将缓存大小设置得很高是安全的；词形还原器只会使用所需的内存。例如，整个俄语字典解压后约为 110 MB；因此设置 lemmatizer_cache 高于该值不会影响内存使用。即使允许 1024 MB 缓存，如果仅需 110 MB，也只会使用这 110 MB。

max_file_field_buffer = 128M

最大文件字段自适应缓冲区大小，单位字节。可选，默认是 8MB，最小值为 1MB。

文件字段缓冲区用于加载从 sql_file_field列引用的文件。该缓冲区是自适应的，首次分配时为 1 MB，并以 2 倍递增，直到能够加载文件内容或达到 max_file_field_buffer 指定的最大缓冲区大小。

因此，如果没有指定文件字段，则不会分配缓冲区。如果在索引期间加载的所有文件大小均低于（例如）2 MB，而 max_file_field_buffer 值为 128 MB，则峰值缓冲区使用仍为 2 MB。但是，超过 128 MB 的文件将被完全跳过。

max_iops = 40

最大每秒 I/O 操作次数，用于 I/O 限流。可选，默认是 0（无限制）。

这是一个与 I/O 限流相关的选项。它限制每秒最大的 I/O 操作数（读或写）。值为 0 表示无限制。

indexer 在构建表时可能会导致磁盘 I/O 突发高峰，限制其磁盘活动（以为同一台机器上的其他程序如 searchd 留出资源）可能会很有必要。I/O 限流通过强制在 indexer 执行的连续磁盘 I/O 操作之间存在最小保证延迟来实现。限制 I/O 可以帮助减轻构建期间对搜索性能的影响。此设置对其他类型的数据摄取（例如插入实时表）无效。

max_iosize = 1048576

最大允许的 I/O 操作大小，单位字节，用于 I/O 限流。可选，默认是 0（无限制）。

这是一个与 I/O 限流相关的选项。它限制 indexer 所有操作的最大文件 I/O 操作（读或写）大小。值为 0 表示无限制。大于限制的读写操作将被拆分成多个较小的操作，并由 max_iops 计数。当前所有 I/O 调用应小于 256 KB（默认内部缓冲区大小），因此设置大于 256 KB 的 max_iosize 不会生效。

max_xmlpipe2_field = 8M

XMLpipe2 源类型允许的最大字段大小，单位字节。可选，默认是 2 MB。

mem_limit = 256M
# mem_limit = 262144K # same, but in KB
# mem_limit = 268435456 # same, but in bytes

Plain table building RAM 使用限制。可选，默认值为128 MB。强制执行的内存使用限制，indexer 不会超过此限制。可以以字节、千字节（使用K后缀）或兆字节（使用M后缀）的形式指定；请参见示例。如果设置的值极低导致I/O缓冲区小于8 KB，则此限制将自动提高；具体下限取决于构建数据的大小。如果缓冲区小于256 KB，则会产生警告。

可能的最大限制是2047M。太低的值会影响plain table构建速度，但256M到1024M应该足以满足大多数甚至所有数据集的需求。将此值设置得太高可能会导致SQL服务器超时。在文档收集阶段，会有时期间内存缓冲区部分排序且不与数据库进行通信；数据库服务器可能会超时。可以通过提高SQL服务器端的超时时间或降低mem_limit来解决这个问题。

on_file_field_error = skip_document

如何处理文件字段中的IO错误。可选，默认值为ignore_field。当索引通过文件字段引用的文件时（sql_file_field），indexer 可以选择忽略该字段并继续处理当前文档，或者跳过当前文档，或者完全失败。on_file_field_error 指令控制这种行为。它可以接受以下值：

ignore_field，忽略该字段并继续处理当前文档；
skip_document，跳过当前文档但继续索引；
fail_index，因错误消息而失败索引。

可能出现的问题包括：打开错误、大小错误（文件太大）和数据读取错误。无论处于哪个阶段或on_file_field_error设置如何，任何问题都会给出警告信息。

请注意，当on_file_field_error = skip_document时，只有在早期检查阶段检测到问题时才会忽略文档，而不是在实际解析文件阶段。indexer会在开始工作之前打开每个引用的文件并检查其大小，然后在实际解析工作时再次打开。因此，在这两个打开尝试之间，如果文件丢失，文档仍然会被索引。

write_buffer = 4M

写缓冲区大小，字节。可选，默认值为1MB。写缓冲区用于索引时写入临时和最终表文件。较大的缓冲区可以减少所需的磁盘写入次数。缓冲区的内存分配在mem_limit之外。请注意，将为不同的文件分配多个（目前最多4个）缓冲区，这会相应地增加RAM使用量。

ignore_non_plain = 1

ignore_non_plain 允许您完全忽略关于跳过非plain表的警告。默认值为0（不忽略）。

有两种方法可以调度索引器运行。第一种方法是传统的crontab方法。第二种方法是使用systemd定时器和用户定义的时间表。要创建定时器单元文件，您需要将它们放在systemd查找此类单元文件的适当目录中。在大多数Linux发行版中，这个目录通常是/etc/systemd/system。以下是操作步骤：

创建一个定时器单元文件以适应您的自定义时间表：

cat << EOF > /etc/systemd/system/manticore-indexer@.timer
[Unit]
Description=Run Manticore Search's indexer on schedule
[Timer]
OnCalendar=minutely
RandomizedDelaySec=5m
Unit=manticore-indexer@%i.service
[Install]
WantedBy=timers.target
EOF

更多关于OnCalendar语法和示例的信息可以在这里找到。

编辑定时器单元以满足您的特定需求。

启用定时器：

systemctl enable manticore-indexer@idx1.timer

启动定时器：

systemctl start manticore-indexer@idx1.timer

对于任何其他定时器重复步骤2-4。

从外部存储添加数据从数据库获取数据

Last modified: August 28, 2025

将文档添加到实时表

自动模式

自动ID

UUID_SHORT 多 ID 生成

批量添加文档

/bulk 的分块传输

插入多值属性（MVA）值

插入 JSON

在percolate表中添加规则

自动ID分配

SQL中没有模式

在PQ表中替换规则

从外部存储添加数据

纯表创建

Indexer工具

Indexer systemd服务

Indexer命令行参数

Indexer 配置设置

lemmatizer_cache

max_file_field_buffer

max_iops

max_iosize

max_xmlpipe2_field

mem_limit

on_file_field_error

write_buffer

ignore_non_plain

通过systemd调度索引器