SET [GLOBAL] server_variable_name = value
SET [INDEX index_name] GLOBAL @user_variable_name = (int_val1 [, int_val2, ...])
SET NAMES value [COLLATE value]
SET @@dummy_variable = ignored_value
The SET
statement in Manticore Search allows you to modify variable values. Variable names are case-insensitive, and no variable value changes will persist after a server restart.
Manticore Search supports the SET NAMES
statement and SET @@variable_name
syntax for compatibility with third-party MySQL client libraries, connectors, and frameworks that may require running these statements when connecting. However, these statements do not have any effect on Manticore Search itself.
There are four classes of variables in Manticore Search:
- Per-session server variable:
set var_name = value
- Global server variable:
set global var_name = value
- Global user variable:
set global @var_name = (value)
- Global distributed variable:
set index dist_index_name global @var_name = (value)
Global user variables are shared between concurrent sessions. The only supported value type is a list of BIGINTs, and these variables can be used with the IN() operator for filtering purposes. The primary use case for this feature is to upload large lists of values to searchd
once and reuse them multiple times later, reducing network overhead. Global user variables can be transferred to all agents of a distributed table or set locally in the case of a local table defined in a distributed table. Example:
// in session 1
mysql> SET GLOBAL @myfilter=(2,3,5,7,11,13);
Query OK, 0 rows affected (0.00 sec)
// later in session 2
mysql> SELECT * FROM test1 WHERE group_id IN @myfilter;
+------+--------+----------+------------+-----------------+------+
| id | weight | group_id | date_added | title | tag |
+------+--------+----------+------------+-----------------+------+
| 3 | 1 | 2 | 1299338153 | another doc | 15 |
| 4 | 1 | 2 | 1299338153 | doc number four | 7,40 |
+------+--------+----------+------------+-----------------+------+
2 rows in set (0.02 sec)
Manticore Search supports per-session and global server variables that affect specific server settings in their respective scopes. Below is a list of known per-session and global server variables:
Known per-session server variables:
-
AUTOCOMMIT = {0 | 1}
determines if data modification statements should be implicitly wrapped byBEGIN
andCOMMIT
. -
COLLATION_CONNECTION = collation_name
selects the collation forORDER BY
orGROUP BY
on string values in subsequent queries. Refer to Collations for a list of known collation names. -
WAIT_TIMEOUT/net_read_timeout = <value>
sets connection timeout, either per session or global. Global can only be set on a VIP connection. -
net_write_timeout = <value>
: Tunes the network timeout for write operations, i.e., sending data. The global value can be changed only with VIP privileges. -
throttling_period = <INT_VALUE>
: Interval (in milliseconds) during which the current running query will reschedule. A value of 0 disables throttling, meaning the query will occupy CPU cores until it finishes. If concurrent queries come from other connections at the same time, they will be allocated to free cores or will be suspended until a core is released. Providing a negative value (-1) resets throttling to the default compiled-in value (100ms), which means the query will be rescheduled every 100ms, allowing concurrent queries a chance to be executed. The global value (set viaset global
) can only be set on a VIP connection. -
thread_stack = <value>
: Changes the default value on-the-fly, which limits the stack size provided to one task. Note that here 'thread' refers not to an OS thread, but to a userspace thread, also known as a coroutine. This can be useful if, for example, you load a percolate table with unexpectedly high requirements. In such cases, 'call pq' would fail with a message about insufficient stack size. Generally, you should stop the daemon, increase the value in the config, and then restart. However, you can also try a new value without restarting, by setting a new one with this variable. The global value can also be changed online withset global thread_stack
, but this is available only from a VIP connection. -
optimize_by_id = {0 | 1}
: Internal flag used in somedebug
commands. -
threads_ex
(diagnostic): Forces Manticore to behave as if it is running on a CPU with the provided profile. As a short example,set threads_ex='4/2+6/3'
indicates 'you have 4 free CPU cores, when scheduling multiple queries they should be batched by 2. Also, you have 6 free CPU cores for pseudo-sharding, parts should be batched by 3'. This option is diagnostic, as it is very helpful, for example, to see how your query would run on a configuration you don't have locally. For instance, on a 128-core CPU. Or, conversely, to quickly limit the daemon to behave as single-threaded, to locate a bottleneck or investigate a crash. -
PROFILING = {0 | 1}
enables query profiling in the current session. Defaults to 0. See also show profile. -
MAX_THREADS_PER_QUERY = <POSITIVE_INT_VALUE>
redefines max_threads_per_query in the runtime. Per-session variable influences only the queries run in the same session (connection), i.e. up to disconnect. Value 0 means 'no limit'. If both per-session and the global variables are set, the per-session one has a higher priority. -
ro = {1 | 0}
switches session to read-only mode or back. Inshow variables
output the variable displayed with namesession_read_only
.
Known global server variables are:
QUERY_LOG_FORMAT = {plain | sphinxql}
Changes the current log format.LOG_LEVEL = {info | debug | replication | debugv | debugvv}
Changes the current log verboseness level.QCACHE_MAX_BYTES = <value>
Changes the query_cache RAM use limit to a given value.QCACHE_THRESH_MSEC = <value>
Changes the query_cache> minimum wall time threshold to a given value.QCACHE_TTL_SEC = <value>
Changes the query_cache TTL for a cached result to a given value.MAINTENANCE = {0 | 1}
When set to 1, puts the server in maintenance mode. Only clients with VIP connections can execute queries in this mode. All new non-VIP incoming connections are refused. Existing connections are left intact.GROUPING_IN_UTC = {0 | 1}
When set to 1, causes timed grouping functions (day(), month(), year(), yearmonth(), yearmonthday()) to be calculated in UTC. Read the doc for grouping_in_utc config params for more details.TIMEZONE = <value>
Specifies the timezone used by date/time-related functions. Read the doc for timezone config param for more details.QUERY_LOG_MIN_MSEC = <value>
Changes the query_log_min_msec searchd settings value. In this case, it expects the value exactly in milliseconds and doesn't parse time suffixes, as in config.Warning: this is a very specific and 'hard' variable; filtered out messages will be just dropped and not written into the log at all. Better just filter your log with something like 'grep', in this case, you'll have at least the full original log as a backup.
LOG_DEBUG_FILTER = <string value>
Filters out redundant log messages. If the value is set, then all logs with level > INFO (i.e.,DEBUG
,DEBUGV
, etc.) will be compared with the string and output only in the case they start with the given value.MAX_THREADS_PER_QUERY = <POSITIVE_INT_VALUE>
Redefines max_threads_per_query at runtime. As global, it changes behavior for all sessions. Value 0 means 'no limit'. If both per-session and global variables are set, the per-session one has a higher priority.NET_WAIT = {-1 | 0 | POSITIVE_INT_VALUE}
Changes the net_wait_tm searchd settings value.IOSTATS = {0 | 1}
Enables or disables I/O operations (except for attributes) reporting in the query log.CPUSTATS= {1|0}
Turns on/off CPU time tracking.COREDUMP= {1|0}
Turns on/off saving a core file or a minidump of the server on crash. More details here.AUTO_OPTIMIZE = {1|0}
Turns on/off auto_optimize.PSEUDO_SHARDING = {1|0}
Turns on/off search pseudo-sharding.SECONDARY_INDEXES = {1|0}
Turns on/off secondary indexes for search queries.ES_COMPAT = {on/off/dashboards}
When set toon
(default), Elasticsearch-like write requests are supported;off
disables the support;dashboards
enables the support and also allows requests from Kibana (this functionality is experimental).RESET_NETWORK_TIMEOUT_ON_PACKET = {1|0}
changes reset_network_timeout_on_packet param. Only clients with VIP connections can change this variable.optimize_cutoff = <value>
: Changes the value of the config's optimize_cutoff setting on-the-fly.accurate_aggregation
: Sets the default value for the option accurate_aggregation of future queries.distinct_precision_threshold
: Sets the default value for the option distinct_precision_threshold of future queries.expansion_merge_threshold_docs
: Changes the value of the config's expansion_merge_threshold_docs setting on-the-fly.expansion_merge_threshold_hits
: Changes the value of the config's expansion_merge_threshold_hits setting on-the-fly.
Examples:
mysql> SET autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> SET GLOBAL query_log_format=sphinxql;
Query OK, 0 rows affected (0.00 sec)
mysql> SET GLOBAL @banned=(1,2,3);
Query OK, 0 rows affected (0.01 sec)
mysql> SET INDEX users GLOBAL @banned=(1,2,3);
Query OK, 0 rows affected (0.01 sec)
To make user variables persistent, make sure sphinxql_state is enabled.
⪢ Integration
Logstash is a log management tool that collects data from a variety of sources, transforms it on the fly, and sends it to your desired destination. It is often used as a data pipeline for Elasticsearch, an open-source analytics and search engine.
Now, Manticore supports the use of Logstash as a processing pipeline. This allows the collected and transformed data to be sent to Manticore just like to Elasticsearch. Currently, the versions 7.6-7.15 are supported.
Let’s examine a simple example of a Logstash config file used for indexing dpkg.log
, a standard log file of the Debian package manager. The log itself has a simple structure, as shown below:
2023-05-31 10:42:55 status triggers-awaited ca-certificates-java:all 20190405ubuntu1.1
2023-05-31 10:42:55 trigproc libc-bin:amd64 2.31-0ubuntu9.9 <none>
2023-05-31 10:42:55 status half-configured libc-bin:amd64 2.31-0ubuntu9.9
2023-05-31 10:42:55 status installed libc-bin:amd64 2.31-0ubuntu9.9
2023-05-31 10:42:55 trigproc systemd:amd64 245.4-4ubuntu3.21 <none>
Here is an example Logstash configuration:
input {
file {
path => ["/var/log/dpkg.log"]
start_position => "beginning"
sincedb_path => "/dev/null"
mode => "read"
exit_after_read => "true"
file_completed_action => "log"
file_completed_log_path => "/dev/null"
}
}
output {
elasticsearch {
index => " dpkg_log"
hosts => ["http://localhost:9308"]
ilm_enabled => false
manage_template => false
}
}
Note that, before proceeding further, one crucial caveat needs to be addressed: Manticore does not support Log Template Management and the Index Lifecycle Management features of Elasticsearch. As these features are enabled by default in Logstash, they need to be explicitly disabled in the config. Additionally, the hosts option in the output config section must correspond to Manticore’s HTTP listen port (default is localhost:9308).
After adjusting the config as described, you can run Logstash, and the data from the dpkg log will be passed to Manticore and properly indexed.
Here is the resulting schema of the created table and an example of the inserted document:
mysql> DESCRIBE dpkg_log;
+------------------+--------+---------------------+
| Field | Type | Properties |
+------------------+--------+---------------------+
| id | bigint | |
| message | text | indexed stored |
| @version | text | indexed stored |
| @timestamp | text | indexed stored |
| path | text | indexed stored |
| host | text | indexed stored |
+------------------+--------+---------------------+
mysql> SELECT * FROM dpkg_log LIMIT 1\G
*************************** 1. row ***************************
id: 7280000849080746110
host: logstash-db848f65f-lnlf9
message: 2023-04-12 02:03:21 status unpacked libc-bin:amd64 2.31-0ubuntu9
path: /var/log/dpkg.log
@timestamp: 2023-06-16T09:23:57.405Z
@version: 1