Query logging

Query logging can be enabled by setting the query_log directive in the searchd section of the configuration file.

Queries can also be sent to syslog by setting syslog instead of a file path. In this case, all search queries will be sent to the syslog daemon with LOG_INFO priority, prefixed with [query] instead of a timestamp. Only the plain log format is supported for syslog.

‹›
  • Config
Config
📋
searchd {
...
    query_log = /var/log/query.log
    query_log_format = sphinxql # default
...
}

Logging format

Two query log formats are supported:

  • sphinxql (default): Logs in SQL format. It also provides an easy way to replay logged queries.
  • plain: Logs full-text queries in a simple text format. Recommended if most of your queries are primarily full-text, or if you don't care about non-full-text components of your queries, such as filtering by attributes, sorting, grouping, etc. Queries logged in the plain format cannot be replayed.

To switch between the formats, you can use the searchd setting query_log_format.

SQL log format

The SQL log format is the default setting. In this mode, Manticore logs all successful and unsuccessful select queries. Requests sent as SQL or via the binary API are logged in the SQL format, but JSON queries are logged as is. This type of logging only works with plain log files and does not support the 'syslog' service for logging.

‹›
  • Config
Config
📋
query_log_format = sphinxql # default

The features of the Manticore SQL log format compared to the plain format include:

  • Full statement data is logged where possible.
  • Errors and warnings are logged.
  • The query log can be replayed.
  • Additional performance counters (currently, per-agent distributed query times) are logged.
  • Each log entry is a valid Manticore SQL/JSON statement that reconstructs the full request, except if the logged request is too large and needs to be shortened for performance reasons.
  • JSON requests and additional messages, counters, etc., are logged as comments.
‹›
  • Example
Example
📋
/* Sun Apr 28 12:38:02.808 2024 conn 2 (127.0.0.1:53228) real 0.000 wall 0.000 found 0 */ SELECT * FROM test WHERE MATCH('test') OPTION ranker=proximity;
/* Sun Apr 28 12:38:05.585 2024 conn 2 (127.0.0.1:53228) real 0.001 wall 0.001 found 0 */ SELECT * FROM test WHERE MATCH('test') GROUP BY channel_id OPTION ranker=proximity;
/* Sun Apr 28 12:40:57.366 2024 conn 4 (127.0.0.1:53256) real 0.000 wall 0.000 found 0 */  /*{
    "table" : "test",
    "query":
    {
        "match":
        {
            "*" : "test"
        }
    },
    "_source": ["f"],
    "limit": 30
} */

Plain log format

With the plain log format, Manticore logs all successfully executed search queries in a simple text format. Non-full-text parts of queries are not logged. JSON queries are logged as flattened to a single line.

‹›
  • Config
Config
📋
query_log_format = plain

The log format is as follows:

[query-date] real-time wall-time [match-mode/filters-count/sort-mode total-matches (offset,limit) @groupby-attr] [table-name] {perf-stats} query

where:

  • real-time is the time from the start to the finish of the query.
  • wall-time is similar to real-time, but excludes time spent waiting for agents and merging result sets from them.
  • perf-stats includes CPU/IO stats when Manticore is started with --cpustats (or it was enabled via SET GLOBAL cpustats=1) and/or --iostats (or it was enabled via SET GLOBAL iostats=1):
    • ios is the number of file I/O operations carried out;
    • kb is the amount of data in kilobytes read from the table files;
    • ms is the time spent on I/O operations.
    • cpums is the time in milliseconds spent on CPU processing the query.
  • match-mode can have one of the following values:
    • "all" for SPH_MATCH_ALL mode;
    • "any" for SPH_MATCH_ANY mode;
    • "phr" for SPH_MATCH_PHRASE mode;
    • "bool" for SPH_MATCH_BOOLEAN mode;
    • "ext" for SPH_MATCH_EXTENDED mode;
    • "ext2" for SPH_MATCH_EXTENDED2 mode;
    • "scan" if the full scan mode was used, either by being specified with SPH_MATCH_FULLSCAN or if the query was empty.
  • sort-mode can have one of the following values:
    • "rel" for SPH_SORT_RELEVANCE mode;
    • "attr-" for SPH_SORT_ATTR_DESC mode;
    • "attr+" for SPH_SORT_ATTR_ASC mode;
    • "tsegs" for SPH_SORT_TIME_SEGMENTS mode;
    • "ext" for SPH_SORT_EXTENDED mode.

Note: the SPH* modes are specific to the sphinx legacy interface. SQL and JSON interfaces will log, in most cases, ext2 as match-mode and ext and rel as sort-mode.

‹›
  • Example
Example
📋
[Fri Jun 29 21:17:58 2021] 0.004 sec [all/0/rel 35254 (0,20)] [lj] [ios=6 kb=111.1 ms=0.5] test
[Fri Jun 29 21:17:58 2021] 0.004 sec [all/0/rel 35254 (0,20)] [lj] [ios=6 kb=111.1 ms=0.5 cpums=0.3] test
[Sun Apr 28 15:09:38.712 2024] 0.000 sec 0.000 sec [ext2/0/ext 0 (0,20)] [test] test
[Sun Apr 28 15:09:44.974 2024] 0.000 sec 0.000 sec [ext2/0/ext 0 (0,20) @channel_id] [test] test
[Sun Apr 28 15:24:32.975 2024] 0.000 sec 0.000 sec [ext2/0/ext 0 (0,30)] [test] {     "table" : "test",     "query":     {         "match":         {             "*" : "test"         }     },     "_source": ["f"],     "limit": 30 }

Logging only slow queries

By default, all queries are logged. If you want to log only queries with execution times exceeding a specified limit, the query_log_min_msec directive can be used.

The expected unit of measurement is milliseconds, but time suffix expressions can also be used.

‹›
  • Config
Config
📋
searchd {
...
    query_log = /var/log/query.log
    query_log_min_msec  = 1000
    # query_log_min_msec  = 1s
...
}

Log file permission mode

By default, the searchd and query log files are created with permission 600, so only the user under which Manticore is running and root can read the log files. The query_log_mode option allows setting a different permission. This can be helpful for allowing other users to read the log files (for example, monitoring solutions running on non-root users).

‹›
  • Config
Config
📋
searchd {
...
    query_log = /var/log/query.log
    query_log_mode = 666
...
}

Server logging

By default, Manticore search daemon logs all runtime events in a searchd.log file in the directory where searchd was started. In Linux by default, you can find the log at /var/log/manticore/searchd.log.

The log file path/name can be overridden by setting log in the searchd section of the configuration file.

searchd {
...
    log = /custom/path/to/searchd.log
...
}
  • You can also use syslog as the file name. In this case, events will be sent to your server's syslog daemon.
  • In some cases, you might want to use /dev/stdout as the file name. In this case, on Linux, Manticore will simply output the events. This can be useful in Docker/Kubernetes environments.

Binary logging

Binary logging serves as a recovery mechanism for real-time table data. When binary logs are enabled, searchd records each transaction to the binlog file and utilizes it for recovery following an unclean shutdown. During a clean shutdown, RAM chunks are saved to disk, and all binlog files are subsequently deleted.

Enabling and disabling binary logging

By default, binary logging is enabled to safeguard data integrity. On Linux systems, the default location for binlog.* files in Plain mode is /var/lib/manticore/data/. In RT mode, binary logs are stored in the <data_dir>/binlog/ folder, unless specified otherwise.

Global binary logging configuration

To disable binary logging globally, set binlog_path to an empty value in the searchd configuration. Disabling binary logging requires a restart of the daemon and puts data at risk if the system shuts down unexpectedly.

‹›
  • Example
Example
📋
searchd {
...
    binlog_path = # disable logging
...

You can use the following directive to set a custom path:

‹›
  • Example
Example
📋
searchd {
...
    binlog_path = /var/data
...

Per-table binary logging configuration

For more granular control, binary logging can be disabled at the table level for real-time tables by setting the binlog table parameter to 0. This option is not available for percolate tables.

‹›
  • Example
Example
📋
create table a (id bigint, s string attribute) binlog='0';

For existing RT tables, binary logging can also be disabled by modifying the binlog parameter.

‹›
  • Example
Example
📋
alter table FOO binlog='0';

If binary logging was previously disabled, it can be re-enabled by setting the binlog parameter back to 1:

‹›
  • Example
Example
📋
alter table FOO binlog='1';

Important considerations:

  • Dependency on global settings: per-table binary logging settings only take effect if binary logging is globally enabled in the searchd configuration (binlog_path must not be empty).
  • Binary logging status and transaction ID insights: Modifying the binary logging status of a table forces an immediate flush of the table. If you turn off binary logging for a table, its transaction ID (TID) changes to -1. This indicates that binary logging is not active, and no changes are being tracked. Conversely, if you start binary logging for a table, its transaction ID becomes a non-negative number (zero or higher). This indicates that the table's changes are now being recorded. You can check the transaction ID by using the command: SHOW TABLE <name> STATUS. The transaction ID reflects whether changes to the table are being recorded (non-negative number) or not (-1).

Operations

When binary logging is turned on, every change made to an RT table is saved to a log file. If the system shuts down unexpectedly, these logs are used automatically when the system starts again to bring back all the changes that were logged.

Log size

During normal operations, when the amount of data logged reaches a certain limit (set by binlog_max_log_size), a new log file starts. Old log files are kept until all changes in them are completely processed and saved to disk as a disk chunk. If this limit is set to 0, the log files are kept until the system is properly shut down. By default, there's no limit to how large these files can grow.

‹›
  • Example
Example
📋
searchd {
...
    binlog_max_log_size = 16M
....

Log files

Each binlog file is named with a zero-padded number, like binlog.0000, binlog.0001, etc., typically showing four digits. You can change how many digits the number has with the setting binlog_filename_digits. If you have more binlog files than the number of digits can accommodate, the number of digits will be automatically increased to fit all files.

Important: To change the number of digits, you must first save all table data and properly shut down the system. Then, delete the old log files and restart the system.

‹›
  • Example
Example
📋
searchd {
...
    binlog_filename_digits = 6
...

Binary logging strategies

You can choose between two ways to manage binary log files, which can be set with the binlog_common directive:

  • Separate file for each table (default, 0): Each table saves its changes in its own log file. This setup is good if you have many tables that get updated at different times. It allows tables to be updated without waiting for others. Also, if there is a problem with one table's log file, it does not affect the others.
  • Single file for all tables (1): All tables use the same binary log file. This method makes it easier to handle files because there are fewer of them. However, this could keep files longer than needed if one table still needs to save its updates. This setting might also slow things down if many tables need to update at the same time because all changes have to wait to be written to one file.
‹›
  • binlog_common
binlog_common
📋
searchd {
...
    binlog_common = 1
...

Binary flushing strategies

There are four different binlog flushing strategies, controlled by the binlog_flush directive:

  • 0 - Data is written to disk (flushed) every second, and Manticore initiates making it secure on the disk (syncing) right after flushing. This method is the fastest, but if the server or computer crashes suddenly, some recently written data that hasn't been secured may be lost.
  • 1 - Data is written to the binlog and synced immediately after each transaction. This method is the safest as it ensures that each change is immediately preserved, but it slows down writing.
  • 2 - Data is written after each transaction, and a sync is initiated every second. This approach offers a balance, writing data regularly and quickly. However, if the computer fails, some of the data that was being secured might not finish saving. Also, syncing may take longer than one second depending on the disk.
  • 3 - Similar to 2, but it also ensures the binlog file is synced before it is closed due to exceeding binlog_max_log_size.

The default mode is 2, which writes data after each transaction and starts syncing it every second, balancing speed and safety.

‹›
  • Example
Example
📋
searchd {
...
    binlog_flush = 1 # ultimate safety, low write speed
...
}

Recovery

During recovery after an unclean shutdown, binlogs are replayed, and all logged transactions since the last good on-disk state are restored. Transactions are checksummed, so in case of binlog file corruption, garbage data will not be replayed; such a broken transaction will be detected and will stop the replay.

Flushing RT RAM chunks

Intensive updates to a small RT table that fully fits into a RAM chunk can result in an ever-growing binlog that can never be unlinked until a clean shutdown. Binlogs essentially serve as append-only deltas against the last known good saved state on disk, and they cannot be unlinked unless the RAM chunk is saved. An ever-growing binlog is not ideal for disk usage and crash recovery time. To address this issue, you can configure searchd to perform periodic RAM chunk flushes using the rt_flush_period directive. With periodic flushes enabled, searchd will maintain a separate thread that checks whether RT table RAM chunks need to be written back to disk. Once this occurs, the respective binlogs can be (and are) safely unlinked.

The default RT flush period is set to 10 hours.

‹›
  • Example
Example
📋
searchd {
...
    rt_flush_period = 3600 # 1 hour
...
}

It's important to note that rt_flush_period only controls the frequency at which checks occur. There are no guarantees that a specific RAM chunk will be saved. For example, it doesn't make sense to regularly re-save a large RAM chunk that only receives a few rows worth of updates. Manticore automatically determines whether to perform the flush using a few heuristics.