SET [GLOBAL] server_variable_name = value
SET [INDEX index_name] GLOBAL @user_variable_name = (int_val1 [, int_val2, ...])
SET NAMES value [COLLATE value]
SET @@dummy_variable = ignored_value
SET
statement modifies a variable value. The variable names are case-insensitive. No variable value changes survive server restart.
SET NAMES
statement and SET @@variable_name
syntax, both introduced do nothing. They were implemented to maintain compatibility with 3rd party MySQL client libraries, connectors, and frameworks that may need to run this statement when connecting.
There are the following classes of the variables:
- per-session server variable:
set var_name = value
- global server variable:
set global var_name = value
- global user variable:
set global @var_name = (value)
- global distributed variable:
set index dist_index_name global @var_name = (value)
Global user variables are shared between concurrent sessions. Currently, the only supported value type is the list of BIGINTs, and these variables can only be used along with IN() for filtering purpose. The intended usage scenario is uploading huge lists of values to searchd
(once) and reusing them (many times) later, saving on network overheads. Global user variables might be either transferred to all agents of distributed table or set locally in case of local table defined at distributed table. Example:
// in session 1
mysql> SET GLOBAL @myfilter=(2,3,5,7,11,13);
Query OK, 0 rows affected (0.00 sec)
// later in session 2
mysql> SELECT * FROM test1 WHERE group_id IN @myfilter;
+------+--------+----------+------------+-----------------+------+
| id | weight | group_id | date_added | title | tag |
+------+--------+----------+------------+-----------------+------+
| 3 | 1 | 2 | 1299338153 | another doc | 15 |
| 4 | 1 | 2 | 1299338153 | doc number four | 7,40 |
+------+--------+----------+------------+-----------------+------+
2 rows in set (0.02 sec)
Per-session and global server variables affect certain server settings in the respective scope. Known per-session server variables are:
AUTOCOMMIT = {0 | 1}
Whether any data modification statement should be implicitly wrapped byBEGIN
andCOMMIT
.COLLATION_CONNECTION = collation_name
Selects the collation to be used forORDER BY
orGROUP BY
on string values in the subsequent queries. Refer to Collations for a list of known collation names.CHARACTER_SET_RESULTS = charset_name
Does nothing; a placeholder to support frameworks, clients, and connectors that attempt to automatically enforce a charset when connecting to a Manticore server.SQL_AUTO_IS_NULL = value
Does nothing; a placeholder to support frameworks, clients, and connectors that attempt to automatically enforce a charset when connecting to a Manticore server.SQL_MODE = <value>
Does nothing; a placeholder to support frameworks, clients, and connectors that attempt to automatically enforce a charset when connecting to a Manticore server.WAIT_TIMEOUT = <value>
Set connection timeout, either per session or global. Global can only be set on a VIP connection.PROFILING = {0 | 1}
Enables query profiling in the current session. Defaults to 0. See also show profileMAX_THREADS_PER_QUERY = <POSITIVE_INT_VALUE>
Redefines max_threads_per_query in the runtime. Per-session variable influences only the queries run in the same session (connection), i.e. up to disconnect. Value 0 means 'no limit'. If both per-session and the global variables are set, the per-session one has a higher priority.ro = {1 | 0}
switch session to read-only mode or back. Inshow variables
output the variable displayed with namesession_read_only
.
Known global server variables are:
QUERY_LOG_FORMAT = {plain | sphinxql}
Changes the current log format.LOG_LEVEL = {info | debug | replication | debugv | debugvv}
Changes the current log verboseness level.QCACHE_MAX_BYTES = <value>
Changes the query_cache RAM use limit to a given value.QCACHE_THRESH_MSEC = <value>
Changes the query_cache> minimum wall time threshold to a given value.QCACHE_TTL_SEC = <value>
Changes the query_cache TTL for a cached result to a given value.MAINTENANCE = {0 | 1}
When set to 1, puts the server in maintenance mode. Only clients with vip connections can execute queries in this mode. All new non-vip incoming connections are refused. Existing connections are left intact.GROUPING_IN_UTC = {0 | 1}
When set to 1, cause timed grouping functions (day(), month(), year(), yearmonth(), yearmonthday()) to be calculated in utc. Read the doc for grouping_in_utc config params for more details.QUERY_LOG_MIN_MSEC = <value>
Changes the query_log_min_msec searchd settings value. In this case it expects value exactly in milliseconds and doesn't parse time suffixes, as in config.Warning: this is very specific and 'hard' variable; filtered out messages will be just dropped and not written into the log at all. Better just filter your log with something like 'grep', in this case you'll have at least full original log as backup.
LOG_DEBUG_FILTER = <string value>
Filters out redundant log messages. If the value is set, then all logs with level > INFO (i.e.,DEBUG
,DEBUGV
, etc.) will be compared with the string and output only in the case they starts with given value.MAX_THREADS_PER_QUERY = <POSITIVE_INT_VALUE>
Redefines max_threads_per_query in the runtime. As global it changes behaviour for all sessions. Value 0 means 'no limit'. If both per-session and the global variables are set, the per-session one has a higher priority.NET_WAIT = {-1 | 0 | POSITIVE_INT_VALUE}
Changes the net_wait_tm searchd settings value.IOSTATS = {0 | 1}
Enable or disable I/O operations (except for attributes) reporting in query log.CPUSTATS= {1|0}
Turns on/off cpu time tracking.COREDUMP= {1|0}
Turns on/off saving a core file or a minidump of the server on crash. More details here.PSEUDO_SHARDING = {1|0}
Turns on/off search pseudo-sharding.SECONDARY_INDEXES = {1|0}
Turns on/off secondary indexes for search queries.ES_COMPAT = {on/off/dashboards}
When set toon
(default) Elasticsearch-like write requests are supported,off
disables the support, anddashboards
enables the support, but also enables support for requests from Kibana (this functionality is experimental).
Examples:
mysql> SET autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> SET GLOBAL query_log_format=sphinxql;
Query OK, 0 rows affected (0.00 sec)
mysql> SET GLOBAL @banned=(1,2,3);
Query OK, 0 rows affected (0.01 sec)
mysql> SET INDEX users GLOBAL @banned=(1,2,3);
Query OK, 0 rows affected (0.01 sec)
To make user variables persistent make sure sphinxql_state is enabled.
▪️ Extensions
SphinxSE is a MySQL storage engine which can be compiled into MySQL/MariaDB server using its pluggable architecture.
Despite the name, SphinxSE does not actually store any data itself. It is actually a built-in client which allows MySQL server to talk to searchd
, run search queries, and obtain search results. All indexing and searching happen outside MySQL.
Obvious SphinxSE applications include:
- easier porting of MySQL FTS applications to Manticore;
- allowing Manticore use with programming languages for which native APIs are not available yet;
- optimizations when additional Manticore result set processing on MySQL side is required (eg. JOINs with original document tables, additional MySQL-side filtering, etc).
You will need to obtain a copy of MySQL sources, prepare those, and then recompile MySQL binary. MySQL sources (mysql-5.x.yy.tar.gz) could be obtained from http://dev.mysql.com Web site.
- copy
sphinx.5.0.yy.diff
patch file into MySQL sources directory and run$ patch -p1 < sphinx.5.0.yy.diff
If there's no .diff file exactly for the specific version you need to: build, try applying .diff with closest version numbers. It is important that the patch should apply with no rejects.
- in MySQL sources directory, run
$ sh BUILD/autorun.sh
- in MySQL sources directory, create
sql/sphinx
directory in and copy all files inmysqlse
directory from Manticore sources there. Example:$ cp -R /root/builds/sphinx-0.9.7/mysqlse /root/builds/mysql-5.0.24/sql/sphinx
- configure MySQL and enable the new engine:
$ ./configure --with-sphinx-storage-engine
- build and install MySQL:
$ make $ make install
- in MySQL sources directory, create
storage/sphinx
directory in and copy all files inmysqlse
directory from Manticore sources there. Example:$ cp -R /root/builds/sphinx-0.9.7/mysqlse /root/builds/mysql-5.1.14/storage/sphinx
- in MySQL sources directory, run
$ sh BUILD/autorun.sh
- configure MySQL and enable Manticore engine:
$ ./configure --with-plugins=sphinx
- build and install MySQL:
$ make $ make install
To check whether SphinxSE has been successfully compiled into MySQL, launch newly built servers, run mysql client and issue SHOW ENGINES
query. You should see a list of all available engines. Manticore should be present and "Support" column should contain "YES":
- sql
mysql> show engines;
+------------+----------+-------------------------------------------------------------+
| Engine | Support | Comment |
+------------+----------+-------------------------------------------------------------+
| MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance |
...
| SPHINX | YES | Manticore storage engine |
...
+------------+----------+-------------------------------------------------------------+
13 rows in set (0.00 sec)
To search via SphinxSE, you would need to create special ENGINE=SPHINX "search table", and then SELECT
from it with full text query put into WHERE
clause for query column.
Let's begin with an example create statement and search query:
CREATE TABLE t1
(
id INTEGER UNSIGNED NOT NULL,
weight INTEGER NOT NULL,
query VARCHAR(3072) NOT NULL,
group_id INTEGER,
INDEX(query)
) ENGINE=SPHINX CONNECTION="sphinx://localhost:9312/test";
SELECT * FROM t1 WHERE query='test it;mode=any';
First 3 columns of search table must have a types of INTEGER UNSINGED
or BIGINT
for the 1st column (document id), INTEGER
or BIGINT
for the 2nd column (match weight), and VARCHAR
or TEXT
for the 3rd column (your query), respectively. This mapping is fixed; you can not omit any of these three required columns, or move them around, or change types. Also, query column must be indexed; all the others must be kept unindexed. Column names are ignored so you can use arbitrary ones.
Additional columns must be either INTEGER
, TIMESTAMP
, BIGINT
, VARCHAR
, or FLOAT
. They will be bound to attributes provided in Manticore result set by name, so their names must match attribute names specified in sphinx.conf
. If there's no such attribute name in Manticore search results, column will have NULL
values.
Special "virtual" attributes names can also be bound to SphinxSE columns. _sph_
needs to be used instead of @
for that. For instance, to obtain the values of @groupby
, @count
, or @distinct
virtualattributes, use _sph_groupby
, _sph_count
or _sph_distinct
column names, respectively.
CONNECTION
string parameter is to be used to specify Manticore host, port and table. If no connection string is specified in CREATE TABLE
, table name *
(i.e. search all tables) and localhost:9312
are assumed. The connection string syntax is as follows:
CONNECTION="sphinx://HOST:PORT/TABLENAME"
You can change the default connection string later:
mysql> ALTER TABLE t1 CONNECTION="sphinx://NEWHOST:NEWPORT/NEWTABLENAME";
You can also override all these parameters per-query.
As seen in example, both query text and search options should be put into WHERE
clause on search query column (ie. 3rd column); the options are separated by semicolons; and their names from values by equality sign. Any number of options can be specified. Available options are:
- query - query text;
- mode - matching mode. Must be one of "all", "any", "phrase", "boolean", or "extended". Default is "all";
- sort - match sorting mode. Must be one of "relevance", "attr_desc", "attr_asc", "time_segments", or "extended". In all modes besides "relevance" attribute name (or sorting clause for "extended") is also required after a colon:
... WHERE query='test;sort=attr_asc:group_id'; ... WHERE query='test;sort=extended:@weight desc, group_id asc';
- offset - offset into result set, default is 0;
- limit - amount of matches to retrieve from result set, default is 20;
- index - names of the tables to search:
... WHERE query='test;index=test1;'; ... WHERE query='test;index=test1,test2,test3;';
- minid, maxid - min and max document ID to match;
- weights - comma-separated list of weights to be assigned to Manticore full-text fields:
... WHERE query='test;weights=1,2,3;';
- filter, !filter - comma-separated attribute name and a set of values to match:
# only include groups 1, 5 and 19 ... WHERE query='test;filter=group_id,1,5,19;'; # exclude groups 3 and 11 ... WHERE query='test;!filter=group_id,3,11;';
- range, !range - comma-separated (integer or bigint) Manticore attribute name, and min and max values to match:
# include groups from 3 to 7, inclusive ... WHERE query='test;range=group_id,3,7;'; # exclude groups from 5 to 25 ... WHERE query='test;!range=group_id,5,25;';
- floatrange, !floatrange - comma-separated (floating point) Manticore attribute name, and min and max values to match:
# filter by a float size ... WHERE query='test;floatrange=size,2,3;'; # pick all results within 1000 meter from geoanchor ... WHERE query='test;floatrange=@geodist,0,1000;';
- maxmatches - per-query max matches value, as in max_matches search option:
... WHERE query='test;maxmatches=2000;';
- cutoff - maximum allowed matches, as in cutoff search option:
... WHERE query='test;cutoff=10000;';
- maxquerytime - maximum allowed query time (in milliseconds), as in max_query_time search option:
... WHERE query='test;maxquerytime=1000;';
- groupby - group-by function and attribute. Read this about grouping search results:
... WHERE query='test;groupby=day:published_ts;'; ... WHERE query='test;groupby=attr:group_id;';
- groupsort - group-by sorting clause:
... WHERE query='test;groupsort=@count desc;';
- distinct - an attribute to compute COUNT(DISTINCT) for when doing group-by:
... WHERE query='test;groupby=attr:country_id;distinct=site_id';
- indexweights - comma-separated list of table names and weights to use when searching through several tables:
... WHERE query='test;indexweights=tbl_exact,2,tbl_stemmed,1;';
- fieldweights - comma-separated list of per-field weights that can be used by the ranker:
... WHERE query='test;fieldweights=title,10,abstract,3,content,1;';
- comment - a string to mark this query in query log, as in comment search option:
... WHERE query='test;comment=marker001;';
- select - a string with expressions to compute:
... WHERE query='test;select=2*a+3*** as myexpr;';
- host, port - remote
searchd
host name and TCP port, respectively:... WHERE query='test;host=sphinx-test.loc;port=7312;';
- ranker - a ranking function to use with "extended" matching mode, as in ranker. Known values are "proximity_bm25", "bm25", "none", "wordcount", "proximity", "matchany", "fieldmask", "sph04", "expr:EXPRESSION" syntax to support expression-based ranker (where EXPRESSION should be replaced with your specific ranking formula), and "export:EXPRESSION":
... WHERE query='test;mode=extended;ranker=bm25;'; ... WHERE query='test;mode=extended;ranker=expr:sum(lcs);';
The "export" ranker works exactly like ranker=expr, but it stores the per-document factor values, while ranker=expr discards them after computing the final
WEIGHT()
value. Note that ranker=export is meant to be used but rarely, only to train a ML (machine learning) function or to define your own ranking function by hand, and never in actual production. When using this ranker, you'll probably want to examine the output of theRANKFACTORS()
function that produces a string with all the field level factors for each document.
- sql
SELECT *, WEIGHT(), RANKFACTORS()
FROM myindex
WHERE MATCH('dog')
OPTION ranker=export('100*bm25');
*************************** 1\. row ***************************
id: 555617
published: 1110067331
channel_id: 1059819
title: 7
content: 428
weight(): 69900
rankfactors(): bm25=699, bm25a=0.666478, field_mask=2,
doc_word_count=1, field1=(lcs=1, hit_count=4, word_count=1,
tf_idf=1.038127, min_idf=0.259532, max_idf=0.259532, sum_idf=0.259532,
min_hit_pos=120, min_best_span_pos=120, exact_hit=0,
max_window_hits=1), word1=(tf=4, idf=0.259532)
*************************** 2\. row ***************************
id: 555313
published: 1108438365
channel_id: 1058561
title: 8
content: 249
weight(): 68500
rankfactors(): bm25=685, bm25a=0.675213, field_mask=3,
doc_word_count=1, field0=(lcs=1, hit_count=1, word_count=1,
tf_idf=0.259532, min_idf=0.259532, max_idf=0.259532, sum_idf=0.259532,
min_hit_pos=8, min_best_span_pos=8, exact_hit=0, max_window_hits=1),
field1=(lcs=1, hit_count=2, word_count=1, tf_idf=0.519063,
min_idf=0.259532, max_idf=0.259532, sum_idf=0.259532, min_hit_pos=36,
min_best_span_pos=36, exact_hit=0, max_window_hits=1), word1=(tf=3,
idf=0.259532)
- geoanchor - geodistance anchor. Read more about Geo-search in this section. Takes 4 parameters which are latitude and longitude attribute names, and anchor point coordinates respectively:
... WHERE query='test;geoanchor=latattr,lonattr,0.123,0.456';
One very important note that it is much more efficient to allow Manticore to perform sorting, filtering and slicing the result set than to raise max matches count and use WHERE
, ORDER BY
and LIMIT
clauses on MySQL side. This is for two reasons. First, Manticore does a number of optimizations and performs better than MySQL on these tasks. Second, less data would need to be packed by searchd, transferred and unpacked by SphinxSE.
Additional query info besides result set could be retrieved with SHOW ENGINE SPHINX STATUS
statement:
- sql
mysql> SHOW ENGINE SPHINX STATUS;
+--------+-------+-------------------------------------------------+
| Type | Name | Status |
+--------+-------+-------------------------------------------------+
| SPHINX | stats | total: 25, total found: 25, time: 126, words: 2 |
| SPHINX | words | sphinx:591:1256 soft:11076:15945 |
+--------+-------+-------------------------------------------------+
2 rows in set (0.00 sec)
This information can also be accessed through status variables. Note that this method does not require super-user privileges.
- sql
mysql> SHOW STATUS LIKE 'sphinx_%';
+--------------------+----------------------------------+
| Variable_name | Value |
+--------------------+----------------------------------+
| sphinx_total | 25 |
| sphinx_total_found | 25 |
| sphinx_time | 126 |
| sphinx_word_count | 2 |
| sphinx_words | sphinx:591:1256 soft:11076:15945 |
+--------------------+----------------------------------+
5 rows in set (0.00 sec)
You could perform JOINs on SphinxSE search table and tables using other engines. Here's an example with "documents" from example.sql:
- sql
mysql> SELECT content, date_added FROM test.documents docs
-> JOIN t1 ON (docs.id=t1.id)
-> WHERE query="one document;mode=any";
mysql> SHOW ENGINE SPHINX STATUS;
+-------------------------------------+---------------------+
| content | docdate |
+-------------------------------------+---------------------+
| this is my test document number two | 2006-06-17 14:04:28 |
| this is my test document number one | 2006-06-17 14:04:28 |
+-------------------------------------+---------------------+
2 rows in set (0.00 sec)
+--------+-------+---------------------------------------------+
| Type | Name | Status |
+--------+-------+---------------------------------------------+
| SPHINX | stats | total: 2, total found: 2, time: 0, words: 2 |
| SPHINX | words | one:1:2 document:2:2 |
+--------+-------+---------------------------------------------+
2 rows in set (0.00 sec)
SphinxSE also includes a UDF function that lets you create snippets through MySQL. The functionality is similar to HIGHLIGHT(), but accessible through MySQL+SphinxSE.
The binary that provides the UDF is named sphinx.so
and should be automatically built and installed to proper location along with SphinxSE itself. If it does not get installed automatically for some reason, look for sphinx.so
in the build directory and copy it to the plugins directory of your MySQL instance. After that, register the UDF using the following statement:
CREATE FUNCTION sphinx_snippets RETURNS STRING SONAME 'sphinx.so';
Function name must be sphinx_snippets, you can not use an arbitrary name. Function arguments are as follows:
Prototype: function sphinx_snippets ( document, table, words [, options] );
Document and words arguments can be either strings or table columns. Options must be specified like this: 'value' AS option_name
. For a list of supported options, refer to Highlighting section. The only UDF-specific additional option is named sphinx
and lets you specify searchd location (host and port).
Usage examples:
SELECT sphinx_snippets('hello world doc', 'main', 'world',
'sphinx://192.168.1.1/' AS sphinx, true AS exact_phrase,
'[**]' AS before_match, '[/**]' AS after_match)
FROM documents;
SELECT title, sphinx_snippets(text, 'index', 'mysql php') AS text
FROM sphinx, documents
WHERE query='mysql php' AND sphinx.id=documents.id;