The query string can include specific operators that define the conditions for how the words from the query string should be matched.
An implicit logical AND operator is always present, so "hello world" implies that both "hello" and "world" must be found in the matching document.
hello world
Note: There is no explicit AND
operator.
The logical OR operator |
has a higher precedence than AND, so looking for cat | dog | mouse
means looking for (cat | dog | mouse)
rather than (looking for cat) | dog | mouse
.
hello | world
Note: There is no operator OR
. Please use |
instead.
hello MAYBE world
The MAYBE
operator functions similarly to the |
operator, but it does not return documents that match only the right subtree expression.
hello -world
hello !world
The negation operator enforces a rule for a word to not exist.
Queries containing only negations are not supported by default. To enable, use the server option not_terms_only_allowed.
@title hello @body world
The field limit operator restricts subsequent searches to a specified field. By default, the query will fail with an error message if the given field name does not exist in the searched table. However, this behavior can be suppressed by specifying the @@relaxed
option at the beginning of the query:
@@relaxed @nosuchfield my query
This can be useful when searching through heterogeneous tables with different schemas.
Field position limits additionally constrain the search to the first N positions within a given field (or fields). For example, @body [50] hello
will not match documents where the keyword hello
appears at position 51 or later in the body.
@body[50] hello
Multiple-field search operator:
@(title,body) hello world
Ignore field search operator (ignores any matches of 'hello world' from the 'title' field):
@!title hello world
Ignore multiple-field search operator (if there are fields 'title', 'subject', and 'body', then @!(title)
is equivalent to @(subject,body)
):
@!(title,body) hello world
All-field search operator:
@* hello
"hello world"
The phrase operator mandates that the words be adjacent to each other.
The phrase search operator can include a match any term
modifier. Within the phrase operator, terms are positionally significant. When the 'match any term' modifier is employed, the positions of the subsequent terms in that phrase query will be shifted. As a result, the 'match any' modifier does not affect search performance.
"exact * phrase * * for terms"
"hello world"~10
Proximity distance is measured in words, accounting for word count, and applies to all words within quotes. For example, the query "cat dog mouse"~5
indicates that there must be a span of fewer than 8 words containing all 3 words. Therefore, a document with CAT aaa bbb ccc DOG eee fff MOUSE
will not match this query, as the span is exactly 8 words long.
"the world is a wonderful place"/3
The quorum matching operator introduces a type of fuzzy matching. It will match only those documents that meet a given threshold of specified words. In the example above ("the world is a wonderful place"/3
), it will match all documents containing at least 3 of the 6 specified words. The operator is limited to 255 keywords. Instead of an absolute number, you can also provide a value between 0.0 and 1.0 (representing 0% and 100%), and Manticore will match only documents containing at least the specified percentage of given words. The same example above could also be expressed as "the world is a wonderful place"/0.5
, and it would match documents with at least 50% of the 6 words.
aaa << bbb << ccc
The strict order operator (also known as the "before" operator) matches a document only if its argument keywords appear in the document precisely in the order specified in the query. For example, the query black << cat
will match the document "black and white cat" but not the document "that cat was black". The order operator has the lowest priority. It can be applied to both individual keywords and more complex expressions. For instance, this is a valid query:
(bag of words) << "exact phrase" << red|green|blue
raining =cats and =dogs
="exact phrase"
The exact form keyword modifier matches a document only if the keyword appears in the exact form specified. By default, a document is considered a match if the stemmed/lemmatized keyword matches. For instance, the query "runs" will match both a document containing "runs" and one containing "running", because both forms stem to just "run". However, the =runs
query will only match the first document. The exact form modifier requires the index_exact_words option to be enabled.
Another use case is to prevent expanding a keyword to its *keyword*
form. For example, with index_exact_words=1
+ expand_keywords=1/star
, bcd
will find a document containing abcde
, but =bcd
will not.
As a modifier affecting the keyword, it can be used within operators such as phrase, proximity, and quorum operators. Applying an exact form modifier to the phrase operator is possible, and in this case, it internally adds the exact form modifier to all terms in the phrase.
nation* *nation* *national
Requires min_infix_len for prefix (expansion in trail) and/or suffix (expansion in head). If only prefixing is desired, min_prefix_len can be used instead.
The search will attempt to find all expansions of the wildcarded tokens, and each expansion is recorded as a matched hit. The number of expansions for a token can be controlled with the expansion_limit table setting. Wildcarded tokens can have a significant impact on query search time, especially when tokens have short lengths. In such cases, it is desirable to use the expansion limit.
The wildcard operator can be automatically applied if the expand_keywords table setting is used.
In addition, the following inline wildcard operators are supported:
?
can match any single character:t?st
will matchtest
, but notteast
%
can match zero or one character:tes%
will matchtes
ortest
, but nottesting
The inline operators require dict=keywords
(enabled by default) and prefixing/infixing enabled.
REGEX(/t.?e/)
Requires the min_infix_len or min_prefix_len and dict=keywords options to be set (which is a default).
Similarly to the wildcard operators, the REGEX operator attempts to find all tokens matching the provided pattern, and each expansion is recorded as a matched hit. Note, this can have a significant impact on query search time, as the entire dictionary is scanned, and every term in the dictionary undergoes matching with the REGEX pattern.
The patterns should adhere to the RE2 syntax. The REGEX expression delimiter is the first symbol after the open bracket. In other words, all text between the open bracket followed by the delimiter and the delimiter and the closed bracket is considered as a RE2 expression.
Please note that the terms stored in the dictionary undergo charset_table
transformation, meaning that for example, REGEX may not be able to match uppercase characters if all characters are lowercased according to the charset_table
(which happens by default). To successfully match a term using a REGEX expression, the pattern must correspond to the entire token. To achieve partial matching, place .*
at the beginning and/or end of your pattern.
REGEX(/.{3}t/)
REGEX(/t.*\d*/)
^hello world$
Field-start and field-end keyword modifiers ensure that a keyword only matches if it appears at the very beginning or the very end of a full-text field, respectively. For example, the query "^hello world$"
(enclosed in quotes to combine the phrase operator with the start/end modifiers) will exclusively match documents containing at least one field with these two specific keywords.
boosted^1.234 boostedfieldend$^1.234
The boost modifier raises the word IDF_score by the indicated factor in ranking scores that incorporate IDF into their calculations. It does not impact the matching process in any manner.
hello NEAR/3 world NEAR/4 "my test"
The NEAR
operator is a more generalized version of the proximity operator. Its syntax is NEAR/N
, which is case-sensitive and does not allow spaces between the NEAR
keywords, slash sign, and distance value.
While the original proximity operator works only on sets of keywords, NEAR
is more versatile and can accept arbitrary subexpressions as its two arguments. It matches a document when both subexpressions are found within N words of each other, regardless of their order. NEAR
is left-associative and shares the same (lowest) precedence as BEFORE.
It is important to note that one NEAR/7 two NEAR/7 three
is not exactly equivalent to "one two three"~7
. The key difference is that the proximity operator allows up to 6 non-matching words between all three matching words, while the version with NEAR
is less restrictive: it permits up to 6 words between one
and two
, and then up to 6 more between that two-word match and three
.
Church NOTNEAR/3 street
The NOTNEAR
operator serves as a negative assertion. It matches a document when the left argument is present and either the right argument is absent from the document or the right argument is a specified distance away from the end of the left matched argument. The distance is denoted in words. The syntax is NOTNEAR/N
, which is case-sensitive and does not permit spaces between the NOTNEAR
keyword, slash sign, and distance value. Both arguments of this operator can be terms or any operators or group of operators.
all SENTENCE words SENTENCE "in one sentence"
"Bill Gates" PARAGRAPH "Steve Jobs"
The SENTENCE
and PARAGRAPH
operators match a document when both of their arguments are within the same sentence or the same paragraph of text, respectively. These arguments can be keywords, phrases, or instances of the same operator.
The order of the arguments within the sentence or paragraph is irrelevant. These operators function only with tables built with index_sp (sentence and paragraph indexing feature) enabled and revert to a simple AND operation otherwise. For information on what constitutes a sentence and a paragraph, refer to the index_sp directive documentation.
ZONE:(h3,h4)
only in these titles
The ZONE limit
operator closely resembles the field limit operator but limits matching to a specified in-field zone or a list of zones. It is important to note that subsequent subexpressions do not need to match within a single continuous span of a given zone and may match across multiple spans. For example, the query (ZONE:th hello world)
will match the following sample document:
<th>Table 1. Local awareness of Hello Kitty brand.</th>
.. some table data goes here ..
<th>Table 2. World-wide brand awareness.</th>
The ZONE
operator affects the query until the next field or ZONE
limit operator, or until the closing parenthesis. It functions exclusively with tables built with zone support (refer to index_zones) and will be disregarded otherwise.
ZONESPAN:(h2)
only in a (single) title
The ZONESPAN
limit operator resembles the ZONE
operator but mandates that the match occurs within a single continuous span. In the example provided earlier, ZONESPAN:th hello world
would not match the document, as "hello" and "world" do not appear within the same span.
Since certain characters function as operators in the query string, they must be escaped to prevent query errors or unintended matching conditions.
The following characters should be escaped using a backslash (\
):
! " $ ' ( ) - / < @ \ ^ | ~
To escape a single quote ('), use one backslash:
SELECT * FROM your_index WHERE MATCH('l\'italiano');
For the other characters in the list mentioned earlier, which are operators or query constructs, they must be treated as simple characters by the engine, with a preceding escape character. The backslash must also be escaped, resulting in two backslashes:
SELECT * FROM your_index WHERE MATCH('r\\&b | \\(official video\\)');
To use a backslash as a character, you must escape both the backslash as a character and the backslash as the escape operator, which requires four backslashes:
SELECT * FROM your_index WHERE MATCH('\\\\ABC');
When you are working with JSON data in Manticore Search and need to include a double quote ("
) within a JSON string, it's important to handle it with proper escaping. In JSON, a double quote within a string is escaped using a backslash (\
). However, when inserting the JSON data through an SQL query, Manticore Search interprets the backslash (\
) as an escape character within strings.
To ensure the double quote is correctly inserted into the JSON data, you need to escape the backslash itself. This results in using two backslashes (\\
) before the double quote. For example:
insert into tbl(j) values('{"a": "\\"abc\\""}');
MySQL drivers provide escaping functions (e.g., mysqli_real_escape_string
in PHP or conn.escape_string
in Python), but they only escape specific characters.
You will still need to add escaping for the characters from the previously mentioned list that are not escaped by their respective functions.
Because these functions will escape the backslash for you, you only need to add one backslash.
This also applies to drivers that support (client-side) prepared statements. For example, with PHP PDO prepared statements, you need to add a backslash for the $
character:
$statement = $ln_sph->prepare( "SELECT * FROM index WHERE MATCH(:match)");
$match = '\$manticore';
$statement->bindParam(':match',$match,PDO::PARAM_STR);
$results = $statement->execute();
This results in the final query SELECT * FROM index WHERE MATCH('\\$manticore');
The same rules for the SQL protocol apply, with the exception that for JSON, the double quote must be escaped with a single backslash, while the rest of the characters require double escaping.
When using JSON libraries or functions that convert data structures to JSON strings, the double quote and single backslash are automatically escaped by these functions and do not need to be explicitly escaped.
The official clients utilize common JSON libraries/functions available in their respective programming languages under the hood. The same rules for escaping mentioned earlier apply.
The asterisk (*
) is a unique character that serves two purposes:
- as a wildcard prefix/suffix expander
- as an any-term modifier within a phrase search.
Unlike other special characters that function as operators, the asterisk cannot be escaped when it's in a position to provide one of its functionalities.
In non-wildcard queries, the asterisk does not require escaping, whether it's in the charset_table
or not.
In wildcard queries, an asterisk in the middle of a word does not require escaping. As a wildcard operator (either at the beginning or end of the word), the asterisk will always be interpreted as the wildcard operator, even if escaping is applied.
To escape special characters in JSON nodes, use a backtick. For example:
MySQL [(none)]> select * from t where json.`a=b`=234;
+---------------------+-------------+------+
| id | json | text |
+---------------------+-------------+------+
| 8215557549554925578 | {"a=b":234} | |
+---------------------+-------------+------+
MySQL [(none)]> select * from t where json.`a:b`=123;
+---------------------+-------------+------+
| id | json | text |
+---------------------+-------------+------+
| 8215557549554925577 | {"a:b":123} | |
+---------------------+-------------+------+
Consider this complex query example:
"hello world" @title "example program"~5 @body python -(php|perl) @* code
The full meaning of this search is:
- Locate the words 'hello' and 'world' adjacently in any field within a document;
- Additionally, the same document must also contain the words 'example' and 'program' in the title field, with up to, but not including, 5 words between them; (For instance, "example PHP program" would match, but "example script to introduce outside data into the correct context for your program" would not, as there are 5 or more words between the two terms)
- Furthermore, the same document must have the word 'python' in the body field, while excluding 'php' or 'perl';
- Finally, the same document must include the word 'code' in any field.
The OR operator takes precedence over AND, so "looking for cat | dog | mouse" means "looking for (cat | dog | mouse)" rather than "(looking for cat) | dog | mouse".
To comprehend how a query will be executed, Manticore Search provides query profiling tools to examine the query tree generated by a query expression.
To enable full-text query profiling with an SQL statement, you must activate it before executing the desired query:
SET profiling =1;
SELECT * FROM test WHERE MATCH('@title abc* @body hey');
To view the query tree, execute the SHOW PLAN
command immediately after running the query:
SHOW PLAN;
This command will return the structure of the executed query. Keep in mind that the 3 statements - SET profiling, the query, and SHOW - must be executed within the same session.
When using the HTTP JSON protocol we can just enable "profile":true
to get in response the full-text query tree structure.
{
"table":"test",
"profile":true,
"query":
{
"match_phrase": { "_all" : "had grown quite" }
}
}
The response will include a profile
object containing a query
member.
The query
property holds the transformed full-text query tree. Each node consists of:
type
: node type, which can be AND, OR, PHRASE, KEYWORD, etc.description
: query subtree for this node represented as a string (inSHOW PLAN
format)children
: any child nodes, if presentmax_field_pos
: maximum position within a field
A keyword node will additionally include:
word
: the transformed keyword.querypos
: position of this keyword in the query.excluded
: keyword excluded from the query.expanded
: keyword added by prefix expansion.field_start
: keyword must appear at the beginning of the field.field_end
: keyword must appear at the end of the field.boost
: the keyword's IDF will be multiplied by this value.
- SQL
- JSON
- PHP
- Python
- javascript
- Java
- C#
- TypeScript
- Go
SET profiling=1;
SELECT * FROM test WHERE MATCH('@title abc* @body hey');
SHOW PLAN \G
*************************** 1\. row ***************************
Variable: transformed_tree
Value: AND(
OR(fields=(title), KEYWORD(abcx, querypos=1, expanded), KEYWORD(abcm, querypos=1, expanded)),
AND(fields=(body), KEYWORD(hey, querypos=2)))
1 row in set (0.00 sec)
In some instances, the evaluated query tree may significantly differ from the original one due to expansions and other transformations.
- SQL
- JSON
- PHP
- Python
- javascript
- Java
- C#
- TypeScript
- Go
SET profiling=1;
SELECT id FROM forum WHERE MATCH('@title way* @content hey') LIMIT 1;
SHOW PLAN;
Query OK, 0 rows affected (0.00 sec)
+--------+
| id |
+--------+
| 711651 |
+--------+
1 row in set (0.04 sec)
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable | Value |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| transformed_tree | AND(
OR(
OR(
AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),
OR(
AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),
AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))),
AND(fields=(title), KEYWORD(way, querypos=1, expanded)),
OR(fields=(title), KEYWORD(way*, querypos=1, expanded))),
AND(fields=(content), KEYWORD(hey, querypos=2))) |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
The SQL statement EXPLAIN QUERY
enables the display of the execution tree for a given full-text query without performing an actual search query on the table.
- SQL
EXPLAIN QUERY index_base '@title running @body dog'\G
EXPLAIN QUERY index_base '@title running @body dog'\G
*************************** 1\. row ***************************
Variable: transformed_tree
Value: AND(
OR(
AND(fields=(title), KEYWORD(run, querypos=1, morphed)),
AND(fields=(title), KEYWORD(running, querypos=1, morphed))))
AND(fields=(body), KEYWORD(dog, querypos=2, morphed)))
EXPLAIN QUERY ... option format=dot
allows displaying the execution tree of a provided full-text query in a hierarchical format suitable for visualization by existing tools, such as https://dreampuf.github.io/GraphvizOnline:
- SQL
EXPLAIN QUERY tbl 'i me' option format=dot\G
EXPLAIN QUERY tbl 'i me' option format=dot\G
*************************** 1. row ***************************
Variable: transformed_tree
Value: digraph "transformed_tree"
{
0 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
0 -> 1
1 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
1 -> 2
2 [shape=record label="i | { querypos=1 }"]
0 -> 3
3 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
3 -> 4
4 [shape=record label="me | { querypos=2 }"]
}
When using an expression ranker, it's possible to reveal the values of the calculated factors with the PACKEDFACTORS() function.
The function returns:
- The values of document-level factors (such as bm25, field_mask, doc_word_count)
- A list of each field that generated a hit (including lcs, hit_count, word_count, sum_idf, min_hit_pos, etc.)
- A list of each keyword from the query along with their tf and idf values
These values can be utilized to understand why certain documents receive lower or higher scores in a search or to refine the existing ranking expression.
- SQL
SELECT id, PACKEDFACTORS() FROM test1 WHERE MATCH('test one') OPTION ranker=expr('1')\G
id: 1
packedfactors(): bm25=569, bm25a=0.617197, field_mask=2, doc_word_count=2,
field1=(lcs=1, hit_count=2, word_count=2, tf_idf=0.152356,
min_idf=-0.062982, max_idf=0.215338, sum_idf=0.152356, min_hit_pos=4,
min_best_span_pos=4, exact_hit=0, max_window_hits=1, min_gaps=2,
exact_order=1, lccs=1, wlccs=0.215338, atc=-0.003974),
word0=(tf=1, idf=-0.062982),
word1=(tf=1, idf=0.215338)
1 row in set (0.00 sec)