Since certain characters function as operators in the query string, they must be escaped to prevent query errors or unintended matching conditions.
The following characters should be escaped using a backslash (\
):
! " $ ' ( ) - / < @ \ ^ | ~
To escape a single quote ('), use one backslash:
SELECT * FROM your_index WHERE MATCH('l\'italiano');
For the other characters in the list mentioned earlier, which are operators or query constructs, they must be treated as simple characters by the engine, with a preceding escape character. The backslash must also be escaped, resulting in two backslashes:
SELECT * FROM your_index WHERE MATCH('r\\&b | \\(official video\\)');
To use a backslash as a character, you must escape both the backslash as a character and the backslash as the escape operator, which requires four backslashes:
SELECT * FROM your_index WHERE MATCH('\\\\ABC');
When you are working with JSON data in Manticore Search and need to include a double quote ("
) within a JSON string, it's important to handle it with proper escaping. In JSON, a double quote within a string is escaped using a backslash (\
). However, when inserting the JSON data through an SQL query, Manticore Search interprets the backslash (\
) as an escape character within strings.
To ensure the double quote is correctly inserted into the JSON data, you need to escape the backslash itself. This results in using two backslashes (\\
) before the double quote. For example:
insert into tbl(j) values('{"a": "\\"abc\\""}');
MySQL drivers provide escaping functions (e.g., mysqli_real_escape_string
in PHP or conn.escape_string
in Python), but they only escape specific characters.
You will still need to add escaping for the characters from the previously mentioned list that are not escaped by their respective functions.
Because these functions will escape the backslash for you, you only need to add one backslash.
This also applies to drivers that support (client-side) prepared statements. For example, with PHP PDO prepared statements, you need to add a backslash for the $
character:
$statement = $ln_sph->prepare( "SELECT * FROM index WHERE MATCH(:match)");
$match = '\$manticore';
$statement->bindParam(':match',$match,PDO::PARAM_STR);
$results = $statement->execute();
This results in the final query SELECT * FROM index WHERE MATCH('\\$manticore');
The same rules for the SQL protocol apply, with the exception that for JSON, the double quote must be escaped with a single backslash, while the rest of the characters require double escaping.
When using JSON libraries or functions that convert data structures to JSON strings, the double quote and single backslash are automatically escaped by these functions and do not need to be explicitly escaped.
The new official clients (which use the HTTP protocol) utilize common JSON libraries/functions available in their respective programming languages under the hood. The same rules for escaping mentioned earlier apply.
The asterisk (*
) is a unique character that serves two purposes:
- as a wildcard prefix/suffix expander
- as an any-term modifier within a phrase search.
Unlike other special characters that function as operators, the asterisk cannot be escaped when it's in a position to provide one of its functionalities.
In non-wildcard queries, the asterisk does not require escaping, whether it's in the charset_table
or not.
In wildcard queries, an asterisk in the middle of a word does not require escaping. As a wildcard operator (either at the beginning or end of the word), the asterisk will always be interpreted as the wildcard operator, even if escaping is applied.
To escape special characters in JSON nodes, use a backtick. For example:
MySQL [(none)]> select * from t where json.`a=b`=234;
+---------------------+-------------+------+
| id | json | text |
+---------------------+-------------+------+
| 8215557549554925578 | {"a=b":234} | |
+---------------------+-------------+------+
MySQL [(none)]> select * from t where json.`a:b`=123;
+---------------------+-------------+------+
| id | json | text |
+---------------------+-------------+------+
| 8215557549554925577 | {"a:b":123} | |
+---------------------+-------------+------+
Consider this complex query example:
"hello world" @title "example program"~5 @body python -(php|perl) @* code
The full meaning of this search is:
- Locate the words 'hello' and 'world' adjacently in any field within a document;
- Additionally, the same document must also contain the words 'example' and 'program' in the title field, with up to, but not including, 5 words between them; (For instance, "example PHP program" would match, but "example script to introduce outside data into the correct context for your program" would not, as there are 5 or more words between the two terms)
- Furthermore, the same document must have the word 'python' in the body field, while excluding 'php' or 'perl';
- Finally, the same document must include the word 'code' in any field.
The OR operator takes precedence over AND, so "looking for cat | dog | mouse" means "looking for (cat | dog | mouse)" rather than "(looking for cat) | dog | mouse".
To comprehend how a query will be executed, Manticore Search provides query profiling tools to examine the query tree generated by a query expression.
To enable full-text query profiling with an SQL statement, you must activate it before executing the desired query:
SET profiling =1;
SELECT * FROM test WHERE MATCH('@title abc* @body hey');
To view the query tree, execute the SHOW PLAN
command immediately after running the query:
SHOW PLAN;
This command will return the structure of the executed query. Keep in mind that the 3 statements - SET profiling, the query, and SHOW - must be executed within the same session.
When using the HTTP JSON protocol we can just enable "profile":true
to get in response the full-text query tree structure.
{
"index":"test",
"profile":true,
"query":
{
"match_phrase": { "_all" : "had grown quite" }
}
}
The response will include a profile
object containing a query
member.
The query
property holds the transformed full-text query tree. Each node consists of:
type
: node type, which can be AND, OR, PHRASE, KEYWORD, etc.description
: query subtree for this node represented as a string (inSHOW PLAN
format)children
: any child nodes, if presentmax_field_pos
: maximum position within a field
A keyword node will additionally include:
word
: the transformed keyword.querypos
: position of this keyword in the query.excluded
: keyword excluded from the query.expanded
: keyword added by prefix expansion.field_start
: keyword must appear at the beginning of the field.field_end
: keyword must appear at the end of the field.boost
: the keyword's IDF will be multiplied by this value.
- SQL
- JSON
- PHP
- Python
- javascript
- Java
- C#
- TypeScript
- Go
SET profiling=1;
SELECT * FROM test WHERE MATCH('@title abc* @body hey');
SHOW PLAN \G
*************************** 1\. row ***************************
Variable: transformed_tree
Value: AND(
OR(fields=(title), KEYWORD(abcx, querypos=1, expanded), KEYWORD(abcm, querypos=1, expanded)),
AND(fields=(body), KEYWORD(hey, querypos=2)))
1 row in set (0.00 sec)
In some instances, the evaluated query tree may significantly differ from the original one due to expansions and other transformations.
- SQL
- JSON
- PHP
- Python
- javascript
- Java
- C#
- TypeScript
- Go
SET profiling=1;
SELECT id FROM forum WHERE MATCH('@title way* @content hey') LIMIT 1;
SHOW PLAN;
Query OK, 0 rows affected (0.00 sec)
+--------+
| id |
+--------+
| 711651 |
+--------+
1 row in set (0.04 sec)
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable | Value |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| transformed_tree | AND(
OR(
OR(
AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),
OR(
AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),
AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))),
AND(fields=(title), KEYWORD(way, querypos=1, expanded)),
OR(fields=(title), KEYWORD(way*, querypos=1, expanded))),
AND(fields=(content), KEYWORD(hey, querypos=2))) |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
The SQL statement EXPLAIN QUERY
enables the display of the execution tree for a given full-text query without performing an actual search query on the table.
- SQL
EXPLAIN QUERY index_base '@title running @body dog'\G
EXPLAIN QUERY index_base '@title running @body dog'\G
*************************** 1\. row ***************************
Variable: transformed_tree
Value: AND(
OR(
AND(fields=(title), KEYWORD(run, querypos=1, morphed)),
AND(fields=(title), KEYWORD(running, querypos=1, morphed))))
AND(fields=(body), KEYWORD(dog, querypos=2, morphed)))
EXPLAIN QUERY ... option format=dot
allows displaying the execution tree of a provided full-text query in a hierarchical format suitable for visualization by existing tools, such as https://dreampuf.github.io/GraphvizOnline:
- SQL
EXPLAIN QUERY tbl 'i me' option format=dot\G
EXPLAIN QUERY tbl 'i me' option format=dot\G
*************************** 1. row ***************************
Variable: transformed_tree
Value: digraph "transformed_tree"
{
0 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
0 -> 1
1 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
1 -> 2
2 [shape=record label="i | { querypos=1 }"]
0 -> 3
3 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
3 -> 4
4 [shape=record label="me | { querypos=2 }"]
}
When using an expression ranker, it's possible to reveal the values of the calculated factors with the PACKEDFACTORS() function.
The function returns:
- The values of document-level factors (such as bm25, field_mask, doc_word_count)
- A list of each field that generated a hit (including lcs, hit_count, word_count, sum_idf, min_hit_pos, etc.)
- A list of each keyword from the query along with their tf and idf values
These values can be utilized to understand why certain documents receive lower or higher scores in a search or to refine the existing ranking expression.
- SQL
SELECT id, PACKEDFACTORS() FROM test1 WHERE MATCH('test one') OPTION ranker=expr('1')\G
id: 1
packedfactors(): bm25=569, bm25a=0.617197, field_mask=2, doc_word_count=2,
field1=(lcs=1, hit_count=2, word_count=2, tf_idf=0.152356,
min_idf=-0.062982, max_idf=0.215338, sum_idf=0.152356, min_hit_pos=4,
min_best_span_pos=4, exact_hit=0, max_window_hits=1, min_gaps=2,
exact_order=1, lccs=1, wlccs=0.215338, atc=-0.003974),
word0=(tf=1, idf=-0.062982),
word1=(tf=1, idf=0.215338)
1 row in set (0.00 sec)
Queries can be automatically optimized if OPTION boolean_simplify=1
is specified. Some transformations performed by this optimization include:
- Excess brackets:
((A | B) | C)
becomes(A | B | C)
;((A B) C)
becomes(A B C)
- Excess AND NOT:
((A !N1) !N2)
becomes(A !(N1 | N2))
- Common NOT:
((A !N) | (B !N))
becomes((A | B) !N)
- Common Compound NOT:
((A !(N AA)) | (B !(N BB)))
becomes(((A | B) !N) | (A !AA) | (B !BB))
if the cost of evaluating N is greater than the sum of evaluating A and B - Common subterm:
((A (N | AA)) | (B (N | BB)))
becomes(((A | B) N) | (A AA) | (B BB))
if the cost of evaluating N is greater than the sum of evaluating A and B - Common keywords:
(A | "A B"~N)
becomesA
;("A B" | "A B C")
becomes"A B"
;("A B"~N | "A B C"~N)
becomes("A B"~N)
- Common phrase:
("X A B" | "Y A B")
becomes("("X"|"Y") A B")
- Common AND NOT:
((A !X) | (A !Y) | (A !Z))
becomes(A !(X Y Z))
- Common OR NOT:
((A !(N | N1)) | (B !(N | N2)))
becomes(( (A !N1) | (B !N2) ) !N)
Note that optimizing queries consumes CPU time, so for simple queries or hand-optimized queries, you'll achieve better results with the defaultboolean_simplify=0
value. Simplifications often benefit complex queries or algorithmically generated queries.
Queries like -dog
, which could potentially include all documents from the collection are not allowed by default. To allow them, you must specify not_terms_only_allowed=1
either as a global setting or as a search option.