Consider this complex query example:
"hello world" @title "example program"~5 @body python -(php|perl) @* code
The full meaning of this search is:
- Locate the words 'hello' and 'world' adjacently in any field within a document;
- Additionally, the same document must also contain the words 'example' and 'program' in the title field, with up to, but not including, 5 words between them; (For instance, "example PHP program" would match, but "example script to introduce outside data into the correct context for your program" would not, as there are 5 or more words between the two terms)
- Furthermore, the same document must have the word 'python' in the body field, while excluding 'php' or 'perl';
- Finally, the same document must include the word 'code' in any field.
The OR operator takes precedence over AND, so "looking for cat | dog | mouse" means "looking for (cat | dog | mouse)" rather than "(looking for cat) | dog | mouse".
To comprehend how a query will be executed, Manticore Search provides query profiling tools to examine the query tree generated by a query expression.
To enable full-text query profiling with an SQL statement, you must activate it before executing the desired query:
SET profiling =1;
SELECT * FROM test WHERE MATCH('@title abc* @body hey');
To view the query tree, execute the SHOW PLAN
command immediately after running the query:
SHOW PLAN;
This command will return the structure of the executed query. Keep in mind that the 3 statements - SET profiling, the query, and SHOW - must be executed within the same session.
When using the HTTP JSON protocol we can just enable "profile":true
to get in response the full-text query tree structure.
{
"table":"test",
"profile":true,
"query":
{
"match_phrase": { "_all" : "had grown quite" }
}
}
The response will include a profile
object containing a query
member.
The query
property holds the transformed full-text query tree. Each node consists of:
type
: node type, which can be AND, OR, PHRASE, KEYWORD, etc.description
: query subtree for this node represented as a string (inSHOW PLAN
format)children
: any child nodes, if presentmax_field_pos
: maximum position within a field
A keyword node will additionally include:
word
: the transformed keyword.querypos
: position of this keyword in the query.excluded
: keyword excluded from the query.expanded
: keyword added by prefix expansion.field_start
: keyword must appear at the beginning of the field.field_end
: keyword must appear at the end of the field.boost
: the keyword's IDF will be multiplied by this value.
- SQL
- JSON
- PHP
- Python
- javascript
- Java
- C#
- TypeScript
- Go
SET profiling=1;
SELECT * FROM test WHERE MATCH('@title abc* @body hey');
SHOW PLAN \G
*************************** 1\. row ***************************
Variable: transformed_tree
Value: AND(
OR(fields=(title), KEYWORD(abcx, querypos=1, expanded), KEYWORD(abcm, querypos=1, expanded)),
AND(fields=(body), KEYWORD(hey, querypos=2)))
1 row in set (0.00 sec)
In some instances, the evaluated query tree may significantly differ from the original one due to expansions and other transformations.
- SQL
- JSON
- PHP
- Python
- javascript
- Java
- C#
- TypeScript
- Go
SET profiling=1;
SELECT id FROM forum WHERE MATCH('@title way* @content hey') LIMIT 1;
SHOW PLAN;
Query OK, 0 rows affected (0.00 sec)
+--------+
| id |
+--------+
| 711651 |
+--------+
1 row in set (0.04 sec)
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable | Value |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| transformed_tree | AND(
OR(
OR(
AND(fields=(title), KEYWORD(wayne, querypos=1, expanded)),
OR(
AND(fields=(title), KEYWORD(ways, querypos=1, expanded)),
AND(fields=(title), KEYWORD(wayyy, querypos=1, expanded)))),
AND(fields=(title), KEYWORD(way, querypos=1, expanded)),
OR(fields=(title), KEYWORD(way*, querypos=1, expanded))),
AND(fields=(content), KEYWORD(hey, querypos=2))) |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
The SQL statement EXPLAIN QUERY
enables the display of the execution tree for a given full-text query without performing an actual search query on the table.
- SQL
EXPLAIN QUERY index_base '@title running @body dog'\G
EXPLAIN QUERY index_base '@title running @body dog'\G
*************************** 1\. row ***************************
Variable: transformed_tree
Value: AND(
OR(
AND(fields=(title), KEYWORD(run, querypos=1, morphed)),
AND(fields=(title), KEYWORD(running, querypos=1, morphed))))
AND(fields=(body), KEYWORD(dog, querypos=2, morphed)))
EXPLAIN QUERY ... option format=dot
allows displaying the execution tree of a provided full-text query in a hierarchical format suitable for visualization by existing tools, such as https://dreampuf.github.io/GraphvizOnline:
- SQL
EXPLAIN QUERY tbl 'i me' option format=dot\G
EXPLAIN QUERY tbl 'i me' option format=dot\G
*************************** 1. row ***************************
Variable: transformed_tree
Value: digraph "transformed_tree"
{
0 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
0 -> 1
1 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
1 -> 2
2 [shape=record label="i | { querypos=1 }"]
0 -> 3
3 [shape=record,style=filled,bgcolor="lightgrey" label="AND"]
3 -> 4
4 [shape=record label="me | { querypos=2 }"]
}
When using an expression ranker, it's possible to reveal the values of the calculated factors with the PACKEDFACTORS() function.
The function returns:
- The values of document-level factors (such as bm25, field_mask, doc_word_count)
- A list of each field that generated a hit (including lcs, hit_count, word_count, sum_idf, min_hit_pos, etc.)
- A list of each keyword from the query along with their tf and idf values
These values can be utilized to understand why certain documents receive lower or higher scores in a search or to refine the existing ranking expression.
- SQL
SELECT id, PACKEDFACTORS() FROM test1 WHERE MATCH('test one') OPTION ranker=expr('1')\G
id: 1
packedfactors(): bm25=569, bm25a=0.617197, field_mask=2, doc_word_count=2,
field1=(lcs=1, hit_count=2, word_count=2, tf_idf=0.152356,
min_idf=-0.062982, max_idf=0.215338, sum_idf=0.152356, min_hit_pos=4,
min_best_span_pos=4, exact_hit=0, max_window_hits=1, min_gaps=2,
exact_order=1, lccs=1, wlccs=0.215338, atc=-0.003974),
word0=(tf=1, idf=-0.062982),
word1=(tf=1, idf=0.215338)
1 row in set (0.00 sec)