⪢ Full-text matching
The MATCH
clause allows for full-text searches in text fields. The input query string is tokenized using the same settings applied to the text during indexing. In addition to the tokenization of input text, the query string supports a number of full-text operators that enforce various rules on how keywords should provide a valid match.
Full-text match clauses can be combined with attribute filters as an AND boolean. OR relations between full-text matches and attribute filters are not supported.
The match query is always executed first in the filtering process, followed by the attribute filters. The attribute filters are applied to the result set of the match query. A query without a match clause is called a fullscan.
There must be at most one MATCH()
in the SELECT
clause.
Using the full-text query syntax, matching is performed across all indexed text fields of a document, unless the expression requires a match within a field (like phrase search) or is limited by field operators.
SELECT * FROM myindex WHERE MATCH('cats|birds');
The SELECT statement uses a MATCH clause, which must come after WHERE, for performing full-text searches. MATCH()
accepts an input string in which all full-text operators are available.
- SQL
- MATCH with filters
SELECT * FROM myindex WHERE MATCH('"find me fast"/2');
+------+------+----------------+
| id | gid | title |
+------+------+----------------+
| 1 | 11 | first find me |
| 2 | 12 | second find me |
+------+------+----------------+
2 rows in set (0.00 sec)
Full-text matching is available in the /search
endpoint and in HTTP-based clients. The following clauses can be used for performing full-text matches:
"match" is a simple query that matches the specified keywords in the specified fields.
"query":
{
"match": { "field": "keyword" }
}
You can specify a list of fields:
"match":
{
"field1,field2": "keyword"
}
Or you can use _all
or *
to search all fields.
You can search all fields except one using "!field":
"match":
{
"!field1": "keyword"
}
By default, keywords are combined using the OR operator. However, you can change that behavior using the "operator" clause:
"query":
{
"match":
{
"content,title":
{
"query":"keyword",
"operator":"or"
}
}
}
"operator" can be set to "or" or "and".
The boost
modifier can also be applied. It raises the word IDF_score by the indicated factor in ranking scores that incorporate IDF into their calculations. It does not impact the matching process in any manner.
"query":
{
"match":
{
"field1":
{
"query": "keyword",
"boost": 2.0
}
}
}
"match_phrase" is a query that matches the entire phrase. It is similar to a phrase operator in SQL. Here's an example:
"query":
{
"match_phrase": { "_all" : "had grown quite" }
}
"query_string" accepts an input string as a full-text query in MATCH()
syntax.
"query":
{
"query_string": "Church NOTNEAR/3 street"
}
"match_all" accepts an empty object and returns documents from the table without performing any attribute filtering or full-text matching. Alternatively, you can just omit the query
clause in the request which will have the same effect.
"query":
{
"match_all": {}
}
All full-text match clauses can be combined with must, must_not, and should operators of a JSON bool
query.
- match
- match_phrase
- query_string
- PHP
- Python
- javascript
- Java
- C#
- TypeScript
- Go
POST /search
-d
'{
"table" : "hn_small",
"query":
{
"match":
{
"*" : "find joe"
}
},
"_source": ["story_author","comment_author"],
"limit": 1
}'
{
"took" : 3,
"timed_out" : false,
"hits" : {
"hits" : [
{
"_id": 668018,
"_score" : 3579,
"_source" : {
"story_author" : "IgorPartola",
"comment_author" : "joe_the_user"
}
}
],
"total" : 88063,
"total_relation" : "eq"
}
}
The query string can include specific operators that define the conditions for how the words from the query string should be matched.
An implicit logical AND operator is always present, so "hello world" implies that both "hello" and "world" must be found in the matching document.
hello world
Note: There is no explicit AND
operator.
The logical OR operator |
has a higher precedence than AND, so looking for cat | dog | mouse
means looking for (cat | dog | mouse)
rather than (looking for cat) | dog | mouse
.
hello | world
Note: There is no operator OR
. Please use |
instead.
hello MAYBE world
The MAYBE
operator functions similarly to the |
operator, but it does not return documents that match only the right subtree expression.
hello -world
hello !world
The negation operator enforces a rule for a word to not exist.
Queries containing only negations are not supported by default. To enable, use the server option not_terms_only_allowed.
@title hello @body world
The field limit operator restricts subsequent searches to a specified field. By default, the query will fail with an error message if the given field name does not exist in the searched table. However, this behavior can be suppressed by specifying the @@relaxed
option at the beginning of the query:
@@relaxed @nosuchfield my query
This can be useful when searching through heterogeneous tables with different schemas.
Field position limits additionally constrain the search to the first N positions within a given field (or fields). For example, @body [50] hello
will not match documents where the keyword hello
appears at position 51 or later in the body.
@body[50] hello
Multiple-field search operator:
@(title,body) hello world
Ignore field search operator (ignores any matches of 'hello world' from the 'title' field):
@!title hello world
Ignore multiple-field search operator (if there are fields 'title', 'subject', and 'body', then @!(title)
is equivalent to @(subject,body)
):
@!(title,body) hello world
All-field search operator:
@* hello
"hello world"
The phrase operator mandates that the words be adjacent to each other.
The phrase search operator can incorporate a match any term
modifier. Within the phrase operator, terms are positionally significant. When the 'match any term' modifier is employed, the positions of the subsequent terms in that phrase query will be shifted. As a result, the 'match any' modifier does not affect search performance.
"exact * phrase * * for terms"
"hello world"~10
Proximity distance is measured in words, accounting for word count, and applies to all words within quotes. For example, the query "cat dog mouse"~5
indicates that there must be a span of fewer than 8 words containing all 3 words. Therefore, a document with CAT aaa bbb ccc DOG eee fff MOUSE
will not match this query, as the span is exactly 8 words long.
"the world is a wonderful place"/3
The quorum matching operator introduces a type of fuzzy matching. It will match only those documents that meet a given threshold of specified words. In the example above ("the world is a wonderful place"/3
), it will match all documents containing at least 3 of the 6 specified words. The operator is limited to 255 keywords. Instead of an absolute number, you can also provide a value between 0.0 and 1.0 (representing 0% and 100%), and Manticore will match only documents containing at least the specified percentage of given words. The same example above could also be expressed as "the world is a wonderful place"/0.5
, and it would match documents with at least 50% of the 6 words.
aaa << bbb << ccc
The strict order operator (also known as the "before" operator) matches a document only if its argument keywords appear in the document precisely in the order specified in the query. For example, the query black << cat
will match the document "black and white cat" but not the document "that cat was black". The order operator has the lowest priority. It can be applied to both individual keywords and more complex expressions. For instance, this is a valid query:
(bag of words) << "exact phrase" << red|green|blue
raining =cats and =dogs
="exact phrase"
The exact form keyword modifier matches a document only if the keyword appears in the exact form specified. By default, a document is considered a match if the stemmed/lemmatized keyword matches. For instance, the query "runs" will match both a document containing "runs" and one containing "running", because both forms stem to just "run". However, the =runs
query will only match the first document. The exact form operator requires the index_exact_words option to be enabled.
Another use case is to prevent expanding a keyword to its *keyword*
form. For example, with index_exact_words=1
+ expand_keywords=1/star
, bcd
will find a document containing abcde
, but =bcd
will not.
As a modifier affecting the keyword, it can be used within operators such as phrase, proximity, and quorum operators. Applying an exact form modifier to the phrase operator is possible, and in this case, it internally adds the exact form modifier to all terms in the phrase.
nation* *nation* *national
Requires min_infix_len for prefix (expansion in trail) and/or suffix (expansion in head). If only prefixing is desired, min_prefix_len can be used instead.
The search will attempt to find all expansions of the wildcarded tokens, and each expansion is recorded as a matched hit. The number of expansions for a token can be controlled with the expansion_limit table setting. Wildcarded tokens can have a significant impact on query search time, especially when tokens have short lengths. In such cases, it is desirable to use the expansion limit.
The wildcard operator can be automatically applied if the expand_keywords table setting is used.
In addition, the following inline wildcard operators are supported:
?
can match any single character:t?st
will matchtest
, but notteast
%
can match zero or one character:tes%
will matchtes
ortest
, but nottesting
The inline operators require dict=keywords
and infixing enabled.
REGEX(/t.?e/)
Requires the min_infix_len or min_prefix_len and dict=keywords options to be set (which is a default).
Similarly to the wildcard operators, the REGEX operator attempts to find all tokens matching the provided pattern, and each expansion is recorded as a matched hit. Note, this can have a significant impact on query search time, as the entire dictionary is scanned, and every term in the dictionary undergoes matching with the REGEX pattern.
The patterns should adhere to the RE2 syntax. The REGEX expression delimiter is the first symbol after the open bracket. In other words, all text between the open bracket followed by the delimiter and the delimiter and the closed bracket is considered as a RE2 expression.
Please note that the terms stored in the dictionary undergo charset_table
transformation, meaning that for example, REGEX may not be able to match uppercase characters if all characters are lowercased according to the charset_table
(which happens by default). To successfully match a term using a REGEX expression, the pattern must correspond to the entire token. To achieve partial matching, place .*
at the beginning and/or end of your pattern.
REGEX(/.{3}t/)
REGEX(/t.*\d*/)
^hello world$
Field-start and field-end keyword modifiers ensure that a keyword only matches if it appears at the very beginning or the very end of a full-text field, respectively. For example, the query "^hello world$"
(enclosed in quotes to combine the phrase operator with the start/end modifiers) will exclusively match documents containing at least one field with these two specific keywords.
boosted^1.234 boostedfieldend$^1.234
The boost modifier raises the word IDF_score by the indicated factor in ranking scores that incorporate IDF into their calculations. It does not impact the matching process in any manner.
hello NEAR/3 world NEAR/4 "my test"
The NEAR
operator is a more generalized version of the proximity operator. Its syntax is NEAR/N
, which is case-sensitive and does not allow spaces between the NEAR
keywords, slash sign, and distance value.
While the original proximity operator works only on sets of keywords, NEAR
is more versatile and can accept arbitrary subexpressions as its two arguments. It matches a document when both subexpressions are found within N words of each other, regardless of their order. NEAR
is left-associative and shares the same (lowest) precedence as BEFORE.
It is important to note that one NEAR/7 two NEAR/7 three
is not exactly equivalent to "one two three"~7
. The key difference is that the proximity operator allows up to 6 non-matching words between all three matching words, while the version with NEAR
is less restrictive: it permits up to 6 words between one
and two
, and then up to 6 more between that two-word match and three
.
Church NOTNEAR/3 street
The NOTNEAR
operator serves as a negative assertion. It matches a document when the left argument is present and either the right argument is absent from the document or the right argument is a specified distance away from the end of the left matched argument. The distance is denoted in words. The syntax is NOTNEAR/N
, which is case-sensitive and does not permit spaces between the NOTNEAR
keyword, slash sign, and distance value. Both arguments of this operator can be terms or any operators or group of operators.
all SENTENCE words SENTENCE "in one sentence"
"Bill Gates" PARAGRAPH "Steve Jobs"
The SENTENCE
and PARAGRAPH
operators match a document when both of their arguments are within the same sentence or the same paragraph of text, respectively. These arguments can be keywords, phrases, or instances of the same operator.
The order of the arguments within the sentence or paragraph is irrelevant. These operators function only with tables built with index_sp (sentence and paragraph indexing feature) enabled and revert to a simple AND operation otherwise. For information on what constitutes a sentence and a paragraph, refer to the index_sp directive documentation.
ZONE:(h3,h4)
only in these titles
The ZONE limit
operator closely resembles the field limit operator but limits matching to a specified in-field zone or a list of zones. It is important to note that subsequent subexpressions do not need to match within a single continuous span of a given zone and may match across multiple spans. For example, the query (ZONE:th hello world)
will match the following sample document:
<th>Table 1. Local awareness of Hello Kitty brand.</th>
.. some table data goes here ..
<th>Table 2. World-wide brand awareness.</th>
The ZONE
operator influences the query until the next field or ZONE
limit operator, or until the closing parenthesis. It functions exclusively with tables built with zone support (refer to index_zones) and will be disregarded otherwise.
ZONESPAN:(h2)
only in a (single) title
The ZONESPAN
limit operator resembles the ZONE
operator but mandates that the match occurs within a single continuous span. In the example provided earlier, ZONESPAN:th hello world
would not match the document, as "hello" and "world" do not appear within the same span.