Searching > Options | Manticore Search Manual

WHERE is an SQL clause which works for both fulltext matching and additional filtering. The following operators are available:

Comparison operators <, > <=, >=, =, <>, BETWEEN, IN, IS NULL
Boolean operators AND, OR, NOT

MATCH('query') is supported and maps to fulltext query.

{col_name | expr_alias} [NOT] IN @uservar condition syntax is supported. Refer to SET syntax for a description of global user variables.

JSON queries have two distinct entities: fulltext queries and filters. Both can be organised in a tree (using a bool query), but for now filters work only for the root element of the query. For example:

{
  "index":"test",
  "query": { "range": { "price": { "lte": 11 } } }
}

Here's an example of several filters in a bool query:

{
  "index": "test1",
  "query":
  {
    "bool":
    {
      "must":
      [
        { "match" : { "_all" : "product" } },
        { "range": { "price": { "gte": 500, "lte": 1000 } } },
      ],
      "must_not":
      {
        "range": { "revision": { "lt": 15 } }
      }
    }
  }
}

This is a fulltext query that matches all the documents containing product in any field. These documents must have a price greater or equal than 500 (gte) and less or equal than 1000 (lte). All of these documents must not have a revision less than 15 (lt).

A bool query matches documents matching boolean combinations of other queries and/or filters. Queries and filters must be specified in "must", "should" or "must_not" sections. Example:

{
  "index":"test",
  "query":
  {
    "bool":
    {
      "must":
      [
        { "match": {"_all":"keyword"} },
        { "range": { "int_col": { "gte": 14 } } }
      ]
    }
  }
}

Queries and filters specified in the "must" section must match the documents. If several fulltext queries or filters are specified, all of them. This is the equivalent of AND queries in SQL.

Queries and filters specified in the should section should match the documents. If some queries are specified in must or must_not, should queries are ignored. On the other hand, if there are no queries other than should, then at least one of these queries must match a document for it to match the bool query. This is the equivalent of OR queries.

Queries and filters specified in the must_not section must not match the documents. If several queries are specified under must_not, the document matches if none of them match.

Example:

{
  "index": "test1",
  "query":
  {
    "bool":
    {
      "must":
      {
        "match" : { "_all" : "product" }
      },
      "must_not":
      [
        { "match": {"_all":"phone"} },
        { "range": { "price": { "gte": 500 } } }
      ]
    }
  }
}

Queries in SQL format (query_string) can also be used in bool queries. Example:

{
  "index": "test1",
  "query":
  {
    "bool":
    {
      "must":
      [
        { "query_string" : "product" },
        { "query_string" : "good" }
      ]
    }
  }
}

Equality filters are the simplest filters that work with integer, float and string attributes. Example:

{
  "index":"test1",
  "query":
  {
    "equals": { "price": 500 }
  }
}

Set filters check if attribute value is equal to any of the values in the specified set. Example:

{
  "index":"test1",
    "query": 
    {
    "in": 
    {
      "price": [1,10,100]
    }
  }
}

Set filters support integer, string and multi-value attributes.

Range filters match documents that have attribute values within a specified range. Example:

{
  "index":"test1",
  "query":
  {
    "range":
    {
      "price":
      {
        "gte": 500,
        "lte": 1000
      }
    }
  }
}

Range filters support the following properties:

Value must be greater than or equal to

value must be greater than

value must be less than or equal to

value must be less

geo_distance filters are used to filter the documents that are within a specific distance from a geo location.

Example:

{
  "index":"test",
  "query":
  {
    "geo_distance":
    {
      "location_anchor": {"lat":49, "lon":15},
      "location_source": {"attr_lat, attr_lon"},
      "distance_type": "adaptive",
      "distance":"100 km"
    }
      }
}

Specifies the pin location, in degrees. Distances are calculated from this point.

Specifies the attributes that contain latitude and longitude.

Specifies distance calculation function. Can be either adaptive or haversine. adaptive is faster and more precise, for more details see GEODIST(). Optional, defaults to adaptive.

Specifies the maximum distance from the pin locations. All documents within this distance match. The distance can be specified in various units. If no unit is specified, the distance is assumed to be in meters. Here is a list of supported distance units:

Meter: m or meters
Kilometer: km or kilometers
Centimeter: cm or centimeters
Millimeter: mm or millimeters
Mile: mi or miles
Yard: yd or yards
Feet: ft or feet
Inch: in or inch
Nautical mile: NM, nmi or nauticalmiles

location_anchor and location_source properties accept the following latitude/longitude formats:

an object with lat and lon keys: { "lat":"attr_lat", "lon":"attr_lon" }
a string of the following structure: "attr_lat,attr_lon"
an array with the latitude and longitude in the following order: [attr_lon, attr_lat]

Latitude and longitude are specified in degrees.

geo_distance can be used as a filter in bool queries along with matches or other attribute filters:

{
  "index": "geodemo",
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "*": "station"
          }
        },
        {
          "equals": {
            "state_code": "ENG"
          }
        },
        {
          "geo_distance": {
            "distance_type": "adaptive",
            "location_anchor": {
              "lat": 52.396,
              "lon": -1.774
            },
            "location_source": "latitude_deg,longitude_deg",
            "distance": "10000 m"
          }
        }
      ]
    }
  }
}

Expressions

Last modified: July 22, 2020

Manticore lets you use arbitrary arithmetic expressions both via SQL and HTTP, involving attribute values, internal attributes (document ID and relevance weight), arithmetic operations, a number of built-in functions, and user-defined functions. Here’s the complete reference list for quick access.

+, -, *, /, %, DIV, MOD

The standard arithmetic operators. Arithmetic calculations involving those can be performed in three different modes:

using single-precision, 32-bit IEEE 754 floating point values (the default),
using signed 32-bit integers
using 64-bit signed integers

The expression parser will automatically switch to integer mode if there are no operations the result in a floating point value. Otherwise, it will use the default floating point mode. For instance, a+b will be computed using 32-bit integers if both arguments are 32-bit integers; or using 64-bit integers if both arguments are integers but one of them is 64-bit; or in floats otherwise. However, a/b or sqrt(a) will always be computed in floats, because these operations return a result of non-integer type. To avoid the first, you can either use IDIV(a,b) or a DIV b form. Also, a*b will not be automatically promoted to 64-bit when the arguments are 32-bit. To enforce 64-bit results, you can use BIGINT(), but note that if there are non-integer operations, BIGINT() will simply be ignored.

<, > <=, >=, =, <>

Comparison operators return 1.0 when the condition is true and 0.0 otherwise. For instance, (a=b)+3 will evaluate to 4 when attribute a is equal to attribute b, and to 3 when a is not. Unlike MySQL, the equality comparisons (ie. = and <> operators) introduce a small equality threshold (1e-6 by default). If the difference between compared values is within the threshold, they will be considered equal. BETWEEN and IN operators in case of multi-value attribute return true if at least one value matches the condition(same as ANY()). IN doesn't support JSON attributes. IS (NOT) NULL is supported only for JSON attributes.

AND, OR, NOT

Boolean operators (AND, OR, NOT) behave as usual. They are left-associative and have the least priority compared to other operators. NOT has more priority than AND and OR but nevertheless less than any other operator. AND and OR have the same priority so brackets use is recommended to avoid confusion in complex expressions.

&, |

These operators perform bitwise AND and OR respectively. The operands must be of an integer types.

In HTTP JSON interface expressions are supported via script_fields and expressions

{
    "index": "test",
    "query": { 
        "match_all": {} 
    }, "script_fields": {
        "add_all": { 
            "script": { 
                "inline": "( gid * 10 ) | crc32(title)" 
            } 
        },
        "title_len": { 
            "script": { 
                "inline": "crc32(title)" 
            } 
        }
    }
}

In this example two expressions are created: add_all and title_len. First expression calculates ( gid * 10 ) | crc32(title) and stores the result in the add_all attribute. Second expression calculates crc32(title) and stores the result in the title_len attribute.

Only inline expressions are supported for now. The value of inline property (the expression to compute) has the same syntax as SQL expressions.

The expression name can be used in filtering or sorting.

‹›

script_fields

script_fields

📋

{
    "index":"movies_rt",
    "script_fields":{
        "cond1":{
            "script":{
                "inline":"actor_2_facebook_likes =296 OR movie_facebook_likes =37000"
            }
        },
        "cond2":{
            "script":{
                "inline":"IF (IN (content_rating,'TV-PG','PG'),2, IF(IN(content_rating,'TV-14','PG-13'),1,0))"
            }
        }
    },
    "limit":10,
    "sort":[
        {
            "cond2":"desc"
        },
        {
            "actor_1_name":"asc"
        },
        {
            "actor_2_name":"desc"
        }
    ],
    "profile":true,
    "query":{
        "bool":{
            "must":[
                {
                    "match":{
                        "*":"star"
                    }
                },
                {
                    "equals":{
                        "cond1":1
                    }
                }
            ],
            "must_not":[
                {
                    "equals":{
                        "content_rating":"R"
                    }
                }
            ]
        }
    }
}

The expression values are by default included in the _source array of the result set. If the source is selective (see Source selection) the expressions name can be added to the _source parameter in the request.

expressions is an alternative to script_fields with a simpler syntax. Example request adds two expressions and stores the results into add_all and title_len attributes.

‹›

expressions

expressions

📋

{
  "index": "test",
  "query": { "match_all": {} },
  "expressions":
  {
      "add_all": "( gid * 10 ) | crc32(title)",
      "title_len": "crc32(title)"
  }
}

Filters Search options

Last modified: July 22, 2020

SQL SELECT clause supports a number of options that can be used to fine-tune search behaviour.

SELECT ... OPTION <optionname>=<value> [ , ... ]

Example:

SELECT * FROM test WHERE MATCH('@title hello @body world')
OPTION ranker=bm25, max_matches=3000,
    field_weights=(title=10, body=3), agent_query_timeout=10000

Supported options and respectively allowed values are:

Integer. Max time in milliseconds to wait for remote queries to complete, see this section.

0 or 1, enables simplifying the query to speed it up

String, user comment that gets copied to a query log file

Integer. Max found matches threshold.

Named integer list (per-field user weights for ranking)

Use global statistics (frequencies) from the global_idf file for IDF computations.

Quoted, comma-separated list of IDF computation flags. Known flags are:

normalized: BM25 variant, idf = log((N-n+1)/n), as per Robertson et al
plain: plain variant, idf = log(N/n), as per Sparck-Jones
tfidf_normalized: additionally divide IDF by query word count, so that TF*IDF fits into [0, 1] range
tfidf_unnormalized: do not additionally divide IDF by query word count where N is the collection size and n is the number of matched documents

The historically default IDF (Inverse Document Frequency) in Manticore is equivalent to OPTION idf='normalized,tfidf_normalized', and those normalizations may cause several undesired effects.

First, idf=normalized causes keyword penalization. For instance, if you search for the | something and the occurs in more than 50% of the documents, then documents with both keywords the and something will get less weight than documents with just one keyword something. Using OPTION idf=plain avoids this. Plain IDF varies in [0, log(N)] range, and keywords are never penalized; while the normalized IDF varies in [-log(N), log(N)] range, and too frequent keywords are penalized.

Second, idf=tfidf_normalized causes IDF drift over queries. Historically, we additionally divided IDF by query keyword count, so that the entire sum(tf*idf) over all keywords would still fit into [0,1] range. However, that means that queries word1 and word1 | nonmatchingword2 would assign different weights to the exactly same result set, because the IDFs for both word1 and nonmatchingword2 would be divided by 2. OPTION idf='tfidf_unnormalized' fixes that. Note that BM25, BM25A, BM25F() ranking factors will be scale accordingly once you disable this normalization.

IDF flags can be mixed; plain and normalized are mutually exclusive; tfidf_unnormalized and tfidf_normalized are mutually exclusive; and unspecified flags in such a mutually exclusive group take their defaults. That means that OPTION idf=plain is equivalent to a complete OPTION idf='plain,tfidf_normalized' specification.

0 or 1,automatically sum DFs over all the local parts of a distributed index, so that the IDF is consistent (and precise) over a locally sharded index.

Named integer list. Per-index user weights for ranking.

Integer. Per-query max matches value.

Maximum amount of matches that the server keeps in RAM for each index and can return to the client. Default is 1000.

Introduced in order to control and limit RAM usage, max_matches setting defines how much matches will be kept in RAM while searching each index. Every match found will still be processed; but only best N of them will be kept in memory and return to the client in the end. Assume that the index contains 2,000,000 matches for the query. You rarely (if ever) need to retrieve all of them. Rather, you need to scan all of them, but only choose “best” at most, say, 500 by some criteria (ie. sorted by relevance, or price, or anything else), and display those 500 matches to the end user in pages of 20 to 100 matches. And tracking only the best 500 matches is much more RAM and CPU efficient than keeping all 2,000,000 matches, sorting them, and then discarding everything but the first 20 needed to display the search results page. max_matches controls N in that "best N" amount.

This parameter noticeably affects per-query RAM and CPU usage. Values of 1,000 to 10,000 are generally fine, but higher limits must be used with care. Recklessly raising max_matches to 1,000,000 means that searchd will have to allocate and initialize 1-million-entry matches buffer for every query. That will obviously increase per-query RAM usage, and in some cases can also noticeably impact performance.

Sets maximum search query time, in milliseconds. Must be a non-negative integer. Default value is 0 which means "do not limit". Local search queries will be stopped once that much time has elapsed. Note that if you're performing a search which queries several local indexes, this limit applies to each index separately. Note it may increase the query's response time a little bit, the overhead is caused by constant tracking if it's time to stop the query.

Integer. Max predicted search time, see predicted_time_costs.

Any of:

proximity_bm25
bm25
none
wordcount
proximity
matchany
fieldmask
sph04
expr
or export

Refer to Search results ranking for more details on each ranker.

Integer. Distributed retries count.

Integer. Distributed retry delay, msec.

0 or 1, lets you control the order in which full-scan query processes the rows.

pq - priority queue, set by default
kbuffer - gives faster sorting for already pre-sorted data, e.g. index data sorted by id The result set is in both cases the same; picking one option or the other may just improve (or worsen!) performance.

Lets you specify a specific integer seed value for an ORDER BY RAND() query, for example: ... OPTION rand_seed=1234. By default, a new and different seed value is autogenerated for every query

Runs the query with idle priority

0 or 1, expand keywords with exact forms and/or stars when possible. Refer to expand_keywords for more details.

Quoted, colon-separated of library name:plugin name:optional string of settings. Query-time token filter gets created on search each time full-text invoked by every index involved and let you implement a custom tokenizer that makes tokens according to custom rules.

SELECT * FROM index WHERE MATCH ('yes@no') OPTION token_filter='mylib.so:blend:@'

none allows to replace all query terms with their exact forms if index was built with index_exact_words enabled. Useful to prevent stemming or lemmatizing query terms.

Expressions Highlighting

Last modified: July 22, 2020

Highlighting allows you to get highlighted text fragments (called snippets) from documents that contain matching keywords.

SQL's HIGHLIGHT() function, "highlight" property in json queries via HTTP and highlight() function in the PHP client all use built-in document storage for retrieving original field contents (enabled by default).

‹›

SQL
HTTP
PHP

📋

SELECT HIGHLIGHT() FROM books WHERE MATCH('try');

POST /search
{
  "index": "books",
  "query":  {  "match": { "*" : "try" }  },
  "highlight": {}
}

$results = $index->search('try')->highlight()->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId();
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.': '.$value;
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet) 
        {
            echo "- ".$snippet."\n";
        }
    }
}

‹›

Response

+----------------------------------------------------------+
| highlight()                                              |
+----------------------------------------------------------+
| Don`t <b>try</b> to compete in childishness, said Bliss. |
+----------------------------------------------------------+
1 row in set (0.00 sec)

{
  "took":1,
  "timed_out":false,
  "hits":
  {
    "total":1,
    "hits":
    [
      {
        "_id":"4",
        "_score":1704,
        "_source":
        {
          "title":"Book four",
          "content":"Don`t try to compete in childishness, said Bliss."
        },
        "highlight":
        {
          "title": ["Book four"],
          "content": ["Don`t <b>try</b> to compete in childishness, said Bliss."]
        }
      }
    ]
  }
}

Document: 14
title: Book four
content: Don`t try to compete in childishness, said Bliss.
Highlight for title:
- Book four
Highlight for content:
- Don`t <b>try</b> to compete in childishness, said Bliss.

When using SQL to highlight search results, you get different snippets from different fields concatenated as a single string. It is a limitation of mysql protocol. You can fine-tune concatenation separators with field_separator and snippet_separator options, see below.

When running json queries via HTTP or using the PHP client, there are no such limitations and the result set contains an array of fields which contains arrays of snippets (without the separators).

Note that snippet generation options such as limit, limit_words, limit_snippets are applied to each field separately (by default). You can change this behavior using the limits_per_field option, but it may lead to undesirable results. I.e. one of the fields has matching keywords, but no snippets from this field are included in the result set because they didn't rank as high as the snippets from the other fields in the highlighting engine.

Highlighting algorithm currently favors better snippets (with closer phrase matches), and then snippets with keywords not yet included in the result. Generally, it will try to highlight the best match with the query, and it will also try to highlight all the query keywords, as made possible by the limits. If there are no matches in the current field, the beginning of the document trimmed down according to the limits will be return by default. You can also return an empty string instead by setting allow_empty option to 1.

Highlighting is performed on a so-called post limit stage, meaning that snippet generation is postponed not just until the entire final result set is ready, but even after the LIMIT clause is applied. For example, with a LIMIT 20,10 clause, HIGHLIGHT() function will be called at most 10 times.

There are several additional optional highlighting options that can be used to fine-tune snippet generation. Most of them are common to SQL, HTTP and PHP client.

A string to insert before a keyword match. A %SNIPPET_ID% macro can be used in this string. The first match of the macro is replaced with an incrementing snippet number within a current snippet. Numbering starts at 1 by default but can be overridden with start_snippet_id option. %SNIPPET_ID% restarts at the start of every new document. Default is <b>.

A string to insert after a keyword match. Default is </b>.

Maximum snippet size, in symbols (codepoints). Default is 256. Per-field by default, see limits_per_field.

Limits the maximum number of words that can be included in the result. Note the limit applies to any words, and not just the matched keywords to highlight. For example, if we are highlighting Mary and a snippet Mary had a little lamb is selected, then it contributes 5 words to this limit, not just 1. Default is 0 (no limit). Per-field by default, see limits_per_field.

Limits the maximum number of snippets that can be included in the result. Default is 0 (no limit). Per-field by default, see limits_per_field.

Selects whether limit, limit_words and limit_snippets work as individual limits in every field of the document being highlighted or as global limits for the whole document. Setting this option to 0 means that all combined highlighting results for one document must be within the specified limits. The downside is that you may get several snippets highlighted in one field and none in another if the highlighting engine decides that they are more relevant. Default is 1 (use per-field limits).

How much words to pick around each matching keywords block. Default is 5.

Whether to additionally break snippets by phrase boundary characters, as configured in index settings with phrase_boundary directive. Default is 0 (don't use boundaries).

Whether to sort the extracted snippets in order of relevance (decreasing weight), or in order of appearance in the document (increasing position). Default is 0 (don't use weight order).

Ignores length limit until the result includes all the keywords. Default is 0 (don't force all keywords).

Specifies the starting value of %SNIPPET_ID% macro (that gets detected and expanded in before_match, after_match strings). Default is 1.

HTML stripping mode setting. Defaults to index, which means that index settings will be used. The other values are none and strip, that forcibly skip or apply stripping irregardless of index settings; and retain, that retains HTML markup and protects it from highlighting. The retain mode can only be used when highlighting full documents and thus requires that no snippet size limits are set. String, allowed values are none, strip, index, and retain.

Allows empty string to be returned as highlighting result when no snippets could be generated in the current field (no keywords match, or no snippets fit the limit). By default, the beginning of original text would be returned instead of an empty string. Default is 0 (don't allow empty result).

Ensures that snippets do not cross a sentence, paragraph, or zone boundary (when used with an index that has the respective indexing settings enabled). String, allowed values are sentence, paragraph, and zone.

Emits an HTML tag with an enclosing zone name before each snippet. Default is 0 (don't emit zone names).

Whether to force snippet generation even if limits allow to highlight whole text. Default is 0 (don't force snippet generation).

‹›

SQL
HTTP
PHP

📋

SELECT HIGHLIGHT({limit=50}) FROM books WHERE MATCH('try|gets|down|said');

POST /search
{
  "index": "books",
  "query": {"query_string": "try|gets|down|said"},
  "highlight": { "limit":50 }
}

$results = $index->search('try|gets|down|said')->highlight([],['limit'=>50])->get();
foreach($results as $doc)
{
    echo 'Document: '.$doc->getId();
    foreach($doc->getData() as $field=>$value)
    {
        echo $field.': '.$value;
    }
    foreach($doc->getHighlight() as $field=>$snippets)
    {
        echo "Highlight for ".$field.":\n";
        foreach($snippets as $snippet) 
        {
            echo  $snippet."\n";
        }
    }
}

‹›

Response

+---------------------------------------------------------------------------+
| highlight({limit=50})                                                     |
+---------------------------------------------------------------------------+
|  ... , "It <b>gets</b> infantile pleasure  ...  to knock it <b>down</b>." |
| Don`t <b>try</b> to compete in childishness, <b>said</b> Bliss.           |
|  ...  a small room. Bander <b>said</b>, "Come, half-humans, I ...         |
+---------------------------------------------------------------------------+
3 rows in set (0.00 sec)

{
  "took":2,
  "timed_out":false,
  "hits":
  {
    "total":3,
    "hits":
    [
      {
        "_id":"3",
        "_score":1602,
        "_source":
        {
          "title":"Book three",
          "content":"Trevize whispered, \"It gets infantile pleasure out of display. I`d love to knock it down.\""
        },
        "highlight":
        {
          "title":
          [
            "Book three"
          ],
          "content":
          [
            ", \"It <b>gets</b> infantile pleasure ",
            " to knock it <b>down</b>.\""
          ]
        }
      },
      {
        "_id":"4",
        "_score":1573,
        "_source":
        {
          "title":"Book four",
          "content":"Don`t try to compete in childishness, said Bliss."
        },
        "highlight":
        {
          "title":
          [
            "Book four"
          ],
          "content":
          [
            "Don`t <b>try</b> to compete in childishness, <b>said</b> Bliss."
          ]
        }
      },
      {
        "_id":"2",
        "_score":1521,
        "_source":
        {
          "title":"Book two",
          "content":"A door opened before them, revealing a small room. Bander said, \"Come, half-humans, I want to show you how we live.\""
        },
        "highlight":
        {
          "title":
          [
            "Book two"
          ],
          "content":
          [
            " a small room. Bander <b>said</b>, \"Come, half-humans, I"
          ]
        }
      }
    ]
  }
}

Document: 3
title: Book three
content: Trevize whispered, "It gets infantile pleasure out of display. I`d love to knock it down."
Highlight for title:
- Book four
Highlight for content:
, "It <b>gets</b> infantile pleasure 
to knock it <b>down</b>."

Document: 4
title: Book four
content: Don`t try to compete in childishness, said Bliss.
Highlight for title:
- Book four
Highlight for content:
Don`t <b>try</b> to compete in childishness, <b>said</b> Bliss.

Document: 2
title: Book two
content: A door opened before them, revealing a small room. Bander said, "Come, half-humans, I want to show you how we live.
Highlight for title:
- Book two
Highlight for content:
 a small room. Bander <b>said</b>, \"Come, half-humans, I

HIGHLIGHT() function can be used to highlight search results. Here's the syntax:

HIGHLIGHT([options], [field_list], [query] )

By default, it works with no arguments.

‹›

SQL

📋

SELECT HIGHLIGHT() FROM books WHERE MATCH('before');

‹›

Response

+-----------------------------------------------------------+
| highlight()                                               |
+-----------------------------------------------------------+
| A door opened <b>before</b> them, revealing a small room. |
+-----------------------------------------------------------+
1 row in set (0.00 sec)

HIGHLIGHT() fetches all available full-text fields from document storage and highlights them against the given query. It supports field syntax in queries. Field text is separated by field_separator, which can be changed in the options.

‹›

SQL

📋

SELECT HIGHLIGHT() FROM books WHERE MATCH('@title one');

‹›

Response

+-----------------+
| highlight()     |
+-----------------+
| Book <b>one</b> |
+-----------------+
1 row in set (0.00 sec)

Optional first argument in HIGHLIGHT() is the list of options.

‹›

SQL

📋

SELECT HIGHLIGHT({before_match='[match]',after_match='[/match]'}) FROM books WHERE MATCH('@title one');

‹›

Response

+------------------------------------------------------------+
| highlight({before_match='[match]',after_match='[/match]'}) |
+------------------------------------------------------------+
| Book [match]one[/match]                                    |
+------------------------------------------------------------+
1 row in set (0.00 sec)

Optional second argument is a string containing a field or a comma-separated list of fields. If this argument is present, only the specified fields will be fetched from document storage and highlighted. An empty string as a second argument means "fetch all available fields".

‹›

SQL

📋

SELECT HIGHLIGHT({},'title,content') FROM books WHERE MATCH('one|robots');

‹›

Response

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| highlight({},'title,content')                                                                                                                                                         |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Book <b>one</b> | They followed Bander. The <b>robots</b> remained at a polite distance, but their presence was a constantly felt threat.                                             |
| Bander ushered all three into the room. <b>One</b> of the <b>robots</b> followed as well. Bander gestured the other <b>robots</b> away and entered itself. The door closed behind it. |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Another way to use the second argument is to specify string attribute or field name without quotes. This way the supplied string will be highlighted against the provided query, however, field syntax will be ignored.

‹›

SQL

📋

SELECT HIGHLIGHT({}, title) FROM books WHERE MATCH('one');

‹›

Response

+---------------------+
| highlight({},title) |
+---------------------+
| Book <b>one</b>     |
| Book five           |
+---------------------+
2 rows in set (0.00 sec)

Optional third argument is the query. It is used to highlight search results against a query different than the one used for searching.

‹›

SQL

📋

SELECT HIGHLIGHT({},'title', 'five') FROM books WHERE MATCH('one');

‹›

Response

+-------------------------------+
| highlight({},'title', 'five') |
+-------------------------------+
| Book one                      |
| Book <b>five</b>              |
+-------------------------------+
2 rows in set (0.00 sec)

While HIGHLIGHT() is designed to work with stored full-text fields and string attributes, it can also be used to highlight arbitrary text. Note that if the query has any field search operators (@title hello @body world), the field part of them is ignored in this case.

‹›

SQL

📋

SELECT HIGHLIGHT({},TO_STRING('some text to highlight'), 'highlight') FROM books WHERE MATCH('@title one');

‹›

Response

+----------------------------------------------------------------+
| highlight({},TO_STRING('some text to highlight'), 'highlight') |
+----------------------------------------------------------------+
| some text to <b>highlight</b>                                  |
+----------------------------------------------------------------+
1 row in set (0.00 sec)

Several options make sense only when generating a single string as a result (not an array of snippets). This only applies to SQL's HIGHLIGHT() function:

A string to insert between snippets. Default is ....

A string to insert between fields. Default is |.

Another way to highlight text is to use the CALL SNIPPETS statement. It mostly duplicates HIGHLIGHT() functionality, but it can't use built-in document storage. It can, however, load source text from files.

To highlight full-text search results in JSON queries via HTTP, field contents has to be stored in document storage (enabled by default). In the example full-text fields content and title are fetched from document storage and highlighted against the query specified in query clause.

Highlighted snippets are returned in the highlight property of the hits array.

‹›

HTTP

HTTP

📋

⚙

POST /search
{
  "index": "books",
  "query": { "match": { "content": "and first" } },
  "highlight":
  {
    "fields": ["content"]
  }
}

‹›

Response

{
  "took":1,
  "timed_out":false,
  "hits":
  {
    "total":1,
    "hits":
    [
      {
        "_id":"5",
        "_score":1602,
        "_source":
        {
          "title":"Book five",
          "content":"Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it."
        },
        "highlight":
        {
          "content":
          [
            "Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away <b>and</b> entered itself. The door closed behind it."
          ]
        }
      }
    ]
  }
}

To highlight all possible fields, pass an empty object as highlight propery.

‹›

HTTP

HTTP

📋

⚙

POST /search
{
  "index": "books",
  "query": { "match": { "content": "and first" } },
  "highlight": {}
}

‹›

Response

{
  "took":1,
  "timed_out":false,
  "hits":
  {
    "total":1,
    "hits":
    [
      {
        "_id":"5",
        "_score":1602,
        "_source":
        {
          "title":"Book five",
          "content":"Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away and entered itself. The door closed behind it."
        },
        "highlight":
        {
          "title":
          [
            "Book five"
          ],
          "content":
          [
            "Bander ushered all three into the room. One of the robots followed as well. Bander gestured the other robots away <b>and</b> entered itself. The door closed behind it."
          ]
        }
      }
    ]
  }
}

In addition to common highlighting options, several synonyms are available for JSON queries via HTTP:

fields object contains attribute names with options. It can also be an array of field names (without any options).

encoder can be set to default or html. When set to html, retains html markup when highlighting. Works similar to html_strip_mode=retain option.

highlight_query makes it possible to highlight against a query other than our search query. Syntax is the same as in the main query.

‹›

HTTP

HTTP

📋

⚙

POST /search
{
  "index": "books",
  "query": { "match": { "content": "and first" } },
  "highlight":
  {
    "fields": [ "content", "title" ],
    "highlight_query": { "match": { "*":"into three five" } }
   }
}

pre_tags and post_tags set opening and closing tags for highlighted text snippets. They work similar to before_match and after_match options. Optional, defaults are <b> and </b>.

‹›

HTTP

HTTP

📋

⚙

POST /search
{
  "index": "books",
  "query": { "match": { "content": "and first" } },
  "highlight":
  {
    "fields": [ "content", "title" ],
    "pre_tags": "before_",
    "post_tags": "_after"
   }
}

no_match_size works similar to the allow_empty option. If set to 0, acts as allow_empty=1, i.e. allows empty string to be returned as highlighting result when a snippet could not be generated. Otherwise, the beginning of the field will be returned. Optional, default is 1.

‹›

HTTP

HTTP

📋

⚙

POST /search
{
  "index": "books",
  "query": { "match": { "content": "and first" } },
  "highlight":
  {
    "fields": [ "content", "title" ],
    "no_match_size": 0
  }
}

order sets the sorting order of extracted snippets. If set to "score", sorts the extracted snippets in order of relevance. Optional. Works similar to weight_order option.

‹›

HTTP

HTTP

📋

⚙

POST /search
{
  "index": "books",
  "query": { "match": { "content": "and first" } },
  "highlight":
  {
    "fields": [ "content", "title" ],
    "order": "score"
  }
}

fragment_size sets maximum snippet size in symbols. Can be global or per-field. Per-field options override global options. Optional, default is 256. Works similar to limit option.

‹›

HTTP

HTTP

📋

⚙

POST /search
{
  "index": "books",
  "query": { "match": { "content": "and first" } },
  "highlight":
  {
    "fields": [ "content", "title" ],
    "fragment_size": 100
  }
}

number_of_fragments: Limits the maximum number of snippets in the result. Just as fragment_size, can be global or per-field. Optional, default is 0 (no limit). Works similar to limit_snippets option.

‹›

HTTP

HTTP

📋

⚙

POST /search
{
  "index": "books",
  "query": { "match": { "content": "and first" } },
  "highlight":
  {
    "fields": [ "content", "title" ],
    "number_of_fragments": 10
  }
}

Options such as limit, limit_words, and limit_snippets can be set as global or per-field options. Global options are used as per-field limits unless per-field options override them. In the example the title field is highlighted with default limit settings while the content field uses a different limit.

‹›

HTTP

HTTP

📋

⚙

POST /search
{
  "index": "books",
  "query": { "match": { "content": "and first" } },
      "highlight":
      {
        "fields":
        {
            "title": {},
            "content" : { "limit": 50 }
        }
      }
}

Global limits can also be forced by specifying limits_per_field=0. Setting this option means that all combined highlighting results must be within the specified limits. The downside is that you may get several snippets highlighted in one field and none in another if the highlighting engine decides that they are more relevant.

‹›

HTTP

HTTP

📋

⚙

POST /search
{
  "index": "books",
  "query": { "match": { "content": "and first" } },
      "highlight":
      {
        "fields":
        {
            "limits_per_field": 0,
            "content" : { "limit": 50 }
        }
      }
}

CALL SNIPPETS statement builds a snippet from provided data and query using specified index settings. It can't access built-in document storage, that's why it's recommended to use HIGHLIGHT() function instead.

The syntax is:

CALL SNIPPETS(data, index, query[, opt_value AS opt_name[, ...]])

data is the source data to extract a snippet from. It can be a single string, or the list of the strings enclosed in curly brackets.

index is the name of the index from which to take the text processing settings.

query is the full-text query to build snippets for.

opt_value and opt_name are snippet generation options

‹›

SQL

📋

CALL SNIPPETS(('this is my document text','this is my another text'), 'forum', 'is text', 5 AS around, 200 AS limit);

‹›

Response

+----------------------------------------+
| snippet                                |
+----------------------------------------+
| this <b>is</b> my document <b>text</b> |
| this <b>is</b> my another <b>text</b>  |
+----------------------------------------+
2 rows in set (0.02 sec)

Most options are the same as in the HIGHLIGHT() function. There are, however, several options that can only be used with CALL SNIPPETS. The following options can be used to highlight text stored in separate files:

Whether to handle the first argument as data to extract snippets from (default behavior), or to treat it as file names, and load data from specified files on the server side. Up to dist_threads worker threads per request will be created to parallelize the work when this flag is enabled. Default is 0. To parallelize snippet generation between remote agents, set the dist_threads parameter in the config to the value greater than 1, and then invoke snippets generation in a distributed index, that contains only one(!) local agent and several remotes. The snippets_file_prefix option is used to generate the final file name. E.g. when searchd is configured with snippets_file_prefix = /var/data_ and text.txt is provided as a file name, snippets will be generated from the content of /var/data_text.txt.

Works only with distributed snippets generation with remote agents. Source files for snippet generation can be distributed among different agents and the main server will merge all non-erroneous results. E.g. if one agent of the distributed index has file1.txt, another agent has file2.txt and you use CALL SNIPPETS with both of these files, searchd will merge agent results, so you will get results from both file1.txt and file2.txt. Default is 0.

If load_files options is also enabled, request will return an error if any of the files is not available anywhere. Otherwise (if load_files is not enabled) it will just return empty strings for all absent files. Searchd does not pass this flag to agents, so agents do not generate a critical error if the file does not exist. If you want to be sure that all source files are loaded, set both load_files_scattered and load_files to 1. If the absence of some source files on some agent is not critical, set only load_files_scattered to 1.

‹›

SQL

📋

CALL SNIPPETS(('data/doc1.txt','data/doc2.txt'), 'forum', 'is text', 1 AS load_files);

‹›

Response

+----------------------------------------+
| snippet                                |
+----------------------------------------+
| this <b>is</b> my document <b>text</b> |
| this <b>is</b> my another <b>text</b>  |
+----------------------------------------+
2 rows in set (0.02 sec)

Search options Sorting and ranking

Last modified: July 22, 2020

Filters

WHERE

HTTP

bool query

must

should

must_not

Queries in SQL format

Equality filters

Set filters

Range filters

gte

gt

lte

lt

Geo distance filters

location_anchor

location_source

distance_type

distance

Expressions in search

Arithmetic operators

Comparison operators

Boolean operators

Bitwise operators

Functions:

Expressions in HTTP JSON

script_fields

expressions

Search options

OPTION

agent_query_timeout

boolean_simplify

comment

cutoff

field_weights

global_idf

idf

local_df

index_weights

max_matches

max_query_time

max_predicted_time

ranker

retry_count

retry_delay

reverse_scan

sort_method

rand_seed

low_priority

expand_keywords

token_filter

morphology

Highlighting

Highlighting options

before_match

after_match

limit

limit_words

limit_snippets

limits_per_field

around

use_boundaries

weight_order

force_all_words

start_snippet_id

html_strip_mode

allow_empty

snippet_boundary

emit_zones

force_snippets

Highlighting via SQL

snippet_separator

field_separator

Highlighting via HTTP

fields

encoder

highlight_query

pre_tags and post_tags

no_match_size