Query cache stores compressed result sets in memory and reuses them for subsequent queries when possible. You can configure it using the following directives:
- qcache_max_bytes, a limit on the RAM usage for cached query storage. Defaults to 16 MB. Setting
qcache_max_bytes
to 0 completely disables the query cache. - qcache_thresh_msec, the minimum wall query time to cache. Queries that complete faster than this will not be cached. Defaults to 3000 msec, or 3 seconds.
- qcache_ttl_sec, cached entry TTL, or time to live. Queries will stay cached for this duration. Defaults to 60 seconds, or 1 minute.
These settings can be changed on the fly using the SET GLOBAL
statement:
mysql> SET GLOBAL qcache_max_bytes=128000000;
These changes are applied immediately, and cached result sets that no longer satisfy the constraints are immediately discarded. When reducing the cache size on the fly, MRU (most recently used) result sets win.
Query cache operates as follows. When enabled, every full-text search result is completely stored in memory. This occurs after full-text matching, filtering, and ranking, so essentially we store total_found
{docid,weight} pairs. Compressed matches can consume anywhere from 2 bytes to 12 bytes per match on average, mostly depending on the deltas between subsequent docids. Once the query is complete, we check the wall time and size thresholds, and either save the compressed result set for reuse or discard it.
Note that the query cache's impact on RAM is not limited byqcache_max_bytes
! If you run, for example, 10 concurrent queries, each matching up to 1M matches (after filters), then the peak temporary RAM usage will be in the range of 40 MB to 240 MB, even if the queries are fast enough and don't get cached.
Queries can use cache when the table, full-text query (i.e.,MATCH()
contents), and ranker all match, and filters are compatible. This means:
- The full-text part within
MATCH()
must be a bytewise match. Add a single extra space, and it's now a different query as far as the query cache is concerned. - The ranker (and its parameters, if any, for user-defined rankers) must be a bytewise match.
- The filters must be a superset of the original filters. You can add extra filters and still hit the cache. (In this case, the extra filters will be applied to the cached result.) But if you remove one, that will be a new query again.
Cache entries expire with TTL and also get invalidated on table rotation, or on TRUNCATE
, or on ATTACH
. Note that currently, entries are not invalidated on arbitrary RT table writes! So a cached query might return older results for the duration of its TTL.
You can inspect the current cache status with SHOW STATUS through the qcache_XXX
variables:
mysql> SHOW STATUS LIKE 'qcache%';
+-----------------------+----------+
| Counter | Value |
+-----------------------+----------+
| qcache_max_bytes | 16777216 |
| qcache_thresh_msec | 3000 |
| qcache_ttl_sec | 60 |
| qcache_cached_queries | 0 |
| qcache_used_bytes | 0 |
| qcache_hits | 0 |
+-----------------------+----------+
6 rows in set (0.00 sec)
Collations primarily impact string attribute comparisons. They define both the character set encoding and the strategy Manticore employs for comparing strings when performing ORDER BY
or GROUP BY
with a string attribute involved.
String attributes are stored as-is during indexing, and no character set or language information is attached to them. This is fine as long as Manticore only needs to store and return the strings to the calling application verbatim. However, when you ask Manticore to sort by a string value, the request immediately becomes ambiguous.
First, single-byte (ASCII, ISO-8859-1, or Windows-1251) strings need to be processed differently than UTF-8 strings, which may encode each character with a variable number of bytes. Thus, we need to know the character set type to properly interpret the raw bytes as meaningful characters.
Second, we also need to know the language-specific string sorting rules. For example, when sorting according to US rules in the en_US locale, the accented character ï
(small letter i
with diaeresis) should be placed somewhere after z
. However, when sorting with French rules and the fr_FR locale in mind, it should be placed between i
and j
. Some other set of rules might choose to ignore accents altogether, allowing ï
and i
to be mixed arbitrarily.
Third, in some cases, we may require case-sensitive sorting, while in others, case-insensitive sorting is needed.
Collations encapsulate all of the following: the character set, the language rules, and the case sensitivity. Manticore currently provides four collations:
libc_ci
libc_cs
utf8_general_ci
binary
The first two collations rely on several standard C library (libc) calls and can thus support any locale installed on your system. They provide case-insensitive (_ci
) and case-sensitive (_cs
) comparisons, respectively. By default, they use the C locale, effectively resorting to bytewise comparisons. To change that, you need to specify a different available locale using the collation_libc_locale directive. The list of locales available on your system can usually be obtained with the locale
command:
$ locale -a
C
en_AG
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_NG
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZW.utf8
es_ES
fr_FR
POSIX
ru_RU.utf8
ru_UA.utf8
The specific list of system locales may vary. Consult your OS documentation to install additional needed locales.
utf8_general_ci
and binary
locales are built-in into Manticore. The first one is a generic collation for UTF-8 data (without any so-called language tailoring); it should behave similarly to the utf8_general_ci
collation in MySQL. The second one is a simple bytewise comparison.
Collation can be overridden via SQL on a per-session basis using the SET collation_connection
statement. All subsequent SQL queries will use this collation. Otherwise, all queries will use the server default collation or as specified in the collation_server configuration directive. Manticore currently defaults to the libc_ci
collation.
Collations affect all string attribute comparisons, including those within ORDER BY
and GROUP BY
, so differently ordered or grouped results can be returned depending on the collation chosen. Note that collations don't affect full-text searching; for that, use the charset_table.
When Manticore executes a fullscan query, it can either use a plain scan to check every document against the filters or employ additional data and/or algorithms to speed up query execution. Manticore uses a cost-based optimizer (CBO), also known as a "query optimizer" to determine which approach to take.
The CBO can also enhance the performance of full-text queries. See below for more details.
The CBO may decide to replace one or more query filters with one of the following entities if it determines that doing so will improve performance:
- A docid index utilizes a special docid-only secondary index stored in files with the
.spt
extension. Besides improving filters on document IDs, the docid index is also used to accelerate document ID to row ID lookups and to speed up the application of large killlists during daemon startup. - A columnar scan relies on columnar storage and can only be used on a columnar attribute. It scans every value and tests it against the filter, but it is heavily optimized and is typically faster than the default approach.
- Secondary indexes are generated for all attributes (except JSON) by default. They use the PGM index along with Manticore's built-in inverted index to retrieve the list of row IDs corresponding to a value or range of values. Secondary indexes are stored in files with the
.spidx
and.spjidx
extensions. For information on how to generate secondary indexes over JSON attributes, see json_secondary_indexes.
The optimizer estimates the cost of each execution path using various attribute statistics, including:
- Information on the data distribution within an attribute (histograms, stored in
.sphi
files). Histograms are generated automatically when data is indexed and serve as the primary source of information for the CBO. - Information from PGM (secondary indexes), which helps estimate the number of document lists to read. This assists in gauging doclist merge performance and in selecting the appropriate merge algorithm (priority queue merge or bitmap merge).
- Columnar encoding statistics, employed to estimate columnar data decompression performance.
- A columnar min-max tree. While the CBO uses histograms to estimate the number of documents left after applying the filter, it also needs to determine how many documents the filter had to process. For columnar attributes, partial evaluation of the min-max tree serves this purpose.
- Full-text dictionary. The CBO utilizes term stats to estimate the cost of evaluating the full-text tree.
The optimizer computes the execution cost for every filter used in a query. Since certain filters can be replaced with several different entities (e.g., for a document id, Manticore can use a plain scan, a docid index lookup, a columnar scan (if the document id is columnar), and a secondary index), the optimizer evaluates all available combinations. However, there is a maximum limit of 1024 combinations.
To estimate query execution costs, the optimizer calculates the estimated costs of the most significant operations performed when executing the query. It uses preset constants to represent the cost of each operation.
The optimizer compares the costs of each execution path and chooses the path with the lowest cost to execute the query.
When working with full-text queries that have filters by attributes, the query optimizer decides between two possible execution paths. One is to execute the full-text query, retrieve the matches, and use filters. The other is to replace filters with one or more entities described above, fetch rowids from them, and inject them into the full-text matching tree. This way, full-text search results will intersect with full-scan results. The query optimizer estimates the cost of full-text tree evaluation and the best possible path for computing filter results. Using this information, the optimizer chooses the execution path.
Another factor to consider is multithreaded query execution (when pseudo_sharding
is enabled). The CBO is aware that some queries can be executed in multiple threads and takes this into account. The CBO prioritizes shorter query execution times (i.e., latency) over throughput. For instance, if a query using a columnar scan can be executed in multiple threads (and occupy multiple CPU cores) and is faster than a query executed in a single thread using secondary indexes, multithreaded execution will be preferred.
Queries using secondary indexes and docid indexes always run in a single thread, as benchmarks indicate that there is little to no benefit in making them multithreaded.
At present, the optimizer only uses CPU costs and does not take memory or disk usage into account.
Manticore Search supports the ability to add embeddings generated by Machine Learning models to each document, and then doing a nearest-neighbor search on them. This lets you build features like similarity search, recommendations, semantic search, and relevance ranking based on NLP algorithms, among others, including image, video, and sound searches.
An embedding is a method of representing data - such as text, images, or sound - as vectors in a high-dimensional space. These vectors are crafted to ensure that the distance between them reflects the similarity of the data they represent. This process typically employs algorithms like word embeddings (e.g., Word2Vec, BERT) for text or neural networks for images. The high-dimensional nature of the vector space, with many components per vector, allows for the representation of complex and nuanced relationships between items. Their similarity is gauged by the distance between these vectors, often measured using methods like Euclidean distance or cosine similarity.
Manticore Search enables k-nearest neighbor (KNN) vector searches using the HNSW library. This functionality is part of the Manticore Columnar Library.
To run KNN searches, you must first configure your table. Float vectors and KNN search are only supported in real-time tables (not in plain tables). The table needs to have at least one float_vector attribute, which serves as a data vector. You need to specify the following properties:
-
knn_type
: A mandatory setting; currently, onlyhnsw
is supported. -
knn_dims
: A mandatory setting that specifies the dimensions of the vectors being indexed. -
hnsw_similarity
: A mandatory setting that specifies the distance function used by the HNSW index. Acceptable values are:L2
- Squared L2IP
- Inner productCOSINE
- Cosine similarity
Note: When using
COSINE
similarity, vectors are automatically normalized upon insertion. This means the stored vector values may differ from the original input values, as they will be converted to unit vectors (vectors with a mathematical length/magnitude of 1.0) to enable efficient cosine similarity calculations. This normalization preserves the direction of the vector while standardizing its length. -
hnsw_m
: An optional setting that defines the maximum number of outgoing connections in the graph. The default is 16. -
hnsw_ef_construction
: An optional setting that defines a construction time/accuracy trade-off.
- SQL
- Config
create table test ( title text, image_vector float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' );
table test_vec {
type = rt
...
rt_attr_float_vector = image_vector
knn = {"attrs":[{"name":"image_vector","type":"hnsw","dims":4,"hnsw_similarity":"L2","hnsw_m":16,"hnsw_ef_construction":200}]}
}
Query OK, 0 rows affected (0.01 sec)
The easiest way to work with vector data is using auto embeddings. With this feature, you create a table with MODEL_NAME
and FROM
parameters, then simply insert your text data - Manticore automatically generates embeddings for you.
When creating a table for auto embeddings, specify:
MODEL_NAME
: The embedding model to useFROM
: Which fields to use for embedding generation (empty means all text/string fields)
Supported embedding models:
- Sentence Transformers: Any suitable BERT-based Hugging Face model (e.g.,
sentence-transformers/all-MiniLM-L6-v2
) — no API key needed. Manticore downloads the model when you create the table. - OpenAI: OpenAI embedding models like
openai/text-embedding-ada-002
- requiresAPI_KEY='<OPENAI_API_KEY>'
parameter - Voyage: Voyage AI embedding models - requires
API_KEY='<VOYAGE_API_KEY>'
parameter - Jina: Jina AI embedding models - requires
API_KEY='<JINA_API_KEY>'
parameter
More information about setting up a float_vector
attribute can be found here.
- SQL
Using sentence-transformers (no API key needed)
CREATE TABLE products (
title TEXT,
description TEXT,
embedding_vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='title'
);
Using OpenAI (requires API_KEY parameter)
CREATE TABLE products_openai (
title TEXT,
description TEXT,
embedding_vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
MODEL_NAME='openai/text-embedding-ada-002' FROM='title,description' API_KEY='...'
);
Using all text fields for embeddings (FROM is empty)
CREATE TABLE products_all (
title TEXT,
description TEXT,
embedding_vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM=''
);
When using auto embeddings, do not specify the vector field in your INSERT statement. The embeddings are generated automatically from the text fields specified in the FROM
parameter.
- SQL
Insert text data only - embeddings generated automatically
INSERT INTO products (title) VALUES
('machine learning artificial intelligence'),
('banana fruit sweet yellow');
Insert multiple fields - both used for embedding if FROM='title,description'
INSERT INTO products_openai (title, description) VALUES
('smartphone', 'latest mobile device with advanced features'),
('laptop', 'portable computer for work and gaming');
Insert empty vector (document excluded from vector search)
INSERT INTO products (title, embedding_vector) VALUES
('no embedding item', ());
Search works the same way - provide your query text and Manticore will generate embeddings and find similar documents:
- SQL
- JSON
SELECT id, knn_dist() FROM products WHERE knn(embedding_vector, 3, 'machine learning');
Using text query with auto-embeddings
POST /search
{
"table": "products",
"knn": {
"field": "embedding_vector",
"query": "machine learning",
"k": 3
}
}
Using vector query directly
POST /search
{
"table": "products",
"knn": {
"field": "embedding_vector",
"query": [0.1, 0.2, 0.3, 0.4],
"k": 3
}
}
+------+------------+
| id | knn_dist() |
+------+------------+
| 1 | 0.12345678 |
| 2 | 0.87654321 |
+------+------------+
2 rows in set (0.00 sec)
{
"took": 0,
"timed_out": false,
"hits": {
"total": 2,
"total_relation": "eq",
"hits": [
{
"_id": 1,
"_score": 1,
"_knn_dist": 0.12345678,
"_source": {
"title": "machine learning artificial intelligence"
}
},
{
"_id": 2,
"_score": 1,
"_knn_dist": 0.87654321,
"_source": {
"title": "banana fruit sweet yellow"
}
}
]
}
}
Alternatively, you can manually insert pre-computed vector data, ensuring it matches the dimensions you specified when creating the table. You can also insert an empty vector; this means that the document will be excluded from vector search results.
Important: When using hnsw_similarity='cosine'
, vectors are automatically normalized upon insertion to unit vectors (vectors with a mathematical length/magnitude of 1.0). This normalization preserves the direction of the vector while standardizing its length, which is required for efficient cosine similarity calculations. This means the stored values will differ from your original input values.
- SQL
- JSON
insert into test values ( 1, 'yellow bag', (0.653448,0.192478,0.017971,0.339821) ), ( 2, 'white bag', (-0.148894,0.748278,0.091892,-0.095406) );
POST /insert
{
"table":"test_vec",
"id":1,
"doc": { "title" : "yellow bag", "image_vector" : [0.653448,0.192478,0.017971,0.339821] }
}
POST /insert
{
"table":"test_vec",
"id":2,
"doc": { "title" : "white bag", "image_vector" : [-0.148894,0.748278,0.091892,-0.095406] }
}
Query OK, 2 rows affected (0.00 sec)
{
"table":"test",
"_id":1,
"created":true,
"result":"created",
"status":201
}
{
"table":"test",
"_id":2,
"created":true,
"result":"created",
"status":201
}
Now, you can perform a KNN search using the knn
clause in either SQL or JSON format. Both interfaces support the same essential parameters, ensuring a consistent experience regardless of the format you choose:
- SQL:
select ... from <table name> where knn ( <field>, <k>, <query vector> [,<options>] )
- JSON:
POST /search { "table": "<table name>", "knn": { "field": "<field>", "query": "<text or vector>", "k": <k>, "ef": <ef>, "rescore": <rescore>, "oversampling": <oversampling> } }
The parameters are:
field
: This is the name of the float vector attribute containing vector data.k
: This represents the number of documents to return and is a key parameter for Hierarchical Navigable Small World (HNSW) indexes. It specifies the quantity of documents that a single HNSW index should return. However, the actual number of documents included in the final results may vary. For instance, if the system is dealing with real-time tables divided into disk chunks, each chunk could returnk
documents, leading to a total that exceeds the specifiedk
(as the cumulative count would benum_chunks * k
). On the other hand, the final document count might be less thank
if, after requestingk
documents, some are filtered out based on specific attributes. It's important to note that the parameterk
does not apply to ramchunks. In the context of ramchunks, the retrieval process operates differently, and thus, thek
parameter's effect on the number of documents returned is not applicable.query
: (Recommended parameter) The search query, which can be either:- Text string: Automatically converted to embeddings if the field has auto-embeddings configured. Will return an error if the field doesn't have auto-embeddings.
- Vector array: Works the same as
query_vector
.
query_vector
: (Legacy parameter) The search vector as an array of numbers. Still supported for backward compatibility. Note: Use eitherquery
orquery_vector
, not both in the same request.ef
: optional size of the dynamic list used during the search. A higheref
leads to more accurate but slower search.rescore
: Enables KNN rescoring (disabled by default). Set to1
in SQL ortrue
in JSON to enable rescoring. After the KNN search is completed using quantized vectors (with possible oversampling), distances are recalculated with the original (full-precision) vectors and results are re-sorted to improve ranking accuracy.oversampling
: Sets a factor (float value) by whichk
is multiplied when executing the KNN search, causing more candidates to be retrieved than needed using quantized vectors. No oversampling is applied by default. These candidates can be re-evaluated later if rescoring is enabled. Oversampling also works with non-quantized vectors. Since it increasesk
, which affects how the HNSW index works, it may cause a small change in result accuracy.
Documents are always sorted by their distance to the search vector. Any additional sorting criteria you specify will be applied after this primary sort condition. For retrieving the distance, there is a built-in function called knn_dist().
- SQL
- JSON
select id, knn_dist() from test where knn ( image_vector, 5, (0.286569,-0.031816,0.066684,0.032926), { ef=2000, oversampling=3.0, rescore=1 } );
POST /search
{
"table": "test",
"knn":
{
"field": "image_vector",
"query": [0.286569,-0.031816,0.066684,0.032926],
"k": 5,
"ef": 2000,
"rescore": true,
"oversampling": 3.0
}
}
+------+------------+
| id | knn_dist() |
+------+------------+
| 1 | 0.28146550 |
| 2 | 0.81527930 |
+------+------------+
2 rows in set (0.00 sec)
{
"took":0,
"timed_out":false,
"hits":
{
"total":2,
"total_relation":"eq",
"hits":
[
{
"_id": 1,
"_score":1,
"_knn_dist":0.28146550,
"_source":
{
"title":"yellow bag",
"image_vector":[0.653448,0.192478,0.017971,0.339821]
}
},
{
"_id": 2,
"_score":1,
"_knn_dist":0.81527930,
"_source":
{
"title":"white bag",
"image_vector":[-0.148894,0.748278,0.091892,-0.095406]
}
}
]
}
}
HNSW indexes need to be fully loaded into memory to perform KNN search, which can lead to significant memory consumption. To reduce memory usage, scalar quantization can be applied - a technique that compresses high-dimensional vectors by representing each component (dimension) with a limited number of discrete values. Manticore supports 8-bit and 1-bit quantization, meaning each vector component is compressed from a 32-bit float to 8 bits or even 1 bit, reducing memory usage by 4x or 32x, respectively. These compressed representations also allow for faster distance calculations, as more vector components can be processed in a single SIMD instruction. Although scalar quantization introduces some approximation error, it is often a worthwhile trade-off between search accuracy and resource efficiency. For even better accuracy, quantization can be combined with rescoring and oversampling: more candidates are retrieved than requested, and distances for these candidates are recalculated using the original 32-bit float vectors.
Supported quantization types include:
8bit
: Each vector component is quantized to 8 bits.1bit
: Each vector component is quantized to 1 bit. Asymmetric quantization is used, with query vectors quantized to 4 bits and stored vectors to 1 bit. This approach offers greater precision than simpler methods, though with some performance trade-off.1bitsimple
: Each vector component is quantized to 1 bit. This method is faster than1bit
, but typically less accurate.
- SQL
create table test ( title text, image_vector float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' quantization='1bit');
Query OK, 0 rows affected (0.01 sec)
NOTE: Finding similar documents by id requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.
Finding documents similar to a specific one based on its unique ID is a common task. For instance, when a user views a particular item, Manticore Search can efficiently identify and display a list of items that are most similar to it in the vector space. Here's how you can do it:
- SQL:
select ... from <table name> where knn ( <field>, <k>, <document id> )
- JSON:
POST /search { "table": "<table name>", "knn": { "field": "<field>", "doc_id": <document id>, "k": <k> } }
The parameters are:
field
: This is the name of the float vector attribute containing vector data.k
: This represents the number of documents to return and is a key parameter for Hierarchical Navigable Small World (HNSW) indexes. It specifies the quantity of documents that a single HNSW index should return. However, the actual number of documents included in the final results may vary. For instance, if the system is dealing with real-time tables divided into disk chunks, each chunk could returnk
documents, leading to a total that exceeds the specifiedk
(as the cumulative count would benum_chunks * k
). On the other hand, the final document count might be less thank
if, after requestingk
documents, some are filtered out based on specific attributes. It's important to note that the parameterk
does not apply to ramchunks. In the context of ramchunks, the retrieval process operates differently, and thus, thek
parameter's effect on the number of documents returned is not applicable.document id
: Document ID for KNN similarity search.
- SQL
- JSON
select id, knn_dist() from test where knn ( image_vector, 5, 1 );
POST /search
{
"table": "test",
"knn":
{
"field": "image_vector",
"doc_id": 1,
"k": 5
}
}
+------+------------+
| id | knn_dist() |
+------+------------+
| 2 | 0.81527930 |
+------+------------+
1 row in set (0.00 sec)
{
"took":0,
"timed_out":false,
"hits":
{
"total":1,
"total_relation":"eq",
"hits":
[
{
"_id": 2,
"_score":1643,
"_knn_dist":0.81527930,
"_source":
{
"title":"white bag",
"image_vector":[-0.148894,0.748278,0.091892,-0.095406]
}
}
]
}
}
Manticore also supports additional filtering of documents returned by the KNN search, either by full-text matching, attribute filters, or both.
- SQL
- JSON
select id, knn_dist() from test where knn ( image_vector, 5, (0.286569,-0.031816,0.066684,0.032926) ) and match('white') and id < 10;
POST /search
{
"table": "test",
"knn":
{
"field": "image_vector",
"query": [0.286569,-0.031816,0.066684,0.032926],
"k": 5,
"filter":
{
"bool":
{
"must":
[
{ "match": {"_all":"white"} },
{ "range": { "id": { "lt": 10 } } }
]
}
}
}
}
+------+------------+
| id | knn_dist() |
+------+------------+
| 2 | 0.81527930 |
+------+------------+
1 row in set (0.00 sec)
{
"took":0,
"timed_out":false,
"hits":
{
"total":1,
"total_relation":"eq",
"hits":
[
{
"_id": 2,
"_score":1643,
"_knn_dist":0.81527930,
"_source":
{
"title":"white bag",
"image_vector":[-0.148894,0.748278,0.091892,-0.095406]
}
}
]
}
}