K-nearest neighbor vector search

Manticore Search supports the ability to add embeddings generated by Machine Learning models to each document, and then doing a nearest-neighbor search on them. This lets you build features like similarity search, recommendations, semantic search, and relevance ranking based on NLP algorithms, among others, including image, video, and sound searches.

What is an embedding?

An embedding is a method of representing data - such as text, images, or sound - as vectors in a high-dimensional space. These vectors are crafted to ensure that the distance between them reflects the similarity of the data they represent. This process typically employs algorithms like word embeddings (e.g., Word2Vec, BERT) for text or neural networks for images. The high-dimensional nature of the vector space, with many components per vector, allows for the representation of complex and nuanced relationships between items. Their similarity is gauged by the distance between these vectors, often measured using methods like Euclidean distance or cosine similarity.

Manticore Search enables k-nearest neighbor (KNN) vector searches using the HNSW library. This functionality is part of the Manticore Columnar Library.

Configuring a table for KNN search

To run KNN searches, you must first configure your table. Float vectors and KNN search are only supported in real-time tables (not in plain tables). The table needs to have at least one float_vector attribute, which serves as a data vector. You need to specify the following properties:

  • knn_type: A mandatory setting; currently, only hnsw is supported.
  • knn_dims: A mandatory setting that specifies the dimensions of the vectors being indexed.
  • hnsw_similarity: A mandatory setting that specifies the distance function used by the HNSW index. Acceptable values are:
    • L2 - Squared L2
    • IP - Inner product
    • COSINE - Cosine similarity
  • hnsw_m: An optional setting that defines the maximum number of outgoing connections in the graph. The default is 16.
  • hnsw_ef_construction: An optional setting that defines a construction time/accuracy trade-off.
‹›
  • SQL
  • Config
📋
create table test ( title text, image_vector float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' );
‹›
Response
Query OK, 0 rows affected (0.01 sec)

Inserting vector data

Auto Embeddings (Recommended)

The easiest way to work with vector data is using auto embeddings. With this feature, you create a table with MODEL_NAME and FROM parameters, then simply insert your text data - Manticore automatically generates embeddings for you.

Creating a table with auto embeddings

When creating a table for auto embeddings, specify:

  • MODEL_NAME: The embedding model to use
  • FROM: Which fields to use for embedding generation (empty means all text/string fields)

Supported embedding models:

  • Sentence Transformers: Any suitable BERT-based Hugging Face model (e.g., sentence-transformers/all-MiniLM-L6-v2) — no API key needed. Manticore downloads the model when you create the table.
  • OpenAI: OpenAI embedding models like openai/text-embedding-ada-002 - requires API_KEY='<OPENAI_API_KEY>' parameter
  • Voyage: Voyage AI embedding models - requires API_KEY='<VOYAGE_API_KEY>' parameter
  • Jina: Jina AI embedding models - requires API_KEY='<JINA_API_KEY>' parameter

More information about setting up a float_vector attribute can be found here.

‹›
  • SQL
SQL
📋

Using sentence-transformers (no API key needed)

CREATE TABLE products (
    title TEXT, 
    description TEXT,
    embedding_vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' 
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM='title'
);

Using OpenAI (requires API_KEY parameter)

CREATE TABLE products_openai (
    title TEXT,
    description TEXT, 
    embedding_vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='openai/text-embedding-ada-002' FROM='title,description' API_KEY='...'
);

Using all text fields for embeddings (FROM is empty)

CREATE TABLE products_all (
    title TEXT,
    description TEXT,
    embedding_vector FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2' FROM=''
);
Inserting data with auto embeddings

When using auto embeddings, do not specify the vector field in your INSERT statement. The embeddings are generated automatically from the text fields specified in the FROM parameter.

‹›
  • SQL
SQL
📋

Insert text data only - embeddings generated automatically

INSERT INTO products (title) VALUES 
('machine learning artificial intelligence'),
('banana fruit sweet yellow');

Insert multiple fields - both used for embedding if FROM='title,description'

INSERT INTO products_openai (title, description) VALUES
('smartphone', 'latest mobile device with advanced features'),
('laptop', 'portable computer for work and gaming');

Insert empty vector (document excluded from vector search)

INSERT INTO products (title, embedding_vector) VALUES 
('no embedding item', ());
Searching with auto embeddings

Search works the same way - provide your query text and Manticore will generate embeddings and find similar documents:

‹›
  • SQL
  • JSON
📋
SELECT id, knn_dist() FROM products WHERE knn(embedding_vector, 3, 'machine learning');
‹›
Response
+------+------------+
| id   | knn_dist() |
+------+------------+
|    1 | 0.12345678 |
|    2 | 0.87654321 |
+------+------------+
2 rows in set (0.00 sec)

Manual Vector Insertion

Alternatively, you can manually insert pre-computed vector data, ensuring it matches the dimensions you specified when creating the table. You can also insert an empty vector; this means that the document will be excluded from vector search results.

‹›
  • SQL
  • JSON
📋
insert into test values ( 1, 'yellow bag', (0.653448,0.192478,0.017971,0.339821) ), ( 2, 'white bag', (-0.148894,0.748278,0.091892,-0.095406) );
‹›
Response
Query OK, 2 rows affected (0.00 sec)

KNN vector search

Now, you can perform a KNN search using the knn clause in either SQL or JSON format. Both interfaces support the same essential parameters, ensuring a consistent experience regardless of the format you choose:

  • SQL: select ... from <table name> where knn ( <field>, <k>, <query vector> [,<options>] )
  • JSON:
    POST /search
    {
        "table": "<table name>",
        "knn":
        {
            "field": "<field>",
            "query": "<text or vector>",
            "k": <k>,
            "ef": <ef>,
            "rescore": <rescore>,
            "oversampling": <oversampling>
        }
    }

The parameters are:

  • field: This is the name of the float vector attribute containing vector data.
  • k: This represents the number of documents to return and is a key parameter for Hierarchical Navigable Small World (HNSW) indexes. It specifies the quantity of documents that a single HNSW index should return. However, the actual number of documents included in the final results may vary. For instance, if the system is dealing with real-time tables divided into disk chunks, each chunk could return k documents, leading to a total that exceeds the specified k (as the cumulative count would be num_chunks * k). On the other hand, the final document count might be less than k if, after requesting k documents, some are filtered out based on specific attributes. It's important to note that the parameter k does not apply to ramchunks. In the context of ramchunks, the retrieval process operates differently, and thus, the k parameter's effect on the number of documents returned is not applicable.
  • query: (Recommended parameter) The search query, which can be either:
    • Text string: Automatically converted to embeddings if the field has auto-embeddings configured. Will return an error if the field doesn't have auto-embeddings.
    • Vector array: Works the same as query_vector.
  • query_vector: (Legacy parameter) The search vector as an array of numbers. Still supported for backward compatibility. Note: Use either query or query_vector, not both in the same request.
  • ef: optional size of the dynamic list used during the search. A higher ef leads to more accurate but slower search.
  • rescore: Enables KNN rescoring (disabled by default). Set to 1 in SQL or true in JSON to enable rescoring. After the KNN search is completed using quantized vectors (with possible oversampling), distances are recalculated with the original (full-precision) vectors and results are re-sorted to improve ranking accuracy.
  • oversampling: Sets a factor (float value) by which k is multiplied when executing the KNN search, causing more candidates to be retrieved than needed using quantized vectors. No oversampling is applied by default. These candidates can be re-evaluated later if rescoring is enabled. Oversampling also works with non-quantized vectors. Since it increases k, which affects how the HNSW index works, it may cause a small change in result accuracy.

Documents are always sorted by their distance to the search vector. Any additional sorting criteria you specify will be applied after this primary sort condition. For retrieving the distance, there is a built-in function called knn_dist().

‹›
  • SQL
  • JSON
📋
select id, knn_dist() from test where knn ( image_vector, 5, (0.286569,-0.031816,0.066684,0.032926), { ef=2000, oversampling=3.0, rescore=1 } );
‹›
Response
+------+------------+
| id   | knn_dist() |
+------+------------+
|    1 | 0.28146550 |
|    2 | 0.81527930 |
+------+------------+
2 rows in set (0.00 sec)

Vector quantization

HNSW indexes need to be fully loaded into memory to perform KNN search, which can lead to significant memory consumption. To reduce memory usage, scalar quantization can be applied - a technique that compresses high-dimensional vectors by representing each component (dimension) with a limited number of discrete values. Manticore supports 8-bit and 1-bit quantization, meaning each vector component is compressed from a 32-bit float to 8 bits or even 1 bit, reducing memory usage by 4x or 32x, respectively. These compressed representations also allow for faster distance calculations, as more vector components can be processed in a single SIMD instruction. Although scalar quantization introduces some approximation error, it is often a worthwhile trade-off between search accuracy and resource efficiency. For even better accuracy, quantization can be combined with rescoring and oversampling: more candidates are retrieved than requested, and distances for these candidates are recalculated using the original 32-bit float vectors.

Supported quantization types include:

  • 8bit: Each vector component is quantized to 8 bits.
  • 1bit: Each vector component is quantized to 1 bit. Asymmetric quantization is used, with query vectors quantized to 4 bits and stored vectors to 1 bit. This approach offers greater precision than simpler methods, though with some performance trade-off.
  • 1bitsimple: Each vector component is quantized to 1 bit. This method is faster than 1bit, but typically less accurate.
‹›
  • SQL
SQL
📋
create table test ( title text, image_vector float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' quantization='1bit');
‹›
Response
Query OK, 0 rows affected (0.01 sec)

Find similar docs by id

NOTE: Finding similar documents by id requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.

Finding documents similar to a specific one based on its unique ID is a common task. For instance, when a user views a particular item, Manticore Search can efficiently identify and display a list of items that are most similar to it in the vector space. Here's how you can do it:

  • SQL: select ... from <table name> where knn ( <field>, <k>, <document id> )
  • JSON:
    POST /search
    {
        "table": "<table name>",
        "knn":
        {
            "field": "<field>",
            "doc_id": <document id>,
            "k": <k>
        }
    }

The parameters are:

  • field: This is the name of the float vector attribute containing vector data.
  • k: This represents the number of documents to return and is a key parameter for Hierarchical Navigable Small World (HNSW) indexes. It specifies the quantity of documents that a single HNSW index should return. However, the actual number of documents included in the final results may vary. For instance, if the system is dealing with real-time tables divided into disk chunks, each chunk could return k documents, leading to a total that exceeds the specified k (as the cumulative count would be num_chunks * k). On the other hand, the final document count might be less than k if, after requesting k documents, some are filtered out based on specific attributes. It's important to note that the parameter k does not apply to ramchunks. In the context of ramchunks, the retrieval process operates differently, and thus, the k parameter's effect on the number of documents returned is not applicable.
  • document id: Document ID for KNN similarity search.
‹›
  • SQL
  • JSON
📋
select id, knn_dist() from test where knn ( image_vector, 5, 1 );
‹›
Response
+------+------------+
| id   | knn_dist() |
+------+------------+
|    2 | 0.81527930 |
+------+------------+
1 row in set (0.00 sec)

Filtering KNN vector search results

Manticore also supports additional filtering of documents returned by the KNN search, either by full-text matching, attribute filters, or both.

‹›
  • SQL
  • JSON
📋
select id, knn_dist() from test where knn ( image_vector, 5, (0.286569,-0.031816,0.066684,0.032926) ) and match('white') and id < 10;
‹›
Response
+------+------------+
| id   | knn_dist() |
+------+------------+
|    2 | 0.81527930 |
+------+------------+
1 row in set (0.00 sec)