Conversational search

Conversational search lets Manticore Buddy answer questions over an existing vectorized table. Buddy retrieves the most relevant rows with KNN search, turns those rows into context, and sends the context plus the conversation history to an LLM.

It is managed from SQL with:

  • CREATE CHAT MODEL
  • SHOW CHAT MODELS
  • DESCRIBE CHAT MODEL
  • DROP CHAT MODEL
  • CALL CHAT

Before you start

You need a vectorized table and an LLM provider. The table requirements are covered below. Provider credentials can be set in CREATE CHAT MODEL with api_key, or supplied through the matching environment variable, such as OPENAI_API_KEY.

How it works

When CALL CHAT runs, Buddy builds a retrieval-augmented answer in this order:

  1. Buddy loads the chat model.
  2. Buddy loads the conversation history for the supplied conversation UUID. If no UUID is supplied, it creates one.
  3. Buddy inspects the target table and chooses a FLOAT_VECTOR field.
  4. The LLM decides how to handle the message: search again, answer from the previous search context, or answer without retrieval.
  5. Buddy runs KNN search with the selected vector field when retrieval is needed.
  6. Buddy builds the LLM context from the vector field's from='...' source fields.
  7. The configured LLM generates the answer.
  8. Buddy saves the user message and the assistant reply in the conversation history.

The fifth argument of CALL CHAT is called fields internally, but for conversational search it means the vector field used by knn(...). It is not a list of fields to return. Buddy selects rows with SELECT *, then removes vector columns from the sources payload so the response does not include large embedding values.

Table requirements

The table must have at least one FLOAT_VECTOR field configured for auto embeddings. The vector field must include from='...', because Buddy uses those source fields as LLM context.

The examples below use onnx-models/all-MiniLM-L12-v2-onnx, which runs through the recommended ONNX path and does not require an embedding API key.

‹›
  • SQL
SQL
📋
CREATE TABLE docs (
    id BIGINT,
    title TEXT,
    content TEXT,
    embedding FLOAT_VECTOR
        knn_type='hnsw'
        hnsw_similarity='cosine'
        model_name='onnx-models/all-MiniLM-L12-v2-onnx'
        from='title,content'
) TYPE='rt';
INSERT INTO docs(id, title, content) VALUES
    (1, 'Vector search', 'Vector search compares embeddings to find semantically similar documents.'),
    (2, 'Full-text search', 'Full-text search matches terms and phrases in indexed text.');

If CALL CHAT does not specify a vector field, Buddy uses the first FLOAT_VECTOR field found in the table definition.

Creating a chat model

Use CREATE CHAT MODEL to store the LLM provider, model id, and retrieval settings.

‹›
  • SQL
SQL
📋
CREATE CHAT MODEL assistant (
    model='openai:gpt-4o-mini'
);

You can also set provider options and retrieval limits:

‹›
  • SQL
SQL
📋
CREATE CHAT MODEL support_assistant (
    model='openai:gpt-4o-mini',
    api_key='your-provider-api-key',
    base_url='http://host.docker.internal:8787/v1',
    timeout=60,
    retrieval_limit=5,
    max_document_length=3000
);

Common options:

Option Required Description
model Yes LLM model id in provider:model format.
description No Stored description.
api_key No Provider API key passed to the llm extension.
base_url No Provider or proxy base URL.
timeout No LLM request timeout, 1..65536.
retrieval_limit No Number of documents requested from KNN, 1..50; default is 5.
max_document_length No Per-document context limit. 0 disables truncation; 100..65536 truncates; default is 2000.

Chat model names may contain only letters, numbers, and underscores.

The model option must use provider:model format:

model='openai:gpt-4o-mini'

Provider api_key is optional if the provider key is already available in Buddy's environment. For example, a Docker Compose service can pass provider keys like this:

environment:
  - OPENAI_API_KEY=${OPENAI_API_KEY}
  - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}

If api_key is not set in CREATE CHAT MODEL, the llm extension can use the matching provider environment variable. Set api_key in the chat model only when you need this model to use a different key.

CALL CHAT syntax

CALL CHAT(
    'query',
    'table',
    'model_name',
    'conversation_uuid',
    'vector_field'
);

Arguments are positional only:

Position Argument Required Description
1 query Yes User question.
2 table Yes Table to search.
3 model_name Yes Chat model name.
4 conversation_uuid No Existing conversation id, or an empty string.
5 fields / vector field No FLOAT_VECTOR field used in knn(...).

The table argument must be a plain table identifier, optionally qualified as database.table. The vector field argument must be a plain field identifier.

Asking questions

Use CALL CHAT with a query, a table, and a chat model.

‹›
  • SQL
SQL
📋
CALL CHAT(
    'What is vector search?',
    'docs',
    'assistant'
);

To continue a conversation, pass the same conversation UUID:

‹›
  • SQL
SQL
📋
CALL CHAT(
    'Can you explain it with an example?',
    'docs',
    'assistant',
    'docs-chat-001'
);

To search a specific vector field, pass it as the fifth argument:

‹›
  • SQL
SQL
📋
CALL CHAT(
    'Find documents where the title is about vector search',
    'docs',
    'assistant',
    '',
    'title_embedding'
);

When the fifth argument is present, Buddy checks that the field exists and is a FLOAT_VECTOR. If the argument is omitted, Buddy detects the first FLOAT_VECTOR field from SHOW CREATE TABLE.

Search and context details

When Buddy needs retrieval, it runs KNN search on the selected vector field and returns up to retrieval_limit rows. The default distance threshold is 0.8.

Buddy uses the retrieved rows as LLM context. The same rows are returned in sources, with knn_dist included and FLOAT_VECTOR columns removed.

max_document_length limits how much text from each source row can be sent to the LLM. Use 0 to disable truncation; otherwise use a value from 100 to 65536.

Response

CALL CHAT returns one row:

Column Description
conversation_uuid Existing or generated conversation id.
user_query Original user query.
search_query Standalone search query used for retrieval.
response LLM answer.
sources JSON string containing retrieved source rows.

Example response shape:

{
  "conversation_uuid": "docs-chat-001",
  "user_query": "What is vector search?",
  "search_query": "vector search, embeddings, similarity search",
  "response": "Vector search finds similar items by comparing embeddings...",
  "sources": "[{\"id\":1,\"title\":\"Vector Search\",\"content\":\"...\",\"knn_dist\":0.12}]"
}

Vector fields are not included in sources.

Managing chat models

List models:

‹›
  • SQL
SQL
📋
SHOW CHAT MODELS;

Describe a model:

‹›
  • SQL
SQL
📋
DESCRIBE CHAT MODEL assistant;

Drop a model:

‹›
  • SQL
SQL
📋
DROP CHAT MODEL assistant;

Drop safely:

‹›
  • SQL
SQL
📋
DROP CHAT MODEL IF EXISTS assistant;

SHOW CHAT MODELS returns name, model, and created_at. DESCRIBE CHAT MODEL returns property and value; stored API keys are shown as HIDDEN.

Dropping a chat model also drops that model's conversation history table. Conversation history is stored per model and written with a 30-day TTL.

Last modified: June 19, 2026