Integration > Kafka | Manticore Search Manual

NOTE: The integration with Filebeat requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.

Filebeat is a lightweight shipper for forwarding and centralizing log data. Once installed as an agent, it monitors the log files or locations you specify, collects log events, and forwards them for indexing, usually to Elasticsearch or Logstash.

Now, Manticore also supports the use of Filebeat as processing pipelines. This allows the collected and transformed data to be sent to Manticore just like to Elasticsearch. Currently, All versions to 9.0 are fully supported.

Configuration varies slightly depending on which version of Filebeat you're using.

Note that Filebeat versions higher than 8.10 have the output compression feature enabled by default. That is why the compression_level: 0 option must be added to the configuration file to provide compatibility with Manticore:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/dpkg.log
  close_eof: true
  scan_frequency: 1s

output.elasticsearch:
  hosts: ["http://localhost:9308"]
  index: "dpkg_log"
  compression_level: 0

setup.ilm.enabled: false
setup.template.enabled: false
setup.template.name: "dpkg_log"
setup.template.pattern: "dpkg_log"

For versions 8.1 through 8.10, you need to add the allow_older_versions option:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/dpkg.log
  close_eof: true
  scan_frequency: 1s

output.elasticsearch:
  hosts: ["http://localhost:9308"]
  index: "dpkg_log"
  compression_level: 0
  allow_older_versions: true

setup.ilm.enabled: false
setup.template.enabled: false
setup.template.name: "dpkg_log"
setup.template.pattern: "dpkg_log"

From version 8.11, output compression is enabled by default, so you must explicitly set compression_level: 0 for compatibility with Manticore:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/dpkg.log
  close_eof: true
  scan_frequency: 1s

output.elasticsearch:
  hosts: ["http://localhost:9308"]
  index: "dpkg_log"
  compression_level: 0
  allow_older_versions: true

setup.ilm.enabled: false
setup.template.enabled: false
setup.template.name: "dpkg_log"
setup.template.pattern: "dpkg_log"

Filebeat 9.0 introduces a major architecture change, replacing the log input type with filestream. Here's the required configuration:

filebeat.inputs:
- type: filestream
  id: dpkg-log-input
  enabled: true
  paths:
    - /var/log/dpkg.log
  prospector.scanner.check_interval: 1s
  close.on_eof: true

output.elasticsearch:
  hosts: ["http://localhost:9308"]
  index: "dpkg_log"
  compression_level: 0
  allow_older_versions: true

setup.ilm.enabled: false
setup.template.enabled: false
setup.template.name: "dpkg_log"
setup.template.pattern: "dpkg_log"

Once you run Filebeat with this configuration, log data will be sent to Manticore and properly indexed. Here is the resulting schema of the table created by Manticore and an example of the inserted document:

mysql> DESCRIBE dpkg_log;
+------------------+--------+--------------------+
| Field            | Type   | Properties         |
+------------------+--------+--------------------+
| id               | bigint |                    |
| @timestamp       | text   | indexed stored     |
| message          | text   | indexed stored     |
| log              | json   |                    |
| input            | json   |                    |
| ecs              | json   |                    |
| host             | json   |                    |
| agent            | json   |                    |
+------------------+--------+--------------------+

mysql> SELECT * FROM dpkg_log LIMIT 1\G
*************************** 1. row ***************************
id: 7280000849080753116
@timestamp: 2023-06-16T09:27:38.792Z
message: 2023-04-12 02:06:08 status half-installed libhogweed5:amd64 3.5.1+really3.5.1-2
input: {"type":"filestream"}
ecs: {"version":"1.6.0"}
host: {"name":"logstash-db848f65f-lnlf9"}
agent: {"ephemeral_id":"587c2ebc-e7e2-4e27-b772-19c611115996","id":"2e3d985b-3610-4b8b-aa3b-2e45804edd2c","name":"logstash-db848f65f-lnlf9","type":"filebeat","version":"7.10.0","hostname":"logstash-db848f65f-lnlf9"}
log: {"offset":80,"file":{"path":"/var/log/dpkg.log"}}

Kibana

Kibana is a visual interface that allows you to explore, visualize, and create dashboards for your log data. Integrating Kibana with Manticore Search can speed up the loading of Kibana visualizations by up to 3 times compared to Elasticsearch, as demonstrated in this demo. This integration enables users to seamlessly analyze their data using interactive dashboards, custom visualizations, and real-time search capabilities. It also simplifies handling diverse data sources by supporting tools like Logstash and Filebeat for streamlined data ingestion, making it a great choice for log analysis workflows.

Download Kibana: Ensure you download a Kibana version compatible with Manticore. Currently, version 7.6.0 is tested and recommended. Other 7.x versions may work but could introduce issues. Version 8.x is not supported.
Verify Manticore: Ensure your Manticore instance is running and its HTTP API is reachable (default: http://localhost:9308).

Open the Kibana configuration file (kibana.yml).

Set the URL of your Manticore instance:

elasticsearch.hosts: ["http://localhost:9308"]

Start Kibana and open it in your browser at http://localhost:5601. Replace localhost with your server's IP or hostname if necessary.

Note: Manticore does not require authentication setup when working with Kibana.

Use the Discover tab in Kibana to search and filter your data interactively.
Refine your searches using the query bar with simple queries in the Kibana query language.

Navigate to Visualizations to create custom visualizations:
- Create a table pattern (it’s called an 'index pattern' in Kibana) if one doesn’t already exist to define your data source.
- Choose a visualization type (e.g., bar chart, line chart, or pie chart).
- Configure your visualization, execute it, and explore your data.
- Save your visualizations for future use.

Access Dashboards to create or view interactive dashboards:
- Add visualizations, filters, or controls for a personalized experience.
- Interact with your data directly from the dashboard.
- Save dashboards for future use.

Go to Management > Kibana to customize settings like default time zones and visualization preferences.

Currently, Kibana version 7.6.0 is tested and recommended. Other 7.x versions may work but could cause issues. Versions 8.x are not supported.
The following Elasticsearch-specific field types are not supported:
- Spatial data types
- Structured data types
- Document ranking types
- Text search types (except for plain 'text')
- Relational data types
Metric aggregation functions are limited to those supported by Manticore.
The following Kibana tools are not supported:
- Canvas – A visualization and presentation tool for combining data with colors and images.
- Elastic Maps – A tool for analyzing geographical data.
- Metrics – An app for monitoring infrastructure metrics.
- Logs – A console-like display for exploring logs from common services.
- Monitoring:
  - Uptime – Monitors the status of network endpoints via HTTP/S, TCP, and ICMP.
  - APM (Application Performance Monitoring) – Collects in-depth performance metrics from applications.
  - SIEM (Security Information and Event Management) – An interactive workspace for security teams to triage events and conduct initial investigations.
  - ILM (Index lifecycle management) - Automatically manage indices according to performance, resiliency, and retention requirements.
  - Stack Monitoring – Provides visualizations of monitoring data across the Elastic Stack.
- Elasticsearch Management – A UI for managing Elastic Stack objects, including ILM (Index Lifecycle Management), etc.

Integrate Manticore with tools like Logstash, Filebeat, Fluentbit, or Vector.dev to ingest data from sources like web logs. Once the data is loaded into Manticore, you can explore and visualize it in Kibana.

Filebeat Kafka

NOTE: this functionality requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.

Manticore Search can seamlessly consume messages from a Kafka broker, allowing for real-time data indexing and search.

To get started, you need to:

Define the source: Specify the Kafka topic from which Manticore Search will read messages. This setup includes details like the broker’s host, port, and topic name.
Set up the destination table: Choose a Manticore real-time table to store the incoming Kafka data.
Create a materialized view: Set up a materialized view (mv) to handle data transformation and mapping from Kafka to the destination table in Manticore Search. Here, you’ll define field mappings, data transformations, and any filters or conditions for the incoming data stream.

The source configuration allows you to define the broker, topic list, consumer group, and the message structure.

Define the schema using Manticore field types like int, float, text, json, etc.

CREATE SOURCE <source name> [(column type, ...)] [source_options]

All schema keys are case-insensitive, meaning Products, products, and PrOdUcTs are treated the same. They are all converted to lowercase.

If your field names don't match the field name syntax allowed in Manticore Search (for example, if they contain special characters or start with numbers), you must define a schema mapping. For instance, $keyName or 123field are valid keys in JSON but not valid field names in Manticore Search. If you try to use invalid field names without proper mapping, Manticore will return an error and the source creation will fail.

To handle such cases, use the following schema syntax to map invalid field names to valid ones:

allowed_field_name 'original JSON key name with special symbols' type

For example:

price_field '$price' float    -- maps JSON key '$price' to field 'price_field'
field_123 '123field' text     -- maps JSON key '123field' to field 'field_123'

‹›

SQL

📋

CREATE SOURCE kafka
(id bigint, term text, abbrev '$abbrev' text, GlossDef json)
type='kafka'
broker_list='kafka:9092'
topic_list='my-data'
consumer_group='manticore'
num_consumers='2'
batch=50

‹›

Response

Query OK, 2 rows affected (0.02 sec)

Option	Accepted Values	Description
`type`	`kafka`	Sets the source type. Currently, only `kafka` is supported
`broker_list`	`host:port [, ...]`	Specifies Kafka broker URLs
`topic_list`	`string [, ...]`	Lists Kafka topics to consume from
`consumer_group`	`string`	Defines the Kafka consumer group, defaulting to `manticore`.
`num_consumers`	`int`	Number of consumers to handle messages.
`partition_list`	`int [, ...]`	List of partitions for reading more.
`batch`	`int`	Number of messages to process before moving on. Default is `100`; processes remaining messages on timeout otherwise

The destination table is a regular real-time table where the results of Kafka message processing are stored. This table should be defined to match the schema requirements of the incoming data and optimized for the query performance needs of your application. Read more about creating real-time tables here.

‹›

SQL

📋

CREATE TABLE destination_kafka
(id bigint, name text, short_name text, received_at text, size multi);

‹›

Response

Query OK, 0 rows affected (0.02 sec)

A materialized view enables data transformation from Kafka messages. You can rename fields, apply Manticore Search functions, and perform sorting, grouping, and other data operations.

A materialized view acts as a query that moves data from the Kafka source to the destination table, letting you use Manticore Search syntax to customize these queries. Make sure that fields in the select match those in the source.

CREATE MATERIALIZED VIEW <materialized view name>
TO <destination table name> AS
SELECT [column|function [as <new name>], ...] FROM <source name>

‹›

SQL

📋

CREATE MATERIALIZED VIEW view_table
TO destination_kafka AS
SELECT id, term as name, abbrev as short_name,
       UTC_TIMESTAMP() as received_at, GlossDef.size as size FROM kafka

‹›

Response

Query OK, 2 rows affected (0.02 sec)

Data is transferred from Kafka to Manticore Search in batches, which are cleared after each run. For calculations across batches, such as AVG, use caution, as these may not work as expected due to batch-by-batch processing.

Here's a mapping table based on the examples above:

Kafka	Source	Buffer	MV	Destination
`id`	`id`	`id`	`id`	`id`
`term`	`term`	`term`	`term as name`	`name`
`unnecessary_key` which we're not interested in	-	-
`$abbrev`	`abbrev`	`abbrev`	`abbrev` as `short_name`	`short_name`
-	-	-	`UTC_TIMESTAMP() as received_at`	`received_at`
`GlossDef`	`glossdef`	`glossdef`	`glossdef.size as size`	`size`

To view sources and materialized views in Manticore Search, use these commands:

SHOW SOURCES: Lists all configured sources.
SHOW MVS: Lists all materialized views.
SHOW MV view_table: Shows detailed information on a specific materialized view.

‹›

SQL

📋

SHOW SOURCES

‹›

Response

+-------+
| name  |
+-------+
| kafka |
+-------+

‹›

SQL

📋

SHOW SOURCE kafka;

‹›

Response

+--------+-------------------------------------------------------------------+
| Source | Create Table                                                      |
+--------+-------------------------------------------------------------------+
| kafka  | CREATE SOURCE kafka                                               |
|        | (id bigint, term text, abbrev '$abbrev' text, GlossDef json)      |
|        | type='kafka'                                                      |
|        | broker_list='kafka:9092'                                          |
|        | topic_list='my-data'                                              |
|        | consumer_group='manticore'                                        |
|        | num_consumers='2'                                                 |
|        | batch=50                                                          |
+--------+-------------------------------------------------------------------+

‹›

SQL

📋

SHOW MVS

‹›

Response

+------------+
| name       |
+------------+
| view_table |
+------------+

‹›

SQL

📋

SHOW MV view_table

‹›

Response

+------------+--------------------------------------------------------------------------------------------------------+-----------+
| View       | Create Table                                                                                           | suspended |
+------------+--------------------------------------------------------------------------------------------------------+-----------+
| view_table | CREATE MATERIALIZED VIEW view_table TO destination_kafka AS                                            | 0         |
|            | SELECT id, term as name, abbrev as short_name, UTC_TIMESTAMP() as received_at, GlossDef.size as size   |           |
|            | FROM kafka                                                                                             |           |
+------------+--------------------------------------------------------------------------------------------------------+-----------+

You can suspend data consumption by altering materialized views.

If you remove the source without deleting the MV, it automatically suspends. After recreating the source, unsuspend the MV manually using the ALTER command.

Currently, only materialized views can be altered. To change source parameters, drop and recreate the source.

‹›

SQL

📋

ALTER MATERIALIZED VIEW view_table suspended=1

‹›

Response

Query OK (0.02 sec)

You can also specify a partition_list for each Kafka topic. One of the main benefits of this approach is the ability to implement sharding for your table via Kafka. To achieve this, you should create a separate chain of source → materialized view → destination table for each shard:

Sources:

CREATE SOURCE kafka_p1 (id bigint, term text)
  type='kafka' broker_list='kafka:9092' topic_list='my-data'
  consumer_group='manticore' num_consumers='1' partition_list='0' batch=50;

CREATE SOURCE kafka_p2 (id bigint, term text)
  type='kafka' broker_list='kafka:9092' topic_list='my-data'
  consumer_group='manticore' num_consumers='1' partition_list='1' batch=50;

Destination Tables:

CREATE TABLE destination_shard_1 (id bigint, name text);
CREATE TABLE destination_shard_2 (id bigint, name text);

Materialized Views:

CREATE MATERIALIZED VIEW mv_1 TO destination_shard_1 AS SELECT id, term AS name FROM kafka_p1;
CREATE MATERIALIZED VIEW mv_2 TO destination_shard_2 AS SELECT id, term AS name FROM kafka_p2;

In this setup, rebalancing must be managed manually.
Kafka does not distribute messages using a round-robin strategy by default.
To achieve round-robin-like distribution when sending data, make sure your Kafka producer is configured with:
- parse.key=true
- key.separator={your_delimiter}

Otherwise, Kafka will distribute messages based on its own internal rules, which may lead to uneven partitioning.

Kafka offsets commit after each batch or when processing times out. If the process stops unexpectedly during a materialized view query, you may see duplicate entries. To avoid this, include an id field in your schema, allowing Manticore Search to prevent duplicates in the table.

Worker initialization: After configuring a source and materialized view, Manticore Search sets up a dedicated worker to handle data ingestion from Kafka.
Message mapping: Messages are mapped according to the source configuration schema, transforming them into a structured format.
Batching: Messages are grouped into batches for efficient processing. Batch size can be adjusted to suit your performance and latency needs.
Buffering: Mapped data batches are stored in a buffer table for efficient bulk operations.
Materialized view processing: The view logic is applied to data in the buffer table, performing any transformations or filtering.
Data transfer: Processed data is then transferred to the destination real-time table.
Cleanup: The buffer table is cleared after each batch, ensuring it’s ready for the next set of data.

Kibana DBeaver

NOTE: The integration with DBeaver requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.

DBeaver is a SQL client software application and a database administration tool. For MySQL databases, it applies the JDBC application programming interface to interact with them via a JDBC driver.

Manticore allows you to use DBeaver for working with data stored in Manticore tables the same way as if it was stored in a MySQL database.

To start working with Manticore in DBeaver, follow these steps:

Choose the New database connection option in DBeaver's UI
Choose SQL -> MySQL as DBeaver's database driver
Set the Server host and Port options corresponding to the host and port of your Manticore instance (keep the database field empty)
Set root/<empty password> as authentication credentials

Since Manticore does not fully support MySQL, only a part of DBeaver's functionality is available when working with Manticore.

You will be able to:

View, create, delete, and rename tables
Add and drop table columns
Insert, delete, and update column data

You will not be able to:

Use database integrity check mechanisms (MyISAM will be set as the only storage engine available)
Use MySQL procedures, triggers, events, etc.
Manage database users
Set other database administration options

Some MySQL data types are not currently supported by Manticore and, therefore, cannot be used when creating a new table with DBeaver. Also, a few of the supported data types are converted to the most similar Manticore types with type precision being ignored in such conversion. Below is the list of supported MySQL data types as well as the Manticore types they are mapped to:

BIGINT UNSIGNED => bigint
BOOL => boolean
DATE, DATETIME, TIMESTAMP => timestamp
FLOAT => float
INT => int
INT UNSIGNED, SMALLINT UNSIGNED, TINYINT UNSIGNED, BIT => uint
JSON => json
TEXT, LONGTEXT, MEDIUMTEXT, TINYTEXT, BLOB, LONGBLOB, MEDIUMBLOB, TINYBLOB => text
VARCHAR, LONG VARCHAR, BINARY, CHAR, VARBINARY, LONG VARBINARY => string

You can find more details about Manticore data types here.

Manticore is able to handle the DATE, DATETIME and TIMESTAMP data types, however, this reqiures Manticore's Buddy enabled. Otherwise, an attempt to operate with one of these types will result in an error.

Note that the TIME type is not supported.

DBeaver's Preferences -> Connections -> Client identification option must not be turned off or overridden. To work correctly with DBeaver, Manticore needs to distinguish its requests from others. For this, it uses client notification info sent by DBeaver in request headers. Disabling client notification will break that detection and, therefore, Manticore's correct functionality.
When trying to update data in your table for the first time, you'll see the No unique key popup message and will be asked to define a custom unique key. When you get this message, perform the following steps:
- Choose the Custom Unique Key option
- Choose only the id column in the columns list
- Press Ok
After that, you'll be able to update your data safely.

Kafka Apache Superset

Integration with Filebeat

Filebeat configuration

Configuration for Filebeat 7.17 - 8.0

Configuration for Filebeat 8.1 - 8.10

Configuration for Filebeat 8.11 - 8.18

Configuration for Filebeat 9.0

Filebeat results

Integration of Manticore with Kibana

Prerequisites

Configuration

Supported Features

Discover

Visualizations

Dashboards

Management

Limitations

Data Ingestion and Exploration

Syncing from Kafka

Source

Schema

Options

Destination table

Materialized view

Field Mapping

Listing

Altering materialized views

Sharding with Kafka

⚠️ Important Notes:

Troubleshooting

Duplicate entries

How it works internally

Integration with DBeaver

Settings to use

Functions available

Data type handling

About date types

Possible caveats