Syncing with Kafka

NOTE: this functionality requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.

Manticore Search can seamlessly consume messages from a Kafka broker, allowing for real-time data indexing and search.

To get started, you need to:

  1. Define the source: Specify the Kafka topic from which Manticore Search will read messages. This setup includes details like the broker’s host, port, and topic name.
  2. Set up the destination table: Choose a Manticore real-time table to store the incoming Kafka data.
  3. Create a materialized view: Set up a materialized view (mv) to handle data transformation and mapping from Kafka to the destination table in Manticore Search. Here, you’ll define field mappings, data transformations, and any filters or conditions for the incoming data stream.

Source

The source configuration allows you to define the broker, topic list, consumer group, and the message structure.

Schema

Define the schema using Manticore field types like int, float, text, json, etc.

CREATE SOURCE <source name> [(column type, ...)] [source_options]

All schema keys are case-insensitive, so Products, products, and PrOdUcTs are treated the same. They are all converted to lowercase.

‹›
  • SQL
SQL
📋
CREATE SOURCE kafka
(id bigint, term text, abbrev text, GlossDef json)
type='kafka'
broker_list='kafka:9092'
topic_list='my-data'
consumer_group='manticore'
num_consumers='2'
batch=50
‹›
Response
Query OK, 2 rows affected (0.02 sec)

Options

Option Accepted Values Description
type kafka Sets the source type. Currently, only kafka is supported
broker_list host:port [, ...] Specifies Kafka broker URLs
topic_list string [, ...] Lists Kafka topics to consume from
consumer_group string Defines the Kafka consumer group, defaulting to manticore.
num_consumers int Number of consumers to handle messages.
batch int Number of messages to process before moving on. Default is 100; processes remaining messages on timeout otherwise

Destination table

The destination table is a regular real-time table where the results of Kafka message processing are stored. This table should be defined to match the schema requirements of the incoming data and optimized for the query performance needs of your application. Read more about creating real-time tables here.

‹›
  • SQL
SQL
📋
CREATE TABLE destination_kafka
(id bigint, name text, short_name text, received_at text, size multi);
‹›
Response
Query OK, 0 rows affected (0.02 sec)

Materialized view

A materialized view enables data transformation from Kafka messages. You can rename fields, apply Manticore Search functions, and perform sorting, grouping, and other data operations.

A materialized view acts as a query that moves data from the Kafka source to the destination table, letting you use Manticore Search syntax to customize these queries. Make sure that fields in the select match those in the source.

CREATE MATERIALIZED VIEW <materialized view name>
TO <destination table name> AS
SELECT [column|function [as <new name>], ...] FROM <source name>
‹›
  • SQL
SQL
📋
CREATE MATERIALIZED VIEW view_table
TO destination_kafka AS
SELECT id, term as name, abbrev as short_name,
       UTC_TIMESTAMP() as received_at, GlossDef.size as size FROM kafka
‹›
Response
Query OK, 2 rows affected (0.02 sec)

Data is transferred from Kafka to Manticore Search in batches, which are cleared after each run. For calculations across batches, such as AVG, use caution, as these may not work as expected due to batch-by-batch processing.

Field Mapping

Here's a mapping table based on the examples above:

Kafka Source Buffer MV Destination
id id id id id
term term term term as name name
unnecessary key - -
abbrev abbrev abbrev abbrev as short_name short_name
- - `UTC_TIMESTAMP()`` as received_at received_at
GlossDef GlossDef GlossDef GlossDef.size as size size

Listing

To view sources and materialized views in Manticore Search, use these commands:

  • SHOW SOURCES: Lists all configured sources.
  • SHOW MVS: Lists all materialized views.
  • SHOW MV view_table: Shows detailed information on a specific materialized view.
‹›
  • SQL
SQL
📋
SHOW SOURCES
‹›
Response
+-------+
| name  |
+-------+
| kafka |
+-------+
‹›
  • SQL
SQL
📋
SHOW SOURCE kafka;
‹›
Response
+--------+---------------------------------------------------------+
| Source | Create Table                                            |
+--------+---------------------------------------------------------+
| kafka  | CREATE SOURCE kafka                                     |
|        | (id bigint, term text, abbrev text, GlossDef json)      |
|        | type='kafka'                                            |
|        | broker_list='kafka:9092'                                |
|        | topic_list='my-data'                                    |
|        | consumer_group='manticore'                              |
|        | num_consumers='2'                                       |
|        | batch=50                                                |
+--------+---------------------------------------------------------+
‹›
  • SQL
SQL
📋
SHOW MVS
‹›
Response
+------------+
| name       |
+------------+
| view_table |
+------------+
‹›
  • SQL
SQL
📋
SHOW MV view_table
‹›
Response
+------------+--------------------------------------------------------------------------------------------------------+-----------+
| View       | Create Table                                                                                           | suspended |
+------------+--------------------------------------------------------------------------------------------------------+-----------+
| view_table | CREATE MATERIALIZED VIEW view_table TO destination_kafka AS                                            | 0         |
|            | SELECT id, term as name, abbrev as short_name, UTC_TIMESTAMP() as received_at, GlossDef.size as size   |           |
|            | FROM kafka                                                                                             |           |
+------------+--------------------------------------------------------------------------------------------------------+-----------+

Altering materialized views

You can suspend data consumption by altering materialized views.

If you remove the source without deleting the MV, it automatically suspends. After recreating the source, unsuspend the MV manually using the ALTER command.

Currently, only materialized views can be altered. To change source parameters, drop and recreate the source.

‹›
  • SQL
SQL
📋
ALTER MATERIALIZED VIEW view_table suspended=1
‹›
Response
Query OK (0.02 sec)

Troubleshooting

Duplicate entries

Kafka offsets commit after each batch or when processing times out. If the process stops unexpectedly during a materialized view query, you may see duplicate entries. To avoid this, include an id field in your schema, allowing Manticore Search to prevent duplicates in the table.

How it works internally

  • Worker initialization: After configuring a source and materialized view, Manticore Search sets up a dedicated worker to handle data ingestion from Kafka.
  • Message mapping: Messages are mapped according to the source configuration schema, transforming them into a structured format.
  • Batching: Messages are grouped into batches for efficient processing. Batch size can be adjusted to suit your performance and latency needs.
  • Buffering: Mapped data batches are stored in a buffer table for efficient bulk operations.
  • Materialized view processing: The view logic is applied to data in the buffer table, performing any transformations or filtering.
  • Data transfer: Processed data is then transferred to the destination real-time table.
  • Cleanup: The buffer table is cleared after each batch, ensuring it’s ready for the next set of data.

Integration with DBeaver

NOTE: The integration with DBeaver requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.

DBeaver is a SQL client software application and a database administration tool. For MySQL databases, it applies the JDBC application programming interface to interact with them via a JDBC driver.

Manticore allows you to use DBeaver for working with data stored in Manticore tables the same way as if it was stored in a MySQL database.

Settings to use

To start working with Manticore in DBeaver, follow these steps:

  • Choose the New database connection option in DBeaver's UI
  • Choose SQL -> MySQL as DBeaver's database driver
  • Set the Server host and Port options corresponding to the host and port of your Manticore instance (keep the database field empty)
  • Set root/<empty password> as authentication credentials

Functions available

Since Manticore does not fully support MySQL, only a part of DBeaver's functionality is available when working with Manticore.

You will be able to:

  • View, create, delete, and rename tables
  • Add and drop table columns
  • Insert, delete, and update column data

You will not be able to:

  • Use database integrity check mechanisms (MyISAM will be set as the only storage engine available)
  • Use MySQL procedures, triggers, events, etc.
  • Manage database users
  • Set other database administration options

Data type handling

Some MySQL data types are not currently supported by Manticore and, therefore, cannot be used when creating a new table with DBeaver. Also, a few of the supported data types are converted to the most similar Manticore types with type precision being ignored in such conversion. Below is the list of supported MySQL data types as well as the Manticore types they are mapped to:

  • BIGINT UNSIGNED => bigint
  • BOOL => boolean
  • DATE, DATETIME, TIMESTAMP => timestamp
  • FLOAT => float
  • INT => int
  • INT UNSIGNED, SMALLINT UNSIGNED, TINYINT UNSIGNED, BIT => uint
  • JSON => json
  • TEXT, LONGTEXT, MEDIUMTEXT, TINYTEXT, BLOB, LONGBLOB, MEDIUMBLOB, TINYBLOB => text
  • VARCHAR, LONG VARCHAR, BINARY, CHAR, VARBINARY, LONG VARBINARY => string

You can find more details about Manticore data types here.

About date types

Manticore is able to handle the DATE, DATETIME and TIMESTAMP data types, however, this reqiures Manticore's Buddy enabled. Otherwise, an attempt to operate with one of these types will result in an error.

Note that the TIME type is not supported.

Possible caveats

  • DBeaver's Preferences -> Connections -> Client identification option must not be turned off or overridden. To work correctly with DBeaver, Manticore needs to distinguish its requests from others. For this, it uses client notification info sent by DBeaver in request headers. Disabling client notification will break that detection and, therefore, Manticore's correct functionality.

  • When trying to update data in your table for the first time, you'll see the No unique key popup message and will be asked to define a custom unique key. When you get this message, perform the following steps:

    • Choose the Custom Unique Key option
    • Choose only the id column in the columns list
    • Press Ok

    After that, you'll be able to update your data safely.

Integration with Apache Superset

NOTE: The integration with Apache Superset requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.

Apache Superset is a modern data exploration and visualization platform. Manticore Search can be integrated with Apache Superset, allowing users to harness the full potential of Manticore Search for data analysis and visualization.

Note: This integration works only with Apache Superset 3.x branch and is not yet available for version 4 or newer.

Prerequisites

Before integrating Manticore Search with Apache Superset, ensure that:

  1. Manticore Search (version 6.2.0 or later) is properly installed and configured on your server. Refer to the official Manticore Search installation guide for assistance.
  2. Apache Superset is set up on your system. Follow the official Apache Superset installation guide for installation instructions.

Connecting Manticore Search to Apache Superset

To connect Manticore Search to Apache Superset:

  1. Log in to your Apache Superset dashboard.
  2. Click on "+" in the top navigation bar and choose "Database Connections".
  3. Click "+ DATABASE" to add a new database.
  4. Select "MySQL" as the database type.
  5. In the configuration page, provide the following details:
    • Database Name: A name for the database (e.g., "Manticore")
    • Host: Your Manticore Search server host
    • Port: The port on which Manticore Search is running
    • Username: Your Manticore Search username
    • Display Name: A display name for the connection
  6. Click "Connect" to verify the connection and save.

Creating Charts and Dashboards

After connecting Manticore Search to Apache Superset, you can create charts and dashboards using your Manticore data:

  1. In the Apache Superset dashboard, click on "Dashboards" in the top navigation bar and select "New dashboard".
  2. Open the dashboard and click on the "+" icon to add a new chart.
  3. Choose the dataset connected to Manticore Search.
  4. Select the type of chart you want to create (e.g., bar chart, line chart, pie chart).
  5. Use Apache Superset's query builder or write an SQL query to fetch data from your Manticore Search database.
  6. Customize the chart's appearance, labels, and other settings as needed.
  7. Click "UPDATE CHART" to update the chart.
  8. Add a name for the chart and click "Save" to permanently save it in the Dashboard.

Supported Functionality

When working with Manticore Search through Apache Superset, you can:

  • View and query data from Manticore Search tables
  • Create various types of visualizations based on your Manticore data
  • Build and customize dashboards using these visualizations

Data Type Handling

Manticore Search data types are mapped to MySQL data types when working with Apache Superset. Refer to the Manticore Search documentation for more details on supported data types and their mappings.

Limitations

  • Some advanced MySQL features may not be available when working with Manticore Search through Apache Superset.
  • Certain data manipulation operations might be limited compared to working directly with MySQL databases.

By following these steps and guidelines, you can effectively integrate Manticore Search with Apache Superset for powerful data exploration and visualization capabilities.

References

Please refer to the following resources for more information on Apache Superset and Manticore Search: