FEDERATED

With the MySQL FEDERATED engine, you can connect to a local or remote Manticore instance from MySQL/MariaDB and perform search queries.

Using FEDERATED

An actual Manticore query can't be used directly with the FEDERATED engine and must be "proxied" (sent as a string in a column) due to the FEDERATED engine's limitations and the fact that Manticore implements custom syntax like the MATCH clause.

To search via FEDERATED, you first need to create a FEDERATED engine table. The Manticore query will be included in a query column in the SELECT performed over the FEDERATED table.

Creating a FEDERATED-compatible MySQL table:

‹›
  • SQL
SQL
📋
CREATE TABLE t1
(
    id          INTEGER UNSIGNED NOT NULL,
    year        INTEGER NOT NULL,
    rating      FLOAT,
    query       VARCHAR(1024) NOT NULL,
    INDEX(query)
) ENGINE=FEDERATED
DEFAULT CHARSET=utf8
CONNECTION='mysql://[email protected]:9306/DB/movies';
‹›
Response
Query OK, 0 rows affected (0.00 sec)

Query FEDERATED compatible table:

‹›
  • SQL
SQL
📋
SELECT * FROM t1 WHERE query='SELECT * FROM movies WHERE MATCH (\'pie\')';
‹›
Response
+----+------+--------+------------------------------------------+
| id | year | rating | query                                    |
+----+------+--------+------------------------------------------+
|  1 | 2019 |      5 | SELECT * FROM movies WHERE MATCH ('pie') |
+----+------+--------+------------------------------------------+
1 row in set (0.04 sec)

The only fixed mapping is the query column. It is mandatory and must be the only column with a table attached.

The Manticore table linked via FEDERATED must be a physical table (plain or real-time).

The FEDERATED table should have columns with the same names as the remote Manticore table attributes since they will be bound to the attributes provided in the Manticore result set by name. However, it might map only some attributes, not all of them.

Manticore server identifies a query from a FEDERATED client by the user name "FEDERATED". The CONNECTION string parameter is used to specify the Manticore host, SQL port, and tables for queries coming through the connection. The connection string syntax is as follows:

CONNECTION="mysql://FEDERATED@HOST:PORT/DB/TABLENAME"

Since Manticore doesn't have the concept of a database, the DB string can be random as it will be ignored by Manticore, but MySQL requires a value in the CONNECTION string definition. As seen in the example, the full SELECT SQL query should be placed in a WHERE clause against the query column.

Only the SELECT statement is supported, not INSERT, REPLACE, UPDATE, or DELETE.

FEDERATED tips

One very important note is that it is much more efficient to allow Manticore to perform sorting, filtering, and slicing the result set than to increase the max matches count and use WHERE, ORDER BY, and LIMIT clauses on the MySQL side. This is for two reasons. First, Manticore implements a number of optimizations and performs better than MySQL for these tasks. Second, less data needs to be packed by searchd, transferred, and unpacked between Manticore and MySQL.

JOINs can be performed between a FEDERATED table and other MySQL tables. This can be used to retrieve information that is not stored in a Manticore table.

‹›
  • SQL
SQL
📋
Query to JOIN MySQL based table with FEDERATED table served by Manticore:
SELECT t1.id, t1.year, comments.comment FROM t1 JOIN comments ON t1.id=comments.post_id WHERE query='SELECT * FROM movies WHERE MATCH (\'pie\')';
‹›
Response
+----+------+--------------+
| id | year | comment      |
+----+------+--------------+
|  1 | 2019 | was not good |
+----+------+--------------+
1 row in set (0.00 sec)

UDFs and Plugins

Manticore can be extended with user-defined functions, or UDFs for short, like this:

SELECT id, attr1, myudf (attr2, attr3+attr4) ...

You can dynamically load and unload UDFs into searchd without having to restart the server, and use them in expressions when searching, ranking, etc. A quick summary of the UDF features is as follows:

  • UDFs can take integer (both 32-bit and 64-bit), float, string, MVA, or PACKEDFACTORS() arguments.
  • UDFs can return integer, float, or string values.
  • UDFs can check the argument number, types, and names during the query setup phase, and raise errors.

We do not yet support aggregation functions. In other words, your UDFs will be called for just a single document at a time and are expected to return some value for that document. Writing a function that can compute an aggregate value like AVG() over the entire group of documents that share the same GROUP BY key is not yet possible. However, you can use UDFs within the built-in aggregate functions: that is, even though MYCUSTOMAVG() is not supported yet, AVG(MYCUSTOMFUNC()) should work just fine!

UDFs offer a wide range of applications, such as:

  • incorporating custom mathematical or string functions;
  • accessing databases or files from within Manticore;
  • creating complex ranking functions.

Plugins

Plugins offer additional opportunities to expand search functionality. They can currently be used to compute custom rankings and tokenize documents and queries.

Here's the complete list of plugin types:

  • UDF plugins (essentially UDFs, but since they're plugged in, they're also referred to as 'UDF plugins')
  • ranker plugins
  • indexing-time token filter plugins
  • query-time token filter plugins

This section covers the general process of writing and managing plugins; specifics related to creating different types of plugins are discussed in their respective subsections.

So, how do you write and use a plugin? Here's a quick four-step guide:

  • create a dynamic library (either .so or .dll), most likely using C or C++;
  • load the plugin into searchd with CREATE PLUGIN;
  • use the plugin with plugin-specific calls (usually through specific OPTIONS).
  • unload or reload a plugin with DROP PLUGIN and RELOAD PLUGINS, respectively.

Note that while UDFs are first-class plugins, they are installed using a separate CREATE FUNCTION statement. This allows for a neat specification of the return type, without sacrificing backward compatibility or changing the syntax.

Dynamic plugins are supported in threads and thread_pool workers. Multiple plugins (and/or UDFs) can be contained in a single library file. You may choose to either group all project-specific plugins in one large library or create a separate library for each UDF and plugin; it's up to you.

As with UDFs, you should include the src/sphinxudf.h header file. At the very least, you'll need the SPH_UDF_VERSION constant to implement an appropriate version function. Depending on the specific plugin type, you may or may not need to link your plugin with src/sphinxudf.c. However, all functions implemented in sphinxudf.c are related to unpacking the PACKEDFACTORS() blob, and no plugin types have access to that data. So currently, linking with just the header should suffice. (In fact, if you copy over the UDF version number, you won't even need the header file for some plugin types.)

Formally, plugins are simply sets of C functions that adhere to a specific naming pattern. You're typically required to define one key function for the primary task, but you can also define additional functions. For instance, to implement a ranker called "myrank", you must define a myrank_finalize() function that returns the rank value. However, you can also define myrank_init(), myrank_update(), and myrank_deinit() functions. Specific sets of well-known suffixes and call arguments differ based on the plugin type, but _init() and _deinit() are generic, and every plugin has them. Hint: for a quick reference on known suffixes and their argument types, refer to sphinxplugin.h, where the call prototypes are defined at the beginning of the file.

Even though the public interface is defined in pure C, our plugins essentially follow an object-oriented model. Indeed, every _init() function receives a void ** userdata out-parameter, and the pointer value stored at (*userdata) is then passed as the first argument to all other plugin functions. So you can think of a plugin as a class that gets instantiated every time an object of that class is needed to handle a request: the userdata pointer serves as the this pointer; the functions act as methods, and the _init() and _deinit() functions work as constructor and destructor, respectively.

This minor OOP-in-C complication arises because plugins run in a multi-threaded environment, and some need to maintain state. You can't store that state in a global variable in your plugin, so we pass around a userdata parameter, which naturally leads to the OOP model. If your plugin is simple and stateless, the interface allows you to omit _init(), _deinit(), and any other functions.

To summarize, here's the simplest complete ranker plugin in just three lines of C code:

// gcc -fPIC -shared -o myrank.so myrank.c
#include "sphinxudf.h"
int myrank_ver() { return SPH_UDF_VERSION; }
int myrank_finalize(void *u, int w) { return 123; }

Here's how to use the simple ranker plugin:

mysql> CREATE PLUGIN myrank TYPE 'ranker' SONAME 'myrank.dll';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT id, weight() FROM test1 WHERE MATCH('test') OPTION ranker=myrank('');
+------+----------+
| id   | weight() |
+------+----------+
|    1 |      123 |
|    2 |      123 |
+------+----------+
2 rows in set (0.01 sec)

Listing plugins

SHOW PLUGINS

SHOW PLUGINS

Displays all the loaded plugins (except for Buddy plugins, see below) and UDFs. The "Type" column should be one of the udf, ranker, index_token_filter, or query_token_filter. The "Users" column is the number of thread that are currently using that plugin in a query. The "Extra" column is intended for various additional plugin-type specific information; currently, it shows the return type for the UDFs and is empty for all the other plugin types.

‹›
  • Example
Example
📋
SHOW PLUGINS;
‹›
Response
+------+----------+----------------+-------+-------+
| Type | Name     | Library        | Users | Extra |
+------+----------+----------------+-------+-------+
| udf  | sequence | udfexample.dll | 0     | INT   |
+------+----------+----------------+-------+-------+
1 row in set (0.00 sec)

SHOW BUDDY PLUGINS

SHOW BUDDY PLUGINS

This will display all available plugins, including core and local ones. To remove a plugin, make sure to use the name listed in the Package column.

‹›
  • Example
Example
📋
SHOW BUDDY PLUGINS;
‹›
Response
+-------------------------------------------------+------------------+----------+----------+
| Package                                         | Plugin           | Version  | Type     |
+-------------------------------------------------+------------------+----------+----------+
| manticoresoftware/buddy-plugin-empty-string     | empty-string     | dev-main | core     |
| manticoresoftware/buddy-plugin-backup           | backup           | dev-main | core     |
| manticoresoftware/buddy-plugin-emulate-elastic  | emulate-elastic  | dev-main | core     |
| manticoresoftware/buddy-plugin-insert           | insert           | dev-main | core     |
| manticoresoftware/buddy-plugin-lock-tables      | lock-tables      | dev-main | core     |
| manticoresoftware/buddy-plugin-unlock-tables    | unlock-tables    | dev-main | core     |
| manticoresoftware/buddy-plugin-select           | select           | dev-main | core     |
| manticoresoftware/buddy-plugin-show-full-tables | show-full-tables | dev-main | core     |
| manticoresoftware/buddy-plugin-show-fields      | show-fields      | dev-main | core     |
| manticoresoftware/buddy-plugin-show-queries     | show-queries     | dev-main | core     |
| manticoresoftware/buddy-plugin-cli-table        | cli-table        | dev-main | core     |
| manticoresoftware/buddy-plugin-plugin           | plugin           | dev-main | core     |
| manticoresoftware/buddy-plugin-test             | test             | dev-main | core     |
| manticoresoftware/buddy-plugin-show-hostname    | show-hostname    | dev-main | external |
+-------------------------------------------------+------------------+----------+----------+
14 rows in set (0.006 sec)