RELOAD PLUGINS FROM SONAME 'plugin_library'
Reloads all plugins (UDFs, rankers, etc) from a given library. Reload is, in a sense, transactional: a successful reload guarantees that:
- all the plugins were successfully updated with their new versions;
- the update was atomic, all the plugins were replaced at once. Atomicity means that queries using multiple functions from a reloaded library will never mix the old and new versions.
The set of plugins is guaranteed to always be consistent during the
RELOAD, it will be either all old, or all new.
Reload also is seamless, meaning that some version of a reloaded plugin will be available to concurrent queries at all times, and there will be no temporary disruptions. Note how this improves on using a pair of
CREATE statements for reloading: with those, there is a tiny window between the
DROP and the subsequent
CREATE, during which the queries technically refer to an unknown plugin and will thus fail.
In case of any failure
RELOAD PLUGINS does absolutely nothing, keeps the old plugins, and reports an error.
On Windows, either overwriting or deleting a DLL library currently in use seems to be an issue. However, you can still rename it, then put a new version under the old name, and
RELOAD will then work. After a successful reload you will also be able to delete the renamed old library, too.
mysql> RELOAD PLUGINS FROM SONAME 'udfexample.dll'; Query OK, 0 rows affected (0.00 sec)
Ranker plugins let you implement a custom ranker that receives all the occurrences of the keywords matched in the document, and computes a
WEIGHT() value. They can be called as follows:
SELECT id, attr1 FROM test WHERE match('hello') OPTION ranker=myranker('option1=1');
The call workflow is as follows:
XXX_init()gets called once per query per table, in the very beginning. A few query-wide options are passed to it through a
SPH_RANKER_INITstructure, including the user options strings (in the example just above, "option1=1" is that string).
XXX_update()gets called multiple times per matched document, with every matched keyword occurrence passed as its parameter, a
SPH_RANKER_HITstructure. The occurrences within each document are guaranteed to be passed in the order of ascending
XXX_finalize()gets called once per matched document, once there are no more keyword occurrences. It must return the
WEIGHT()value. This is the only mandatory function.
XXX_deinit()gets called once per query, in the very end.
Token filter plugins let you implement a custom tokenizer that makes tokens according to custom rules. There are two type:
- Index-time tokenizer declared by index_token_filter in table settings
- query-time tokenizer declared by token_filter OPTION directive
In the text processing pipeline, the token filters will run after the base tokenizer processing occurs (which processes the text from field or query and creates tokens out of them).
Index-time tokenizer gets created by
indexer on indexing source data into a table or by an RT table on processing
Plugin is declared as
library name:plugin name:optional string of settings. The init functions of the plugin can accept arbitrary settings that can be passed as a string in format
index_token_filter = my_lib.so:email_process:field=email;split=.io
The call workflow for index-time token filter is as follows:
XXX_init()gets called right after
indexercreates token filter with empty fields list then after indexer got table schema with actual fields list. It must return zero for successful initialization or error description otherwise.
XXX_begin_documentgets called only for RT table
REPLACEfor every document. It must return zero for successful call or error description otherwise. Using OPTION
token_filter_optionsadditional parameters/settings can be passed to the function.
INSERT INTO rt (id, title) VALUES (1, 'some text [email protected]') OPTION token_filter_options='.io'
XXX_begin_fieldgets called once for each field prior to processing field with base tokenizer with field number as its parameter.
XXX_push_tokengets called once for each new token produced by base tokenizer with source token as its parameter. It must return token, count of extra tokens made by token filter and delta position for token.
XXX_get_extra_tokengets called multiple times in case
XXX_push_tokenreports extra tokens. It must return token and delta position for that extra token.
XXX_end_fieldgets called once right after source tokens from current field get over.
XXX_deinitgets called in the very end of indexing.
The following functions are mandatory to be defined:
Query-time tokenizer gets created on search each time full-text invoked by every table involved.
The call workflow for query-time token filter is as follows:
XXX_init()gets called once per table prior to parsing query with parameters - max token length and string set by
SELECT * FROM index WHERE MATCH ('test') OPTION token_filter='my_lib.so:query_email_process:io'
It must return zero for successful initialization or error description otherwise.
XXX_push_token()gets called once for each new token produced by base tokenizer with parameters: token produced by base tokenizer, pointer to raw token at source query string and raw token length. It must return token and delta position for token.
XXX_pre_morph()gets called once for token right before it got passed to morphology processor with reference to token and stopword flag. It might set stopword flag to mark token as stopword.
XXX_post_morph()gets called once for token after it processed by morphology processor with reference to token and stopword flag. It might set stopword flag to mark token as stopword. It must return flag non-zero value of which means to use token prior to morphology processing.
XXX_deinit()gets called in the very end of query processing.
Absence of any of the functions is tolerated.