ALTER TABLE table ADD COLUMN column_name [{INTEGER|INT|BIGINT|FLOAT|BOOL|MULTI|MULTI64|JSON|STRING|TIMESTAMP|TEXT [INDEXED [ATTRIBUTE]]}] [engine='columnar']
ALTER TABLE table DROP COLUMN column_name
ALTER TABLE table MODIFY COLUMN column_name bigint
This feature only supports adding one field at a time for RT tables or the expansion of an int
column to bigint
. The supported data types are:
int
- integer attributetimestamp
- timestamp attributebigint
- big integer attributefloat
- float attributebool
- boolean attributemulti
- multi-valued integer attributemulti64
- multi-valued bigint attributejson
- json attributestring
/text attribute
/string attribute
- string attributetext
/text indexed stored
/string indexed stored
- full-text indexed field with original value stored in docstoretext indexed
/string indexed
- full-text indexed field, indexed only (the original value is not stored in docstore)text indexed attribute
/string indexed attribute
- full text indexed field + string attribute (not storing the original value in docstore)text stored
/string stored
- the value will be only stored in docstore, not full-text indexed, not a string attribute- adding
engine='columnar'
to any attribute (except for json) will make it stored in the columnar storage
- ❗It's recommended to backup table files before
ALTER
ing it to avoid data corruption in case of a sudden power interruption or other similar issues. - Querying a table is impossible while a column is being added.
- Newly created attribute's values are set to 0.
ALTER
will not work for distributed tables and tables without any attributes.- You can't delete the
id
column. - When dropping a field which is both a full-text field and a string attribute the first
ALTER DROP
drops the attribute, the second one drops the full-text field. - Adding/dropping full-text field is only supported in the RT mode.
- Example
mysql> desc rt;
+------------+-----------+
| Field | Type |
+------------+-----------+
| id | bigint |
| text | field |
| group_id | uint |
| date_added | timestamp |
+------------+-----------+
mysql> alter table rt add column test integer;
mysql> desc rt;
+------------+-----------+
| Field | Type |
+------------+-----------+
| id | bigint |
| text | field |
| group_id | uint |
| date_added | timestamp |
| test | uint |
+------------+-----------+
mysql> alter table rt drop column group_id;
mysql> desc rt;
+------------+-----------+
| Field | Type |
+------------+-----------+
| id | bigint |
| text | field |
| date_added | timestamp |
| test | uint |
+------------+-----------+
mysql> alter table rt add column title text indexed;
mysql> desc rt;
+------------+-----------+------------+
| Field | Type | Properties |
+------------+-----------+------------+
| id | bigint | |
| text | text | indexed |
| title | text | indexed |
| date_added | timestamp | |
| test | uint | |
+------------+-----------+------------+
mysql> alter table rt add column title text attribute;
mysql> desc rt;
+------------+-----------+------------+
| Field | Type | Properties |
+------------+-----------+------------+
| id | bigint | |
| text | text | indexed |
| title | text | indexed |
| date_added | timestamp | |
| test | uint | |
| title | string | |
+------------+-----------+------------+
mysql> alter table rt drop column title;
mysql> desc rt;
+------------+-----------+------------+
| Field | Type | Properties |
+------------+-----------+------------+
| id | bigint | |
| text | text | indexed |
| title | text | indexed |
| date_added | timestamp | |
| test | uint | |
+------------+-----------+------------+
mysql> alter table rt drop column title;
mysql> desc rt;
+------------+-----------+------------+
| Field | Type | Properties |
+------------+-----------+------------+
| id | bigint | |
| text | text | indexed |
| date_added | timestamp | |
| test | uint | |
+------------+-----------+------------+
ALTER TABLE table ft_setting='value'[, ft_setting2='value']
You can use ALTER
to modify the full-text settings of your table in RT mode. However, it only affects new documents and not existing ones.
Example:
- create a table with a full-text field and
charset_table
that allows only 3 searchable characters:a
,b
andc
. - then we insert document 'abcd' and find it by query
abcd
, thed
just gets ignored since it's not in thecharset_table
array - then we understand, that we want
d
to be searchable too, so we add it with help ofALTER
- but the same query
where match('abcd')
still says it searched byabc
, because the existing document remembers previous contents ofcharset_table
- then we add another document
abcd
and search byabcd
again - now it finds the both documents and
show meta
says it used two keywords:abc
(to find the old document) andabcd
(for the new one).
- Example
mysql> create table rt(title text) charset_table='a,b,c';
mysql> insert into rt(title) values('abcd');
mysql> select * from rt where match('abcd');
+---------------------+-------+
| id | title |
+---------------------+-------+
| 1514630637682688054 | abcd |
+---------------------+-------+
mysql> show meta;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total | 1 |
| total_found | 1 |
| time | 0.000 |
| keyword[0] | abc |
| docs[0] | 1 |
| hits[0] | 1 |
+---------------+-------+
mysql> alter table rt charset_table='a,b,c,d';
mysql> select * from rt where match('abcd');
+---------------------+-------+
| id | title |
+---------------------+-------+
| 1514630637682688054 | abcd |
+---------------------+-------+
mysql> show meta
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total | 1 |
| total_found | 1 |
| time | 0.000 |
| keyword[0] | abc |
| docs[0] | 1 |
| hits[0] | 1 |
+---------------+-------+
mysql> insert into rt(title) values('abcd');
mysql> select * from rt where match('abcd');
+---------------------+-------+
| id | title |
+---------------------+-------+
| 1514630637682688055 | abcd |
| 1514630637682688054 | abcd |
+---------------------+-------+
mysql> show meta;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total | 2 |
| total_found | 2 |
| time | 0.000 |
| keyword[0] | abc |
| docs[0] | 1 |
| hits[0] | 1 |
| keyword[1] | abcd |
| docs[1] | 1 |
| hits[1] | 1 |
+---------------+-------+
You can change the name of a real-time table in RT mode.
ALTER TABLE table_name RENAME new_table_name;
NOTE: Renaming a real-time table requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.
- Example
ALTER TABLE table_name RENAME new_table_name;
Query OK, 0 rows affected (0.00 sec)
ALTER TABLE table RECONFIGURE
ALTER
can also reconfigure an RT table in the plain mode, so that new tokenization, morphology and other text processing settings from the configuration file take effect for new documents. Note, that the existing document will be left intact. Internally, it forcibly saves the current RAM chunk as a new disk chunk and adjusts the table header, so that new documents are tokenized using the updated full-text settings.
- Example
mysql> show table rt settings;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| settings | |
+---------------+-------+
1 row in set (0.00 sec)
mysql> alter table rt reconfigure;
Query OK, 0 rows affected (0.00 sec)
mysql> show table rt settings;
+---------------+----------------------+
| Variable_name | Value |
+---------------+----------------------+
| settings | morphology = stem_en |
+---------------+----------------------+
1 row in set (0.00 sec)
ALTER TABLE table REBUILD SECONDARY
You can also use ALTER
to rebuild secondary indexes in a given table. Sometimes, a secondary index can be disabled for the entire table or for one or multiple attributes within the table:
- When an attribute is updated, its secondary index gets disabled.
- If Manticore loads a table with an old version of secondary indexes that is no longer supported, the secondary indexes will be disabled for the entire table.
ALTER TABLE table REBUILD SECONDARY
rebuilds secondary indexes from attribute data and enables them again.
Additionally, an old version of secondary indexes may be supported but will lack certain features. REBUILD SECONDARY
can be used to update secondary indexes.
- Example
ALTER TABLE rt REBUILD SECONDARY;
Query OK, 0 rows affected (0.00 sec)
To change the list of local or remote nodes in a distributed table, follow the same syntax you used to create the table. Just replace CREATE
with ALTER
in the command and remove type='distributed'
:
ALTER TABLE `distr_table_name` [[local='local_table_name'], [agent='host:port:remote_table'] ... ]
NOTE: Renaming a real-time table requires Manticore Buddy. If it doesn't work, make sure Buddy is installed.
- Example
ALTER TABLE local_dist local='index1' local='index2' agent='127.0.0.1:9312:remote_table';
Returns the absolute value of the argument.
Returns the arctangent function of two arguments, expressed in radians.
BITDOT(mask, w0, w1, ...)
returns the sum of products of each bit of a mask multiplied by its weight. bit0*w0 + bit1*w1 + ...
Returns the smallest integer value greater than or equal to the argument.
Returns the cosine of the argument.
Returns the CRC32 value of a string argument.
Returns the exponent of the argument (e=2.718... to the power of the argument).
Returns the N-th Fibonacci number, where N is the integer argument. That is, arguments of 0 and up will generate the values 0, 1, 1, 2, 3, 5, 8, 13 and so on. Note that the computations are done using 32-bit integer math and thus numbers 48th and up will be returned modulo 2^32.
Returns the largest integer value lesser than or equal to the argument.
GREATEST(attr_json.some_array)
function takes a JSON array as the argument, and returns the greatest value in that array. Also works for MVA.
Returns the result of an integer division of the first argument by the second argument. Both arguments must be of an integer type.
LEAST(attr_json.some_array)
function takes a JSON array as the argument, and returns the least value in that array. Also works for MVA.
Returns the natural logarithm of the argument (with the base of e=2.718...).
Returns the common logarithm of the argument (with the base of 10).
Returns the binary logarithm of the argument (with the base of 2).
Returns the larger of two arguments.
Returns the smaller of two arguments.
Returns the first argument raised to the power of the second argument.
Returns a random float between 0 and 1. It can optionally accept a seed
, which can be a constant integer or an integer attribute's name.
If you use a seed
, keep in mind that it resets rand()
's starting point separately for each plain table, RT disk, RAM chunk, or pseudo shard. Therefore, queries to a distributed table in any form can return multiple identical random values.
Returns the sine of the argument.
Returns the square root of the argument.
BM25A(k1,b)
returns the exact BM25A()
value. Requires the expr
ranker and enabled index_field_lengths
. Parameters k1
and b
must be floats.
BM25F(k1, b, {field=weight, ...})
returns the exact BM25F()
value and requires index_field_lengths
to be enabled. The expr
ranker is also necessary. Parameters k1
and b
must be floats.
Substitutes non-existent columns with default values. It returns either the value of an attribute specified by 'attr-name', or the 'default-value' if that attribute does not exist. STRING or MVA attributes are not supported. This function is useful when searching through multiple tables with different schemas.
SELECT *, EXIST('gid', 6) as cnd FROM i1, i2 WHERE cnd>5
Returns the sort key value of the worst-ranked element in the current top-N matches if the sort key is a float, and 0 otherwise.
Returns the weight of the worst-ranked element in the current top-N matches.
PACKEDFACTORS()
can be used in queries to display all calculated weighting factors during matching or to provide a binary attribute for creating a custom ranking UDF. This function only works if the expression ranker is specified and the query is not a full scan; otherwise, it returns an error. PACKEDFACTORS()
can take an optional argument that disables ATC ranking factor calculation: PACKEDFACTORS({no_atc=1})
. Calculating ATC significantly slows down query processing, so this option can be useful if you need to see the ranking factors but don't require ATC. PACKEDFACTORS()
can also output in JSON format: PACKEDFACTORS({json=1})
. The respective outputs in either key-value pair or JSON format are shown below. (Note that the examples below are wrapped for readability; actual returned values would be single-line.)
mysql> SELECT id, PACKEDFACTORS() FROM test1
-> WHERE MATCH('test one') OPTION ranker=expr('1') \G
*************************** 1\. row ***************************
id: 1
packedfactors(): bm25=569, bm25a=0.617197, field_mask=2, doc_word_count=2,
field1=(lcs=1, hit_count=2, word_count=2, tf_idf=0.152356,
min_idf=-0.062982, max_idf=0.215338, sum_idf=0.152356, min_hit_pos=4,
min_best_span_pos=4, exact_hit=0, max_window_hits=1, min_gaps=2,
exact_order=1, lccs=1, wlccs=0.215338, atc=-0.003974),
word0=(tf=1, idf=-0.062982),
word1=(tf=1, idf=0.215338)
1 row in set (0.00 sec)
mysql> SELECT id, PACKEDFACTORS({json=1}) FROM test1
-> WHERE MATCH('test one') OPTION ranker=expr('1') \G
*************************** 1\. row ***************************
id: 1
packedfactors({json=1}):
{
"bm25": 569,
"bm25a": 0.617197,
"field_mask": 2,
"doc_word_count": 2,
"fields": [
{
"lcs": 1,
"hit_count": 2,
"word_count": 2,
"tf_idf": 0.152356,
"min_idf": -0.062982,
"max_idf": 0.215338,
"sum_idf": 0.152356,
"min_hit_pos": 4,
"min_best_span_pos": 4,
"exact_hit": 0,
"max_window_hits": 1,
"min_gaps": 2,
"exact_order": 1,
"lccs": 1,
"wlccs": 0.215338,
"atc": -0.003974
}
],
"words": [
{
"tf": 1,
"idf": -0.062982
},
{
"tf": 1,
"idf": 0.215338
}
]
}
1 row in set (0.01 sec)
This function can be used to implement custom ranking functions in UDFs, as in:
SELECT *, CUSTOM_RANK(PACKEDFACTORS()) AS r
FROM my_index
WHERE match('hello')
ORDER BY r DESC
OPTION ranker=expr('1');
Where CUSTOM_RANK()
is a function implemented in a UDF. It should declare a SPH_UDF_FACTORS
structure (defined in sphinxudf.h), initialize this structure, unpack the factors into it before usage, and deinitialize it afterwards, as follows:
SPH_UDF_FACTORS factors;
sphinx_factors_init(&factors);
sphinx_factors_unpack((DWORD*)args->arg_values[0], &factors);
// ... can use the contents of factors variable here ...
sphinx_factors_deinit(&factors);
PACKEDFACTORS()
data is available at all query stages, not just during the initial matching and ranking pass. This enables another particularly interesting application of PACKEDFACTORS()
: re-ranking.
In the example above, we used an expression-based ranker with a dummy expression and sorted the result set by the value computed by our UDF. In other words, we used the UDF to rank all our results. Now, let's assume for the sake of an example that our UDF is extremely expensive to compute, with a throughput of only 10,000 calls per second. If our query matches 1,000,000 documents, we would want to use a much simpler expression to do most of our ranking in order to maintain reasonable performance. Then, we would apply the expensive UDF to only a few top results, say, the top 100 results. In other words, we would build the top 100 results using a simpler ranking function and then re-rank those with a more complex one. This can be done with subselects:
SELECT * FROM (
SELECT *, CUSTOM_RANK(PACKEDFACTORS()) AS r
FROM my_index WHERE match('hello')
OPTION ranker=expr('sum(lcs)*1000+bm25')
ORDER BY WEIGHT() DESC
LIMIT 100
) ORDER BY r DESC LIMIT 10
In this example, the expression-based ranker is called for every matched document to compute WEIGHT()
, so it gets called 1,000,000 times. However, the UDF computation can be postponed until the outer sort, and it will only be performed for the top 100 matches by WEIGHT()
, according to the inner limit. This means the UDF will only be called 100 times. Finally, the top 10 matches by UDF value are selected and returned to the application.
For reference, in a distributed setup, the PACKEDFACTORS()
data is sent from the agents to the master node in binary format. This makes it technically feasible to implement additional re-ranking passes on the master node if needed.
When used in SQL but not called from any UDFs, the result of PACKEDFACTORS()
is formatted as plain text, which can be used to manually assess the ranking factors. Note that this feature is not currently supported by the Manticore API.
REMOVE_REPEATS ( result_set, column, offset, limit )
- removes repeated adjusted rows with the same 'column' value.
SELECT REMOVE_REPEATS((SELECT * FROM dist1), gid, 0, 10)
Note that REMOVE_REPEATS
does not affect total_found
in the search query meta info.
The WEIGHT()
function returns the calculated matching score. If no ordering is specified, the result is sorted in descending order by the score provided by WEIGHT()
. In this example, we order first by weight and then by an integer attribute.
The search above performs a simple matching, where all words need to be present. However, we can do more (and this is just a simple example):
mysql> SELECT *,WEIGHT() FROM testrt WHERE MATCH('"list of business laptops"/3');
+------+------+-------------------------------------+---------------------------+----------+
| id | gid | title | content | weight() |
+------+------+-------------------------------------+---------------------------+----------+
| 1 | 10 | List of HP business laptops | Elitebook Probook | 2397 |
| 2 | 10 | List of Dell business laptops | Latitude Precision Vostro | 2397 |
| 3 | 20 | List of Dell gaming laptops | Inspirion Alienware | 2375 |
| 5 | 30 | List of ASUS ultrabooks and laptops | Zenbook Vivobook | 2375 |
+------+------+-------------------------------------+---------------------------+----------+
4 rows in set (0.00 sec)
mysql> SHOW META;
+----------------+----------+
| Variable_name | Value |
+----------------+----------+
| total | 4 |
| total_found | 4 |
| total_relation | eq |
| time | 0.000 |
| keyword[0] | list |
| docs[0] | 5 |
| hits[0] | 5 |
| keyword[1] | of |
| docs[1] | 4 |
| hits[1] | 4 |
| keyword[2] | business |
| docs[2] | 2 |
| hits[2] | 2 |
| keyword[3] | laptops |
| docs[3] | 5 |
| hits[3] | 5 |
+----------------+----------+
16 rows in set (0.00 sec)
Here, we search for four words, but a match can occur even if only three of the four words are found. The search will rank documents containing all words higher.
The ZONESPANLIST()
function returns pairs of matched zone spans. Each pair contains the matched zone span identifier, a colon, and the order number of the matched zone span. For example, if a document reads <emphasis role="bold"><i>text</i> the <i>text</i></emphasis>
, and you query for 'ZONESPAN:(i,b) text'
, then ZONESPANLIST()
will return the string "1:1 1:2 2:1"
, meaning that the first zone span matched "text" in spans 1 and 2, and the second zone span in span 1 only.
QUERY()
returns the current search query. QUERY()
is a postlimit expression and is intended to be used with SNIPPET().
Table functions are a mechanism for post-query result set processing. Table functions take an arbitrary result set as input and return a new, processed set as output. The first argument should be the input result set, but a table function can optionally take and handle more arguments. Table functions can completely change the result set, including the schema. Currently, only built-in table functions are supported. Table functions work for both outer SELECT
and nested SELECT.