Manticore's data types can be split into full-text fields and attributes.
Full-text fields are indexed and can be searched for keywords. They cannot be used in filtering, sorting or grouping. However, original document content can be retrieved and used in result set highlighting.
Full-text fields are represented by the Text
data type. All the other data types are attributes.
Attributes are additional values associated with each document that can be used to perform additional filtering and sorting during search.
It is often desired to additionally process full-text search results based not only on matching document ID and its rank, but on a number of other per-document values as well. For instance, one might need to sort news search results by date and then relevance, or search through products within specified price range, or limit blog search to posts made by selected users, or group results by month. To do that efficiently, Manticore allows to attach a number of additional attributes to each document. It's then possible to use stored values to filter, sort, or group full-text matches.
Attributes, unlike the fields, are not full-text indexed. They are stored in the index, but it is not possible to search them as full-text.
A good example for attributes would be a forum posts index. Assume that only title and content fields need to be full-text searchable - but that sometimes it is also required to limit search to a certain author or a sub-forum (ie. search only those rows that have some specific values of author_id or forum_id); or to sort matches by post_date column; or to group matching posts by month of the post_date and calculate per-group match counts.
- SQL
- HTTP
- PHP
- config
create table forum(title text, content text, author_id int, forum_id int, post_date timestamp);
This example shows running a full-text query filtered by author_id
, forum_id
and sorted by post_date
.
- SQL
- HTTP
- PHP
select * from forum where author_id=123 and forum_id in (1,3,7) order by post_date desc
Below is the list of data types supported by Manticore Search:
The identifier of a document in the index. Document IDs must be unique signed positive non-zero 64-bit integers. Note that no negative or zero values are allowed. Document IDs are implicit and have no declaration. Update operation is forbidden on document ids.
Text data type forms the full-text part of the index. Text fields are indexed and can be searched for keywords.
Text is passed through an analyzer pipeline that converts the text to words, applies morphology transformations etc. Eventually a full-text index (a special data structure that enables quick searches for a keyword) gets built from that text.
Full-text fields can only be used in MATCH()
clause and cannot be used for sorting or aggregation. Words are stored in an inverted index along with references to the fields they belong and positions in the field. This allows to search a word inside each field and to use advanced operators like proximity. By default the original text of the fields is both indexed and stored in document storage. It means that the original text can be returned with the query results and it can be used in search result highlighting.
- SQL
- HTTP
- PHP
- config
create table products(title text);
This behavior can overridden by explicitly specifying that the text is only indexed.
- SQL
- HTTP
- PHP
- config
create table products(title text indexed);
Fields are named, and you can limit your searches to a single field (e.g. search through "title" only) or a subset of fields (eg. to "title" and "abstract" only). Manticore index format generally supports up to 256 fields.
- SQL
- HTTP
- PHP
select * from products where match('@title first');
Unlike full-text fields, string attributes are stored as they are received and cannot be used in full-text searches. Instead they are returned in results, they can be used in WHERE
clause for comparison filtering or REGEX
and they can be used for sorting and aggregation. In general it's not recommended to store large texts in string attributes, but use string attributes for metadata like names, titles, tags, keys.
- SQL
- HTTP
- PHP
- config
create table products(title text, keys string);
You can create a fulltext field that is also stored as a string attribute. This approach creates a fulltext field and a string attribute that have the same name. Note that you can't add a stored
property to store the data as a a string attribute and in the document storage at the same time.
- SQL
- HTTP
- PHP
- config
string attribute indexed
means that we're working with a string data type that is stored as an attribute and indexed as a full-text field.
create table products ( title string attribute indexed );
Integer type allows storing 32 bit unsigned integer values.
- SQL
- HTTP
- PHP
- config
create table products(title text, price int);
Integers can be stored in shorter sizes than 32 bit by specifying a bit count. For example if we want to store a numeric value which we know is not going to be bigger than 8, the type can be defined as bit(3)
. Bitcount integers perform slower than the full size ones, but they require less RAM. They are saved in 32-bit chunks, so in order to save space they should be grouped at the end of attributes definitions (otherwise a bitcount integer between 2 full-size integers will occupy 32 bits as well).
- SQL
- HTTP
- PHP
- config
create table products(title text, flags bit(3), tags bit(2) );
Big integers are 64-bit wide signed integers.
- SQL
- HTTP
- PHP
- config
create table products(title text, price bigint );
Timestamp type represents unix timestamps which is stored as a 32-bit integer. The difference is that time and date functions are available for the timestamp type.
- SQL
- HTTP
- PHP
- config
create table products(title text, date timestamp);
Real numbers are stored as 32-bit IEEE 754 single precision floats.
- SQL
- HTTP
- PHP
- config
create table products(title text, coeff float);
Unlike integer types, equal comparison of floats is forbidden due to rounding errors. A near equal can be used instead, by checking the absolute error margin.
- SQL
- HTTP
- PHP
select abs(a-b)<=0.00001 from products
Another alternative, which can also be used to perform IN(attr,val1,val2,val3)
is to compare floats as integers by choosing a multiplier factor and convert the floats to integers in operations. Example illustrates modifying IN(attr,2.0,2.5,3.5)
to work with integer values.
- SQL
- HTTP
- PHP
select in(ceil(attr*100),200,250,350) from products
This data type allows storing JSON objects for schema-less data.
- SQL
- HTTP
- PHP
- config
create table products(title text, data json);
JSON properties can be used in most operations. There are also special functions such as ALL(), ANY(), GREATEST(), LEAST() and INDEXOF() that allow traversal of property arrays.
- SQL
- HTTP
- PHP
select indexof(x>2 for x in data.intarray) from products
Text properties are treated same as strings so it's not possible to use them in full-text matches expressions, but string functions like REGEX() can be used.
- SQL
- HTTP
- PHP
select regex(data.name, 'est') as c from products where c>0
In case of JSON properties, enforcing data type is required to be casted in some situations for proper functionality. For example in case of float values DOUBLE() must be used for proper sorting.
- SQL
- HTTP
- PHP
select * from products order by double(data.myfloat) desc
Multi-value attributes allow storing variable-length lists of 32-bit unsigned integers. It can be used to store one-to-many numeric values like tags, product categories, properties.
- SQL
- HTTP
- PHP
- config
create table products(title text, product_codes multi);
It supports filtering and aggregation, but not sorting. Filtering can made of condition that requires at least one element to pass (using ANY()) or all (ALL()).
- SQL
- HTTP
- PHP
select * from products where any(product_codes)=3
Information like least or greatest element and length of the list can be extracted. An example shows ordering by the least element of a multi-value attribute.
- SQL
- HTTP
- PHP
select least(product_codes) l from products order by l asc
When grouping by multi-value attribute, a document will contribute to as many groups as there are different values associated with that document. For instance, if the collection contains exactly 1 document having a 'product_codes' multi-value attribute with values 5, 7, and 11, grouping on 'product_codes' will produce 3 groups with COUNT(*)
equal to 1 and GROUPBY()
key values of 5, 7, and 11 respectively. Also note that grouping by multi-value attributes might lead to duplicate documents in the result set because each document can participate in many groups.
- SQL
insert into products values ( 1, 'doc one', (5,7,11) );
select id, count(*), groupby() from products group by product_codes;
Query OK, 1 row affected (0.00 sec)
+------+----------+-----------+
| id | count(*) | groupby() |
+------+----------+-----------+
| 1 | 1 | 11 |
| 1 | 1 | 7 |
| 1 | 1 | 5 |
+------+----------+-----------+
3 rows in set (0.00 sec)
The order of the numbers inserted as values of multi-valued attributes is not preserved. Values are stored internally as a sorted set.
- SQL
- HTTP
- PHP
insert into product values (1,'first',(4,2,1,3));
select * from products;
Query OK, 1 row affected (0.00 sec)
+------+---------------+-------+
| id | product_codes | title |
+------+---------------+-------+
| 1 | 1,2,3,4 | first |
+------+---------------+-------+
1 row in set (0.01 sec)
A data type type that allows storing variable-length lists of 64-bit signed integers. It has the same functionality as multi-value integer.
- SQL
- HTTP
- PHP
- config
create table products(title text, values multi64);