Compiling from sources can be used for custom build configurations, such as disabling some features, adding new or testing patches, if you want to contribute. For example, you can compile from sources disabling embedded ICU, if you want to replace it with another one installed in your system with possibility to upgrade it independently from Manticore.
In our CI/CD pipeline Manticore Search is compiled using these docker images, so instead of reading all the below you might want to master them and make modifications that are important for you.
- C++ compiler
- in Linux - GNU gcc (4.7.2 and above) or clang can be used
- in Windows - Microsoft Visual Studio 2015 and above (community edition is enough)
- on Mac OS - XCode
- Cmake - used on all the platforms (version 3.13 or above required)
Manticore consists of different tools. The main one is Manticore search server -
searchd. Features available in the server depend on different third-party libraries:
- SSL (for HTTPS implementation)
- Galera (for replication)
- Stemmer (for language stemming)
- ICU (for support of segmentation for CJK languages)
- RE2 (for regular expressions)
Some features, like support of AOT lemmatization and basic stemming are embedded and don't need any libraries.
Another big tool is
indexer, which creates plain indexes from different sources. There may be different external storages. The list of libraries which indexer can use (and depend from) includes:
- odbc and others.
None of them are mandatory by default, but they are obviously necessary if you want to index a source, provided by a particular storage.
Internally manticore consists of big library 'libsphinx' with which different tools are linked. Because of the fact, that some dependencies are mandatory for that library, they also apply to all the tools despite the fact whether they are used or not there. The common dependencies are:
- By default, only indexing of mysql sources is expected. So, the configuration script will search for mysql dev client lib, and if nothing found, will fail. To have the possibility of indexing mysql, you need at least dev version of MySQL library. Usually it is provided in a package named
mysql-devel, depending on Linux flavour you use. Also, different derivatives, as dev package from mariadb (which are
mariadb-devel) might be ok. If you have mysql or derivative installed by some custom path, set env variable
MYSQL_DIRto that path for configuration. Configuration will look for available program
mysql_config, and use data, provided by it. OR will look for header
mysqlclient, if no
mysql_configprogram found. If you're not going to use mysql sources you can explicitly set
-DWITH_MYSQL=NOas config parameter.
bison- needed if the sources are cloned from git repository. If you use official tarball with sources, they are not necessary (git is used to pick commit hash). Flex and bison are used to build parsers. In tarball sources the version is hardcoded, and parses also pre-builded into C-sources, so these tools are not necessary.
- RE2 - used for regexp_filter feature. For using the feature Manticore must be configured with
-DWITH_RE2=1, otherwise it will not be available. If configured, system-wide RE2 will be searched, and if nothing found, configuration will download RE2 sources and build the feature as embedded.
- stemmer - used for additional language stemmers. Might be configured by
-DWITH_STEMMER=1. If required, will be searched in the system and linked. If not found - configuration will download snowball sources and build the feature as embedded.
- ICU - for CJK languages. It replaces previous RLP platform, also used for that purpose. By default the ICU is configured as embedded, and will be built from sources. ICU, RE2 and stemmer may be either searched and used as shared libraries, provided by your system or explicitly build from sources and statically linked forever. The first option makes the binaries smaller and more flexible for upgrade (that is simple: upgrade a library in the system and take all benefits/fixes of the upgrade). By default the RE2 and stemmer libraries supposed to be used from system, and ICU configured to be built as static from sources. You can manually tune that behaviour by providing boolean options
- Indexing of postgresql is supported by
libpq-devpackages. Absence of these packages is not fatal, your tool just will not be able to index postgresql source then. The feature can be switched on using
- Indexing of xmlpipe is supported by
expatlibrary, provided by
libexpat-devpackage. The feature is automatically switched on or off depending on availability of expat library. It ca also be manually tuned with help of option
- Indexing of generic ODBC source is supported by either
unixodbclibraries that are provided by
unixodbc-devor iodbc alternatively. The feature is automatically switched on or off depending on availability of client library. Use
-DWITH_ODBC=to tune it manually.
When used with a source indexer will try to dynamically load the necessary runtime library, and if nothing is available it will report an error. So, it is reasonable to provide all dev packages for building, and then in runtime provide only actually necessary client libraries.
For the server these dependencies may be in play:
- If you're going to work over https - you need dev version of ssl lib. Usually it comes in a package named like
openssl-devel(redhat-based). That is mandatory for support of https, but not strictly mandatory for the server (i.e. no ssl means no possibility to connect by https, but other protocols will work). SSL library versions starting from 1.0.2 to 1.1.1 may be used by Manticore, however note that for the sake of security it's highly recommended to use the freshest possible SSL library. For now only v1.1.1 is supported, the rest are outdated (see openssl release strategy for details)
- if you want replication functionality - Galera library has to be built. It will be downloaded and included into the build, however it requires by itself
libboost-all-devon debian-based, or
boost-develon redhat should be enough. Also, it requires ssl (despite of the requirements of the server).
Manticore sources are hosted on github. Clone the repo, then checkout desired branch or tag. Our public git workfow contains only main
master branch, which represents bleeding-edge of development. On release we create a versioned tag, like
3.6.0, and start a new branch for current release, in this case
manticore-3.6.0. The head of the versioned branch after all changes is used as source to build all binary releases. For example, to take sources of version 3.6.0 you can run:
git clone https://github.com/manticoresoftware/manticoresearch.git cd manticoresearch git checkout manticore-3.6.0
When using sources from GitHub you'll need
bison tools, since all internal parsers are provided as lex/yacc sources.
Tarballs are available here. Look for "Source tar.gz". Those provided by github 'Source code' archives are not what you want, so avoid using them (mainly they lack the git version which we use to make version string). The tarball sources have pre-built lexers and parsers, so flex and bison tools are not required for a build.
wget -c https://repo.manticoresearch.com/repository/manticoresearch_source/release/manticore-3.6.0-210504-96d61d8-release-source.tar.gz tar -zxf manticore-3.6.0-*.tar.gz cd manticore-3.6.0-*
Manticore uses cmake for pre-compiling configuration. To use it make a build directory somewhere, go to it, then invoke cmake, pointing it to the source dir. Simplest is to create the build directory inside unpacked sources.
cd manticore-3.6.0 mkdir build && cd build cmake -DWITH_MYSQL=1 -DWITH_RE2=1 ..
The cmake script will investigate available features and configure the build according to them.
- If you want to configure without mysql, provide explicitly
-DWITH_MYSQL=0, otherwise configuring will fail if mysql is absent.
- If you want to finally build full-featured RPM package, path to build directory must be long enough in order to correctly build debug symbols. Like
/manticore012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789, for example. That is because RPM tools modify the path over compiled binaries when building debug info, and it can just write over existing room and won't allocate more. Above mentioned long path has 100 chars and that is quite enough for such case.
Options may be either provided in the form
-DNAME=VALUE or via any available GUI or TUI interface interactively. On console systems usually together with
cmake is also provided
ccmake tool, which provides friendly access to the configuration.
CMAKE_BUILD_TYPE(string) - can be Debug, Release, MinSizeRel and RelWithDebInfo (default). Usually we use just 'Debug' and 'RelWithDebInfo'. The first produces slower binaries, well applicable for testing and debugging features. The last is for production.
CMAKE_INSTALL_PREFIX(path) - where to install the project. Building itself installs nothing, but prepares installation rules which are executed once you run 'make install' command, or create a package. It has a default depending on the cmake (on Linux usually /usr/local).
DISABLE_TESTING(bool) whether to support testing. If enabled, after the build you cam run 'ctest' and test the build. Note that testing implies additional dependencies, like at least presence of PHP cli, python and available mysql server with test database. For 'just build', just disable the option, you don't need to care about it.
USE_FLEX(bool) - enabled by default, specifies whether to enable bison and flex tools. You can disable them building from tarball, as the necessary files already pre-generated and packed inside
LIBS_BUNDLE- path to a folder with different libraries. This is mostly relevant for Windows building, but may be also helpful if you build quite often, in order to avoid downloading third-party sources each time.
DISTR_BUILD- in case the target is packaging, it specifies the target operating system. Supported values are:
DISTR_BUILDwill cause to configure CMAKE_BUILD_TYPE as RelWithDebInfo, and WITH_MYSQL, WITH_EXPAT, WITH_PGSQL, WITH_RE2, WITH_STEMMER, and DISABLE_TESTING as 1. This option is intended for building packages. For example, running
cmake -DDISTR_BUILD=rhel8, then
make packagewill build RPM packaged for Red Hat Enterprise 8.
One line to get all dependencies on Debian/Ubuntu:
apt-get install build-essential cmake unixodbc-dev libpq-dev libexpat-dev libmysqlclient-dev libicu-dev libssl-dev libboost-system-dev libboost-program-options-dev git flex bison
Note: on Debian 9 (Stretch) package
libmysqlclient-dev is absent. Use
default-libmysqlclient-dev there instead.
One line to get all dependencies on Redhat/Centos:
yum install gcc gcc-c++ make cmake mysql-devel expat-devel postgresql-devel unixODBC-devel libicu-devel openssl-devel boost-devel rpm-build systemd-units git flex bison
RHEL/CentOS 6 ships with an old version of the GCC compiler, which doesn't support
-std=c++11 flag, for compiling use
wget http://people.centos.org/tru/devtools-2/devtools-2.repo -O /etc/yum.repos.d/devtools-2.repo yum upgrade -y yum install -y devtoolset-2-gcc devtoolset-2-binutils devtoolset-2-gcc-c++ export PATH=/opt/rh/devtoolset-2/root/usr/bin:$PATH
To compile run:
To install run:
make -j4 install
For building package use target
package. It will build package according to selection, provided by
-DDISTR_BUILD option. By default it will be a simple zip archive with binaries and supplementary files.
make -j4 package
For building source tarball for future, use target
tarball. All lexers and stemmers will be created running flex/bison. Then the current version will be embedded into a header file, stemmer/re2 will be embedded into sources (if configured), and the final folder will be packed into the tarball for distribution.
For preparing official packages we use docker containers. They include all necessary environment components and are proved as working solutions by our own builds. You can recreate any of them using Dockerfiles and
README.md instruction, provided in
dist/build_dockers/ folder of the sources. That is easiest way to make the binaries for any supported Linux distribution, and also make packages there. Each docker provides
DISTR environment variable, which can be passed directly to
DISTR_BUILD config value (as
-DDISTR_BUILD=$DISTR clause to cmake).
For example, to create RedHat 7 package 'as official', but without embedded ICU with it's big datafile, you may execute (implies that sources are placed in
/manticore/sources folder of the host):
docker run -it --rm -v /manticore/sources:/manticore registry.gitlab.com/manticoresearch/dev/bionic_cmake314 bash # following is inside docker shell. By default, workdir will be in the source folder, mounted as volume from the host. RELEASE_TAG="noicu" mkdir build && cd build cmake -DBUILD_TAG=$RELEASE_TAG -DDISTR_BUILD=$DISTR -DWITH_ICU_FORCE_STATIC=0 .. make -j4 package
For building on Windows you need:
- Visual Studio
- Cmake for Windows
- prebuilt Expat, MySQL and PostgreSQL in bundle directory.
If you build from git clone, you also need to provide
bison tools. They may be found in
cygwin framework. When building from tarball these tools are not necessary. Building might be performed from
cmd or from cygwin console.
For a simple building on x64:
mkdir build cd build cmake -G "Visual Studio 16 2019" -DLIBS_BUNDLE="C:\bundle" "C:\manticore" cmake -DWITH_PGSQL=1 -DWITH_RE2=1 -DWITH_STEMMER=1 . cmake --build . --target package --config RelWithDebInfo
Support for FreeBSD is limited and successful compiling is not guaranteed. We recommend checking the issue tracker for unresolved issues on this platform before trying to compile latest versions.
FreeBSD uses clang instead of gcc as system compiler and it is installed by default.
First install required packages:
pkg install cmake bison flex
To compile a version without optional dependencies:
cmake -DUSE_GALERA=0 -DWITH_MYSQL=0 -DDISABLE_TESTING=1 ../manticoresearch/ make
Except for Galera the rest of optional dependencies can be installed so:
pkg install mariadb103-client postgresql-libpqxx unixODBC icu expat
(you can replace
mariadb103-client with the MySQL client package of your choice)
Building with all optional features and installation system-wide:
cmake -DUSE_GALERA=0 -DWITH_PGSQL=1 -DDISABLE_TESTING=1 -DCMAKE_INSTALL_PREFIX=/ -DCMAKE_INSTALL_LOCALSTATEDIR=/var ../manticoresearch/ make make install
If you didn't change the path for sources and build, just move to you build folder and run:
cmake . make clean make
If by any reason it doesn't work, you can delete file
CMakeCache.txt located in the build folder. After this step you have to run cmake again, pointing to the source folder and configuring the options.
If it also doesn't help, just wipe out your build folder and begin from scratch.
Manticore Search 2.x maintains compatibility with Sphinxsearch 2.x and can load existing indexes created by Sphinxsearch. In most cases, upgrading is just a matter of replacing the binaries.
Instead of sphinx.conf (in Linux normally located at
/etc/sphinxsearch/sphinx.conf) Manticore by default uses
/etc/manticoresearch/manticore.conf. It also runs under a different user and use different folders.
Systemd service name has changed from
manticore and the service runs under user
manticore (Sphinx was using
sphinxsearch). It also uses a different folder for the PID file.
The folders used by default are
/var/run/manticore. You can still use the existing Sphinx config, but you need to manually change permissions for
/var/log/sphinxsearch folders. Or, just rename globally 'sphinx' to 'manticore' in system files. If you use other folders (for data, wordforms files etc.) the ownership must be also switched to user
pid_file location should be changed to match the manticore.service to
If you want to use the Manticore folder instead, the index files need to be moved to the new data folder (
/var/lib/manticore) and the permissions to be changed to user
Upgrading from Sphinx / Manticore 2.x to 3.x is not straightforward, because the index storage engine received a massive upgrade and the new searchd can't load older indexes and upgrade them to new format on-the-fly.
Manticore Search 3 got a redesigned index storage. Indexes created with Manticore/Sphinx 2.x cannot be loaded by Manticore Search 3 without a conversion. Because of the 4GB limitation, a real-time index in 2.x could still have several disk chunks after an optimize operation. After upgrading to 3.x, these indexes can now be optimized to 1-disk chunk with the usual OPTIMIZE command. Index files also changed. The only component that didn't get any structural changes is the
.spp file (hitlists).
.sps (strings/json) and
.spm (MVA) are now held by
.spb (var-length attributes). The new format has an
.spm file present, but it's used for row map (previously it was dedicated for MVA attributes). The new extensions added are
.spt (docid lookup),
.sphi ( secondary index histograms),
.spds (document storage). In case you are using scripts that manipulate index files, they should be adapted for the new file extensions.
The upgrade procedure may differ depending on your setup (number of servers in the cluster, whether you have HA or not etc.), but in general it's about creating new 3.x index versions and replacing your existing ones with them along with replacing older 2.x binaries with the new ones.
There are two special requirements to take care:
- Real-time indexes need to be flushed using FLUSH RAMCHUNK
- Plain indexes with kill-lists require adding a new directive in index configuration (see killlist_target)
Manticore Search 3 includes a new tool - index_converter - that can convert Sphinx 2.x / Manticore 2.x indexes to 3.x format.
index_converter comes in a separate package which should be installed first. Using the convert tool create 3.x versions of your indexes.
index_converter can write the new files in the existing data folder and backup the old files or it can write the new files to a chosen folder.
If you have a single server:
- install manticore-converter package
- use index_converter to create new versions of the indexes in a different folder than the existing data folder ( using –output-dir option)
- stop existing Manticore/Sphinx, upgrade to 3.0, move the new indexes to data folder, start Manticore
To get a minimal downtime, you can copy 2.x indexes, config (you'll need to edit paths here for indexes, logs and different ports) and binaries to a separate location and start this on a separate port and point your application to it. After upgrade is made to 3.0 and the new server is started, you can point back the application to the normal ports. If all is good, stop the 2.x copy and delete the files to free the space.
If you have a spare box (like a testing or staging server), you can do there first the index upgrade and even install Manticore 3 to perform several tests and if everything is ok copy the new index files to the production server. If you have multiple servers which can be pulled out from production, do it one by one and perform the upgrade on each. For distributed setups, 2.x searchd can work as a master with 3.x nodes, so you can do upgrading on the data nodes first and at the end the master node.
There have been no changes made on how clients should connect to the engine or any change in querying mode or queries behavior.
Kill-lists have been redesigned in Manticore Search 3. In previous versions kill-lists were applied on the result set provided by each previous searched index on query time.
Thus In 2.x the index order at query time mattered. For example if a delta index had a kill-list in order to apply it against the main index the order had to be main, delta (either in a distributed index or in the FROM clause).
In Manticore 3 kill-lists are applied to an index when it's loaded during searchd startup or gets rotated. New directive killlist_target in index configuration specifies target indexes and defines which doc ids from the source index should be used for suppression. These can be ids from the defined kill-list, actual doc ids of the index or the both.
Documents from the kill-lists are deleted from the target indexes, they are not returned in results even if the search doesn't include the index that provided the kill-lists. Because of that the order of indexes for searching does not matter any more. Now delta,main and main,delta will provide the same results.
In previous versions indexes were rotated following the order from the configuration file. In Manticore 3 index rotation order is much smarter and works in accordance with killlist targets. Before starting to rotate indexes the server looks for chains of indexes by killlist_target definitions. It will then first rotate indexes not referenced anywhere as kill-lists targets. Next it will rotate indexes targeted by already rotated indexes and so on. For example if we do
indexer --all and we have 3 indexes : main, delta_big (which targets at the main) and delta_small (with target at delta_big), first the delta_small is rotated, then delta_big and finally the main. This is to ensure that when a dependent index is rotated it gets the most actual kill-list from other indexes.
docinfo- everything is now extern
inplace_docinfo_gap- not needed anymore
mva_updates_pool- MVAs don’t have anymore a dedicated pool for updates, as now they can be updated directly in the blob (see below).
String, JSON and MVA attributes can be updated in Manticore 3.x using
In 2.x string attributes required
REPLACE, for JSON it was only possible to update scalar properties (as they were fixed-width) and MVAs could be updated using the MVA pool. Now updates are performed directly on the blob component. One setting that may require tuning is attr_update_reserve which allows changing the allocated extra space at the end of the blob used to avoid frequent resizes in case the new values are bigger than the existing values in the blob.
Doc ids used to be UNSIGNED 64-bit integers. Now they are POSITIVE SIGNED 64-bit integers.
Read here about the RT mode
Manticore 3.x recognizes and parses special suffixes which makes easier to use numeric values with special meaning. Common form for them is integer number + literal, like 10k or 100d, but not 40.3s (since 40.3 is not integer), or not 2d 4h (since there are two, not one value). Literals are case-insensitive, so 10W is the same as 10w. There are 2 types of such suffixes currently supported:
- Size suffixes - can be used in parameters that define size of something (memory buffer, disk file, limit of RAM, etc. ) in bytes. "Naked" numbers in that places mean literally size in bytes (octets). Size values take suffix
kfor kilobytes (1k=1024),
mfor megabytes (1m=1024k),
gfor gigabytes (1g=1024m) and
tfor terabytes (1t=1024g).
- Time suffixes - can be used in parameters defining some time interval values like delays, timeouts, etc. "Naked" values for those parameters usually have documented scale, and you must know if their numbers, say, 100, means '100 seconds' or '100 milliseconds'. However instead of guessing you just can write suffixed value and it will be fully determined by it's suffix. Time values take suffix
usfor useconds (microseconds),
dfor days and
index_converter is a tool for converting indexes created with Sphinx/Manticore Search 2.x to Manticore Search 3.x index format. The tool can be used in several different ways:
$ index_converter --config /home/myuser/manticore.conf --index indexname
$ index_converter --config /home/myuser/manticore.conf --all
$ index_converter --path /var/lib/manticoresearch/data --all
New version of the index is written by default in the same folder. Previous version files are saved with .old extension in their name. An exception is .spp (hitlists) file which is the only index component that didn’t have any change in the new format.
You can save the new index version to a different folder using –output-dir option
$ index_converter --config /home/myuser/manticore.conf --all --output-dir /new/path
A special case is for indexes containing kill-lists. As the behaviour of how kill-lists works has changed (see killlist_target), the delta index should know which are the target indexes for applying the kill-lists. There are 3 ways to have a converted index ready for setting targeted indexes for applying kill-lists:
-–killlist-targetwhen converting an index
$ index_converter --config /home/myuser/manticore.conf --index deltaindex --killlist-target mainindex:kl
- Add killlist_target in the configuration before doing the conversion
- use ALTER ... KILLIST_TARGET command after conversion
Here's the complete list of
-c <file>for short) tells index_converter to use the given file as its configuration. Normally, it will look for manticore.conf in the installation directory (e.g.
/usr/local/manticore/etc/manticore.confif installed into
/usr/local/sphinx), followed by the current directory you are in when calling index_converter from the shell.
--indexspecifies which index should be converted
--path- instead of using a config file, a path containing index(es) can be used
--strip-path- strips path from filenames referenced by index: stopwords, exceptions and wordforms
--large-docid- allows to convert documents with ids larger than 2^63 and display a warning, otherwise it will just exit on the large id with an error. This option was added as in Manticore 3.x doc ids are signed bigint, while previously they were unsigned
--output-dir <dir>- writes the new files in a chosen folder rather than the same location as with the existing index files. When this option set, existing index files will remain untouched at their location.
--all- converts all indexes from the config
--killlist-target <targets>- sets the target indexes for which kill-lists will be applied. This option should be used only in conjunction with
You can install and start Manticore easily in Ubuntu, Centos, Debian, Windows and MacOS or use Manticore as a docker container.
wget https://repo.manticoresearch.com/manticore-repo.noarch.deb sudo dpkg -i manticore-dev-repo.noarch.deb sudo apt-key adv --fetch-keys 'http://repo.manticoresearch.com/GPG-KEY-manticore' sudo apt update sudo apt install manticore manticore-columnar-lib sudo systemctl start manticore
By default Manticore is waiting for your connections on:
- port 9306 for MySQL clients
- port 9308 for HTTP/HTTPS connections
- port 9312 for connections from other Manticore nodes and clients based on Manticore binary API
mysql -h0 -P9306
Let's now create an index called "products" with 2 fields:
- title - full-text field which will contain our product's title
- price - of type "float"
create table products(title text, price float) morphology='stem_en';
Query OK, 0 rows affected (0.02 sec)
insert into products(title,price) values ('Crossbody Bag with Tassel', 19.85), ('microfiber sheet set', 19.99), ('Pet Hair Remover Glove', 7.99);
Query OK, 3 rows affected (0.01 sec)
Let's find one of the documents. The query we will use is 'remove hair'. As you can see it finds document with title 'Pet Hair Remover Glove' and highlights 'Hair remover' in it even though the query has "remove", not "remover". This is because when we created the index we turned on using English stemming (
select id, highlight(), price from products where match('remove hair');
+---------------------+-------------------------------+----------+ | id | highlight() | price | +---------------------+-------------------------------+----------+ | 1513686608316989452 | Pet <strong>Hair Remover</strong> Glove | 7.990000 | +---------------------+-------------------------------+----------+ 1 row in set (0.00 sec)
Let's assume we now want to update the document - change the price to 18.5. This can be done by filtering by any field, but normally you know the document id and update something based on that.
update products set price=18.5 where id = 1513686608316989452;
Query OK, 1 row affected (0.00 sec)