Compiling from sources can be used for custom build configurations, such as disabling some features, adding new or testing patches, if you want to contribute. For example, you can compile from sources disabling embedded ICU, if you want to replace it with another one installed in your system with possibility to upgrade it independently from Manticore.
In our CI/CD pipeline Manticore Search is compiled using these docker images, so instead of reading all the below you might want to master them and make modifications that are important for you.
- C++ compiler
- in Linux - GNU (4.7.2 and above) or Clang can be used
- in Windows - Microsoft Visual Studio 2019 and above (community edition is enough)
- on Mac OS - Clang (from command line tools of XCode, use
xcode-select --installto install).
- Bison, Flex - on most of the systems available as packages, on Windows available in cygwin framework.
- Cmake - used on all the platforms (version 3.19 or above required)
Manticore sources are hosted on github. Clone the repo, then
checkout desired branch or tag. Our public git workfow contains only main
master branch, which represents
bleeding-edge of development. On release we create a versioned tag, like
3.6.0, and start a new branch for current
release, in this case
manticore-3.6.0. The head of the versioned branch after all changes is used as source to build
all binary releases. For example, to take sources of version 3.6.0 you can run:
git clone https://github.com/manticoresoftware/manticoresearch.git cd manticoresearch git checkout manticore-3.6.0
You can download desired code from github by using 'download zip' button. Both .zip and .tar.gz are suitable.
wget -c https://github.com/manticoresoftware/manticoresearch/archive/refs/tags/3.6.0.tar.gz tar -zxf 3.6.0.tar.gz cd manticoresearch-3.6.0
Manticore uses cmake. Assume you're staying inside source dir.
mkdir build && cd build cmake ..
The cmake script will investigate available features and configure the build according to them. By default all features considered enabled, if they're available. Also script downloads and build some external libraries assuming you want to use them. Implicitly you get support of maximal number of features.
Also, you can rule configuration explicitly, with flags and options. To demand feature
-DFOO=1 to cmake call.
To disable it - same way,
-DFOO=0. If not explicitly noticed, enabling of not available feature (say,
MS Windows build) will cause configuration to fail with error. Disabling of a feature, apart excluding it from build, also
disables it's investigation on the system, and disables their downloading/building, as it would be done for some external
libs in case of implicit configuration.
USE_SYSLOG - allows to use
syslogin query logging.
WITH_GALERA - support replication on search daemon. Support will be configured for the build. Also, sources of Galera library will be downloaded, built and final module will be included into distribution/installation. Usually it is safe if you build with galera, but not distribute the library itself (so, no galera module - no replication). But sometimes you may need to explicitly disable it. Say, if you want to build static binary which by desing can't load any libraries, so that even presence of call to 'dlopen' function inside daemon will cause link error.
WITH_RE2_FORCE_STATIC - download sources of RE2, compile them and link with them statically, so that final binaries will not depend on presence of shared
RE2library in your system.
WITH_STEMMER - build with using Snowball stemming library.
WITH_STEMMER_FORCE_STATIC - download snowball sources, compile them and link with them statically, so that final binaries will not depend on presence of shared
libstemmerlibrary in your system.
WITH_ICU - build with using icu, International Components for Unicode library. That is used in tokenization of Chineze, for text segmentation. It is in game when morplology like
WITH_ICU_FORCE_STATIC - download icu sources, compile them and link with them statically, so that final binaries will not depend on presence of shared
iculibrary in your system. Also include icu data file into installation/distribution. Purpose of statically linked ICU - is to have the library of known version, so that behaviour is determined and not depends on any system libraries. You most probably would prefer to use system ICU instead, because it may be updated in time without need to recompile manticore daemon. In this case you need to explicitly disable this option. That will also save you some place occupied by icu data file (about 30M), as it will NOT be included into distribution then.
WITH_SSL - used for support https, and also encrypted mysql connections to the daemon. System OpenSSL library will be linked to daemon. That implies, that OpenSSL will be required to start the daemon. That is mandatory for support of https, but not strictly mandatory for the server (i.e. no ssl means no possibility to connect by https, but other protocols will work). SSL library versions starting from 1.0.2 to 1.1.1 may be used by Manticore, however note that for the sake of security it's highly recommended to use the freshest possible SSL library. For now only v1.1.1 is supported, the rest are outdated ( see openssl release strategy
WITH_ZLIB - used by indexer to work with compressed columns from mysql. Used by daemon to provide support of compressed mysql proto.
WITH_ODBC - used by indexer to support indexing sources from ODBC providers (they're typically UnixODBC and iODBC). On MS Windows ODBC is the proper way to work witn MS SQL sources, so indexing of
MSSQLalso implies this flag.
DL_ODBC - don't link with ODBC library. If ODBC is linked, but not available, you can't start
indexertool even if you want to index something not related to ODBC. This option asks indexer to load the library in runtime only when you want to deal with ODBC source.
ODBC_LIB - name of ODBC library file. Indexer will try to load that file when you want to index ODBC source. That option is written automatically from available ODBC shared library investigation. You can also override that name on runtime, providing environment variable
ODBC_LIBwith proper path to alternative library before running indexer.
WITH_EXPAT - used by indexer to support indexing xmlpile sources.
DL_EXPAT - don't link with EXPAT library. If EXPAT is linked, but not available, you can't start
indexertool even if you want to index something not related to xmlpile. This option asks indexer to load the library in runtime only when you want to deal with xmlpile source.
EXPAT_LIB - name of EXPAT library file. Indexer will try to load that file when you want to index xmlpipe source. That option is written automatically from available EXPAT shared library investigation. You can also override that name on runtime, providing environment variable
EXPAT_LIBwith proper path to alternative library before running indexer.
WITH_ICONV - for support different encodings when indexing xmlpipe sources with indexer.
DL_ICONV - don't link with iconv library. If iconv is linked, but not available, you can't start
indexertool even if you want to index something not related to xmlpile. This option asks indexer to load the library in runtime only when you want to deal with xmlpile source.
ICONV_LIB - name of iconv library file. Indexer will try to load that file when you want to index xmlpipe source. That option is written automatically from available iconv shared library investigation. You can also override that name on runtime, providing environment variable
ICONV_LIBwith proper path to alternative library before running indexer.
WITH_MYSQL - used by indexer to support indexing mysql sources.
DL_MYSQL - don't link with mysql library. If mysql is linked, but not available, you can't start
indexertool even if you want to index something not related to mysql. This option asks indexer to load the library in runtime only when you want to deal with mysql source.
MYSQL_LIB - name of mysql library file. Indexer will try to load that file when you want to index mysql source. That option is written automatically from available mysql shared library investigation. You can also override that name on runtime, providing environment variable
MYSQL_LIBwith proper path to alternative library before running indexer.
WITH_POSTGRESQL - used by indexer to support indexing postgresql sources.
DL_POSTGRESQL - don't link with postgresql library. If postgresql is linked, but not available, you can't start
indexertool even if you want to index something not related to postgresql. This option asks indexer to load the library in runtime only when you want to deal with postgresql source.
POSTGRESQL_LIB - name of postgresql library file. Indexer will try to load that file when you want to index postgresql source. That option is written automatically from available postgresql shared library investigation. You can also override that name on runtime, providing environment variable
POSTGRESQL_LIBwith proper path to alternative library before running indexer.
LOCALDATADIR - default path where daemon stores binlog. If that path is not provided or disabled explicitly in daemon's runtime config (that is file
manticore.conf, no way related to this build configuration), binlogs will be placed to this path. It is assumed to be absolute, however that is not strictly necessary, and you may play with relative values also. You most probably would not, however, change default value defined by configuration, which, depending on target system, might be something like
FULL_SHARE_DIR - default path where all assets are stored. It may be overriden by environment variable
FULL_SHARE_DIRbefore starting any tool which uses files from that folder. That is quite important path, as many things are by default expected there. That are - predefined charset tables, stopwords, manticore modules and icu data files - all placed into that folder. Configuration script usually determines that path to be something like
DISTR_BUILD - shortcut of the options for releasing packages. That is string value with the name of the target platform. It may be used instead of manually configuring all the stuff. On debian and redhat linuxes default falue might be determined by light introspection and set to generic 'debian' or 'rhel'. Otherwize value is not defined.
PACK - even more shortcut. It reads
DISTRenvironment variable, assigns it to DISTR_BUILD param and then works as usual. That is very useful when building in prepared build systems, like docker containers, where that
DISTRvariable is set on system level and reflects target system for which such container intended.
CMAKE_INSTALL_PREFIX (path) - where manticore except itself installed. Building installs nothing, but prepares installation rules which are executed once you run
cmake --installcommand, or create a package and then install it. Prefix may be freely changed anytime, even during install - by invoking
cmake --install . --prefix /path/to/installation. However, at config time this variable once used to initialize default values of
FULL_SHARE_DIR. So, for example, setting it to
/my/customat configure time will hardcode
BUILD_TESTING (bool) whether to support testing. If enabled, after the build you can run 'ctest' and test the build. Note that testing implies additional dependencies, like at least presence of PHP cli, python and available mysql server with test database. By default this param is on. So, for 'just build', you might want to disable the option by explicitly specifying 'off' value.
LIBS_BUNDLE - path to a folder with different libraries. This is mostly relevant for Windows building, but may be also helpful if you build quite often, in order to avoid downloading third-party sources each time. That path is never modified by configuring script in default behaviour; you should put everything there by youself. When, say, we want support of stemmer - the sources will be downloaded from snowball homepage, then extracted, configured, built, etc. Originall source tarball (which is
libstemmer_c.tgz) you may store to that folder. Next time you want to build from scratch, configure script looks first to the bundle, and if it found stemmer there, it will not download it again from internet.
CACHEB - path to a folder with stored builds of 3-rd party libraries. Usually features like galera, re2, icu, etc. first downloaded or being got from bundle, then unpacked, built and installed into temporary internal folder. When building manticore that folder is then used as the place where the things required to support asked feature are live. Finally they either link with manticore, if it is library; either go directly to distribution/installation (like galera or icu data). When CACHEB is defined either as cmake config param, either as system environment variable, it is used as target folder for that builds. This folder might be kept across builds, so that stored libraries there will not be rebuilt anymore, making whole build process much shorter.
Note, that some options organized in triples:
XXX_LIB - like support of mysql, odbc, etc.
whether next two has effect or not. I.e., if you set
0 - there is no sence to provide
ODBC_LIB, and these two
will have no effect if whole feature is disabled. Also,
XXX_LIB has no sense without
DL_XXX, because if you don't want
DL_XXX option, dynamic loading will not be used, and name provided by
XXX_LIB is useless. That is used by default introspection.
iconv library assumes
expat and is useless if last is disabled.
Also, some libraries may be always available, and so, there is no sence to avoid linkage with them. For example, in windows that is ODBC. On Mac Os that is Expat, iconv and m.b. others. Default introspection determines such libraries and effectively emits only
WITH_XXX for them, without
XXX_LIB, that makes the things simpler.
With some options in game configuring might look like:
mkdir build && cd build cmake -DWITH_MYSQL=1 -DWITH_RE2=1 ..
Apart general configuration values, you may also investigate file
CMakeCache.txt which is left in build folder right after you run configuration. Any values defined there might be redefined explicitly when running cmake. For example, you may run
cmake -DHAVE_GETADDRINFO_A=FALSE ..., and that config run will not assume investigated value of that variable, but will use one you've provided.
Environment variables are useful to provide some kind of global settings which are stored aside build configuration and just present 'always'.
For persistency they may be set globally on the system using different ways - like add them to
.bashrc file, or embedd into Dockerfile if you produce docker-based build system, or write in system preferences environment variables on Windows. Also you may set them short-live using
export VAR=value in the shell. Or even shorter, prepending values to cmake call, like
CACHEB=/my/cache cmake ... - this way it will only work on this call and will not be visible on the next.
Some of such variables are known to be used in general by cmake and some other tools. That is things like
CXX which determines current C++ compiler, or
CXX_FLAGS to provide compiler flags, etc.
However we have some of the variables specific to manticore configuration, which are invented solely for our builds.
- CACHEB - same as config CACHEB option
- LIBS_BUNDLE - same as config LIBS_BUNDLE option
- DISTR - used to initialize
- DIAGNOSTIC - make output of cmake configuration much more verbose, explaining every thing happening
- WRITEB - assumes LIBS_BUNDLE, and if set, will download source archive files for different tools to LIBS_BUNDLE folder. That is, if fresh version of stemmer came out - you can manually remove libstemmer_c.tgz from the bundle, and then run oneshot
WRITEB=1 cmake ...- it will not found stemmer's sources in the bundle, and then download them from vendor's site to the bundle (without WRITEB it will download them into some temporary folder inside build and it will dissapear as you wipe the build folder).
At the end of configuration you may see what is available and will be used in the list like this one:
-- Enabled features compiled in: * Galera, replication of indexes * re2, a regular expression library * stemmer, stemming library (Snowball) * icu, International Components for Unicode * OpenSSL, for encrypted networking * ZLIB, for compressed data and networking * ODBC, for indexing MSSQL (windows) and generic ODBC sources with indexer * EXPAT, for indexing xmlpipe sources with indexer * Iconv, for support different encodings when indexing xmlpipe sources with indexer * Mysql, for indexing mysql sources with indexer * PostgreSQL, for indexing postgresql sources with indexer
cmake --build . --config RelWithDebInfo
To install run:
cmake --install . --config RelWithDebInfo
to install into custom (non-default) folder, run
cmake --install . --prefix path/to/build --config RelWithDebInfo
For building package use target
package. It will build package according to selection, provided by
option. By default it will be a simple .zip or .tgz archive with all binaries and supplement files.
cmake --build . --target package --config RelWithDebInfo
For preparing official packages we use docker containers. They include all necessary environment components and are
proved as working solutions by our own builds. You can recreate any of them using Dockerfiles and
instruction, provided in
dist/build_dockers/ folder of the sources. That is easiest way to make the binaries for any
supported Linux distribution, and also make packages there. Each docker provides
DISTR environment variable, which is
consumed by applying
PACK config option, so that whole configuring might be done by
cmake -DPACK=1 /path/to/sources.
For example, to create RedHat 7 package 'as official', but without embedded ICU with it's big datafile, you may
execute (implies that sources are placed in
/manticore/sources folder of the host):
docker run -it --rm -v /manticore/sources:/manticore registry.gitlab.com/manticoresearch/dev/bionic_cmake:320 bash # following is inside docker shell. By default, workdir will be in the source folder, mounted as volume from the host. RELEASE_TAG="noicu" mkdir build && cd build cmake -DPACK=1 -DBUILD_TAG=$RELEASE_TAG -DWITH_ICU_FORCE_STATIC=0 .. cmake --build . --target package
If you didn't change the path for sources and build, just move to you build folder and run:
cmake . cmake --build . --clean-first --config RelWithDebInfo
If by any reason it doesn't work, you can delete file
CMakeCache.txt located in the build folder. After this step you
have to run cmake again, pointing to the source folder and configuring the options.
If it also doesn't help, just wipe out your build folder and begin from scratch.
- shortly - just provide
--config RelWithDebInfoas written above. It will make no mistake ).
We use two build types. For development it is
Debug - it assigns compiler flags for optimization and other things the
way that it is very friendly for development, in means debug runs with step-by-step execution. However, produced
binaries are quite large and slow for production. For releasing we use another type -
RelWithDebInfo - which means
'release build with debug info'. It produces production binaries with embedded debug info. Last then split away into
separate debuginfo packages which are stored aside with release packages and might be used if some abnormal things, like
crashes, happens - for investigation and bugfixing. Cmake also provides
MinSizeRel, but we're not using
them. If build type is not available, cmake will make
There are two types of generators: single-config and multi-config.
- single-config needs build type provided on configuration, via
CMAKE_BUILD_TYPEparameter. If it is not defined, build fall-back to
RelWithDebInfotype which is quite well if you want just build manticore from sources and not going to participate in development. For explicit build you should provide build type, like
- multi-config selects build type during the build. It should be provided with
--configoption, otherwise it will build kind of
noconfig, which is quite strange and not desirable. So, you should always specify build type, like
If you want to specify build type, but don't want to care about whether it is 'single' or 'multi' config generator -
just provide necessary keys in both places. I.e., configure with
-DCMAKE_BUILD_TYPE=Debug, and then build with
Just be sure that both values are same. If target builder is single-config, it will consume confirutation param.
If it is multi-config, configuration param will be ignored, but correct build confirutation will then be selected by --config key.
If you want RelWihtDebInfo (i.e. just build for production) and know you're on single-config platform (that is all, except Windows) - you can omit
--config flag on cmake invocation. Default
CMAKE_BUILD_TYPE=RelWithDebInfo will be configured then, and used.
All the commands for 'building', 'installation' and 'building package' will become shorter then.
Cmake is the tool which is not performing building by itself, but it generates rules for local build system.
Usually it determines available build system well, but sometimes you might need to provide generator explicitly. You
cmake -G and review the list of available generators.
- on Windows, if you have more than one version ov Visual Studio installed, you might need to specify which one to use,
cmake -G "Visual Studio 16 2019" ....
- on all other platforms - usually
Unix makefilesare in game, but you can specify another one, as
Ninja Multi-Config, as:
cmake -GNinja ...
cmake -G"Ninja Multi-Config" ...
Ninja Multi-Config is quite useful, as it is really 'multi-config', and available on linux/macos/bsd. With this generator you may shift choosing of configuration type to build time, and also you may build several configurations in one and same build folder, changing only
- If you want to finally build full-featured RPM package, path to build directory must be long enough in order to
correctly build debug symbols.
/manticore012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789, for example. That is because RPM tools modify the path over compiled binaries when building debug info, and it can just write over existing room and won't allocate more. Above mentioned long path has 100 chars and that is quite enough for such case.
Some libraries should be available if you want to use them.
- for indexing (
postgresql. Without them, you could only index
- for serving queries (
opensslmight be necessary.
- for all (required, mandatory!) we need Boost library. Minimal version is 1.61.0, however we build the binaries with fresher 1.75.0.
Even more fresh (like 1.76) should also be ok. On Windows you can download pre-built Boost from their site (boost.org) and
install into default suggested path (that is C:\boost...). On Mac Os the one provided in brew is ok. On linuxes you can check
available version in official repositories, and if it doesn't match requirements you can build from sources. We need
component 'context', you can also build components 'system' and 'program_options', they will be necessary if you also want
to build Galera library from the sources. Look into
dist/build_dockers/xxx/boost_175/Dockerfilefor a short self-documented script/instruction how to do it.
On build system you need 'dev' or 'devel' versions of that packages installed (i.e. - libmysqlclient-devel, unixodbc-devel, etc. Look to our dockerfiles for the names of concrete packages).
On run systems these packages should present at least in final (non-dev) variants. (devel variants usually larger, as they include not only target binaries, but also different development stuff like include headers, etc.).
Apart necessary pre-requisites, you might need prebuilt
postgresql client libraries. You have either to build them yourself, either contact us to get our build bundle (that is simple zip archive where folder with these targets located).
- ODBC is not necessary, as it is system library.
- OpenSSL might be build from sources, or download prebuilt from https://slproweb.com/products/Win32OpenSSL.html (that is mentioned in cmake internal script on FindOpenSSL).
- Boost might be donwloaded pre-built from https://www.boost.org/ releases.
indexer -h. It will say which features was configured and built (whenever they're explicit, or investigated, doesn't matter):
Built on Linux x86_64 by GNU 8.3.1 compiler. Configured with these definitions: -DDISTR_BUILD=rhel8 -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmariadb.so.3 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore/data -DFULL_SHARE_DIR=/usr/share/manticore
Manticore Search 2.x maintains compatibility with Sphinxsearch 2.x and can load existing indexes created by Sphinxsearch. In most cases, upgrading is just a matter of replacing the binaries.
Instead of sphinx.conf (in Linux normally located at
/etc/sphinxsearch/sphinx.conf) Manticore by default uses
/etc/manticoresearch/manticore.conf. It also runs under a different user and use different folders.
Systemd service name has changed from
manticore and the service runs under user
manticore (Sphinx was using
sphinxsearch). It also uses a different folder for the PID file.
The folders used by default are
/var/run/manticore. You can still use the existing Sphinx config, but you need to manually change permissions for
/var/log/sphinxsearch folders. Or, just rename globally 'sphinx' to 'manticore' in system files. If you use other folders (for data, wordforms files etc.) the ownership must be also switched to user
pid_file location should be changed to match the manticore.service to
If you want to use the Manticore folder instead, the index files need to be moved to the new data folder (
/var/lib/manticore) and the permissions to be changed to user
Upgrading from Sphinx / Manticore 2.x to 3.x is not straightforward, because the index storage engine received a massive upgrade and the new searchd can't load older indexes and upgrade them to new format on-the-fly.
Manticore Search 3 got a redesigned index storage. Indexes created with Manticore/Sphinx 2.x cannot be loaded by Manticore Search 3 without a conversion. Because of the 4GB limitation, a real-time index in 2.x could still have several disk chunks after an optimize operation. After upgrading to 3.x, these indexes can now be optimized to 1-disk chunk with the usual OPTIMIZE command. Index files also changed. The only component that didn't get any structural changes is the
.spp file (hitlists).
.sps (strings/json) and
.spm (MVA) are now held by
.spb (var-length attributes). The new format has an
.spm file present, but it's used for row map (previously it was dedicated for MVA attributes). The new extensions added are
.spt (docid lookup),
.sphi ( secondary index histograms),
.spds (document storage). In case you are using scripts that manipulate index files, they should be adapted for the new file extensions.
The upgrade procedure may differ depending on your setup (number of servers in the cluster, whether you have HA or not etc.), but in general it's about creating new 3.x index versions and replacing your existing ones with them along with replacing older 2.x binaries with the new ones.
There are two special requirements to take care:
- Real-time indexes need to be flushed using FLUSH RAMCHUNK
- Plain indexes with kill-lists require adding a new directive in index configuration (see killlist_target)
Manticore Search 3 includes a new tool - index_converter - that can convert Sphinx 2.x / Manticore 2.x indexes to 3.x format.
index_converter comes in a separate package which should be installed first. Using the convert tool create 3.x versions of your indexes.
index_converter can write the new files in the existing data folder and backup the old files or it can write the new files to a chosen folder.
If you have a single server:
- install manticore-converter package
- use index_converter to create new versions of the indexes in a different folder than the existing data folder ( using –output-dir option)
- stop existing Manticore/Sphinx, upgrade to 3.0, move the new indexes to data folder, start Manticore
To get a minimal downtime, you can copy 2.x indexes, config (you'll need to edit paths here for indexes, logs and different ports) and binaries to a separate location and start this on a separate port and point your application to it. After upgrade is made to 3.0 and the new server is started, you can point back the application to the normal ports. If all is good, stop the 2.x copy and delete the files to free the space.
If you have a spare box (like a testing or staging server), you can do there first the index upgrade and even install Manticore 3 to perform several tests and if everything is ok copy the new index files to the production server. If you have multiple servers which can be pulled out from production, do it one by one and perform the upgrade on each. For distributed setups, 2.x searchd can work as a master with 3.x nodes, so you can do upgrading on the data nodes first and at the end the master node.
There have been no changes made on how clients should connect to the engine or any change in querying mode or queries behavior.
Kill-lists have been redesigned in Manticore Search 3. In previous versions kill-lists were applied on the result set provided by each previous searched index on query time.
Thus In 2.x the index order at query time mattered. For example if a delta index had a kill-list in order to apply it against the main index the order had to be main, delta (either in a distributed index or in the FROM clause).
In Manticore 3 kill-lists are applied to an index when it's loaded during searchd startup or gets rotated. New directive killlist_target in index configuration specifies target indexes and defines which doc ids from the source index should be used for suppression. These can be ids from the defined kill-list, actual doc ids of the index or the both.
Documents from the kill-lists are deleted from the target indexes, they are not returned in results even if the search doesn't include the index that provided the kill-lists. Because of that the order of indexes for searching does not matter any more. Now delta,main and main,delta will provide the same results.
In previous versions indexes were rotated following the order from the configuration file. In Manticore 3 index rotation order is much smarter and works in accordance with killlist targets. Before starting to rotate indexes the server looks for chains of indexes by killlist_target definitions. It will then first rotate indexes not referenced anywhere as kill-lists targets. Next it will rotate indexes targeted by already rotated indexes and so on. For example if we do
indexer --all and we have 3 indexes : main, delta_big (which targets at the main) and delta_small (with target at delta_big), first the delta_small is rotated, then delta_big and finally the main. This is to ensure that when a dependent index is rotated it gets the most actual kill-list from other indexes.
docinfo- everything is now extern
inplace_docinfo_gap- not needed anymore
mva_updates_pool- MVAs don’t have anymore a dedicated pool for updates, as now they can be updated directly in the blob (see below).
String, JSON and MVA attributes can be updated in Manticore 3.x using
In 2.x string attributes required
REPLACE, for JSON it was only possible to update scalar properties (as they were fixed-width) and MVAs could be updated using the MVA pool. Now updates are performed directly on the blob component. One setting that may require tuning is attr_update_reserve which allows changing the allocated extra space at the end of the blob used to avoid frequent resizes in case the new values are bigger than the existing values in the blob.
Doc ids used to be UNSIGNED 64-bit integers. Now they are POSITIVE SIGNED 64-bit integers.
Read here about the RT mode
Manticore 3.x recognizes and parses special suffixes which makes easier to use numeric values with special meaning. Common form for them is integer number + literal, like 10k or 100d, but not 40.3s (since 40.3 is not integer), or not 2d 4h (since there are two, not one value). Literals are case-insensitive, so 10W is the same as 10w. There are 2 types of such suffixes currently supported:
- Size suffixes - can be used in parameters that define size of something (memory buffer, disk file, limit of RAM, etc. ) in bytes. "Naked" numbers in that places mean literally size in bytes (octets). Size values take suffix
kfor kilobytes (1k=1024),
mfor megabytes (1m=1024k),
gfor gigabytes (1g=1024m) and
tfor terabytes (1t=1024g).
- Time suffixes - can be used in parameters defining some time interval values like delays, timeouts, etc. "Naked" values for those parameters usually have documented scale, and you must know if their numbers, say, 100, means '100 seconds' or '100 milliseconds'. However instead of guessing you just can write suffixed value and it will be fully determined by it's suffix. Time values take suffix
usfor useconds (microseconds),
dfor days and
index_converter is a tool for converting indexes created with Sphinx/Manticore Search 2.x to Manticore Search 3.x index format. The tool can be used in several different ways:
$ index_converter --config /home/myuser/manticore.conf --index indexname
$ index_converter --config /home/myuser/manticore.conf --all
$ index_converter --path /var/lib/manticoresearch/data --all
New version of the index is written by default in the same folder. Previous version files are saved with .old extension in their name. An exception is .spp (hitlists) file which is the only index component that didn’t have any change in the new format.
You can save the new index version to a different folder using –output-dir option
$ index_converter --config /home/myuser/manticore.conf --all --output-dir /new/path
A special case is for indexes containing kill-lists. As the behaviour of how kill-lists works has changed (see killlist_target), the delta index should know which are the target indexes for applying the kill-lists. There are 3 ways to have a converted index ready for setting targeted indexes for applying kill-lists:
-–killlist-targetwhen converting an index
$ index_converter --config /home/myuser/manticore.conf --index deltaindex --killlist-target mainindex:kl
- Add killlist_target in the configuration before doing the conversion
- use ALTER ... KILLIST_TARGET command after conversion
Here's the complete list of
-c <file>for short) tells index_converter to use the given file as its configuration. Normally, it will look for manticore.conf in the installation directory (e.g.
/usr/local/manticore/etc/manticore.confif installed into
/usr/local/sphinx), followed by the current directory you are in when calling index_converter from the shell.
--indexspecifies which index should be converted
--path- instead of using a config file, a path containing index(es) can be used
--strip-path- strips path from filenames referenced by index: stopwords, exceptions and wordforms
--large-docid- allows to convert documents with ids larger than 2^63 and display a warning, otherwise it will just exit on the large id with an error. This option was added as in Manticore 3.x doc ids are signed bigint, while previously they were unsigned
--output-dir <dir>- writes the new files in a chosen folder rather than the same location as with the existing index files. When this option set, existing index files will remain untouched at their location.
--all- converts all indexes from the config
--killlist-target <targets>- sets the target indexes for which kill-lists will be applied. This option should be used only in conjunction with
You can install and start Manticore easily in Ubuntu, Centos, Debian, Windows and MacOS or use Manticore as a docker container.
wget http://repo.manticoresearch.com/manticore-repo.noarch.deb sudo dpkg -i manticore-repo.noarch.deb sudo apt update sudo apt install manticore manticore-columnar-lib sudo systemctl start manticore
By default Manticore is waiting for your connections on:
- port 9306 for MySQL clients
- port 9308 for HTTP/HTTPS connections
- port 9312 for connections from other Manticore nodes and clients based on Manticore binary API
mysql -h0 -P9306
Let's now create an index called "products" with 2 fields:
- title - full-text field which will contain our product's title
- price - of type "float"
create table products(title text, price float) morphology='stem_en';
Query OK, 0 rows affected (0.02 sec)
insert into products(title,price) values ('Crossbody Bag with Tassel', 19.85), ('microfiber sheet set', 19.99), ('Pet Hair Remover Glove', 7.99);
Query OK, 3 rows affected (0.01 sec)
Let's find one of the documents. The query we will use is 'remove hair'. As you can see it finds document with title 'Pet Hair Remover Glove' and highlights 'Hair remover' in it even though the query has "remove", not "remover". This is because when we created the index we turned on using English stemming (
select id, highlight(), price from products where match('remove hair');
+---------------------+-------------------------------+----------+ | id | highlight() | price | +---------------------+-------------------------------+----------+ | 1513686608316989452 | Pet <strong>Hair Remover</strong> Glove | 7.990000 | +---------------------+-------------------------------+----------+ 1 row in set (0.00 sec)
Let's assume we now want to update the document - change the price to 18.5. This can be done by filtering by any field, but normally you know the document id and update something based on that.
update products set price=18.5 where id = 1513686608316989452;
Query OK, 1 row affected (0.00 sec)