Geo spatial functions

GEODIST()

GEODIST(lat1, lon1, lat2, lon2, [...]) function calculates the geosphere distance between two points specified by their coordinates. Note that by default, both latitudes and longitudes must be in radians, and the result will be in meters. You can use arbitrary expressions for any of the four coordinates. An optimized path will be chosen when one pair of arguments directly refers to a pair of attributes, and the other one is constant.

GEODIST() also accepts an optional 5th argument, allowing you to easily convert between input and output units and select the specific geodistance formula to use. The complete syntax and a few examples are as follows:

GEODIST(lat1, lon1, lat2, lon2, { option=value, ... })

GEODIST(40.7643929, -73.9997683, 40.7642578, -73.9994565, {in=degrees, out=feet})

GEODIST(51.50, -0.12, 29.98, 31.13, {in=deg, out=mi})

The known options and their values are:

  • in = {deg | degrees | rad | radians}, specifies the input units;
  • out = {m | meters | km | kilometers | ft | feet | mi | miles}, specifies the output units;
  • method = {adaptive | haversine}, specifies the geodistance calculation method.

The default method is "adaptive". It is a well-optimized implementation that is both more precise and much faster at all times than "haversine".

GEOPOLY2D()

GEOPOLY2D(lat1,lon1,lat2,lon2,lat3,lon3...) creates a polygon to be used with the CONTAINS() function. This function takes into account the Earth's curvature by tessellating the polygon into smaller ones, and should be used for larger areas. For small areas, the POLY2D() function can be used instead. The function expects coordinates to be pairs of latitude/longitude coordinates in degrees; if radians are used, it will give the same result as POLY2D().

POLY2D()

POLY2D(x1,y1,x2,y2,x3,y3...) creates a polygon to be used with the CONTAINS() function. This polygon assumes a flat Earth, so it should not be too large; for large areas, the GEOPOLY2D() function, which takes Earth's curvature into consideration, should be used.

String functions

CONCAT()

Concatenates two or more strings into one. Non-string arguments must be explicitly converted to string using the TO_STRING() function.

CONCAT(TO_STRING(float_attr), ',', TO_STRING(int_attr), ',', title)

LEVENSHTEIN()

LEVENSHTEIN ( pattern, source, {normalize=0, length_delta=0}) returns number (Levenshtein distance) of single-character edits (insertions, deletions or substitutions) between pattern and source strings required to make in pattern to make it source.

  • pattern, source - constant string, string field name, JSON field name, or any expression that produces a string (like e.g., SUBSTRING_INDEX())
  • normalize - option to return the distance as a float number in the range [0.0 - 1.0], where 0.0 is an exact match, and 1.0 is the maximum difference. The default value is 0, meaning not to normalize and provide the result as an integer.
  • length_delta - skips Levenshtein distance calculation and returns max(strlen(pattern), strlen(source)) if the option is set and the lengths of the strings differ by more than the length_delta value. The default value is 0, meaning to calculate Levenshtein distance for any input strings. This option can be useful when checking mostly similar strings.
SELECT LEVENSHTEIN('gily', attr1) AS dist, WEIGHT() AS w FROM test WHERE MATCH('test') ORDER BY w DESC, dist ASC;
SELECT LEVENSHTEIN('gily', j.name, {length_delta=6}) AS dist, WEIGHT() AS w FROM test WHERE MATCH('test') ORDER BY w DESC;
SELECT LEVENSHTEIN(title, j.name, {normalize=1}) AS dist, WEIGHT() AS w FROM test WHERE MATCH ('test') ORDER BY w DESC, dist ASC;

REGEX()

The REGEX(attr,expr) function returns 1 if a regular expression matches the attribute's string, and 0 otherwise. It works with both string and JSON attributes.

SELECT REGEX(content, 'box?') FROM test;
SELECT REGEX(j.color, 'red | pink') FROM test;

Expressions should adhere to the RE2 syntax. To perform a case-insensitive search, for instance, you can use:

SELECT REGEX(content, '(?i)box') FROM test;

SNIPPET()

The SNIPPET() function can be used to highlight search results within a given text. The first two arguments are: the text to be highlighted, and a query. Options can be passed to the function as the third, fourth, and so on arguments. SNIPPET() can obtain the text for highlighting directly from the table. In this case, the first argument should be the field name:

SELECT SNIPPET(body, QUERY()) FROM myIndex WHERE MATCH('my.query')

In this example, the QUERY() expression returns the current full-text query. SNIPPET() can also highlight non-indexed text:

mysql  SELECT id, SNIPPET('text to highlight', 'my.query', 'limit=100') FROM myIndex WHERE MATCH('my.query')

Additionally, it can be used to highlight text fetched from other sources using a User-Defined Function (UDF):

SELECT id, SNIPPET(myUdf(id), 'my.query', 'limit=100') FROM myIndex WHERE MATCH('my.query')

In this context, myUdf() is a User-Defined Function (UDF) that retrieves a document by its ID from an external storage source. The SNIPPET() function is considered a "post limit" function, which means that the computation of snippets is delayed until the entire final result set is prepared, and even after the LIMIT clause has been applied. For instance, if a LIMIT 20,10 clause is used, SNIPPET() will be called no more than 10 times.

It is important to note that SNIPPET() does not support field-based limitations. For this functionality, use HIGHLIGHT() instead.

SUBSTRING_INDEX()

SUBSTRING_INDEX(string, delimiter, number) returns a substring of the original string, based on a specified number of delimiter occurrences:

  • string - The original string, which can be a constant string or a string from a string/JSON attribute.
  • delimiter - The delimiter to search for.
  • number - The number of times to search for the delimiter. This can be either a positive or negative number. If it is a positive number, the function will return everything to the left of the delimiter. If it is a negative number, the function will return everything to the right of the delimiter.

SUBSTRING_INDEX() by default returns a string, but it can also be coerced into other types (such as integer or float) if necessary. Numeric values can be converted using specific functions (such as BIGINT(), DOUBLE(), etc.).

‹›
  • SQL
SQL
📋
SELECT SUBSTRING_INDEX('www.w3schools.com', '.', 2) FROM test;
SELECT SUBSTRING_INDEX(j.coord, ' ', 1) FROM test;
SELECT          SUBSTRING_INDEX('1.2 3.4', ' ',  1);  /* '1.2' */
SELECT          SUBSTRING_INDEX('1.2 3.4', ' ', -1);  /* '3.4' */
SELECT sint (   SUBSTRING_INDEX('1.2 3.4', ' ',  1)); /* 1 */
SELECT sint (   SUBSTRING_INDEX('1.2 3.4', ' ', -1)); /* 3 */
SELECT double ( SUBSTRING_INDEX('1.2 3.4', ' ',  1)); /* 1.200000 */
SELECT double ( SUBSTRING_INDEX('1.2 3.4', ' ', -1)); /* 3.400000 */

UPPER() and LOWER()

UPPER(string) convert argument to upper case, LOWER(string) convert argument to lower case.

Result also can be promoted to numeric, but only if string argument is convertible to a number. Numeric values could be promoted with arbitrary functions (BITINT, DOUBLE, etc.).

SELECT upper('www.w3schools.com', '.', 2); /* WWW.W3SCHOOLS.COM  */
SELECT double (upper ('1.2e3')); /* 1200.000000 */
SELECT integer (lower ('12345')); /* 12345 */

Other functions

LAST_INSERT_ID()

Returns the IDs of documents that were inserted or replaced by the last statement in the current session.

The same value can also be obtained via the @@session.last_insert_id variable.

mysql> select @@session.last_insert_id;
+--------------------------+
| @@session.last_insert_id |
+--------------------------+
| 11,32                    |
+--------------------------+
1 rows in set

mysql> select LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
| 25,26,29         |
+------------------+
1 rows in set   

CONNECTION_ID()

Returns the current connection ID.

mysql> select CONNECTION_ID();
+-----------------+
| CONNECTION_ID() |
+-----------------+
| 6               |
+-----------------+
1 row in set (0.00 sec)