Arrays and conditions functions

ALL()

ALL(cond FOR var IN json.array) applies to JSON arrays and returns 1 if the condition is true for all elements in the array and 0 otherwise. cond is a general expression that can also use var as the current value of an array element within itself.

‹›
  • ALL() with json
  • ALL() with json ex. 2
📋
select *, ALL(x>0 AND x<4 FOR x IN j.ar) from tbl
‹›
Response
+------+--------------+--------------------------------+
| id   | j            | all(x>0 and x<4 for x in j.ar) |
+------+--------------+--------------------------------+
|    1 | {"ar":[1,3]} |                              1 |
|    2 | {"ar":[3,7]} |                              0 |
+------+--------------+--------------------------------+
2 rows in set (0.00 sec)

ALL(mva) is a special constructor for multi-value attributes. When used with comparison operators (including comparison with IN()), it returns 1 if all values from the MVA attribute are found among the compared values.

‹›
  • ALL() with MVA
  • ALL() with MVA and IN()
📋
select * from tbl where all(m) >= 1
‹›
Response
+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)

To compare an MVA attribute with an array, avoid using <mva> NOT ALL(); use ALL(<mva>) NOT IN() instead.

‹›
  • ALL() with MVA and NOT IN()
ALL() with MVA and NOT IN()
📋
select * from tbl where all(m) not in (2, 4)
‹›
Response
+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)

ALL(string list) is a special operation for filtering string tags.

If all of the words enumerated as arguments of ALL() are present in the attribute, the filter matches. The optional NOT inverts the logic.

This filter internally uses doc-by-doc matching, so in the case of a full scan query, it might be slower than expected. It is intended for attributes that are not indexed, like calculated expressions or tags in PQ tables. If you need such filtering, consider the solution of putting the string attribute as a full-text field, and then use the full-text operator match(), which will invoke a full-text search.

‹›
  • ALL() with strings
  • ALL() with strings and NOT
📋
select * from tbl where tags all('bug', 'release')
‹›
Response
+------+---------------------------+
| id   | tags                      |
+------+---------------------------+
|    1 | bug priority_high release |
|    2 | bug priority_low release  |
+------+---------------------------+
2 rows in set (0.00 sec)

ANY()

ANY(cond FOR var IN json.array) applies to JSON arrays and returns 1 if the condition is true for any element in the array and 0 otherwise. cond is a general expression that can also use var as the current value of an array element within itself.

‹›
  • ANY() with json
  • ANY() with json ex. 2
📋
select *, ANY(x>5 AND x<10 FOR x IN j.ar) from tbl
‹›
Response
+------+--------------+---------------------------------+
| id   | j            | any(x>5 and x<10 for x in j.ar) |
+------+--------------+---------------------------------+
|    1 | {"ar":[1,3]} |                               0 |
|    2 | {"ar":[3,7]} |                               1 |
+------+--------------+---------------------------------+
2 rows in set (0.00 sec)

ANY(mva) is a special constructor for multi-value attributes. When used with comparison operators (including comparison with IN()), it returns 1 if any of the MVA values is found among the compared values.

When comparing an array using IN(), ANY() is assumed by default if not otherwise specified, but a warning will be issued regarding the missing constructor.

‹›
  • ANY() with MVA
  • ANY() with MVA and IN()
📋
mysql> select * from tbl

+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.01 sec)

mysql> select * from tbl where any(m) > 5

+------+------+
| id   | m    |
+------+------+
|    2 | 3,7  |
+------+------+
1 row in set (0.00 sec)
‹›
Response
+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)

To compare an MVA attribute with an array, avoid using <mva> NOT ANY(); use <mva> NOT IN() instead or ANY(<mva>) NOT IN().

‹›
  • ANY() with MVA and NOT IN()
ANY() with MVA and NOT IN()
📋
mysql> select * from tbl

+------+------+
| id   | m    |
+------+------+
|    1 | 1,3  |
|    2 | 3,7  |
+------+------+
2 rows in set (0.00 sec)

mysql> select * from tbl where any(m) not in (1, 3, 5)

+------+------+
| id   | m    |
+------+------+
|    2 | 3,7  |
+------+------+
1 row in set (0.00 sec)

ANY(string list) is a special operation for filtering string tags.

If any of the words enumerated as arguments of ANY() is present in the attribute, the filter matches. The optional NOT inverts the logic.

This filter internally uses doc-by-doc matching, so in the case of a full scan query, it might be slower than expected. It is intended for attributes that are not indexed, like calculated expressions or tags in PQ tables. If you need such filtering, consider the solution of putting the string attribute as a full-text field, and then use the full-text operator match(), which will invoke a full-text search.

‹›
  • ANY() with strings
  • ANY() with strings and NOT
📋
select * from tbl where tags any('bug', 'feature')
‹›
Response
+------+---------------------------+
| id   | tags                      |
+------+---------------------------+
|    1 | bug priority_high release |
|    2 | bug priority_low release  |
+------+---------------------------+
2 rows in set (0.00 sec)

CONTAINS()

CONTAINS(polygon, x, y) checks whether the (x,y) point is within the given polygon, and returns 1 if true, or 0 if false. The polygon has to be specified using either the POLY2D() function. The former function is intended for "small" polygons, meaning less than 500 km (300 miles) a side, and it doesn't take into account the Earth's curvature for speed. For larger distances, you should use GEOPOLY2D, which tessellates the given polygon in smaller parts, accounting for the Earth's curvature.

IF()

The behavior of IF() is slightly different from its MySQL counterpart. It takes 3 arguments, checks whether the 1st argument is equal to 0.0, returns the 2nd argument if it is not zero, or the 3rd one when it is. Note that unlike comparison operators, IF() does not use a threshold! Therefore, it's safe to use comparison results as its 1st argument, but arithmetic operators might produce unexpected results. For instance, the following two calls will produce different results even though they are logically equivalent:

IF ( sqrt(3)*sqrt(3)-3<>0, a, b )
IF ( sqrt(3)*sqrt(3)-3, a, b )

In the first case, the comparison operator <> will return 0.0 (false) due to a threshold, and IF() will always return ** as a result. In the second case, the same sqrt(3)*sqrt(3)-3 expression will be compared with zero without a threshold by the IF() function itself. However, its value will be slightly different from zero due to limited floating-point calculation precision. Because of this, the comparison with 0.0 done by IF() will not pass, and the second variant will return 'a' as a result.

IN()

IN(expr,val1,val2,...) takes 2 or more arguments and returns 1 if the 1st argument (expr) is equal to any of the other arguments (val1..valN), or 0 otherwise. Currently, all the checked values (but not the expression itself) are required to be constant. The constants are pre-sorted, and binary search is used, so IN() even against a large arbitrary list of constants will be very quick. The first argument can also be an MVA attribute. In that case, IN() will return 1 if any of the MVA values are equal to any of the other arguments. IN() also supports IN(expr,@uservar) syntax to check whether the value belongs to the list in the given global user variable. The first argument can be a JSON attribute.

INDEXOF()

INDEXOF(cond FOR var IN json.array) function iterates through all elements in the array and returns the index of the first element for which 'cond' is true, and -1 if 'cond' is false for every element in the array.

INTERVAL()

INTERVAL(expr,point1,point2,point3,...) takes 2 or more arguments and returns the index of the argument that is less than the first argument: it returns 0 if expr<point1, 1 if point1<=expr<point2, and so on. It is required that point1<point2<...<pointN for this function to work correctly.

LENGTH()

LENGTH(attr_mva) function returns the number of elements in an MVA set. It works with both 32-bit and 64-bit MVA attributes. LENGTH(attr_json) returns the length of a field in JSON. The return value depends on the type of field. For example, LENGTH(json_attr.some_int) always returns 1, and LENGTH(json_attr.some_array) returns the number of elements in the array. LENGTH(string_expr) function returns the length of the string resulting from an expression. TO_STRING() must enclose the expression, regardless of whether the expression returns a non-string or it's simply a string attribute.

REMAP()

REMAP(condition, expression, (cond1, cond2, ...), (expr1, expr2, ...)) function allows you to make some exceptions to expression values depending on condition values. The condition expression should always result in an integer, while the expression can result in an integer or float.

Example:

SELECT id, size, REMAP(size, 15, (5,6,7,8), (1,1,2,2)) s
FROM products
ORDER BY s ASC;

This will put documents with sizes 5 and 6 first, followed by sizes 7 and 8. In case there's an original value not listed in the array (e.g. size 10), it will default to 15, and in this case, will be placed at the end.

More examples:

SELECT REMAP(userid, karmapoints, (1, 67), (999, 0)) FROM users;
SELECT REMAP(id%10, salary, (0), (0.0)) FROM employes;

Date and time functions

Note, that CURTIME(), UTC_TIME(), UTC_TIMESTAMP(), and TIMEDIFF() can be promoted to numeric types using arbitrary conversion functions such as BIGINT(), DOUBLE(), etc.

NOW()

Returns the current timestamp as an INTEGER.

‹›
  • SQL
SQL
📋
select NOW();
‹›
Response
+------------+
| NOW()      |
+------------+
| 1615788407 |
+------------+

CURTIME()

Returns the current time in the local timezone in hh:ii:ss format.

‹›
  • SQL
SQL
📋
select CURTIME();
‹›
Response
+-----------+
| CURTIME() |
+-----------+
| 07:06:30  |
+-----------+

UTC_TIME()

Returns the current time in UTC timezone in hh:ii:ss format.

‹›
  • SQL
SQL
📋
select UTC_TIME();
‹›
Response
+------------+
| UTC_TIME() |
+------------+
| 06:06:18   |
+------------+

UTC_TIMESTAMP()

Returns the current time in UTC timezone in YYYY-MM-DD hh:ii:ss format.

‹›
  • SQL
SQL
📋
select UTC_TIMESTAMP();
‹›
Response
+---------------------+
| UTC_TIMESTAMP()     |
+---------------------+
| 2021-03-15 06:06:03 |
+---------------------+

SECOND()

Returns the integer second (in 0..59 range) from a timestamp argument, according to the current timezone.

‹›
  • SQL
SQL
📋
select second(now());
‹›
Response
+---------------+
| second(now()) |
+---------------+
| 52            |
+---------------+

MINUTE()

Returns the integer minute (in 0..59 range) from a timestamp argument, according to the current timezone.

‹›
  • SQL
SQL
📋
select minute(now());
‹›
Response
+---------------+
| minute(now()) |
+---------------+
| 5             |
+---------------+

HOUR()

Returns the integer hour (in 0..23 range) from a timestamp argument, according to the current timezone.

‹›
  • SQL
SQL
📋
select hour(now());
‹›
Response
+-------------+
| hour(now()) |
+-------------+
| 7           |
+-------------+

DAY()

Returns the integer day of the month (in 1..31 range) from a timestamp argument, according to the current timezone.

‹›
  • SQL
SQL
📋
select day(now());
‹›
Response
+------------+
| day(now()) |
+------------+
| 15         |
+------------+

MONTH()

Returns the integer month (in 1..12 range) from a timestamp argument, according to the current timezone.

‹›
  • SQL
SQL
📋
select month(now());
‹›
Response
+--------------+
| month(now()) |
+--------------+
| 3            |
+--------------+

YEAR()

Returns the integer year (in 1969..2038 range) from a timestamp argument, according to the current timezone.

‹›
  • SQL
SQL
📋
select year(now());
‹›
Response
+-------------+
| year(now()) |
+-------------+
| 2021        |
+-------------+

YEARMONTH()

Returns the integer year and month code (in 196912..203801 range) from a timestamp argument, according to the current timezone.

‹›
  • SQL
SQL
📋
select yearmonth(now());
‹›
Response
+------------------+
| yearmonth(now()) |
+------------------+
| 202103           |
+------------------+

YEARMONTHDAY()

Returns the integer year, month, and date code (ranging from 19691231 to 20380119) based on the current timezone.

‹›
  • SQL
SQL
📋
select yearmonthday(now());
‹›
Response
+---------------------+
| yearmonthday(now()) |
+---------------------+
| 20210315            |
+---------------------+

TIMEDIFF()

Calculates the difference between two timestamps in the format hh:ii:ss.

‹›
  • SQL
SQL
📋
select timediff(1615787586, 1613787583);
‹›
Response
+----------------------------------+
| timediff(1615787586, 1613787583) |
+----------------------------------+
| 555:33:23                        |
+----------------------------------+

DATE_FORMAT()

Returns a formatted string based on the provided date and format arguments. The format argument uses the same specifiers as the strftime function. For convenience, here are some common format specifiers:

  • %Y - Four-digit year
  • %m - Two-digit month (01-12)
  • %d - Two-digit day of the month (01-31)
  • %H - Two-digit hour (00-23)
  • %M - Two-digit minute (00-59)
  • %S - Two-digit second (00-59)
  • %T - Time in 24-hour format (%H:%M:%S)

Note that this is not a complete list of the specifiers. Please consult the documentation for strftime() for your operating system to get the full list.

‹›
  • SQL
SQL
📋
SELECT DATE_FORMAT(NOW(), 'year %Y and time %T');
‹›
Response
+------------------------------------------+
| DATE_FORMAT(NOW(), 'year %Y and time %T') |
+------------------------------------------+
| year 2023 and time 11:54:52              |
+------------------------------------------+

This example formats the current date and time, displaying the four-digit year and the time in 24-hour format.

Geo spatial functions

GEODIST()

GEODIST(lat1, lon1, lat2, lon2, [...]) function calculates the geosphere distance between two points specified by their coordinates. Note that by default, both latitudes and longitudes must be in radians, and the result will be in meters. You can use arbitrary expressions for any of the four coordinates. An optimized path will be chosen when one pair of arguments directly refers to a pair of attributes, and the other one is constant.

GEODIST() also accepts an optional 5th argument, allowing you to easily convert between input and output units and select the specific geodistance formula to use. The complete syntax and a few examples are as follows:

GEODIST(lat1, lon1, lat2, lon2, { option=value, ... })

GEODIST(40.7643929, -73.9997683, 40.7642578, -73.9994565, {in=degrees, out=feet})

GEODIST(51.50, -0.12, 29.98, 31.13, {in=deg, out=mi})

The known options and their values are:

  • in = {deg | degrees | rad | radians}, specifies the input units;
  • out = {m | meters | km | kilometers | ft | feet | mi | miles}, specifies the output units;
  • method = {adaptive | haversine}, specifies the geodistance calculation method.

The default method is "adaptive". It is a well-optimized implementation that is both more precise and much faster at all times than "haversine".

GEOPOLY2D()

GEOPOLY2D(lat1,lon1,lat2,lon2,lat3,lon3...) creates a polygon to be used with the CONTAINS() function. This function takes into account the Earth's curvature by tessellating the polygon into smaller ones, and should be used for larger areas. For small areas, the POLY2D() function can be used instead. The function expects coordinates to be pairs of latitude/longitude coordinates in degrees; if radians are used, it will give the same result as POLY2D().

POLY2D()

POLY2D(x1,y1,x2,y2,x3,y3...) creates a polygon to be used with the CONTAINS() function. This polygon assumes a flat Earth, so it should not be too large; for large areas, the GEOPOLY2D() function, which takes Earth's curvature into consideration, should be used.