分面搜索

分面搜索对于现代搜索应用来说,与自动补全拼写纠正和搜索关键词高亮同样重要,尤其是在电子商务产品中。

分面搜索

当处理大量数据和各种相互关联的属性时,例如尺寸、颜色、制造商或其他因素,分面搜索就派上用场了。在查询海量数据时,搜索结果常常包含许多不符合用户预期的条目。分面搜索使最终用户能够明确定义他们希望搜索结果满足的条件。

在 Manticore Search 中,有一项优化功能,它会保留原始查询的结果集,并在每个分面计算中重复使用。由于聚合操作应用于已计算好的文档子集,因此速度很快,总执行时间通常仅比初始查询稍长一些。分面可以添加到任何查询中,分面可以是任何属性或表达式。分面结果包括分面值和分面计数。可以通过在查询末尾声明分面,使用 SQL SELECT 语句访问分面。

聚合

SQL

分面值可以来自属性、JSON 属性内的 JSON 属性或表达式。分面值也可以使用别名,但别名在所有结果集(主查询结果集和其他分面结果集)中必须是唯一的。分面值源自聚合的属性/表达式,但也可以来自另一个属性/表达式。

FACET {expr_list} [BY {expr_list} ] [DISTINCT {field_name}] [ORDER BY {expr | FACET()} {ASC | DESC}] [LIMIT [offset,] count]

多个分面声明必须用空格分隔。

HTTP JSON

分面可以在 aggs 节点中定义:

     "aggs" :
     {
        "group name" :
         {
            "terms" :
             {
              "field":"attribute name",
              "size": 1000
             }
             "sort": [ {"attribute name": { "order":"asc" }} ]
         }
     }

其中:

  • group name 是分配给聚合的别名
  • field 值必须包含要进行分面的属性或表达式的名称
  • 可选的 size 指定结果中包含的最大桶数。未指定时,继承主查询的限制。更多详细信息可以在分面结果大小部分找到。
  • 可选的 sort 指定一个属性数组和/或附加属性,使用与主查询中的"sort"参数相同的语法。

结果集将包含一个 aggregations 节点,其中包含返回的分面,key 是聚合值,doc_count 是聚合计数。

    "aggregations": {
        "group name": {
        "buckets": [
            {
                "key": 10,
                "doc_count": 1019
            },
            {
                "key": 9,
                "doc_count": 954
            },
            {
                "key": 8,
                "doc_count": 1021
            },
            {
                "key": 7,
                "doc_count": 1011
            },
            {
                "key": 6,
                "doc_count": 997
            }
            ]
        }
    }
‹›
  • SQL
  • JSON
  • PHP
  • Python
  • Python-asyncio
  • Javascript
  • Java
  • C#
  • Rust
  • TypeScript
  • Go
📋
SELECT *, price AS aprice FROM facetdemo LIMIT 10 FACET price LIMIT 10 FACET brand_id LIMIT 5;
‹›
Response
+------+-------+----------+---------------------+------------+-------------+---------------------------------------+------------+--------+
| id   | price | brand_id | title               | brand_name | property    | j                                     | categories | aprice |
+------+-------+----------+---------------------+------------+-------------+---------------------------------------+------------+--------+
|    1 |   306 |        1 | Product Ten Three   | Brand One  | Six_Ten     | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11      |    306 |
|    2 |   400 |       10 | Product Three One   | Brand Ten  | Four_Three  | {"prop1":69,"prop2":19,"prop3":"One"} | 13,14      |    400 |
...
|    9 |   560 |        6 | Product Two Five    | Brand Six  | Eight_Two   | {"prop1":90,"prop2":84,"prop3":"One"} | 13,14      |    560 |
|   10 |   229 |        9 | Product Three Eight | Brand Nine | Seven_Three | {"prop1":84,"prop2":39,"prop3":"One"} | 12,13      |    229 |
+------+-------+----------+---------------------+------------+-------------+---------------------------------------+------------+--------+
10 rows in set (0.00 sec)
+-------+----------+
| price | count(*) |
+-------+----------+
|   306 |        7 |
|   400 |       13 |
...
|   229 |        9 |
|   595 |       10 |
+-------+----------+
10 rows in set (0.00 sec)
+----------+----------+
| brand_id | count(*) |
+----------+----------+
|        1 |     1013 |
|       10 |      998 |
|        5 |     1007 |
|        8 |     1033 |
|        7 |      965 |
+----------+----------+
5 rows in set (0.00 sec)

通过聚合另一个属性进行分面

可以通过聚合另一个属性或表达式对数据进行分面。例如,如果文档同时包含品牌ID和名称,我们可以在分面中返回品牌名称,但聚合品牌ID。这可以通过使用 FACET {expr1} BY {expr2} 来实现。

‹›
  • SQL
SQL
📋
SELECT * FROM facetdemo FACET brand_name by brand_id;
‹›
Response
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
| id   | price | brand_id | title               | brand_name  | property    | j                                     | categories |
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
|    1 |   306 |        1 | Product Ten Three   | Brand One   | Six_Ten     | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11      |
|    2 |   400 |       10 | Product Three One   | Brand Ten   | Four_Three  | {"prop1":69,"prop2":19,"prop3":"One"} | 13,14      |
....
|   19 |   855 |        1 | Product Seven Two   | Brand One   | Eight_Seven | {"prop1":63,"prop2":78,"prop3":"One"} | 10,11,12   |
|   20 |    31 |        9 | Product Four One    | Brand Nine  | Ten_Four    | {"prop1":79,"prop2":42,"prop3":"One"} | 12,13,14   |
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
20 rows in set (0.00 sec)
+-------------+----------+
| brand_name  | count(*) |
+-------------+----------+
| Brand One   |     1013 |
| Brand Ten   |      998 |
| Brand Five  |     1007 |
| Brand Nine  |      944 |
| Brand Two   |      990 |
| Brand Six   |     1039 |
| Brand Three |     1016 |
| Brand Four  |      994 |
| Brand Eight |     1033 |
| Brand Seven |      965 |
+-------------+----------+
10 rows in set (0.00 sec)

去重分面

如果需要从 FACET 返回的桶中移除重复项,可以使用 DISTINCT field_name,其中 field_name 是您希望用于去重的字段。如果您对分布式表进行 FACET 查询,并且不确定表中是否有唯一的ID(表应该是本地的且具有相同的模式),也可以是 id(这是默认值)。

如果查询中有多个 FACET 声明,field_name 在所有声明中应该相同。

DISTINCT 会在 count(*) 列之前返回一个额外的列 count(distinct ...),使您无需进行另一个查询即可获得两个结果。

‹›
  • SQL
  • JSON
📋
SELECT brand_name, property FROM facetdemo FACET brand_name distinct property;
‹›
Response
+-------------+----------+
| brand_name  | property |
+-------------+----------+
| Brand Nine  | Four     |
| Brand Ten   | Four     |
| Brand One   | Five     |
| Brand Seven | Nine     |
| Brand Seven | Seven    |
| Brand Three | Seven    |
| Brand Nine  | Five     |
| Brand Three | Eight    |
| Brand Two   | Eight    |
| Brand Six   | Eight    |
| Brand Ten   | Four     |
| Brand Ten   | Two      |
| Brand Four  | Ten      |
| Brand One   | Nine     |
| Brand Four  | Eight    |
| Brand Nine  | Seven    |
| Brand Four  | Five     |
| Brand Three | Four     |
| Brand Four  | Two      |
| Brand Four  | Eight    |
+-------------+----------+
20 rows in set (0.00 sec)
+-------------+--------------------------+----------+
| brand_name  | count(distinct property) | count(*) |
+-------------+--------------------------+----------+
| Brand Nine  |                        3 |        3 |
| Brand Ten   |                        2 |        3 |
| Brand One   |                        2 |        2 |
| Brand Seven |                        2 |        2 |
| Brand Three |                        3 |        3 |
| Brand Two   |                        1 |        1 |
| Brand Six   |                        1 |        1 |
| Brand Four  |                        4 |        5 |
+-------------+--------------------------+----------+
8 rows in set (0.00 sec)

基于表达式的分面

分面可以基于表达式进行聚合。一个经典的例子是按特定范围对价格进行分段:

‹›
  • SQL
  • JSON
  • PHP
  • Python
  • Python-asyncio
  • Javascript
  • Java
  • C#
  • Rust
  • TypeScript
  • Go
📋
SELECT * FROM facetdemo FACET INTERVAL(price,200,400,600,800) AS price_range ;
‹›
Response
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+-------------+
| id   | price | brand_id | title               | brand_name  | property    | j                                     | categories | price_range |
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+-------------+
|    1 |   306 |        1 | Product Ten Three   | Brand One   | Six_Ten     | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11      |           1 |
...
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+-------------+
20 rows in set (0.00 sec)
+-------------+----------+
| price_range | count(*) |
+-------------+----------+
|           0 |     1885 |
|           3 |     1973 |
|           4 |     2100 |
|           2 |     1999 |
|           1 |     2043 |
+-------------+----------+
5 rows in set (0.01 sec)

多级分组的Facet

Facets可以对多级分组进行聚合,结果集与查询执行多级分组时的结果相同:

‹›
  • SQL
SQL
📋
SELECT *,INTERVAL(price,200,400,600,800) AS price_range FROM facetdemo
FACET price_range AS price_range,brand_name ORDER BY brand_name asc;
‹›
Response
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+-------------+
| id   | price | brand_id | title               | brand_name  | property    | j                                     | categories | price_range |
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+-------------+
|    1 |   306 |        1 | Product Ten Three   | Brand One   | Six_Ten     | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11      |           1 |
...
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+-------------+
20 rows in set (0.00 sec)
+--------------+-------------+----------+
| fprice_range | brand_name  | count(*) |
+--------------+-------------+----------+
|            1 | Brand Eight |      197 |
|            4 | Brand Eight |      235 |
|            3 | Brand Eight |      203 |
|            2 | Brand Eight |      201 |
|            0 | Brand Eight |      197 |
|            4 | Brand Five  |      230 |
|            2 | Brand Five  |      197 |
|            1 | Brand Five  |      204 |
|            3 | Brand Five  |      193 |
|            0 | Brand Five  |      183 |
|            1 | Brand Four  |      195 |
...

基于直方图值的Facet

Facets可以通过构造固定大小的桶来对值进行直方图聚合。 键函数是:

key_of_the_bucket = interval + offset * floor ( ( value - offset ) / interval )

直方图参数interval必须为正数,直方图参数offset必须为正数且小于interval。默认情况下,桶以数组形式返回。直方图参数keyed使得响应以字典形式返回桶键。

‹›
  • SQL
  • JSON
  • JSON 2
📋
SELECT COUNT(*), HISTOGRAM(price, {hist_interval=100}) as price_range FROM facets GROUP BY price_range ORDER BY price_range ASC;
‹›
Response
+----------+-------------+
| count(*) | price_range |
+----------+-------------+
|        5 |           0 |
|        5 |         100 |
|        1 |         300 |
|        4 |         400 |
|        1 |         500 |
|        3 |         700 |
|        1 |         900 |
+----------+-------------+

基于日期直方图值的Facet

Facets可以对日期直方图值进行聚合,这与普通直方图类似。不同之处在于,区间由日期或时间表达式指定。此类表达式需要特殊支持,因为区间长度不总是固定的。值会根据以下键函数四舍五入到最近的桶:

key_of_the_bucket = interval * floor ( value / interval )

直方图参数calendar_interval理解月份具有不同的天数。 与calendar_interval不同,fixed_interval参数使用固定数量的单位,无论其在日历中的位置如何,都不会偏离。但是fixed_interval无法处理如周或月这样的单位,因为月不是一个固定数量。尝试为fixed_interval指定如周或月这样的单位将导致错误。 接受的区间在日期直方图表达式中描述。默认情况下,桶以数组形式返回。直方图参数keyed使得响应以字典形式返回桶键。

‹›
  • SQL
  • JSON
📋
SELECT count(*), DATE_HISTOGRAM(tm, {calendar_interval='month'}) AS months FROM idx_dates GROUP BY months ORDER BY months ASC
‹›
Response
+----------+------------+
| count(*) | months     |
+----------+------------+
|      442 | 1485907200 |
|      744 | 1488326400 |
|      720 | 1491004800 |
|      230 | 1493596800 |
+----------+------------+

基于范围的Facet

Facets可以对一组范围进行聚合。值会与桶范围进行检查,其中每个桶包括from值并排除to值。 将keyed属性设置为true使得响应以字典形式返回桶键,而不是数组。

‹›
  • SQL
  • JSON
  • JSON 2
📋
SELECT COUNT(*), RANGE(price, {range_to=150},{range_from=150,range_to=300},{range_from=300}) price_range FROM facets GROUP BY price_range ORDER BY price_range ASC;
‹›
Response
+----------+-------------+
| count(*) | price_range |
+----------+-------------+
|        8 |           0 |
|        2 |           1 |
|       10 |           2 |
+----------+-------------+

基于日期范围的Facet

Facets可以对一组日期范围进行聚合,这与普通范围类似。不同之处在于,fromto值可以使用日期数学表达式表示。此聚合包括from值并排除每个范围的to值。将keyed属性设置为true使得响应以字典形式返回桶键,而不是数组。

‹›
  • SQL
  • JSON
📋
SELECT COUNT(*), DATE_RANGE(tm, {range_to='2017||+2M/M'},{range_from='2017||+2M/M',range_to='2017||+5M/M'},{range_from='2017||+5M/M'}) AS points FROM idx_dates GROUP BY points ORDER BY points ASC;
‹›
Response
+----------+--------+
| count(*) | points |
+----------+--------+
|      442 |      0 |
|     1464 |      1 |
|      230 |      2 |
+----------+--------+

Facet结果中的排序

Facets支持ORDER BY子句,就像标准查询一样。每个Facet可以有自己的排序方式,Facet的排序不会影响主结果集的排序,这由主查询的ORDER BY决定。排序可以基于属性名、计数(使用COUNT(*)COUNT(DISTINCT attribute_name))或特殊的FACET()函数,该函数提供聚合数据值。默认情况下,带有ORDER BY COUNT(*)的查询将按降序排序。

‹›
  • SQL
  • JSON
📋
SELECT * FROM facetdemo
FACET brand_name BY brand_id ORDER BY FACET() ASC
FACET brand_name BY brand_id ORDER BY brand_name ASC
FACET brand_name BY brand_id order BY COUNT(*) DESC;
FACET brand_name BY brand_id order BY COUNT(*);
‹›
Response
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
| id   | price | brand_id | title               | brand_name  | property    | j                                     | categories |
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
|    1 |   306 |        1 | Product Ten Three   | Brand One   | Six_Ten     | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11      |
...
|   20 |    31 |        9 | Product Four One    | Brand Nine  | Ten_Four    | {"prop1":79,"prop2":42,"prop3":"One"} | 12,13,14   |
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
20 rows in set (0.01 sec)
+-------------+----------+
| brand_name  | count(*) |
+-------------+----------+
| Brand One   |     1013 |
| Brand Two   |      990 |
| Brand Three |     1016 |
| Brand Four  |      994 |
| Brand Five  |     1007 |
| Brand Six   |     1039 |
| Brand Seven |      965 |
| Brand Eight |     1033 |
| Brand Nine  |      944 |
| Brand Ten   |      998 |
+-------------+----------+
10 rows in set (0.01 sec)
+-------------+----------+
| brand_name  | count(*) |
+-------------+----------+
| Brand Eight |     1033 |
| Brand Five  |     1007 |
| Brand Four  |      994 |
| Brand Nine  |      944 |
| Brand One   |     1013 |
| Brand Seven |      965 |
| Brand Six   |     1039 |
| Brand Ten   |      998 |
| Brand Three |     1016 |
| Brand Two   |      990 |
+-------------+----------+
10 rows in set (0.01 sec)
+-------------+----------+
| brand_name  | count(*) |
+-------------+----------+
| Brand Six   |     1039 |
| Brand Eight |     1033 |
| Brand Three |     1016 |
| Brand One   |     1013 |
| Brand Five  |     1007 |
| Brand Ten   |      998 |
| Brand Four  |      994 |
| Brand Two   |      990 |
| Brand Seven |      965 |
| Brand Nine  |      944 |
+-------------+----------+
10 rows in set (0.01 sec)

Facet结果的大小

默认情况下,每个Facet结果集仅限于20个值。可以通过LIMIT子句单独为每个Facet控制Facet值的数量,提供返回值的数量格式LIMIT count或使用偏移量LIMIT offset, count

返回的最大Facet值数量受查询的max_matches设置限制。如果您想实现动态max_matches(限制max_matches为偏移量+每页以提高性能),必须考虑到过低的max_matches值可能会影响Facet值的数量。在这种情况下,应使用足以覆盖Facet值数量的最小max_matches值。

‹›
  • SQL
  • JSON
  • PHP
  • Python
  • Python-asyncio
  • Javascript
  • Java
  • C#
  • Rust
  • TypeScript
  • Go
📋
SELECT * FROM facetdemo
FACET brand_name BY brand_id ORDER BY FACET() ASC  LIMIT 0,1
FACET brand_name BY brand_id ORDER BY brand_name ASC LIMIT 2,4
FACET brand_name BY brand_id order BY COUNT(*) DESC LIMIT 4;
‹›
Response
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
| id   | price | brand_id | title               | brand_name  | property    | j                                     | categories |
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
|    1 |   306 |        1 | Product Ten Three   | Brand One   | Six_Ten     | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11      |
...
|   20 |    31 |        9 | Product Four One    | Brand Nine  | Ten_Four    | {"prop1":79,"prop2":42,"prop3":"One"} | 12,13,14   |
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
20 rows in set (0.01 sec)
+-------------+----------+
| brand_name  | count(*) |
+-------------+----------+
| Brand One   |     1013 |
+-------------+----------+
1 rows in set (0.01 sec)
+-------------+----------+
| brand_name  | count(*) |
+-------------+----------+
| Brand Four  |      994 |
| Brand Nine  |      944 |
| Brand One   |     1013 |
| Brand Seven |      965 |
+-------------+----------+
4 rows in set (0.01 sec)
+-------------+----------+
| brand_name  | count(*) |
+-------------+----------+
| Brand Six   |     1039 |
| Brand Eight |     1033 |
| Brand Three |     1016 |
+-------------+----------+
3 rows in set (0.01 sec)

返回的结果集

当使用SQL时,带有 facets 的搜索会返回多个结果集。MySQL客户端/库/连接器所使用的 必须 支持多个结果集,以便访问 facets 结果集。

性能

内部,FACET 是执行多查询的一个快捷方式,其中第一个查询包含主要搜索查询,批次中的其余查询各自包含聚类。正如多查询的情况一样,对于带有 facets 的搜索,常见的查询优化可以启动,这意味着搜索查询只执行一次,而 facets 则在搜索查询结果上操作,每个 facet 只增加总查询时间的一小部分。

要检查 facets 搜索是否以优化模式运行,可以在 查询日志 中查找,其中所有记录的查询将包含一个 xN 字符串,N 是优化组中运行的查询数量。或者,可以检查 SHOW META 语句的输出,该语句将显示一个 multiplier 指标:

‹›
  • SQL
SQL
📋
SELECT * FROM facetdemo FACET brand_id FACET price FACET categories;
SHOW META LIKE 'multiplier';
‹›
Response
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
| id   | price | brand_id | title               | brand_name  | property    | j                                     | categories |
+------+-------+----------+---------------------+-------------+-------------+---------------------------------------+------------+
|    1 |   306 |        1 | Product Ten Three   | Brand One   | Six_Ten     | {"prop1":66,"prop2":91,"prop3":"One"} | 10,11      |
...
+----------+----------+
| brand_id | count(*) |
+----------+----------+
|        1 |     1013 |
...
+-------+----------+
| price | count(*) |
+-------+----------+
|   306 |        7 |
...
+------------+----------+
| categories | count(*) |
+------------+----------+
|         10 |     2436 |
...
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| multiplier    | 4     |
+---------------+-------+
1 row in set (0.00 sec)
Last modified: August 28, 2025