Adding data from tables

Killlist in plain tables

When using plain tables there is a problem generated by the need of having the data in the table as fresh as possible.

In this case one or more secondary (also know as delta) tables are used to capture the modified data between the time the main table was created and and current time. The modified data can mean new, updated or deleted documents. The search becomes a search over the main table and the delta table. This works with no obstacle when you just add new documents to the delta table, but when it comes to updated or deleted documents there remains the following issue.

If a document is present in both main and delta tables it can cause issues at searching as the engine will see two versions of a document and won't know how to pick the right one. So the delta needs to tell somehow to the search that there are deleted documents in the main table that should be forgotten. Here comes kill lists.

Table kill-list

Table can maintain a list of document ids that can be used to suppress records in other tables. This feature is available for plain tables using database sources or plain tables using XML sources. In case of database sources, the source needs to provide an additional query defined by sql_query_killlist. It will store in the table a list of documents that can be used by the server to remove documents from other plain tables.

This query is expected to return a number of 1-column rows, each containing just the document ID.

In many cases the query is a union between a query that gets a list of updated documents and a list of deleted documents, e.g.:

sql_query_killlist = \
    SELECT id FROM documents WHERE updated_ts>=@last_reindex UNION \
    SELECT id FROM documents_deleted WHERE deleted_ts>=@last_reindex

Removing documents in a plain table

A plain table can contain a directive called killlist_target that will tell the server it can provide a list of document ids that should be removed from certain existing tables. The table can use either it's document ids as the source for this list or provide a separate list.

killlist_target

Sets the table(s) that the kill-list will be applied to. Optional, default value is empty.

When you use plain tables you often need to maintain not a single table, but a set of them to be able to add/update/delete new documents sooner (read about delta table updates). In order to suppress matches in the previous (main) table that were updated or deleted in the next (delta) table you need to:

  1. Create a kill-list in the delta table using sql_query_killlist
  2. Specify main table as killlist_target in delta table settings:
‹›
  • CONFIG
CONFIG
📋
table products {
  killlist_target = main:kl

  path = products
  source = src_base
}

When killlist_target is specified, kill-list is applied to all the tables listed in it on searchd startup. If any of the tables from killlist_target are rotated, kill-list is reapplied to these tables. When kill-list is applied, tables that were affected save these changes to disk.

killlist_target has 3 modes of operation:

  1. killlist_target = main:kl. Document ids from the kill-list of the delta table are suppressed in the main table (see sql_query_killlist).
  2. killlist_target = main:id. All document ids from delta table are suppressed in the main table. Kill-list is ignored.
  3. killlist_target = main. Both document ids from delta table and its kill-list are suppressed in the main table.

Multiple targets can be specified separated by comma like

killlist_target = table_one:kl,table_two:kl

You can change killlist_target settings for a table without rebuilding it by using ALTER.

But since the 'old' main table has already written the changes to disk, the documents that were deleted in it will remain deleted even if it is no longer in the killlist_target of the delta table.

‹›
  • SQL
  • HTTP
📋
ALTER TABLE delta KILLLIST_TARGET='new_main_table:kl'