Basics of Storing Signals in Solr with Fusion for Data Engineers - Lucidworks.com
In general, signals are useful any time information about outside activity, such as user behavior, can be used to improve the quality of search results. Signals are particularly useful in e-commerce applications, where they can be used to make recommendations as well as to improve search. Signal data comes from server logs and transaction databases which record items that users search for, view, click on, like, or purchase. For example, clickstream data which records a user's search query together with the item which was ultimately clicked on is treated as one "click" signal and can be used to:
- "other customers who entered this query clicked on that"
- "customers who bought this also bought that"
Signals Key Concepts
- A signal is a piece of information, event, or action, e.g., user queries, clicks, and other recorded actions that can be related back to a document or documents which are stored in a Fusion collection, referred to as the "primary collection".
- A signal has a type, an id, and a timestamp. For example, signals from clickstream information are of type "click" and signals derived from query logs are of type "query".
- Signals are stored in an auxiliary collection and naming conventions link the two so that the name of the signals collection is the name the primary collection plus the suffix "_signals".
- An aggregation is the result of processing a stream of signals into a set of summaries that can be used to improve the search experience. Aggregation is necessary because in the usual case there is a high volume of signals flowing into the system but each signal contains only a small amount of information in and of itself.
- Aggregations are stored in an auxiliary collection and naming conventions link the two so that the name of the aggregations collection is the name the primary collection plus the suffix "_signals_aggr".
- Query pipelines use aggregated signals to boost search results.
- Fusion provides an extensive library of aggregation functions allowing for complex models of user behavior. In particular, date-time functions provide a temporal decay function so that over time, older signals are automatically downweighted.
- Fusion's job scheduler provides the mechanism for processing signals and aggregations collections in near real-time.
Read full article from Basics of Storing Signals in Solr with Fusion for Data Engineers - Lucidworks.com
No comments:
Post a Comment