Elasticsearch & Function Score Queries for Custom Recommendations

Elasticsearch

Elasticsearch

In most cases, you can trust the default scoring algorithms in Elasticsearch to return the most relevant results first. You are discouraged from tightly controlling scores, but encouraged to trust that smart people set reasonable defaults and got the complex math right. For typical full-text search scenarios, this works out. All you have to do is add the right filters for your application and the results will be good enough.

In some cases, you need more control. You may need to implement a very specific scoring formula. In this situation, you can take control of your query’s scoring using a function score query.

Function score queries have two parts. The first is a base query that finds the overall pool of results you want. The second part is a list of functions which are used to adjust the scoring.

Each function is composed of an optional filter that tells ES which records should have their scores adjusted (defaults to “all records”), and a description of how to adjust the score.

Example Time

Let’s say you were building a recommendation engine for homebrew beer recipes. For each recipe in the system, we want to be able to recommend similar recipes. In beer brewing, there are several numbers we can use to evaluate the similarity of two beers, for example: original gravity (amount of sugar before fermentation), international bitterness units (bitterness), and SRM (color).

In our index, every recipe would have a document like this:

{
  "name": "Three Weisse Guys",
  "og": 1.043,
  "ibus": 26,
  "srm": 5.1
}

a query to find the most similar beers might look something like this:

 1 {
 2   "query": {
 3     "function_score": {
 4       "query": {
 5         "match_all": {}
 6       },
 7       "functions": [
 8         {
 9           "gauss": {
10             "og": {
11               "origin": 1.0,
12               "scale": 0.05,
13               "decay": 0.5,
14             }
15           }
16         }, // snip
17       [
18     }
19   }
20 }

Explanation– Lines 4-6: start with all records. Lines 8-16: change the scoring
of all records (because no filter was given). Specifically, use the gaussian
distance scoring function to compare two numeric fields. Origin is the value you
are comparing with each record– 1.0. Scale 0.05 and decay 0.5
mean “reduce the score by half for if the distance from the origin is 0.05.”

With the function score query you’ll have to do more experimentation to get your
results just how you want them.

For more details on which scoring functions are available and for more advanced configuration, see the Elasticsearch reference for function score query.

More Posts by Robert Prehn: