I had a spot of bother trying to construct an OR query using the python elasticsearch_dsl. My end game was to retrieve elastic documents that referenced particular controlled terms (eg. styles or materials) in their record.

Between terms I wanted the query to be constructed as an OR logical expression so I’m interested documents that contain either styles.id=’X123’ OR styles.id=’X1234’ OR materials.id=’A12345’

In elasticsearch terms, this is the query I wanted to recreate:

{
  "query": {
    "bool": {
      "should": [
        {
          "terms": {
             "styles.id": [
              "X123",
              "X1234"
            ]
          }
        },
        {
          "materials": {
            "materials.id": ["A12345"]
          }
        }
      ]
    }
  }
}

I’m holding my list of ‘terms’ and the relevant ids in a dictionary called keyword_filters:

{
  'styles.id': ['X123', 'X1234'],
  'materials.id': ['A12345']
}

I thought my mis-steps with the elasticsearch_dsl syntax might be worth documenting as code smippets for potential re-use.

Filters with Elasticsearch_dsl

if keyword_filters is not None:
    for dict_key in keyword_filters:
        s = s.filter('terms', **{dict_key: keyword_filters[dict_key]})
# print(json.dumps(s.to_dict(), indent=2))

this generates the following query:

{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "styles.id": [
              "X123",
              "X1234"
            ]
          }
        },
        {
          "terms": {
            "materials.id": [
              "A12345"
            ]
          }
        }
      ]
    }
  }
}

Not surprisingly this doesn’t work as in filter context all terms must match. So here I am asking for docs where the styles.id is “X123” or “X1234” and the materials.id is “A12345”.

The form of the query is almost right though. I just want to switch out the word filter for should in the boolean construct.

Randomly hacking stuff together

In my attempt to switch the word filter for should, I started hacking togather some code.

if keyword_filters is not None:
    for dict_key in keyword_filters:
        s = s.query('bool', should=[
                    Q('terms', ** {dict_key: keyword_filters[dict_key]})])
# print(json.dumps(s.to_dict(), indent=2))

results in this, which is not at all what I am after:

{
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "styles.id": [
              "X123",
              "X1234"
            ]
          }
        },
        {
          "terms": {
            "materials.id": [
              "A12345"
            ]
          }
        }
      ]
    }
  }
}

It has the same effect as the filter query, only as this is now in query context, the bool must now contributes to the overall score.

If this was the affect I was after it probably ought to be re-written as:

if keyword_filters is not None:
    for dict_key in keyword_filters:
        s = s.query('bool', must=[
                    Q('terms', ** {dict_key: keyword_filters[dict_key]})])

Nesting Q with Elasticsearch_dsl to achieve should

There’s a section in the official docs on Query Combination but it doesn’t quite reveal how to do what I want. This combination seems to do the trick:

queries = []
  if keyword_filters is not None:
      for dict_key in keyword_filters:
          queries.append(Q("terms", **{dict_key: keyword_filters[dict_key]}))
          # print(queries)
  s = s.query(Q('bool', should=queries))
  # print(json.dumps(s.to_dict(), indent=2))

In this case, the queries list looks like this:

[
  Terms(styles__id=['X123', 'X1234']),
  Terms(materials__id=['A12345'])
]

and the search query that results:

"query": {
    "bool": {
      "should": [
        {
          "terms": {
            "styles.id": [
              "X123",
              "X1234"
            ]
          }
        },
        {
          "terms": {
            "materials.id": [
              "A12345"
            ]
          }
        }
      ]
    }
  }
}

Bingo - that’s what I am after! I probably ought to get to grips with how elasticsearch_dsl constructs boolean queries with its Q construct. It would safe me ages trying to hack together the winning spell.