In our search app we have the concept of narrow search and broad search where filtered terms would be ANDed or ORed respectively. In a broad search for objects with colour=blue,yellow we return objects that were either blue OR yellow. In a narrow search we would return objects that were both blue AND yellow.

All Coloured Objects

Things got a bit more confusing when we added the concept of negation eg. NOT red colour=blue,yellow,-red where a hyphen represents a negation of the colour term.

For a broad search if we say NOT red OR NOT green OR yellow OR blue you end up with everything, which doesn’t seem terribly helpful when someone goes to the bother of entering colour=blue,yellow,-red,-green into a url. It’s a dangerous game trying to second guess what a user wants when they embark on a search but surely it is not that?!

I thought instead that I wanted [NOT red OR NOT green] AND [blue OR yellow] for a broad search and [NOT red AND NOT green] AND [blue AND yellow] for a narrow search but decided that wherever a negation is used thats a pretty strong statement of intent so I’m now assuming the user may prefer [NOT red AND NOT green] AND [blue OR yellow]

Broad Search

My assumption is quite fortunate for me as [NOT R OR NOT G] is in affect a should_not clause which isn’t a supported feature and I couldn’t fathom out how to achieve it in elasticsearch or elasticsearch_dsl.

Here’s a quote from https://stackoverflow.com/a/35067135/3258059 showing how the should_not concept could really be translated to a must_not anyway:

Since should is roughly equivalent to a boolean OR (i.e. return documents where A or B or C is true), then one way to think of “should_not” would be roughly equivalent to NOT (A OR B OR C). In boolean logic, this is the same as NOT A AND NOT B AND NOT C. In this case, “should_not” behavior would be accomplished by simply adding all your clauses to the must_not section of the bool filter.

I did loads of experimenting while trying to decide what was required and how I could possibly get the elasticsearch_dsl syntax to play ball. I thought it might be worth documenting some of the successes and fails as I find the web quite short on examples of elasticsearch_dsl syntax for more complex queries.

If you want to follow along and recreate an index of objects with colours to experiment with. This is the bulk instruction that you can paste into kibana to get combinations of coloured objects (I might have missed one) in an index called venn_colour.

POST _bulk
{ "index" : { "_index" : "venn_colour", "_id" : "obj1 RGBY" } }
{ "systemNumber" : "obj1 RGBY", "colours": [{"text":"red","id":"R"},{"text":"green","id":"G"},{"text":"blue","id":"B"},{"text":"yellow","id":"Y"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj2 RGB" } }
{ "systemNumber" : "obj2 RGB-", "colours": [{"text":"red","id":"R"},{"text":"green","id":"G"},{"text":"blue","id":"B"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj3 RG" } }
{ "systemNumber" : "obj3 RG--", "colours": [{"text":"red","id":"R"},{"text":"green","id":"G"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj4 R" } }
{ "systemNumber" : "obj4 R---", "colours": [{"text":"red","id":"R"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj5 GBY" } }
{ "systemNumber" : "obj5 -GBY", "colours": [{"text":"green","id":"G"},{"text":"blue","id":"B"},{"text":"yellow","id":"Y"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj6 BY" } }
{ "systemNumber" : "obj6 --BY", "colours": [{"text":"blue","id":"B"},{"text":"yellow","id":"Y"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj7 Y" } }
{ "systemNumber" : "obj7 ---Y", "colours": [{"text":"yellow","id":"Y"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj8 RBY" } }
{ "systemNumber" : "obj8 R-BY", "colours": [{"text":"red","id":"R"},{"text":"blue","id":"B"},{"text":"yellow","id":"Y"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj9 RY" } }
{ "systemNumber" : "obj9 R--Y", "colours": [{"text":"red","id":"R"},{"text":"yellow","id":"Y"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj10" } }
{ "systemNumber" : "obj10 ----", "colours": []}
{ "index" : { "_index" : "venn_colour", "_id" : "obj11 B" } }
{ "systemNumber" : "obj10 --B-", "colours": [{"text":"blue","id":"B"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj12 G" } }
{ "systemNumber" : "obj12 -G--", "colours": [{"text":"green","id":"G"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj13 GY" } }
{ "systemNumber" : "obj13 -G-Y", "colours": [{"text":"green","id":"G"},{"text":"yellow","id":"Y"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj14 GB" } }
{ "systemNumber" : "obj14 -GB", "colours": [{"text":"green","id":"G"},{"text":"blue","id":"B"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj15 RB" } }
{ "systemNumber" : "obj15 R-B-", "colours": [{"text":"red","id":"R"},{"text":"blue","id":"B"}]}
{ "index" : { "_index" : "venn_colour", "_id" : "obj16 RGY" } }
{ "systemNumber" : "obj16 RG-Y", "colours": [{"text":"red","id":"R"},{"text":"green","id":"G"},{"text":"yellow","id":"Y"}]}

In each python extract shown below I am passing the following objects which we generate as a result of the user passing the following query params into the url of our API ?id_colour=B&id_colour=Y&id_colour=-R&id_colour=-G:

keyword_filters = {'colours.id': ['B', 'Y']}
keyword_negation_filters = {'colours.id': ['R', 'G']}

This dsl extract is my first attempt and it gives me the desired outcome for both broad and narrow serches but I find the narrow search elasticsearch syntax to be complex, where I’m forced to use constant score to get a query to perform as a filter.

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
from elasticsearch_dsl.query import Q

for dict_key in keyword_filters:
        if narrow:
            narrow_query = []
            for term in keyword_filters[dict_key]:
                narrow_query.append(
                    Q("term", **{dict_key: term}))
            print("Narrow queries: ", narrow_query)
            s = s.query('constant_score', filter=Q(
                'bool', must=narrow_query))
        else:
            s = s.filter('terms', **{dict_key: keyword_filters[dict_key]})

    for dict_key in keyword_negation_filters:
        s = s.exclude(
            'terms', **{dict_key: keyword_negation_filters[dict_key]})

This is the resultant elasticsearch syntax for a Broad Search where the results are [NOT red AND NOT green] AND [blue OR yellow]:

GET venn_colour/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "colours.id.keyword": [
              "B",
              "Y"
            ]
          }
        },
        {
          "bool": {
            "must_not": [
              {
                "terms": {
                  "colours.id.keyword": [
                    "R",
                    "G"
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  },
  "_source": {
    "includes": [
      "systemNumber"
      ]
  }
}

This is the resultant elasticsearch syntax for a Narrow Search where the results are [NOT red AND NOT green] AND [blue AND yellow]:

GET venn_colour/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "must_not": [
              {
                "terms": {
                  "colours.id.keyword": [
                    "R",
                    "G"
                  ]
                }
              }
            ]
          }
        }
      ],
      "must": [
        {
          "constant_score": {
            "filter": {
              "bool": {
                "must": [
                  {
                    "term": {
                      "colours.id.keyword": "B"
                    }
                  },
                  {
                    "term": {
                      "colours.id.keyword": "Y"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  },
  "_source": {
    "includes": [
      "systemNumber"
      ]
  }
}

Narrow Search

This dsl attempt utilises the negation tilda ~ supported by elasticsearch for term negation and leads to a more consistent filter syntax so I will go with this approach.

    from elasticsearch import Elasticsearch
    from elasticsearch_dsl import Search
    from elasticsearch_dsl.query import Q

    narrow_query = []
    broad_query = []
    negate_query = []

    for dict_key in keyword_filters:
        for term in keyword_filters[dict_key]:
            if narrow:
                narrow_query.append(Q("term", **{dict_key: term}))
            else:
                broad_query.append(Q("term", **{dict_key: term}))

    for dict_key in keyword_negation_filters:
        negate_query.append(~Q(
            'terms', **{dict_key: keyword_negation_filters[dict_key]}))

    if(len(narrow_query) > 0):
        full_query = narrow_query + negate_query
        s = s.filter(Q('bool', must=full_query))
    else:
        s = s.filter(*negate_query)
        s = s.filter(Q('bool', should=broad_query))

if you amend that last bit of code to wrap everything in a should, for a truly broad search, you end up with everything. So I won’t do that.

else:
  full_query = broad_query + negate_query
  s = s.filter(Q('bool', should=full_query))

Here’s the resultant elasticsearch syntax for a Broad Search where the results are [NOT red AND NOT green] AND [blue OR yellow]:

GET venn_colour/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "must_not": [
              {
                "terms": {
                  "colours.id.keyword": [
                    "R",
                    "G"
                  ]
                }
              }
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "term": {
                  "colours.id.keyword": "B"
                }
              },
              {
                "term": {
                  "colours.id.keyword": "Y"
                }
              }
            ]
          }
        }
      ]
    }
  },
  "_source": {
    "includes": [
      "systemNumber"
      ]
  }
}

This is the resultant elasticsearch syntax for a Narrow Search where the results are [NOT red AND NOT green] AND [blue AND yellow]:

GET venn_colour/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "must": [
              {
                "term": {
                  "colours.id.keyword": "B"
                }
              },
              {
                "term": {
                  "colours.id.keyword": "Y"
                }
              },
              {
                "bool": {
                  "must_not": [
                    {
                      "terms": {
                        "colours.id.keyword": [
                          "R",
                          "G"
                        ]
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  },
  "_source": {
    "includes": [
      "systemNumber"
      ]
  }
}