I had a spot of bother trying to construct an OR query using the python elasticsearch_dsl. My end game was to retrieve elastic documents that referenced particular controlled terms (eg. styles or materials) in their record.
Between terms I wanted the query to be constructed as an OR logical expression so I’m interested documents that contain either styles.id=’X123’ OR styles.id=’X1234’ OR materials.id=’A12345’
In elasticsearch terms, this is the query I wanted to recreate:
{
"query": {
"bool": {
"should": [
{
"terms": {
"styles.id": [
"X123",
"X1234"
]
}
},
{
"materials": {
"materials.id": ["A12345"]
}
}
]
}
}
}
I’m holding my list of ‘terms’ and the relevant ids in a dictionary called keyword_filters
:
{
'styles.id': ['X123', 'X1234'],
'materials.id': ['A12345']
}
I thought my mis-steps with the elasticsearch_dsl syntax might be worth documenting as code smippets for potential re-use.
Filters with Elasticsearch_dsl
if keyword_filters is not None:
for dict_key in keyword_filters:
s = s.filter('terms', **{dict_key: keyword_filters[dict_key]})
# print(json.dumps(s.to_dict(), indent=2))
this generates the following query:
{
"query": {
"bool": {
"filter": [
{
"terms": {
"styles.id": [
"X123",
"X1234"
]
}
},
{
"terms": {
"materials.id": [
"A12345"
]
}
}
]
}
}
}
Not surprisingly this doesn’t work as in filter context all terms must match. So here I am asking for docs where the styles.id is “X123” or “X1234” and the materials.id is “A12345”.
The form of the query is almost right though. I just want to switch out the word filter
for should
in the boolean construct.
Randomly hacking stuff together
In my attempt to switch the word filter
for should
, I started hacking togather some code.
if keyword_filters is not None:
for dict_key in keyword_filters:
s = s.query('bool', should=[
Q('terms', ** {dict_key: keyword_filters[dict_key]})])
# print(json.dumps(s.to_dict(), indent=2))
results in this, which is not at all what I am after:
{
"query": {
"bool": {
"must": [
{
"terms": {
"styles.id": [
"X123",
"X1234"
]
}
},
{
"terms": {
"materials.id": [
"A12345"
]
}
}
]
}
}
}
It has the same effect as the filter query, only as this is now in query context, the bool must
now contributes to the overall score.
If this was the affect I was after it probably ought to be re-written as:
if keyword_filters is not None:
for dict_key in keyword_filters:
s = s.query('bool', must=[
Q('terms', ** {dict_key: keyword_filters[dict_key]})])
Nesting Q with Elasticsearch_dsl to achieve should
There’s a section in the official docs on Query Combination but it doesn’t quite reveal how to do what I want. This combination seems to do the trick:
queries = []
if keyword_filters is not None:
for dict_key in keyword_filters:
queries.append(Q("terms", **{dict_key: keyword_filters[dict_key]}))
# print(queries)
s = s.query(Q('bool', should=queries))
# print(json.dumps(s.to_dict(), indent=2))
In this case, the queries list looks like this:
[
Terms(styles__id=['X123', 'X1234']),
Terms(materials__id=['A12345'])
]
and the search query that results:
"query": {
"bool": {
"should": [
{
"terms": {
"styles.id": [
"X123",
"X1234"
]
}
},
{
"terms": {
"materials.id": [
"A12345"
]
}
}
]
}
}
}
Bingo - that’s what I am after!
I probably ought to get to grips with how elasticsearch_dsl constructs boolean queries with its Q
construct. It would safe me ages trying to hack together the winning spell.
Comments