Optimize TaskStateCounts aggregate pipeline #617

tgjohnst · 2019-09-10T05:00:40Z

In its current state, MongoDB does not make use of indexes in $group operations within an aggregate pipeline. If a lot of tasks exist in the db, this means that this simple aggregate -> sum operation will take a long time as it has to access all documents in the db and cannot just use an index. In real world usage, this was the longest frequently running query we noticed in the logs and it caused significant load on our db.

According to the docs, $match and $sort both are capable of using said index, and inserting them at the beginning of the aggregate pipeline yields a roughly five and ten-fold decrease in query runtime, respectively (according to 1k test queries run by @kmavrommatis on our db). This happens because $group can then operate on the $match or $sort results rather than polling all the documents each time.

for additional reference, see discussion at https://jira.mongodb.org/browse/SERVER-29444 and related issues regarding the proposal of eventually allowing simple group calls like this to use covered indices.

In its current state, MongoDB does not use indices in $group operations within an aggregate pipeline. If a lot of tasks exist, this means that the simple aggregate sum operation will take a long time as it has to access all documents in the db and cannot just use the index of the state field. In real world usage, this was the longest frequently running query and caused significant load on our db. However, $match and $sort both are capable of using said index, and inserting them at the beginning of the pipeline yields a roughly ten-fold decrease in query runtime (in our hands) as $group can then operate on the $match or $sort results rather than polling all documents.

ran gofmt and this is the proper syntax

adamstruck

Nice catch and thanks for the contribution!

tgjohnst added 2 commits September 9, 2019 21:54

remove unnecessary bson.M aliases (gofmt fix)

2757874

ran gofmt and this is the proper syntax

adamstruck approved these changes Sep 10, 2019

View reviewed changes

adamstruck merged commit 5a151d5 into ohsu-comp-bio:master Sep 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize TaskStateCounts aggregate pipeline #617

Optimize TaskStateCounts aggregate pipeline #617

tgjohnst commented Sep 10, 2019 •

edited

Loading

adamstruck left a comment

Optimize TaskStateCounts aggregate pipeline #617

Optimize TaskStateCounts aggregate pipeline #617

Conversation

tgjohnst commented Sep 10, 2019 • edited Loading

adamstruck left a comment

Choose a reason for hiding this comment

tgjohnst commented Sep 10, 2019 •

edited

Loading