Couchbase: Scans and Pushdowns
Couchbase has numerous query optimizations, and the purpose is to get the query plan as much optimized as possible. Scans and pushdowns are one of these optimizations. Let’s first start with scans.
Note: Before we start, I assume that you have basic knowledge about Couchbase Index, Query and Data Services.
Scans
When a query request hits the Query Service, the service will transform predicates to ranges and give these ranges to Index Service. These ranges are Scans. If you explain a query plan that uses GSI(Global Secondary Index), you can see the scans section. For example:
As we can see in the picture, spans have “ranges”. In this range section, there are three properties. These are: “high”, “inclusion” and “low”. What is the meaning of these? Let’s see together.
- “High” means upper bound. If it does not exist, it indicates that there is no upper bound.
- “Low” means lower bound. If it does not exist, it indicates that there is no lower bound.
- “Inclusion” intimates if High and Low fields are involved.
Scans are not just for numbers. It is an explanation for predicates. Let’s see how scans behave in strings.
Our range starts with “Bruce” and goes until all combinations begin with the “Bruce” string. “f” is the next character after “e”(last character of the “Bruce” string), so that is our high upper bound.
Scans have a lot of expressions like these examples. You can check Couchbase’s document for more details.
So we now know what scan is. Query service produces scans and passes them to Index Service. For what? What will Index Service do with it? That question directs us to our second topic, pushdowns.
Pushdowns
Query Service has a lot of responsibility, a lot of work to do. Fortunately, it has a hero to run its business efficiently, and that is Index Service. Do not think that Index Service is just indexing data and sitting there. It has a lot of abilities like pagination, ordering, point, and range scans.
When Query Service produces its predicates and selects which indexes to use, it tries to address Index Service work. In that way, Query Service does less job and fewer data transfers from Index Service to Query Service. This is what pushdown is.
Index Service stores its index keys in sorted order. Let’s perceive this from an example. Assume that we have data that holds superheroes’ names and their heights. Here is an example:
Now we will create an index on height.
When we create the index on height, Index Service will sort it as I said before. If we now select “id” from data with “height”, Query Service will get the data from Index Service. It doesn’t have to go Data Service next because Index Service has “height” and “id”. You can say that we did not index the “id” field. That’s because Couchbase implicitly indexes document id.
We can imagine Index Service stores superheroes in sorted order by height like this:
Note: Captain America’s height: 188cm ~= 6'2"
Now let’s explain our select query and see what it has.
We can see that our index covers both height(1) and id(2).
At the start of the Pushdown section, I said Index Service has abilities like pagination. If Index Service stores data in sorted order, we can get data with Limit, Offset, Max, Min, etc. If you want to go deeper about these functionalities, you can check out official documents.
Conclusion
In this article, we learned what Scans and Pushdowns are in Couchbase. This article is my first blog post. I am open to any advice and comments.
Thank you for reading, May the force be with you!