[MDEV-26232] Performance Regression in queries against mixed BIGINT values Created: 2021-07-24 Updated: 2023-04-27 |
|
| Status: | Confirmed |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Affects Version/s: | 10.4.20 |
| Fix Version/s: | 10.4, 10.5, 10.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Josh Bello | Assignee: | Oleksandr Byelkin |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | SELECT, performance, regression | ||
| Environment: |
CentOS 7 |
||
| Description |
|
We are experiencing what appears to be a regression in the query planner in the process of testing an upgrade from MariaDB 10.2.39 to 10.4.20. The table definition we are running the query against is as follows:
The query that I am running is as follows:
The query returns 49 rows in 0.00 sec on MariaDB 10.2, but consistently takes over 8-9 seconds returning the same 49 rows on MariaDB 10.4. We have been running similar queries for many years on MySQL 5.5 and MariaDB 10.2 against even much longer sets of unsigned BIGINT values, and the results have always been nearly instant. Here is the EXPLAIN EXTENDED output on the query on MariaDB 10.2:
Here is the EXPLAIN EXTENDED output on the query on MariaDB 10.4:
As you see in the schema above, the value for 'token' is an unsigned BIGINT. In the query we are running, 24 of the values are below the max size of a signed BIGINT (9223372036854775807), 25 of the values are above that size. Here is where it gets interesting... When we run the same query on MariaDB 10.4 but only include the 24 values below the signed BIGINT max size, the results return in 0.002 seconds, with the following EXPLAIN EXTENDED output:
Running the same query again, but this time against the 25 values that are above the signed BIGINT max size, the results return in 0.001 seconds, with the following EXPLAIN EXTENDED output:
Finally, if when running the same query against 24 values that are both above and below the unsigned BIGINT limit, the query takes 3.305 seconds and the EXPLAIN EXTENDED reports:
We have tried similar tests against table structures that use a simple primary key on 'token' instead of the composite key above, but the performance characteristics are similar: fast when values do not straddle the max signed BIGINT limit, very slow when they do. The only workaround we have found is by using OR rather than IN, but that is not a solution which will work for us in all use cases. |
| Comments |
| Comment by Josh Bello [ 2021-07-26 ] | |||||||||||||||||||||||||||||||||||||
|
For more complete info as we dig into this, please see the following JSON explains: First, a query for integers that span the range of signed and unsigned BIGINT values:
Now for a similar quantity of integers, but ones which do not span both signed and unsigned BIGINT:
| |||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2021-08-03 ] | |||||||||||||||||||||||||||||||||||||
|
This is the effect of bugs like Small positive integers are considered signed, so when you mix small and big positive integers you get DECIMAL too. I think we can implement a fix where small positive integers can be treated as "neither signed nor unsigned" and do not contribute to the decision "oh, we have a signed/unsigned mix, need to use decimal". Like this:
|