One of the unittests in Moodle LMS started failing on MariaDB since testing docker image was upgraded to version 10.5.10.
The same unittest and the same query was passing on MariaDB 10.5.9. Also it passes on MySQL, Postgres, MsSQL and Oracle (all databases supported by Moodle).
I have created an SQL file to demonstrate the problem. It creates two database tables, fills them with the data and performs a query:
GROUP BY itemid) maxquery ON h.id = maxquery.id AND h.itemid = maxquery.itemid
JOIN grade_items gi ON gi.id = h.itemid
WHERE gi.courseid = 128000;
This query is slightly simplified from what we actually use in Moodle in order to demonstrate the problem.
On MariaDB 10.5.9 and all other databases it returns:
id itemtype itemmodule userid rawgrade
330004 course NULL 131000 NULL
330003 mod assign 131000 50.00000
On MariaDB 10.5.10 it returns:
id itemtype itemmodule userid rawgrade
330004 course NULL 131000 NULL
To make it even more interesting, the following query (using "LEFT JOIN") returns correct results. This is even more confusing because in the return values you can see that data in the grade_items table is present and it is actually an inner join.
Just as with the previous commenter, we are doing a left join against a derived table which contains aggregates, and there are 2 ON conditions in the join.
I tried to reduce the problem to the simplest possible test case, but I've found it extremely difficult to pin down. Sometimes we get the right results, and sometimes we don't, even when using the exact same SQL test script in the setup. Furthermore, sometimes we get the right results when the test data is first added, then a few minutes later, we will start getting the wrong results.
I have seen it change from correct to incorrect many times in my testing. But I have never seen it change back from incorrect to correct. Once it breaks, it seems to stay broken.
When we are getting the incorrect results, there are always 2 'LATERAL DERIVED' rows in the EXPLAIN output.
Daniel Howard
added a comment - I have encountered this issue again in v10.5.13.
Just as with the previous commenter, we are doing a left join against a derived table which contains aggregates, and there are 2 ON conditions in the join.
I tried to reduce the problem to the simplest possible test case, but I've found it extremely difficult to pin down. Sometimes we get the right results, and sometimes we don't, even when using the exact same SQL test script in the setup. Furthermore, sometimes we get the right results when the test data is first added, then a few minutes later, we will start getting the wrong results.
I have seen it change from correct to incorrect many times in my testing. But I have never seen it change back from incorrect to correct. Once it breaks, it seems to stay broken.
When we are getting the incorrect results, there are always 2 'LATERAL DERIVED' rows in the EXPLAIN output.
danhowardmws,
Please provide the query returning a wrong result and the output from EXPLAIN FORMAT=JSON for this query.
Igor Babaev (Inactive)
added a comment - danhowardmws ,
Please provide the query returning a wrong result and the output from EXPLAIN FORMAT=JSON for this query.
On version 10.5.13, the test script below reliably exhibits the bug. I run the setup script to create the tables and populate them with some dummy data. The query below counts the number of transaction_item rows for each (ledger_id, charge_id) pair. I've been careful in my test data to ensure that there is only ever 1 transaction_item row for each (ledger_id, charge_id) pair. Usually when I first run the query, I get the correct results (this is obvious because we see from_num_rows=1 on every row in the results set). After a short time (less than 1 minute for me), I start getting the incorrect results, and we see from_num_rows=2 on every row in the results set. I've captured below the output from EXPLAIN FORMAT=JSON for the same query, before and after it starts failing.
Note that that the number of rows of dummy data I have seems to be significant. The more rows I have, the faster the query starts giving incorrect results.
Setup:
DROPTABLEIF EXISTS transaction_items;
DROPTABLEIF EXISTS transactions;
DROPTABLEIF EXISTS charges;
DROPTABLEIF EXISTS ledgers;
CREATETABLE ledgers (
id BIGINT UNSIGNED NOTNULL AUTO_INCREMENT PRIMARYKEY,
nameVARCHAR(32)
);
CREATETABLE charges (
id BIGINT UNSIGNED NOTNULL AUTO_INCREMENT PRIMARYKEY,
Real-life problem that brought me here is that Openstack malfunctions because of this error (the amount of supposedly allocated VCPUs is way above the actual number, because the query used by "placement" API also accounts for other resources, especially memory, too).
Jens-U. Mozdzen
added a comment - - edited I see the same problem in MariaDB v10.6.5, using the queries provided by Daniel:
+----+----------------+--------------+---------------+
| id | from_ledger_id | to_ledger_id | from_num_rows |
+----+----------------+--------------+---------------+
| 2 | 1 | 2 | 2 |
| 3 | 1 | 2 | 2 |
| 5 | 3 | 2 | 2 |
| 8 | 3 | 2 | 2 |
| 10 | 3 | 2 | 2 |
| 12 | 3 | 2 | 2 |
| 13 | 3 | 2 | 2 |
| 18 | 1 | 2 | 2 |
+----+----------------+--------------+---------------+
8 rows in set (0,001 sec)
EXPLAIN output:
{
"query_block": {
"select_id": 1,
"table": {
"table_name": "charges",
"access_type": "ALL",
"possible_keys": ["fk_charge_to_ledger"],
"rows": 20,
"filtered": 40,
"attached_condition": "charges.to_ledger_id = 2"
},
"table": {
"table_name": "<derived2>",
"access_type": "ref",
"possible_keys": ["key0"],
"key": "key0",
"key_length": "18",
"used_key_parts": ["ledger_id", "charge_id"],
"ref": ["mariadbtest.charges.from_ledger_id", "mariadbtest.charges.id"],
"rows": 2,
"filtered": 100,
"materialized": {
"lateral": 1,
"query_block": {
"select_id": 2,
"table": {
"table_name": "transaction_items",
"access_type": "ref",
"possible_keys": ["fk_items_transaction", "fk_items_charge"],
"key": "fk_items_charge",
"key_length": "9",
"used_key_parts": ["charge_id"],
"ref": ["mariadbtest.charges.id"],
"rows": 1,
"filtered": 100
},
"table": {
"table_name": "transactions",
"access_type": "eq_ref",
"possible_keys": ["PRIMARY", "fk_transactions_ledger"],
"key": "PRIMARY",
"key_length": "8",
"used_key_parts": ["id"],
"ref": ["mariadbtest.transaction_items.transaction_id"],
"rows": 1,
"filtered": 100
}
}
}
}
}
}
Real-life problem that brought me here is that Openstack malfunctions because of this error (the amount of supposedly allocated VCPUs is way above the actual number, because the query used by "placement" API also accounts for other resources, especially memory, too).
We met with the same issue from Openstack/Placement malfunction(as Jens-U.Mozdzen mentioned) on Mariadb version 10.5.13.
And using the SQL queried provided by Daniel, can confirm it breaks in versions 10.5.12, 10.5.13, and 10.6.5.
But in versions 10.5.15 and 10.6.7, the issue looks be fixed, and we cannot replicate the issue anymore after the upgrade. Although I cannot see any relevant info in their release notes.
Shi Yan
added a comment - We met with the same issue from Openstack/Placement malfunction(as Jens-U.Mozdzen mentioned) on Mariadb version 10.5.13.
And using the SQL queried provided by Daniel, can confirm it breaks in versions 10.5.12, 10.5.13, and 10.6.5.
But in versions 10.5.15 and 10.6.7, the issue looks be fixed, and we cannot replicate the issue anymore after the upgrade. Although I cannot see any relevant info in their release notes.
People
Igor Babaev (Inactive)
Marina Glancy
Votes:
1Vote for this issue
Watchers:
18Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
I have encountered this issue again in v10.5.13.
Just as with the previous commenter, we are doing a left join against a derived table which contains aggregates, and there are 2 ON conditions in the join.
I tried to reduce the problem to the simplest possible test case, but I've found it extremely difficult to pin down. Sometimes we get the right results, and sometimes we don't, even when using the exact same SQL test script in the setup. Furthermore, sometimes we get the right results when the test data is first added, then a few minutes later, we will start getting the wrong results.
I have seen it change from correct to incorrect many times in my testing. But I have never seen it change back from incorrect to correct. Once it breaks, it seems to stay broken.
When we are getting the incorrect results, there are always 2 'LATERAL DERIVED' rows in the EXPLAIN output.