[MDEV-32126] Assertion `!writer.checksum_len || writer.remains == 0' fails upon concurrent online ALTER and transactions with failing statements and binary log enabled Created: 2023-09-07 Updated: 2023-11-04 Resolved: 2023-11-04 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data Definition - Alter Table |
| Affects Version/s: | 11.2 |
| Fix Version/s: | 11.2.2 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Elena Stepanova | Assignee: | Nikita Malyavin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | regression | ||
| Issue Links: |
|
||||||||
| Description |
|
Set to blocker as it's a bug in a new feature in a soon-to-be GA.
|
| Comments |
| Comment by Nikita Malyavin [ 2023-09-19 ] | ||||||||||||||||||||||
|
The smaller test, leading to the same problem, however a different assertion, has been found:
| ||||||||||||||||||||||
| Comment by Nikita Malyavin [ 2023-09-19 ] | ||||||||||||||||||||||
|
Please review branch bb-11.2-nikita, commit b96b093d | ||||||||||||||||||||||
| Comment by Nikita Malyavin [ 2023-09-23 ] | ||||||||||||||||||||||
|
There is a lot of interference with binlog_hton enabled by online alter, which I couldn't fight by adhoc state deductions. So I decided to carry online alter out, into a separate handlerton. This gave me a possibility to store connection-local online alter data as ha_data.
Also please consider my 132c56f1 commit among then, which creates online_alter.cc and moves all the producer side code there. It's optional, but really helps reading and editing. | ||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2023-09-25 ] | ||||||||||||||||||||||
this doesn't really explain anything. Please explain why the assert actually fires. You can explain here to avoid recommitting everything just to change a comment. Later, when everything is "ok to push" you can fix the comment in commit too together with other changes, if any | ||||||||||||||||||||||
| Comment by Nikita Malyavin [ 2023-09-25 ] | ||||||||||||||||||||||
|
The high-level reason is that the assumption about that thd->binlog_get_cache_mngr() is, sufficiently, NULL, when we shouldn't run the binlog part of binlog_commit/binlog_rollback, is wrong, i.e. this condition:
is not enough. I don't know whether cache_mgr is used for statement logging or not. Here we have a mixed logging, and that is, some operations may have no row-base logging. However, cache_mngr exists, which makes me think, that a statement logging is done differently, and cache_mngr only serves row-based logging, since trx_cache is also empty at the moment of check. What's definite is that binlog_hton wouldn't have been set for SBR. It gets enabled during binlog_log_row_to_binlog, in THD::binlog_start_trans_and_stmt. In particular, trx_cache.restore_prev_position() is called during binlog_rollback, when it actually shouldn't have to – bacause it's SBR and binlog_hton is not set:
I tried a few approaches to fix that problem. I tried to only exit when trx_cache->before_stmt_pos is MY_OFF_T_UNDEF, but then DBUG_ASSERT(WSREP(thd) || rollback_online) breaks, meaning, that sometimes it's fine to have undefined trx_cache. But in that case, binlog_commit/binlog_rollback is supposed to work somehow – since the handlerton was installed for some reason. I tried to go deeper and take MY_OFF_T_UNDEF more carefully into account – actually, I tried to modify binlog_cache_data::empty() and write a more careful exit condition. That failed tests in random places, like it had an assertion failure in MYSQL_BIN_LOG::trx_group_commit_leader. So overall this demonstrates that binlog state consistency is quite fragile, and I don't want to play around with it. I had a success in extracting everything in a separate hton with no much price, but some benefits, like online_alter_cache_list is moved out of thd, and I call it debloating (yes, first I bloated it | ||||||||||||||||||||||
| Comment by Nikita Malyavin [ 2023-10-11 ] | ||||||||||||||||||||||
|
Minor issues are resolved, head is now 8d21bf9f5. Waiting for the final verdict. | ||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2023-10-13 ] | ||||||||||||||||||||||
|
|