[MDEV-26464] InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0 during shutdown Created: 2021-08-23 Updated: 2023-08-07 Resolved: 2023-08-07 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.4.22 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Matthias Leich | Assignee: | Marko Mäkelä |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | need_rr, not-10.5, not-10.6, not-10.7 | ||
| Issue Links: |
|
||||||||
| Description |
|
Workflow:
Per marko:
2. There happens a change buffer merge after the begin of the shutdown process
3. The bug does not seem to be a duplicate of RQG
sdp:/data/Results/1629713923/TBR-1163/dev/shm/vardir/1629713923/217/1/rr |
| Comments |
| Comment by Marko Mäkelä [ 2022-08-02 ] |
|
Sorry, I had neglected this too long, and the rr replay trace is no longer available. We would need a new one, if we still care to debug and fix this before 10.4 reaches its end of life. |
| Comment by Vitaliy Zyukin [ 2023-06-01 ] |
|
I've provided a steps to reproduce this issue on oracle forum: https://bugs.mysql.com/bug.php?id=85585 |
| Comment by Marko Mäkelä [ 2023-06-02 ] |
|
VitalyZyukin, thank you. I suspect that this bug is specific to the MariaDB Server 10.4 release only (and 10.2 and 10.3, which already reached EOL). In those releases, the I/O layer is very similar to MySQL 5.7. In MariaDB Server 10.5, the I/O subsystem was rewritten and there is only one buf_flush_page_cleaner thread and one buffer pool instance. Also, A reproducible test case would help a lot. Our internal InnoDB testing efforts are concentrated on later releases, mainly MariaDB Server 10.6 and later. A simple work-around would seem be to set innodb_change_buffering=none. The default was changed in |
| Comment by Marko Mäkelä [ 2023-06-02 ] |
|
VitalyZyukin, based on your comment in the MySQL bug system, you reproduced this bug on MariaDB Server 10.4.10. (They used to remove or make private comments that mention MariaDB. Luckily it had not happened here yet.) Can you test a newer version? I think that this bug may have been fixed by |
| Comment by Vitaliy Zyukin [ 2023-06-05 ] |
|
Just in case I'm going to duplicate the content here: I was able to reproduce it on MariaDB 10.4.10. Couple of times already. It doesn't always happen on our hosts, it happen once in 10 executions on average. The way to reproduce on my end was creating 8 processes in python that will delete 100 records per each delete query in one table. This procedure is taking place when database is prepared for a shutdown, meaning replication is OFF and no traffic coming in or going out. Additional parameters: Next code is capable of deleting ~5 million records in 2 minutes: ``` with multiprocessing.Pool(8) as p: def delete_range(self, table_name, from_id, to_id): for i in range(from_id, to_id, 100):
cursor.close() On my end the bug is still for the same function: UT_LIST_GET_LEN but a different location ``` To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help Server version: 10.4.10-MariaDB-log Thread pointer: 0x0 |
| Comment by Vitaliy Zyukin [ 2023-06-05 ] |
|
Unfortunately, I do not have resources to reproduce it on different MariaDB versions. My solution was to keep `innodb_flush_log_at_trx_commit` as 1. And this slowed the deletion speed. |
| Comment by Marko Mäkelä [ 2023-06-06 ] |
|
If you are running MariaDB Server 10.4.10 on some hosts, it could be a good idea to upgrade to at least some of them to the latest 10.4, to see if the bug is still reproducible. |