[MDEV-15641] InnoDB crash while committing table-rebuilding ALTER TABLE Created: 2018-03-23 Updated: 2021-02-16 Resolved: 2019-07-10 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data Definition - Alter Table, Storage Engine - InnoDB, Storage Engine - XtraDB |
| Affects Version/s: | 10.0, 10.1, 10.2, 10.3 |
| Fix Version/s: | 10.2.26, 10.3.17, 10.4.7 |
| Type: | Bug | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Thirunarayanan Balathandayuthapani |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | crash, hang, online-ddl, performance, upstream | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
Currently, mysql_inplace_alter_table() in sql/sql_table.cc does the following:
During this time, the ALTER TABLE thread is not doing anything useful. If this is a table-rebuilding ALTER TABLE operation (something that cannot be done as ALGORITHM=INSTANT ( I would like to consider adding a variant of the ha_innobase::inplace_alter_table() call that would let InnoDB invoke the next batch of row_log_table_apply(). Once this call returns, the caller would check if the ALTER TABLE operation was killed or the MDL upgrade timed out, or the MDL was granted. As long as the lock wait should continue, the storage engine would be called again to do useful work during the wait. |
| Comments |
| Comment by Marko Mäkelä [ 2018-03-23 ] | ||||||||||||||||||
|
The row_log_table_apply() is actually invoked while holding both dict_sys->mutex and dict_operation_lock. If there is a lot of log to apply, this may actually cause InnoDB to crash.
| ||||||||||||||||||
| Comment by quantuml3ap [ 2018-03-23 ] | ||||||||||||||||||
|
Step 1: A table with ~25M rows.
Step 2: Have 2 sessions - S1 and S2. In S1 run:
In S2 do this:
Now when the alter awaits X MDL before the commit phase (post-alter phase), try this in S2 (deleting 18M records):
Now `SHOW PROCESSLIST` will show "committing alter table to storage engine". Saw this for ~15min until Mysql crashed and restarted. Final observation: #sql-ib... files were there, had to drop them and realized the ALTER hadn't completed successfully. The column was not added. | ||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-01-28 ] | ||||||||||||||||||
|
The problematic code would be removed when | ||||||||||||||||||
| Comment by Valerii Kravchuk [ 2019-03-15 ] | ||||||||||||||||||
|
It seems the problem is even easier to hit with partitioned table ALTERed, see upstream https://bugs.mysql.com/bug.php?id=94610 | ||||||||||||||||||
| Comment by Thirunarayanan Balathandayuthapani [ 2019-07-04 ] | ||||||||||||||||||
|
Currently, there are three things blocked while applying concurrently The approach to apply the log of concurrent DMLs before acquiring dict_sys latches and | ||||||||||||||||||
| Comment by Thirunarayanan Balathandayuthapani [ 2019-07-05 ] | ||||||||||||||||||
|
Patch approved by kevg. I would like to run some RQG test before pushing it to 10.2 |