[MDEV-15060] Assertion `0' failed in row_log_table_apply_op after instant ADD when the table is emptied during subsequent ALTER TABLE Created: 2018-01-24 Updated: 2018-05-03 Resolved: 2018-05-03 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.3.2 |
| Fix Version/s: | 10.3.7 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | online-ddl | ||
| Issue Links: |
|
||||||||
| Description |
|
Note: run the test case with --mem --repeat=N. For me N=100 has always been enough so far, but it can vary on different machines. --mem is important, at least on my machines, apparently the test is not fast/concurrent enough when it's run on disk.
Not reproducible on 10.2. Initially this problem was interchangeable with |
| Comments |
| Comment by Marko Mäkelä [ 2018-02-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I would need a core dump so that I can investigate the contents of the online_log as well as the table definitions before and after table-rebuilding online alter. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2018-04-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've updated the description, hopefully with the test case you will get a coredump of your liking. If not, I can produce one for you, but it will be a non-ASAN coredump. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-05-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I repeated this. In my core dump, the payload of log.tail.block is as follows (one record per line):
The assertion fails because the log record parser does not find a valid start byte (ROW_T_INSERT=0x41 or ROW_T_UPDATE=0x42 or ROW_T_DELETE=0x43) at offset 32. That offset is 4 bytes after the start of the third record. Notably, other records that start with 0x43,3 are 4 bytes shorter than the first one. I must find out why the 4 extra bytes are sometimes being written. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-05-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here is a little simpler test:
I cannot repeat if I declare a INT NOT NULL (which would also be implied by PRIMARY KEY) or if I insert non-NULL values into a.
Decoded:
The problem appears to be that the parser expects the ROW_T_INSERT to be longer. It looks like this is because purge emptied the table, and the table no longer is in instant-added format. I wrote a DEBUG_SYNC version of the test that always crashes:
It is important that we wait for the purge after the INSERT and ROLLBACK have been logged. In that way, the log for the INSERT and the ROLLBACK will be written as if the table is in the instant-add format. When the log is applied, the source table would be in the plain format, and the log would be parsed incorrectly. |