Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.6.1
Description
The attached copy of a data directory data_copy.tar.bz2 is from a run where according to rr replay the server had been executing ALTER TABLE when it was killed:
10.6 6fbf978eec4506eb46737ac4da00ea04403ae855 |
#11 0x000055b80bc9831d in ha_innobase::write_row (this=0x61d000a582b8, record=0x61a0004e4eb8 "\b\376\t\001") at /data/Server/bb-10.6-MDEV-25062/storage/innobase/handler/ha_innodb.cc:7663
|
#12 0x000055b80b3a7320 in handler::ha_write_row (this=0x61d000a582b8, buf=0x61a0004e4eb8 "\b\376\t\001") at /data/Server/bb-10.6-MDEV-25062/sql/handler.cc:7240
|
#13 0x000055b80ae631e4 in copy_data_between_tables (thd=0x62b0000cb218, from=0x61900042fe98, to=0x619000178998, create=..., ignore=false, order_num=0, order=0x0, copied=0x371c790da790, deleted=0x371c790da7b0,
|
keys_onoff=Alter_info::LEAVE_AS_IS, alter_ctx=0x371c790dc120) at /data/Server/bb-10.6-MDEV-25062/sql/sql_table.cc:11090
|
#14 0x000055b80ae5d746 in mysql_alter_table (thd=0x62b0000cb218, new_db=0x62b0000cfc28, new_name=0x62b0000d0040, create_info=0x371c790dd5d0, table_list=0x62b0000d23c8, alter_info=0x371c790dd4a0, order_num=0,
|
order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.6-MDEV-25062/sql/sql_table.cc:10353
|
#15 0x000055b80afe8a77 in Sql_cmd_alter_table::execute (this=0x62b0000d2bf0, thd=0x62b0000cb218) at /data/Server/bb-10.6-MDEV-25062/sql/sql_alter.cc:550
|
#16 0x000055b80abf14a0 in mysql_execute_command (thd=0x62b0000cb218) at /data/Server/bb-10.6-MDEV-25062/sql/sql_parse.cc:5983
|
#17 0x000055b80abfd96d in mysql_parse (thd=0x62b0000cb218, rawbuf=0x62b0000d2238 "ALTER TABLE t3 ADD COLUMN IF NOT EXISTS col2_copy INT /* E_R Thread1 QNO 5315 CON_ID 59 */", length=91,
|
parser_state=0x371c790deb20) at /data/Server/bb-10.6-MDEV-25062/sql/sql_parse.cc:8016
|
The server was killed in the middle of this, and on accessing the table t3 we would get into trouble.
ssh pluto
|
rr replay /data/Results/1623345023/TBR-793/dev/shm/vardir/1623345023/59/1/rr/mysqld-2
|
This would crash while accessing the table t3 due to an invalid BLOB pointer that had been left behind due to the SIGKILL.
It should be sufficient to start up the server on the copy of the data directory:
sql/mariadbd --datadir /dev/shm/data_copy
|
CHECK TABLE t3 will report a number of row count mismatch to the client.
The correct recovery would have been to drop the incomplete #sql-alter- table. What happens here instead is that the valid table t3 was dropped and the invalid table was renamed to t3.
Attachments
Issue Links
- causes
-
MDEV-35813 Performance regression in INSERT…SELECT due to unnecessarily making InnoDB log durable
-
- Closed
-
- duplicates
-
MDEV-25751 Assertion `!rec_offs_any_null_extern(rec, offsets)' in row_merge_read_clustered_index
-
- Closed
-
- relates to
-
MDEV-25180 Atomic ALTER TABLE
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue relates to |
Description |
The attached copy of a data directory is from a run where according to {{rr replay}} the server had been executing {{ALTER TABLE}} when it was killed:
{noformat:title=10.6 6fbf978eec4506eb46737ac4da00ea04403ae855} #11 0x000055b80bc9831d in ha_innobase::write_row (this=0x61d000a582b8, record=0x61a0004e4eb8 "\b\376\t\001") at /data/Server/bb-10.6- #12 0x000055b80b3a7320 in handler::ha_write_row (this=0x61d000a582b8, buf=0x61a0004e4eb8 "\b\376\t\001") at /data/Server/bb-10.6- #13 0x000055b80ae631e4 in copy_data_between_tables (thd=0x62b0000cb218, from=0x61900042fe98, to=0x619000178998, create=..., ignore=false, order_num=0, order=0x0, copied=0x371c790da790, deleted=0x371c790da7b0, keys_onoff=Alter_info::LEAVE_AS_IS, alter_ctx=0x371c790dc120) at /data/Server/bb-10.6- #14 0x000055b80ae5d746 in mysql_alter_table (thd=0x62b0000cb218, new_db=0x62b0000cfc28, new_name=0x62b0000d0040, create_info=0x371c790dd5d0, table_list=0x62b0000d23c8, alter_info=0x371c790dd4a0, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.6- #15 0x000055b80afe8a77 in Sql_cmd_alter_table::execute (this=0x62b0000d2bf0, thd=0x62b0000cb218) at /data/Server/bb-10.6- #16 0x000055b80abf14a0 in mysql_execute_command (thd=0x62b0000cb218) at /data/Server/bb-10.6- #17 0x000055b80abfd96d in mysql_parse (thd=0x62b0000cb218, rawbuf=0x62b0000d2238 "ALTER TABLE t3 ADD COLUMN IF NOT EXISTS col2_copy INT /* E_R Thread1 QNO 5315 CON_ID 59 */", length=91, parser_state=0x371c790deb20) at /data/Server/bb-10.6- {noformat} The server was killed in the middle of this, and on accessing the table {{t3}} we would get into trouble. {code:sh} ssh pluto rr replay /data/Results/1623345023/TBR-793/dev/shm/vardir/1623345023/59/1/rr/mysqld-2 {code} This would crash while accessing the table {{t3}} due to an invalid BLOB pointer that had been left behind due to the SIGKILL. It should be sufficient to start up the server on the copy of the data directory: {code:sh} sql/mariadbd --datadir /dev/shm/data_copy {code} {CHECK TABLE t3}} will report a number of row count mismatch to the client. The correct recovery would have been to drop the incomplete {{#sql-alter-}} table. What happens here instead is that the valid table {{t3}} was dropped and the invalid table was renamed to {{t3}}. |
The attached copy of a data directory [^data_copy.tar.bz2] is from a run where according to {{rr replay}} the server had been executing {{ALTER TABLE}} when it was killed:
{noformat:title=10.6 6fbf978eec4506eb46737ac4da00ea04403ae855} #11 0x000055b80bc9831d in ha_innobase::write_row (this=0x61d000a582b8, record=0x61a0004e4eb8 "\b\376\t\001") at /data/Server/bb-10.6- #12 0x000055b80b3a7320 in handler::ha_write_row (this=0x61d000a582b8, buf=0x61a0004e4eb8 "\b\376\t\001") at /data/Server/bb-10.6- #13 0x000055b80ae631e4 in copy_data_between_tables (thd=0x62b0000cb218, from=0x61900042fe98, to=0x619000178998, create=..., ignore=false, order_num=0, order=0x0, copied=0x371c790da790, deleted=0x371c790da7b0, keys_onoff=Alter_info::LEAVE_AS_IS, alter_ctx=0x371c790dc120) at /data/Server/bb-10.6- #14 0x000055b80ae5d746 in mysql_alter_table (thd=0x62b0000cb218, new_db=0x62b0000cfc28, new_name=0x62b0000d0040, create_info=0x371c790dd5d0, table_list=0x62b0000d23c8, alter_info=0x371c790dd4a0, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.6- #15 0x000055b80afe8a77 in Sql_cmd_alter_table::execute (this=0x62b0000d2bf0, thd=0x62b0000cb218) at /data/Server/bb-10.6- #16 0x000055b80abf14a0 in mysql_execute_command (thd=0x62b0000cb218) at /data/Server/bb-10.6- #17 0x000055b80abfd96d in mysql_parse (thd=0x62b0000cb218, rawbuf=0x62b0000d2238 "ALTER TABLE t3 ADD COLUMN IF NOT EXISTS col2_copy INT /* E_R Thread1 QNO 5315 CON_ID 59 */", length=91, parser_state=0x371c790deb20) at /data/Server/bb-10.6- {noformat} The server was killed in the middle of this, and on accessing the table {{t3}} we would get into trouble. {code:sh} ssh pluto rr replay /data/Results/1623345023/TBR-793/dev/shm/vardir/1623345023/59/1/rr/mysqld-2 {code} This would crash while accessing the table {{t3}} due to an invalid BLOB pointer that had been left behind due to the SIGKILL. It should be sufficient to start up the server on the copy of the data directory: {code:sh} sql/mariadbd --datadir /dev/shm/data_copy {code} {{CHECK TABLE t3}} will report a number of row count mismatch to the client. The correct recovery would have been to drop the incomplete {{#sql-alter-}} table. What happens here instead is that the valid table {{t3}} was dropped and the invalid table was renamed to {{t3}}. |
Status | Open [ 1 ] | In Progress [ 3 ] |
issue.field.resolutiondate | 2021-06-16 06:08:40.0 | 2021-06-16 06:08:40.097 |
Fix Version/s | 10.6.2 [ 25800 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Assignee | Michael Widenius [ monty ] | Marko Mäkelä [ marko ] |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Link |
This issue duplicates |
Workflow | MariaDB v3 [ 122658 ] | MariaDB v4 [ 159392 ] |
Link |
This issue causes |
I have looked at the binary log.
In the log, the phase of alter table is (from ddl_log.cc, the comment says it all):
case DDL_ALTER_TABLE_PHASE_COPIED:
{
char *from_end;
/*
New table is created and we have the query for the binary log.
We should remove the original table and in the next stage replace
it with the new one.
*/
This means that copy_between_tables() had executed fully and the data have been copied.
This tag is added here in sql_table.cc
ddl_log_update_phase(&ddl_log_state, DDL_ALTER_TABLE_PHASE_COPIED);
debug_crash_here("ddl_log_alter_after_log"); // Use new table
THD_STAGE_INFO(thd, stage_rename_result_table);
if (wait_while_table_is_used(thd, table, HA_EXTRA_PREPARE_FOR_RENAME))
goto err_new_table_cleanup;
Are you sure that the ddl log is for the above trace?
I don't see any way that the phase could be wrong if it crashed in copy_data_between tables().
Just before copy_between_tables, the phase is set to DDL_ALTER_TABLE_PHASE_CREATED, which is what it should be in the above case if the crash happened there.