[MDEV-20117] Assertion `0' failed in row_sel_get_clust_rec_for_mysql - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.4.7, 10.5
Fix Version/s: 10.4.9
Component/s: Storage Engine - InnoDB
Labels:
None

Description

Problem hit during RQG testing on

10.4.7 commit 819c40d694071e56adb62ded97561717edb59f23 2019-07-22

10.4.7 showed that problem in minimum already ~ 2019-07-15.

2019-07-22 18:28:23 24 [ERROR] InnoDB: Clustered record for sec rec not found index `col_int` of table `test`.`t1`

InnoDB: sec index record PHYSICAL RECORD: n_fields 2; compact format; info bits 0

 0: SQL NULL;

|  1: len 4; hex 80000004; asc     ;;

 InnoDB: clust index record PHYSICAL RECORD: n_fields 73; compact format; info bits 32

|  0: len 4; hex 80000003; asc     ;;

|  1: len 6; hex 0000000015f8; asc       ;;

|  2: len 7; hex 31000001851961; asc 1     a;;

|  3: len 4; hex 80000002; asc     ;;

|  4: len 4; hex ffffffff; asc     ;;

|  5: SQL NULL;

|  6: SQL NULL;

....

|  72: SQL NULL;

# 2019-07-22T18:28:34 [208245] | TRANSACTION 5624, ACTIVE 0 sec fetching rows

# 2019-07-22T18:28:34 [208245] | mysql tables in use 1, locked 1

# 2019-07-22T18:28:34 [208245] | 3 lock struct(s), heap size 1160, 5 row lock(s), undo log entries 2

# 2019-07-22T18:28:34 [208245] | MySQL thread id 24, OS thread handle 140235059529472, query id 2900 localhost 127.0.0.1 root Updating

# 2019-07-22T18:28:34 [208245] | DELETE FROM t1 WHERE col_int = 5 OR col_int IS NULL  /* E_R Thread9 QNO 2263 CON_ID 24 */

....

| mysqld: torage/innobase/row/row0sel.cc:3356: dberr_t row_sel_get_clust_rec_for_mysql(row_prebuilt_t*, dict_index_t*, const rec_t*, que_thr_t*, const rec_t**, ulint**, mem_heap_t**, dtuple_t**, mtr_t*): Assertion `0' failed.

| 190722 18:28:23 [ERROR] mysqld got signal 6 ;

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

MDEV-20117.tgz
3 kB
2019-07-24 09:45
MDEV-20117.yy
1 kB
2019-07-23 12:47
prt
73 kB
2019-07-24 09:46

Issue Links

is caused by

MDEV-15562 Instant DROP COLUMN or changing the order of columns

Closed

relates to

MDEV-20090 mysqld: dict0dict.ic:879: dict_field_t* dict_index_get_nth_field(const dict_index_t*, ulint): Assertion `pos < index->n_def' failed.

Closed

MDEV-20479 assertion failure in dict_table_get_nth_col() after INSTANT DROP COLUMN

Closed

Activity

Ascending order - Click to sort in descending order

View 8 older comments

Marko Mäkelä added a comment - 2019-09-05 15:40

I can still repeat the failure with MDEV-20117.tgz:

10.4 18af13b88ba580562981a190c25da128a2e9db26
Version: '10.4.8-MariaDB-debug-log' socket: '/dev/shm/10.4/mysql-test/var/tmp/18/mysqld.1.sock' port: 16340 Source distribution
2019-09-05 18:31:49 1 [ERROR] InnoDB: Trying to access update undo rec field 53 in index `PRIMARY` of table `test`.`t1` but index has only 8 fields Submit a detailed bug report to https://jira.mariadb.org/. Run also CHECK TABLE `test`.`t1`. n_fields = 3, i = 2
mysqld: /mariadb/10.4/storage/innobase/trx/trx0rec.cc:1682: byte trx_undo_update_rec_get_update(const byte , dict_index_t , ulint, trx_id_t, roll_ptr_t, ulint, mem_heap_t , upd_t **): Assertion `0' failed.
…
#7 0x0000557718307a9b in trx_undo_update_rec_get_update (ptr=0x55771ab8ccfe "\004\200", index=0x7f186412ed08, type=13, trx_id=53, roll_ptr=2, info_bits=139743404846360, heap=0x55771ab8cc40, upd=0x55771ab8c268) at /mariadb/10.4/storage/innobase/trx/trx0rec.cc:1682
#8 0x000055771828a27a in row_purge_parse_undo_rec (node=0x55771ab8c1f0, undo_rec=<optimized out>, updated_extern=<optimized out>, thr=<optimized out>) at /mariadb/10.4/storage/innobase/row/row0purge.cc:1147
#9 row_purge (node=0x55771ab8c1f0, undo_rec=<optimized out>, thr=<optimized out>) at /mariadb/10.4/storage/innobase/row/row0purge.cc:1259
#10 0x0000557718289121 in row_purge_step (thr=0x55771ab8c138) at /mariadb/10.4/storage/innobase/row/row0purge.cc:1321
#11 0x00005577181fe1bb in que_thr_step (thr=<optimized out>) at /mariadb/10.4/storage/innobase/que/que0que.cc:1037
#12 que_run_threads_low (thr=0x55771ab8c138) at /mariadb/10.4/storage/innobase/que/que0que.cc:1099
#13 que_run_threads (thr=0x55771ab8c138) at /mariadb/10.4/storage/innobase/que/que0que.cc:1139
#14 0x00005577182fb900 in trx_purge (n_purge_threads=<optimized out>, truncate=<optimized out>) at /mariadb/10.4/storage/innobase/trx/trx0purge.cc:1316
#15 0x00005577182da52f in srv_do_purge (n_total_purged=<optimized out>) at /mariadb/10.4/storage/innobase/srv/srv0srv.cc:2594
#16 srv_purge_coordinator_thread (arg=<optimized out>) at /mariadb/10.4/storage/innobase/srv/srv0srv.cc:2720

Presumably, at the time of the previous instant ALTER TABLE was executed, the table was empty, and dict_index_t::clear_instant() was invoked. This would cause the crash in purge, if there were any old undo log records of committed transactions waiting for purge. Those old records would become corrupted, because the clustered index would have fewer fields, or the fields could be moved to different positions.

Like I wrote in my previous comment, if we remove rec_is_alter_metadata() information, we must at the same time update SYS_TABLES.ID and dict_table_t::id in order to effectively delete old undo log records for the table. We should try to avoid this rewrite when it is not necessary (say, after removing metadata for instant ADD COLUMN, that is, rec_is_metadata() && !rec_is_alter_metadata()).

My test case was a failed attempt to reproduce the scenario, which should lead you towards the correct direction. It happened to demonstrate an unrelated bug ~~MDEV-20479~~, which you fixed.

Marko Mäkelä added a comment - 2019-09-05 15:40 I can still repeat the failure with MDEV-20117.tgz : 10.4 18af13b88ba580562981a190c25da128a2e9db26 Version: '10.4.8-MariaDB-debug-log' socket: '/dev/shm/10.4/mysql-test/var/tmp/18/mysqld.1.sock' port: 16340 Source distribution 2019-09-05 18:31:49 1 [ERROR] InnoDB: Trying to access update undo rec field 53 in index `PRIMARY` of table `test`.`t1` but index has only 8 fields Submit a detailed bug report to https://jira.mariadb.org/. Run also CHECK TABLE `test`.`t1`. n_fields = 3, i = 2 mysqld: /mariadb/10.4/storage/innobase/trx/trx0rec.cc:1682: byte *trx_undo_update_rec_get_update(const byte *, dict_index_t *, ulint, trx_id_t, roll_ptr_t, ulint, mem_heap_t *, upd_t **): Assertion `0' failed. … #7 0x0000557718307a9b in trx_undo_update_rec_get_update (ptr=0x55771ab8ccfe "\004\200", index=0x7f186412ed08, type=13, trx_id=53, roll_ptr=2, info_bits=139743404846360, heap=0x55771ab8cc40, upd=0x55771ab8c268) at /mariadb/10.4/storage/innobase/trx/trx0rec.cc:1682 #8 0x000055771828a27a in row_purge_parse_undo_rec (node=0x55771ab8c1f0, undo_rec=<optimized out>, updated_extern=<optimized out>, thr=<optimized out>) at /mariadb/10.4/storage/innobase/row/row0purge.cc:1147 #9 row_purge (node=0x55771ab8c1f0, undo_rec=<optimized out>, thr=<optimized out>) at /mariadb/10.4/storage/innobase/row/row0purge.cc:1259 #10 0x0000557718289121 in row_purge_step (thr=0x55771ab8c138) at /mariadb/10.4/storage/innobase/row/row0purge.cc:1321 #11 0x00005577181fe1bb in que_thr_step (thr=<optimized out>) at /mariadb/10.4/storage/innobase/que/que0que.cc:1037 #12 que_run_threads_low (thr=0x55771ab8c138) at /mariadb/10.4/storage/innobase/que/que0que.cc:1099 #13 que_run_threads (thr=0x55771ab8c138) at /mariadb/10.4/storage/innobase/que/que0que.cc:1139 #14 0x00005577182fb900 in trx_purge (n_purge_threads=<optimized out>, truncate=<optimized out>) at /mariadb/10.4/storage/innobase/trx/trx0purge.cc:1316 #15 0x00005577182da52f in srv_do_purge (n_total_purged=<optimized out>) at /mariadb/10.4/storage/innobase/srv/srv0srv.cc:2594 #16 srv_purge_coordinator_thread (arg=<optimized out>) at /mariadb/10.4/storage/innobase/srv/srv0srv.cc:2720 Presumably, at the time of the previous instant ALTER TABLE was executed, the table was empty, and dict_index_t::clear_instant() was invoked. This would cause the crash in purge, if there were any old undo log records of committed transactions waiting for purge. Those old records would become corrupted, because the clustered index would have fewer fields, or the fields could be moved to different positions. Like I wrote in my previous comment, if we remove rec_is_alter_metadata() information, we must at the same time update SYS_TABLES.ID and dict_table_t::id in order to effectively delete old undo log records for the table. We should try to avoid this rewrite when it is not necessary (say, after removing metadata for instant ADD COLUMN , that is, rec_is_metadata() && !rec_is_alter_metadata() ). My test case was a failed attempt to reproduce the scenario, which should lead you towards the correct direction. It happened to demonstrate an unrelated bug MDEV-20479 , which you fixed.

Thirunarayanan Balathandayuthapani added a comment - 2019-09-24 06:56

OK to push

Thirunarayanan Balathandayuthapani added a comment - 2019-09-24 06:56 OK to push

Marko Mäkelä added a comment - 2019-09-26 15:21

With the proposed patch, MDEV-20117.tgz still crashed for me.

I think that the proposed patch was removing the wrong code path. The path that it removed was for the case that the table is truly empty, and not even a hidden metadata record exists.

What we must do is to prevent the goto empty_table; when only the metadata record exists, but the metadata record is in the post-10.3 format. We must also adjust page_cur_delete_rec() so that when btr_cur_pessimistic_update() is invoked by innobase_instant_try(), the page type will not be reset:

diff --git a/storage/innobase/handler/handler0alter.cc b/storage/innobase/handler/handler0alter.cc

index 14125272daf..ab3ec7ba249 100644

--- a/storage/innobase/handler/handler0alter.cc

+++ b/storage/innobase/handler/handler0alter.cc

@@ -5820,7 +5820,8 @@ static bool innobase_instant_try(

 	dberr_t err = DB_SUCCESS;

 	if (rec_is_metadata(rec, *index)) {

 		ut_ad(page_rec_is_user_rec(rec));

-		if (!page_has_next(block->frame)

+		if (!rec_is_alter_metadata(rec, *index)

+		    && !page_has_next(block->frame)

 		    && page_rec_is_last(rec, block->frame)) {

 			goto empty_table;

diff --git a/storage/innobase/page/page0cur.cc b/storage/innobase/page/page0cur.cc

index ded90c1c4f8..c2bdaee8c13 100644

--- a/storage/innobase/page/page0cur.cc

+++ b/storage/innobase/page/page0cur.cc

@@ -2327,7 +2327,8 @@ page_cur_delete_rec(

 	/* The record must not be the supremum or infimum record. */

 	ut_ad(page_rec_is_user_rec(current_rec));

-	if (page_get_n_recs(page) == 1 && !recv_recovery_is_on()) {

+	if (page_get_n_recs(page) == 1 && !recv_recovery_is_on()

+	    && !rec_is_alter_metadata(current_rec, *index)) {

 		/* Empty the page, unless we are applying the redo log

 		during crash recovery. During normal operation, the

 		page_create_empty() gets logged as one of MLOG_PAGE_CREATE,

With this patch, the test in MDEV-20117.tgz runs for me without any issues for the full duration of 5 minutes and 20 seconds.

Writing a test case for this is tricky. I think that I will skip that. If I do not see any test failure from existing tests when omitting the fix of page_cur_delete_rec(), I will write a test case that covers that change. But that test case would not cover the original failure that is demonstrated by MDEV-20117.tgz.

Marko Mäkelä added a comment - 2019-09-26 15:21 With the proposed patch, MDEV-20117.tgz still crashed for me. I think that the proposed patch was removing the wrong code path. The path that it removed was for the case that the table is truly empty, and not even a hidden metadata record exists. What we must do is to prevent the goto empty_table; when only the metadata record exists, but the metadata record is in the post-10.3 format. We must also adjust page_cur_delete_rec() so that when btr_cur_pessimistic_update() is invoked by innobase_instant_try() , the page type will not be reset: diff --git a/storage/innobase/handler/handler0alter.cc b/storage/innobase/handler/handler0alter.cc index 14125272daf..ab3ec7ba249 100644 --- a/storage/innobase/handler/handler0alter.cc +++ b/storage/innobase/handler/handler0alter.cc @@ -5820,7 +5820,8 @@ static bool innobase_instant_try( dberr_t err = DB_SUCCESS; if (rec_is_metadata(rec, *index)) { ut_ad(page_rec_is_user_rec(rec)); - if (!page_has_next(block->frame) + if (!rec_is_alter_metadata(rec, *index) + && !page_has_next(block->frame) && page_rec_is_last(rec, block->frame)) { goto empty_table; } diff --git a/storage/innobase/page/page0cur.cc b/storage/innobase/page/page0cur.cc index ded90c1c4f8..c2bdaee8c13 100644 --- a/storage/innobase/page/page0cur.cc +++ b/storage/innobase/page/page0cur.cc @@ -2327,7 +2327,8 @@ page_cur_delete_rec( /* The record must not be the supremum or infimum record. */ ut_ad(page_rec_is_user_rec(current_rec)); - if (page_get_n_recs(page) == 1 && !recv_recovery_is_on()) { + if (page_get_n_recs(page) == 1 && !recv_recovery_is_on() + && !rec_is_alter_metadata(current_rec, *index)) { /* Empty the page, unless we are applying the redo log during crash recovery. During normal operation, the page_create_empty() gets logged as one of MLOG_PAGE_CREATE, With this patch, the test in MDEV-20117.tgz runs for me without any issues for the full duration of 5 minutes and 20 seconds. Writing a test case for this is tricky. I think that I will skip that. If I do not see any test failure from existing tests when omitting the fix of page_cur_delete_rec() , I will write a test case that covers that change. But that test case would not cover the original failure that is demonstrated by MDEV-20117.tgz .

Marko Mäkelä added a comment - 2019-09-26 16:48

I tried about an hour with a debugger, but I was unable to create a deterministic test case for the bug.

Marko Mäkelä added a comment - 2019-09-26 16:48 I tried about an hour with a debugger, but I was unable to create a deterministic test case for the bug.

Marko Mäkelä added a comment - 2019-09-27 13:59

kevg found and attempted to fix a different scenario where the bug occurs. In that scenario, the table is empty at the time the very first instant DROP COLUMN is executed. After the operation, purge would process an undo log record for updating a column that no longer exists. This scenario shows that instant DROP/reorder column must always insert or update the metadata record. I pushed a follow-up fix that addresses this scenario.

The scenario of MDEV-20117.tgz is that the table was emptied between instant ALTER TABLE operations. That one I fixed yesterday.

Marko Mäkelä added a comment - 2019-09-27 13:59 kevg found and attempted to fix a different scenario where the bug occurs. In that scenario, the table is empty at the time the very first instant DROP COLUMN is executed. After the operation, purge would process an undo log record for updating a column that no longer exists. This scenario shows that instant DROP /reorder column must always insert or update the metadata record. I pushed a follow-up fix that addresses this scenario. The scenario of MDEV-20117.tgz is that the table was emptied between instant ALTER TABLE operations. That one I fixed yesterday.

MariaDB Server

Assertion `0' failed in row_sel_get_clust_rec_for_mysql

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration