[MDEV-30183] Assertion `!memcmp(rec_trx_id, old_pk_trx_id->data, 6 + 7)' failed in row_log_table_apply_update Created: 2022-12-09  Updated: 2023-03-07  Resolved: 2023-03-07

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.8.7, 10.9, 10.10, 10.11, 11.0
Fix Version/s: 10.11.3, 11.0.2, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Critical
Reporter: Matthias Leich Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: rr-profile-analyzed

Attachments: File TBR-1503.cfg     File TBR-1503.yy    
Issue Links:
Blocks
blocks MDEV-26198 Assertion `0' failed in row_log_table... Closed
Problem/Incident
is caused by MDEV-26938 support descending indexes internally... Closed

 Description   

origin/10.8 b3c254339be29d3be24c16fa0c150ff8ebd0bfda 2022-12-07T09:43:13+02:00
and also bb-10.11-MDEV-29694
Scenario
1. Start the server and generate some unimportant data
2. Thread 1 creates a table
     CREATE TABLE t1 ( col_int INTEGER , col_varchar VARCHAR(500) ) ENGINE = InnoDB ROW_FORMAT = Dynamic ;
      sleeps 1s, runs than
      ALTER TABLE t1 ADD PRIMARY KEY ( col_varchar ) ;
      and exits.
      Threads 2 runs in a loop a random picked SQL of
      UPDATE t1 SET col_int  = { $my_int= $prng->int(513, 4096) } ; COMMIT |
      UPDATE t1 SET col_int  = { $my_int= $prng->int(513, 4096) } ; COMMIT |
      UPDATE t1 SET col_int  = { $my_int= $prng->int(513, 4096) } ; COMMIT |
      INSERT INTO t1 (col_int) VALUES ( { $my_int= $prng->int(513, 4096) } ); COMMIT ;
 
Sooner or later one of the test runs doing what is described above hits
# 2022-12-08T18:43:44 [4136740] | [rr 4137876 40173]mysqld: /data/Server/10.8H/storage/innobase/row/row0log.cc:1975: dberr_t row_log_table_apply_update(que_thr_t*, ulint, const mrec_t*, const rec_offs*, mem_heap_t*, mem_heap_t*, row_merge_dup_t*, const dtuple_t*): Assertion `!memcmp(rec_trx_id, old_pk_trx_id->data, 6 + 7)' failed.
 
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f16fd11e859 in __GI_abort () at abort.c:79
#2  0x00007f16fd11e729 in __assert_fail_base (fmt=0x7f16fd2b4588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55579c7f94c0 "!memcmp(rec_trx_id, old_pk_trx_id->data, 6 + 7)", 
    file=0x55579c7f6860 "/data/Server/10.8H/storage/innobase/row/row0log.cc", line=1975, function=<optimized out>) at assert.c:92
#3  0x00007f16fd12ff36 in __GI___assert_fail (assertion=0x55579c7f94c0 "!memcmp(rec_trx_id, old_pk_trx_id->data, 6 + 7)", file=0x55579c7f6860 "/data/Server/10.8H/storage/innobase/row/row0log.cc", line=1975, 
    function=0x55579c7f91c0 "dberr_t row_log_table_apply_update(que_thr_t*, ulint, const mrec_t*, const rec_offs*, mem_heap_t*, mem_heap_t*, row_merge_dup_t*, const dtuple_t*)") at assert.c:101
#4  0x000055579b61f76b in row_log_table_apply_update (thr=0x6190004552c0, new_trx_id_col=1, mrec=0x763a41de7081 "", offsets=0x6030001933c0, offsets_heap=0x621000297d00, heap=0x621000296900, dup=0x763a41dde940, old_pk=0x621000296988)
    at /data/Server/10.8H/storage/innobase/row/row0log.cc:1975
#5  0x000055579b622d22 in row_log_table_apply_op (thr=0x6190004552c0, new_trx_id_col=1, dup=0x763a41dde940, error=0x763a41dde2b0, offsets_heap=0x621000297d00, heap=0x621000296900, mrec=0x763a41de7081 "", mrec_end=0x763a41de7098 "", offsets=0x6030001933c0)
    at /data/Server/10.8H/storage/innobase/row/row0log.cc:2393
#6  0x000055579b626ff6 in row_log_table_apply_ops (thr=0x6190004552c0, dup=0x763a41dde940, stage=0x60200001e030) at /data/Server/10.8H/storage/innobase/row/row0log.cc:2763
#7  0x000055579b627dc7 in row_log_table_apply (thr=0x6190004552c0, old_table=0x618000090908, table=0x763a41de0160, stage=0x60200001e030, new_table=0x618000091508) at /data/Server/10.8H/storage/innobase/row/row0log.cc:2871
#8  0x000055579b30f562 in ha_innobase::inplace_alter_table (this=0x61d0007ec2b8, altered_table=0x763a41de0160, ha_alter_info=0x763a41ddfa60) at /data/Server/10.8H/storage/innobase/handler/handler0alter.cc:8682
#9  0x000055579a47fc34 in handler::ha_inplace_alter_table (this=0x61d0007ec2b8, altered_table=0x763a41de0160, ha_alter_info=0x763a41ddfa60) at /data/Server/10.8H/sql/handler.h:4601
#10 0x000055579a45ca1c in mysql_inplace_alter_table (thd=0x62b0001dc218, table_list=0x629000dd93b0, table=0x619000442598, altered_table=0x763a41de0160, ha_alter_info=0x763a41ddfa60, target_mdl_request=0x763a41ddfb60, ddl_log_state=0x763a41ddf880, 
    trigger_param=0x763a41de05a0, alter_ctx=0x763a41de1000, partial_alter=@0x763a41ddf5f0: false, start_alter_id=@0x763a41ddf620: 0, if_exists=false) at /data/Server/10.8H/sql/sql_table.cc:7682
#11 0x000055579a47262c in mysql_alter_table (thd=0x62b0001dc218, new_db=0x62b0001e0cf0, new_name=0x62b0001e1110, create_info=0x763a41de24a0, table_list=0x629000dd93b0, alter_info=0x763a41de2370, order_num=0, order=0x0, ignore=false, if_exists=false)
    at /data/Server/10.8H/sql/sql_table.cc:10777
#12 0x000055579a5fdae8 in Sql_cmd_alter_table::execute (this=0x629000dd9bd8, thd=0x62b0001dc218) at /data/Server/10.8H/sql/sql_alter.cc:557
#13 0x000055579a1cc432 in mysql_execute_command (thd=0x62b0001dc218, is_called_from_prepared_stmt=false) at /data/Server/10.8H/sql/sql_parse.cc:5991
#14 0x000055579a1d8496 in mysql_parse (thd=0x62b0001dc218, rawbuf=0x629000dd9238 "ALTER TABLE t1 ADD PRIMARY KEY ( col_varchar ) /* E_R Thread1 QNO 5 CON_ID 18 */", length=80, parser_state=0x763a41de39b0) at /data/Server/10.8H/sql/sql_parse.cc:8016
#15 0x000055579a1b0f3b in dispatch_command (command=COM_QUERY, thd=0x62b0001dc218, packet=0x629000dcf219 " ALTER TABLE t1 ADD PRIMARY KEY ( col_varchar ) /* E_R Thread1 QNO 5 CON_ID 18 */ ", packet_length=82, blocking=true)
    at /data/Server/10.8H/sql/sql_parse.cc:1894
#16 0x000055579a1ae4a8 in do_command (thd=0x62b0001dc218, blocking=true) at /data/Server/10.8H/sql/sql_parse.cc:1407
#17 0x000055579a5e3d93 in do_handle_one_connection (connect=0x608000003db8, put_in_cache=true) at /data/Server/10.8H/sql/sql_connect.cc:1415
#18 0x000055579a5e371b in handle_one_connection (arg=0x608000003db8) at /data/Server/10.8H/sql/sql_connect.cc:1317
#19 0x0000677e02a84609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#20 0x00007f16fd21b293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr)
 
pluto:/data/results/1670524974/TBR-1503$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio
 
RQG
===
# GIT_SHOW: HEAD -> experimental c0cd00de14dd52daa87b155e44a5e4a6f9e67e4d 2022-09-22T16:32:22+02:00
# rqg.pl  : Version 4.0.6 (2022-09)
#
# $RQG_HOME/rqg.pl \
# --duration=300 \
# --queries=10000000 \
# --no_mask \
# --seed=random \
# --max_gd_duration=1200 \
# --rpl_mode=none \
# --engine=InnoDB \
# --gendata=conf/mariadb/table_stress.zz \
# --gendata_sql=conf/mariadb/table_stress.sql \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--log-output=none \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --mysqld=--sync-binlog=1 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-plugin-load-add=provider_lz4.so \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--innodb-buffer-pool-size=5M \
# --mysqld=--connect_timeout=60 \
# --mysqld=--loose-max-statement-time=30 \
# --mysqld=--innodb_page_size=4K \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--log-bin \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--net_write_timeout=60 \
# --reporters=Backtrace,Deadlock1 \
# --validators=None \
# --threads=2 \
# --grammar=TBR-1503.yy \
# <local settings>
 
 



 Comments   
Comment by Matthias Leich [ 2022-12-09 ]

Recommended replay variant:
./REPLAY_SIMP.sh TBR-1503.cfg <path to MariaDB install> TBR-1503.yy

Comment by Marko Mäkelä [ 2023-02-27 ]

A condition was replaced with the failing debug assertion in MDEV-15250. It is not straightforward to fix this, because the original logic related to managing BLOB pointers was removed in MDEV-15250. Here is part of the removed code:

-               if (memcmp(rec_trx_id, old_pk_trx_id->data,
-                          DATA_TRX_ID_LEN + DATA_ROLL_PTR_LEN)) {
-                       /* The ROW_T_UPDATE was logged for a different
-                       DB_TRX_ID,DB_ROLL_PTR. This is possible if an
-                       earlier ROW_T_INSERT or ROW_T_UPDATE was diverted
-                       because some BLOBs were missing due to rolling
-                       back the initial insert or due to purging
-                       the old BLOB values of an update. */
-                       ut_ad(log->blobs);
-                       if (error != DB_SUCCESS) {
-                               ut_ad(error == DB_MISSING_HISTORY);
-                               /* Some BLOBs are missing, so we are
-                               interpreting this ROW_T_UPDATE as
-                               ROW_T_DELETE (see *1).
-                               Because this is a different row,
-                               we will do nothing. */
-                               error = DB_SUCCESS;
-                       } else {
-                               /* Because the user record is missing due to
-                               BLOBs that were missing when processing
-                               an earlier log record, we should
-                               interpret the ROW_T_UPDATE as ROW_T_INSERT.
-                               However, there is a different user record
-                               with the same PRIMARY KEY value already. */
-                               error = DB_DUPLICATE_KEY;
-                       }
-
-                       goto func_exit;
-               }

thiru, can you please experiment with the grammar and see what would be the correct course of action?

In the trace that I analyzed, both old_pk and the current record would have an empty string in the PRIMARY KEY column.

Comment by Thirunarayanan Balathandayuthapani [ 2023-03-02 ]

Below test case can able to repeat the failure in given version :

commit b3c254339be29d3be24c16fa0c150ff8ebd0bfda (HEAD)
Merge: 0751bfbcaf0 9e27e53dfac
Author: Marko Mäkelä <marko.makela@mariadb.com>
Date:   Wed Dec 7 09:43:13 2022 +0200
 
    Merge 10.7 into 10.8
 

Test case is

--source include/have_innodb.inc
set sql_mode="";
CREATE TABLE t1(col_int int, col_varchar varchar(500))ENGINE=InnoDB;
INSERT INTO t1(col_int) values(2560);
--source include/wait_all_purged.inc
set debug_sync="row_log_table_apply1_before SIGNAL con1_begin WAIT_FOR con1_commit";
send ALTER TABLE t1 ADD PRIMARY KEY ( col_varchar);
 
connect(con1,localhost,root,,,);
SET DEBUG_SYNC="now WAIT_FOR con1_begin";
update t1 set col_int=2178;
insert into t1(col_int) values(3016);
UPDATE t1 set col_int=2802;
SET DEBUG_SYNC="now SIGNAL con1_commit";
 
connection default;
reap;
SHOW CREATE TABLE t1;
DROP TABLE t1;

stack trace:

mariadbd: ../storage/innobase/row/row0log.cc:1975: dberr_t row_log_table_apply_update(que_thr_t*, ulint, const mrec_t*, const rec_offs*, mem_heap_t*, mem_heap_t*, row_merge_dup_t*, const dtuple_t*): Assertion `!memcmp(rec_trx_id, old_pk_trx_id->data, 6 + 7)' failed.
230302 17:48:49 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.8.7-MariaDB-debug-log
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=2
max_threads=153
thread_count=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63906 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f7e94000db8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f7ed07a4cd8 thread_stack 0x49000
mysys/stacktrace.c:212(my_print_stacktrace)[0x564bbc7b6c00]
sql/signal_handler.cc:236(handle_fatal_signal)[0x564bbbecb734]
sigaction.c:0(__restore_rt)[0x7f7edd7363c0]
??:0(gsignal)[0x7f7edd21f03b]
??:0(abort)[0x7f7edd1fe859]
/lib/x86_64-linux-gnu/libc.so.6(+0x22729)[0x7f7edd1fe729]
??:0(__assert_fail)[0x7f7edd210006]
row/row0log.cc:1980(row_log_table_apply_update(que_thr_t*, unsigned long, unsigned char const*, unsigned short const*, mem_block_info_t*, mem_block_info_t*, row_merge_dup_t*, dtuple_t const*))[0x564bbc4e86db]
row/row0log.cc:2393(row_log_table_apply_op(que_thr_t*, unsigned long, row_merge_dup_t*, dberr_t*, mem_block_info_t*, mem_block_info_t*, unsigned char const*, unsigned char const*, unsigned short*))[0x564bbc4ea0d7]
row/row0log.cc:2763(row_log_table_apply_ops(que_thr_t*, row_merge_dup_t*, ut_stage_alter_t*))[0x564bbc4ec35d]
row/row0log.cc:2871(row_log_table_apply(que_thr_t*, dict_table_t*, TABLE*, ut_stage_alter_t*, dict_table_t*))[0x564bbc4eca1e]
handler/handler0alter.cc:8682(ha_innobase::inplace_alter_table(TABLE*, Alter_inplace_info*))[0x564bbc36238e]
sql/handler.h:4602(handler::ha_inplace_alter_table(TABLE*, Alter_inplace_info*))[0x564bbbc359b7]
sql/sql_table.cc:7682(mysql_inplace_alter_table(THD*, TABLE_LIST*, TABLE*, TABLE*, Alter_inplace_info*, MDL_request*, st_ddl_log_state*, TRIGGER_RENAME_PARAM*, Alter_table_ctx*, bool&, unsigned long long&, bool))[0x564bbbc25d31]
sql/sql_table.cc:10777(mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool, bool))[0x564bbbc2f654]
sql/sql_alter.cc:557(Sql_cmd_alter_table::execute(THD*))[0x564bbbce7621]
sql/sql_parse.cc:5991(mysql_execute_command(THD*, bool))[0x564bbbb0f093]
sql/sql_parse.cc:8016(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x564bbbb1520b]
sql/sql_parse.cc:1896(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x564bbbb0169a]
sql/sql_parse.cc:1407(do_command(THD*, bool))[0x564bbbb00059]
sql/sql_connect.cc:1415(do_handle_one_connection(CONNECT*, bool))[0x564bbbcdc0f8]
sql/sql_connect.cc:1319(handle_one_connection)[0x564bbbcdbe59]
perfschema/pfs.cc:2203(pfs_spawn_thread)[0x564bbc22ff9f]
nptl/pthread_create.c:478(start_thread)[0x7f7edd72a609]
??:0(clone)[0x7f7edd2fb163]

Comment by Thirunarayanan Balathandayuthapani [ 2023-03-02 ]

Same test case fails in 10.6 with DB_DUPLICATE_KEY for alter. I think when we're applying log for insert operation, col_varchar column will have empty string like
the existing row. So if we're making col_varchar as primary key then it should throw error. But in 10.8+, row_ins_duplicate_online() has the following changes:

@@ -2191,7 +2193,7 @@ row_ins_duplicate_online(
 
        ulint trx_id_len;
 
-       if (fields == n_uniq + 2
+       if (fields == n_uniq
            && memcmp(rec_get_nth_field(rec, offsets, n_uniq, &trx_id_len),
                      reset_trx_id, DATA_TRX_ID_LEN + DATA_ROLL_PTR_LEN)) {
                ut_ad(trx_id_len == DATA_TRX_ID_LEN);

This leads to DB_SUCCESS_LOCKED_REC error instead of DB_DUPLICATE_KEY. This changes done as a part of the following commit:

commit 358921ce32203a9a8dd277a5ba7ac177c9e79e53
Author: Marko Mäkelä <marko.makela@mariadb.com>
Date:   Wed Jan 26 11:01:39 2022 +0200
 
    MDEV-26938 Support descending indexes internally in InnoDB
    
    This is loosely based on the InnoDB changes in
    mysql/mysql-server@97fd8b1b6993340b361fa7f85da86a308f0b5e0c
    that I had developed in 2015 or 2016.

Comment by Thirunarayanan Balathandayuthapani [ 2023-03-03 ]

Patch is in bb-10.8-thiru

Comment by Marko Mäkelä [ 2023-03-03 ]

Thank you, the fix looks good to me. It is reverting an accidental change that I made in MDEV-26938. We used to compare not only the PRIMARY KEY, but also the DB_TRX_ID, DB_ROLL_PTR.

OK to push after mleich has run the stress tests.

Generated at Thu Feb 08 10:14:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.