[MDEV-32174] ROW_FORMAT=COMPRESSED table corruption due to ROLLBACK Created: 2023-09-15  Updated: 2024-01-30

Status: Confirmed
Project: MariaDB Server
Component/s: Data Manipulation - Insert, Storage Engine - InnoDB
Affects Version/s: 10.11.3, 10.6.12
Fix Version/s: 10.6

Type: Bug Priority: Critical
Reporter: Bart-Jan Hilbrands Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 0
Labels: None
Environment:

OS: Ubuntu 22.04 (Reproduced on both an azure VM, as well as a regular workstation laptop.)
Mariadb: 10.6.12 and 10.11.3 (And presumably all versions in between).
mydumper: 0.10.0 (If relevant, the problem occured using all different version of mydumper)


Attachments: Text File MDEV-32174_ps.txt     Text File configuration.txt     Text File mariadbd_full_bt_all_threads.txt     Text File restore-failure-error-log.txt     Text File syslog-restore-10.6.15.txt     Text File table_structure.txt    
Issue Links:
Relates
relates to MDEV-16281 Implement parallel CREATE INDEX, ALTE... Open
relates to MDEV-31441 BLOB corruption on UPDATE of PRIMARY ... Closed
relates to MDEV-31817 SIGSEGV after btr_page_get_father_blo... Closed
relates to MDEV-30882 Crash on ROLLBACK of DELETE or UPDATE... Closed

 Description   

After upgrading to ubuntu 22.04, and alongside that to mariadb 10.6, we started experiencing table corruption during our nightly restores. Because these restores are to staging servers loss of data was acceptable, so we fixed the problem by simply removing the corrupted table. But this obviously won't fly should we ever have to do a production restore (Which is very rarely).
The database portion of the restores specifically are done using mydumper with the default amount of 4 threads.
What makes the problem particularly nasty is that it cannot be consistently reproduced, we ourselves noticed it seemed to happen in roughly 1 out of every 7 restores (Where each restore takes about ~40-50 minutes).
This made us think it was related to parallelism. So we tried running mydumper single-threaded, which did not solve the problem.
We have also tried upgrading to various versions of mariadb/mydumper, most notably:

mariadb mydumper
10.6.12 0.15
10.11.3 0.10
10.11.3 0.15

But with all of the above version combinations the problem still occurred
Eventually we found that we could no longer reproduce the problem while running mariadb 10.5.21 with mydumper 0.10, but we are still unsure of the underlying cause.

Provided files:
The included table structure is just one table of our db, as we were able to reproduce it by only restoring backups of this table.
Because it is quite time consuming to reproduce and we would have to generate a large amount of dummy data (Existing dumps all contain customer data), we have not included a database dump for now.
But we still wanted to create this bug report just in case you might already see something strange based on the included table structure and error log.
In case there is no immediate apparent problem and you still want to look further into this, we would of course be happy to provide a database dump.

Update:
After running on 10.6.15 I was again able to reproduce it. I generated a stacktrace form the core dump (mariadbd_full_bt_all_threads.txt)



 Comments   
Comment by Daniel Black [ 2023-09-15 ]

Hi bjhilbrands,

To give us a little more detail:

sudo apt-get install mariadb-server-core-dbg

And using abrt-retrace and starting a gdb session run thread apply all bt -frame-arguments all full.

If you are able to test with 10.6.15 that would be much appreciated. repo config.

or even the 10.11 though an earlier version will tell us where to fix it more reliably.

Comment by Marko Mäkelä [ 2023-09-15 ]

If the innodb_buffer_pool_size is much smaller than the database, this report could be a duplicate of MDEV-31767.

Comment by Bart-Jan Hilbrands [ 2023-09-15 ]

Hi, thank you for the quick feedback. I am testing it on 10.6.15 right now, i'll get back to you with my findings and a full stacktrace if the problem indeed still occurs.

Comment by Bart-Jan Hilbrands [ 2023-09-15 ]

I was able to reproduce it again with 10.6.15. One thing I noticed that was different is, that mariadb is stil running properly (Before it was in a restart loop). Now it is running fine with status "To roll back: 1 transactions, 5668 rows". Stacktrace I generated from the core dump is attached as mariadbd_full_bt_all_threads.txt , also added output from the syslog if that is useful syslog-restore-10.6.15.txt
.

Comment by Daniel Black [ 2023-09-15 ]

I think the backtrace is still being processed with the 10.6.12 symbols rather than the 10.6.15 ones. Can you install mariadb-server-core-10.6-dbgsym which should come from the mariadb repo. After that re-resolve the previous core dump. Thanks for the confirmation and quick response.

Comment by Bart-Jan Hilbrands [ 2023-09-15 ]

I see, updated the attachment with the fixed version

Comment by Daniel Black [ 2023-09-18 ]

Thanks bjhilbrands.

page_zip_dir_find returning a null pointer

Aborting thread

#8  0x00007f0eb02b7476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
        ret = <optimized out>
#9  0x00007f0eb029d7f3 in __GI_abort () at ./stdlib/abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x7f0e983b9d00, sa_sigaction = 0x7f0e983b9d00}, sa_mask = {__val = {398, 139701063841440, 3432, 94106376343816, 139701062207568, 398, 94106364354817, 9, 139701063841440, 398, 398, 1, 94106376343816, 139701063824896, 139701062160471, 1103806595077}}, sa_flags = -1337395328, sa_restorer = 0x7f0eb048b600 <_IO_file_jumps>}
        sigs = {__val = {32, 139700660311452, 139701063841056, 15840675240469249536, 9, 15840675240469249536, 139701063841888, 139700660311648, 94106376288112, 139701062769284, 4297, 398, 94106376343816, 398, 139700660311424, 139701062205293}}
#10 0x00005596d86b1d3e in ut_dbg_assertion_failed (expr=expr@entry=0x5596d9213770 "slot_rec", file=file@entry=0x5596d9213230 "./storage/innobase/page/page0zip.cc", line=line@entry=4297) at ./storage/innobase/ut/ut0dbg.cc:60
No locals.
#11 0x00005596d8ddb888 in page_zip_dir_delete (block=block@entry=0x7f0e8c0f46a0, rec=<optimized out>, index=index@entry=0x7f0e5c0535f0, offsets=offsets@entry=0x7f0e983b97d0, free=0x0, mtr=mtr@entry=0x7f0e983b9d00) at ./storage/innobase/page/page0zip.cc:4297
        page_zip = 0x7f0e8c0f46d8
        page_free = <optimized out>
        garbage = <optimized out>
        slot_rec = 0x0
        n_recs = <optimized out>
        page_n_recs = <optimized out>
        slot_free = <optimized out>
        slot_len = <optimized out>
#12 0x00005596d8dcb06c in page_cur_delete_rec (cursor=<optimized out>, offsets=0x7f0e983b97d0, mtr=0x7f0e983b9d00) at ./storage/innobase/page/page0cur.cc:2362
        cur_dir_slot = <optimized out>
        current_rec = 0x7f0e92306b62 ""
        prev_rec = <optimized out>
        next_rec = <optimized out>
        cur_n_owned = 8
        rec = <optimized out>
        index = 0x7f0e5c0535f0
        block = 0x7f0e8c0f46a0
        cur_slot_no = 152
        slot_rec = <optimized out>
        data_size = <optimized out>
        extra_size = <optimized out>
#13 0x00005596d8e8807e in btr_cur_optimistic_delete (cursor=cursor@entry=0x7f0e983b9aa0, flags=flags@entry=0, mtr=mtr@entry=0x7f0e983b9d00) at ./storage/innobase/btr/btr0cur.cc:4619
        page = <optimized out>
        page_zip = <optimized out>
        block = 0x7f0e8c0f46a0
        rec = 0x7f0e92306b62 ""
        heap = 0x0
        offsets_ = {300, 3, 32774, 4, 8, 12, 0, 0, 11824, 23557, 32526, 0, 39664, 38971, 0, 0, 1, 0, 0, 0, 64144, 27638, 32526, 0, 11456, 35854, 32526, 0, 11824, 23557, 32526, 0, 1, 0, 0, 0, 64144, 27638, 32526, 0, 38992, 38971, 32526, 0, 64032, 27638, 32526, 0, 11824, 23557, 32526, 0, 64032, 27638, 32526, 0, 11824, 23557, 32526, 0, 39664, 38971, 32526, 0, 40608, 38971, 32526, 0, 42204, 55522, 21910, 0, 11456, 35854, 32526, 0, 4088, 0, 0, 0, 1, 0, 0, 0, 46830, 37311, 32526, 0, 39136, 38971, 32526, 0, 0, 0, 0, 0, 300, 15, 32780, 4, 10, 17, 21, 25, 176, 176, 185, 227, 231, 240, 244, 248, 252, 498, 0, 0, 40192, 38971, 32526, 0, 4089, 0, 0, 0, 11456, 35854, 32526, 0, 47973, 37311, 32526, 0, 40192, 38971, 32526, 0, 39168, 38971, 32526, 0, 36003, 55517, 21910, 0, 4, 0, 0, 0, 47973, 37311, 32526, 0, 39312, 38971, 32526, 0, 46156, 55516, 21910, 0, 41, 0, 0, 0, 47973, 37311, 32526, 0, 47340, 37311, 32526, 0, 49140, 37311, 32526, 0, 47340, 37311, 32526, 0, 2, 0, 0, 0, 16384, 0, 0, 0, 39392, 38971, 32526, 0, 33970, 37311, 32526, 0, 49140, 37311, 32526, 0...}
        offsets = <optimized out>
        err = DB_SUCCESS
#14 0x00005596d8f1eef5 in row_undo_ins_remove_sec_low (mode=<optimized out>, index=<optimized out>, entry=<optimized out>, thr=<optimized out>) at ./storage/innobase/row/row0uins.cc:318
        btr_cur = 0x7f0e983b9aa0
        pcur = {btr_cur = {page_cur = {index = 0x7f0e5c0535f0, rec = 0x7f0e92304070 "", offsets = 0x0, block = 0x7f0e8c0f46a0}, purge_node = 0x0, thr = 0x0, flag = BTR_CUR_BINARY, tree_height = 3, up_match = 1, up_bytes = 0, low_match = 3, low_bytes = 0, n_fields = 0, n_bytes = 0, fold = 0, path_arr = 0x0, rtr_info = 0x0}, latch_mode = BTR_MODIFY_LEAF, old_rec = 0x0, old_n_core_fields = 0, old_n_fields = 0, rel_pos = 0, block_when_stored = {m_block = 0x0, m_page_id = {m_id = 0}}, modify_clock = 0, pos_state = BTR_PCUR_IS_POSITIONED, search_mode = PAGE_CUR_LE, trx_if_known = 0x0, old_rec_buf = 0x0, buf_size = 0}
        err = DB_SUCCESS
        mtr = {m_last = 0x7f0e8c0f46a0, m_last_offset = 48, m_log_mode = 0, m_modifications = 1, m_made_dirty = 0, m_inside_ibuf = 0, m_trim_pages = 0, m_memo = {<small_vector_base> = {BeginX = 0x7f0e983b9d20, Size = 2, Capacity = 16}, small = {{object = 0x7f0e5c053768, type = MTR_MEMO_S_LOCK}, {object = 0x7f0e8c0f46a0, type = MTR_MEMO_PAGE_X_MODIFY}, {object = 0x7f0e8c0f46a0, type = MTR_MEMO_PAGE_X_FIX}, {object = 0x7f0e8c097900, type = MTR_MEMO_PAGE_X_FIX}, {object = 0x7f0e8c0468c0, type = MTR_MEMO_PAGE_X_FIX}, {object = 0x7f0e8c002080, type = MTR_MEMO_PAGE_X_MODIFY}, {object = 0x7f0e8c050c80, type = MTR_MEMO_PAGE_X_FIX}, {object = 0x7f0e8c089080, type = MTR_MEMO_PAGE_X_FIX}, {object = 0x7f0e5c04e4f8, type = MTR_MEMO_SPACE_X_LOCK}, {object = 0x7f0e8c08c780, type = MTR_MEMO_PAGE_SX_FIX}, {object = 0x7f0e8c050c80, type = 2956040257}, {object = 0x7f0e68000090, type = 0}, {object = 0x7, type = MTR_MEMO_PAGE_X_FIX}, {object = 0x428c0e2cc0, type = 1152}, {object = 0x7f0e8c139800, type = 605081585}, {object = 0x57, type = 1200}}}, m_log = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7f0e983b9e48, prev = 0x7f0e983b9e48}}, size_ = 1}, m_size = 19, m_first_block = {<ilist_node<void>> = {next = 0x7f0e983b9e28, prev = 0x7f0e983b9e28}, m_data = "<omitted customer data>"..., m_used = 19}}, m_user_space = 0x7f0e5c04e4f8, m_commit_lsn = 0, m_freed_space = 0x0, m_freed_pages = 0x0}
        modify_leaf = true
#15 0x00005596d8f1f4f4 in row_undo_ins_remove_sec (thr=0x7f0e68095bf8, entry=0x7f0e6bfc6430, index=0x7f0e5c0535f0) at ./storage/innobase/row/row0uins.cc:353
        err = <optimized out>
        n_tries = 0
        retry = <optimized out>
        err = <optimized out>
        n_tries = <optimized out>
#16 row_undo_ins_remove_sec_rec (thr=0x7f0e68095bf8, node=0x7f0e6bf6fa20) at ./storage/innobase/row/row0uins.cc:549
        entry = 0x7f0e6bfc6430
        err = DB_SUCCESS
        index = 0x7f0e5c0535f0
        heap = 0x7f0e6bfc63c8
        func_exit = <optimized out>
        err = <optimized out>
        index = <optimized out>
        heap = <optimized out>
        entry = <optimized out>
#17 row_undo_ins (node=node@entry=0x7f0e6bf6fa20, thr=thr@entry=0x7f0e68095bf8) at ./storage/innobase/row/row0uins.cc:601
        err = <optimized out>
        dict_locked = false
#18 0x00005596d8e2ab7b in row_undo (thr=0x7f0e68095bf8, node=0x7f0e6bf6fa20) at ./storage/innobase/row/row0undo.cc:401
        err = <optimized out>
        err = <optimized out>
#19 row_undo_step (thr=thr@entry=0x7f0e68095bf8) at ./storage/innobase/row/row0undo.cc:452
        err = <optimized out>
        node = 0x7f0e6bf6fa20
        trx = 0x7f0ea62aa680
#20 0x00005596d8de36f2 in que_thr_step (thr=0x7f0e68095bf8) at ./storage/innobase/que/que0que.cc:586
        node = 0x7f0e6bf6fa20
        old_thr = 0x7f0e68095bf8
        trx = <optimized out>
        type = <optimized out>
        node = <optimized out>
        old_thr = <optimized out>
        trx = <optimized out>
        type = <optimized out>
#21 que_run_threads_low (thr=<optimized out>) at ./storage/innobase/que/que0que.cc:644
        next_thr = <optimized out>
        trx = <optimized out>
#22 que_run_threads (thr=0x7f0e68095bf8) at ./storage/innobase/que/que0que.cc:664
        trx = 0x7f0ea62aa680
        loop = <optimized out>
#23 0x00005596d8e4b414 in trx_t::rollback_low (this=this@entry=0x7f0ea62aa680, savept=savept@entry=0x7f0ea62aafc0) at ./storage/innobase/trx/trx0roll.cc:125
        thr = 0x7f0e680e8ad8
        heap = 0x7f0e680e8978
        roll_node = 0x7f0e680e89e0
#24 0x00005596d8e4a300 in trx_t::rollback (savept=0x7f0ea62aafc0, this=0x7f0ea62aa680) at ./storage/innobase/trx/trx0roll.cc:176
No locals.
#25 trx_t::rollback (savept=0x7f0ea62aafc0, this=0x7f0ea62aa680) at ./storage/innobase/trx/trx0roll.cc:163
No locals.
#26 trx_rollback_last_sql_stat_for_mysql (trx=trx@entry=0x7f0ea62aa680) at ./storage/innobase/trx/trx0roll.cc:307
        err = <optimized out>
#27 0x00005596d8d4d2e0 in innobase_rollback (hton=<optimized out>, thd=0x7f0e68000c68, rollback_trx=<optimized out>) at ./storage/innobase/handler/ha_innodb.cc:4705
        trx = 0x7f0ea62aa680
        error = <optimized out>
#28 0x00005596d8a4eed0 in ha_rollback_trans (thd=thd@entry=0x7f0e68000c68, all=all@entry=false) at ./sql/handler.cc:2224
        err = <optimized out>
        ht = <optimized out>
        error = 0
        trans = <optimized out>
        ha_info = 0x7f0e68003628
        ha_info_next = <optimized out>
        is_real_trans = <optimized out>
#29 0x00005596d892f9d9 in trans_rollback_stmt (thd=thd@entry=0x7f0e68000c68) at ./sql/transaction.cc:535
No locals.
#30 0x00005596d8802e50 in mysql_execute_command (thd=0x7f0e68000c68, is_called_from_prepared_stmt=<optimized out>) at ./sql/sql_parse.cc:6077
        multi_delete_error = <optimized out>
        wsrep_error_label = <optimized out>
        error = <optimized out>
        res = 1
        up_result = <optimized out>
        lex = 0x7f0e68004d88
        select_lex = <optimized out>
        first_table = <optimized out>
        all_tables = 0x7f0e68013130
        unit = 0x7f0e68004e50
        have_table_map_for_update = <optimized out>
        rpl_filter = <optimized out>
        ots = {ctx = 0x7f0e68004a58, traceable = false}
        orig_binlog_format = <optimized out>
        orig_current_stmt_binlog_format = <optimized out>
#31 0x00005596d88085f7 in mysql_parse (thd=0x7f0e68000c68, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at ./sql/sql_parse.cc:8053
        found_semicolon = <optimized out>
        error = <optimized out>
        lex = 0x7f0e68004d88
        err = <optimized out>
#32 0x00005596d880addd in dispatch_command (command=COM_QUERY, thd=0x7f0e68000c68, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at ./sql/sql_parse.cc:1993
        beginning_of_next_stmt = <optimized out>
        length = <optimized out>
        packet_end = 0x7f0e3b861df4 ""
        parser_state = {m_lip = {lookahead_token = -1, lookahead_yylval = 0x0, m_thd = 0x7f0e68000c68, m_ptr = 0x7f0e3b861df5 "\b", m_tok_start = 0x7f0e3b861df5 "\b", m_tok_end = 0x7f0e3b861df5 "\b", m_end_of_query = 0x7f0e3b861df4 "", m_tok_start_prev = 0x7f0e3b861df4 "", m_buf = 0x7f0e3afb4e00 "INSERT INTO `steps` VALUES\n(11626625,40592,907,\"<omitted customer data>"..., m_buf_length = 9097204, m_echo = true, m_echo_saved = true, m_cpp_buf = 0x7f0e38000070 "INSERT INTO `steps` VALUES\n(11626625,40592,907,\"<omitted customer data>"..., m_cpp_ptr = 0x7f0e388ad064 "", m_cpp_tok_start = 0x7f0e388ad064 "", m_cpp_tok_start_prev = 0x7f0e388ad064 "", m_cpp_tok_end = 0x7f0e388ad064 "", m_body_utf8 = 0x0, m_body_utf8_ptr = 0x0, m_cpp_utf8_processed_ptr = 0x0, next_state = MY_LEX_END, found_semicolon = 0x0, ignore_space = false, stmt_prepare_mode = false, multi_statements = true, yylineno = 26245, m_digest = 0x0, in_comment = NO_COMMENT, in_comment_saved = NO_COMMENT, m_cpp_text_start = 0x7f0e388acf97 "<omitted customer data>"..., m_cpp_text_end = 0x7f0e388ad062 "\")", m_underscore_cs = 0x0}, m_yacc = {yacc_yyss = 0x0, yacc_yyvs = 0x0, m_set_signal_info = {m_item = {0x0 <repeats 12 times>}}, m_lock_type = TL_READ_DEFAULT, m_mdl_type = MDL_SHARED_READ}, m_digest_psi = 0x0}
        net = 0x7f0e68000f20
        error = false
        do_end_of_statement = true
        drop_more_results = false
        __FUNCTION__ = "dispatch_command"
        res = <optimized out>
#33 0x00005596d880cbe8 in do_command (thd=0x7f0e68000c68, blocking=blocking@entry=true) at ./sql/sql_parse.cc:1409
        return_value = <optimized out>
        packet = 0x7f0e3919adb8 "\003INSERT INTO `steps` VALUES\n(11626625,40592,907,\"<omitted customer data>"...
        packet_length = <optimized out>
        net = 0x7f0e68000f20
        command = COM_QUERY
        __FUNCTION__ = "do_command"
#34 0x00005596d891efef in do_handle_one_connection (connect=<optimized out>, put_in_cache=true) at ./sql/sql_connect.cc:1416
        create_user = true
        thr_create_utime = <optimized out>
        thd = 0x7f0e68000c68
#35 0x00005596d891f33d in handle_one_connection (arg=arg@entry=0x5596db163948) at ./sql/sql_connect.cc:1318
        connect = 0x5596db163948
#36 0x00005596d8ca4ce6 in pfs_spawn_thread (arg=0x5596db1180f8) at ./storage/perfschema/pfs.cc:2201
        typed_arg = 0x5596db1180f8
        user_arg = 0x5596db163948
        user_start_routine = 0x5596d891f2e0 <handle_one_connection(void*)>
        pfs = <optimized out>
        klass = <optimized out>
#37 0x00007f0eb0309b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140720411833360, -1557864560374536914, 139700660323904, 1, 139701062244432, 140720411833712, 1476171432797185326, 1476118612295460142}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#38 0x00007f0eb039ba00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Comment by Marko Mäkelä [ 2023-11-21 ]

The posted stack trace is specific to ROW_FORMAT=COMPRESSED. There is an open bug MDEV-31441 with a test case that reproduces corruption on 10.4. This may be related to this failure.

Starting with 10.6, MDEV-515 allows a bulk load mechanism that will use table-level locking and undo logging when inserting into an empty table or partition. This mechanism is enabled by SET unique_checks=0,foreign_key_checks=0.

My understanding is that Myloader would by default attempt to use multiple threads for loading a single table. That will obviously not work with the table-level locking. There exists MDEV-16281 for improving the data loading.

Comment by Bart-Jan Hilbrands [ 2023-11-23 ]

As for the relation to MDEV-31441, if they are indeed the same problem that would mean it was fixed in 10.5 (At least as far as we can tell; we have been running 10.5 for five months now and have not seen the problem reoccur) .

Regarding the table-level locking, we did run myloader with the

--threads 1

option before downgrading to 10.5. But while the problem seemed much less likely to occur, it did eventually still happen.

Comment by Marko Mäkelä [ 2023-11-23 ]

bjhilbrands, sorry, actually I meant MDEV-31574 and not MDEV-31441.

The famous quote of E. W. Dijkstra is applicable here: Program testing can be used to show the presence of bugs, but never to show their absence!

Depending on when transactions are rolled back or rollback or the purge of committed history is executed, the physical layout of index pages or the entire index tree could be very different for the same logical contents. Also in MDEV-31817 we are able to reproduce some corruption on a ROW_FORMAT=COMPRESSED table.

Comment by Marko Mäkelä [ 2023-11-23 ]

It turns out that MDEV-31574 had already been fixed in 10.5 by MDEV-12353. But I am rather certain that there are other bugs that cause corruption of ROW_FORMAT=COMPRESSED tables. We’d only need some data for reproducing them.

Comment by Bart-Jan Hilbrands [ 2023-11-24 ]

Working on generating some data right now! Its going to take a while mostly because I'll have to make sure it is still reproducible with the generated data (Don't see a reason why not), and reproducing it might take several hours.

Comment by Bart-Jan Hilbrands [ 2023-11-24 ]

I have some data ready but the file is too big; https://mariadb.com/kb/en/reporting-bugs/ mentions uploading to a public ftp server, but we are reluctant to use that. Would it be okay to invite you to a private sharepoint folder that contains the archived dump via email?

Comment by Marko Mäkelä [ 2023-11-24 ]

bjhilbrands, I sent you an email.

Comment by Bart-Jan Hilbrands [ 2023-11-24 ]

Shared the drive. Perhaps some more detailed information: it always seems be a steps table index that gets corrupted, but that might also be a statistical thing as the vast majority of the data are steps.

Comment by Marko Mäkelä [ 2023-11-24 ]

Thank you. I will get to this next week, after fixing MDEV-31441.

Comment by Marko Mäkelä [ 2023-11-27 ]

bjhilbrands, can you please reduce the dump so that the error would reproduce much faster? For example:

  1. remove any other tables than steps or any tables that it might be connected via FOREIGN KEY constraints (unless myloader does set foreign_key_checks=0)
  2. remove any non-unique indexes on the table steps other than the one where it crashes; in GDB, execute frame 11 to get to the stack frame of page_zip_dir_delete() and then print index->name to see the name of the index

I have had a single-threaded load of the dump running against a debug instrumented server for over 102 CPU minutes (and some 75 minutes real time) now, and it still has not crashed. The data directory has grown to 4.3 GiB during this time. I think that the main cause of slowdown could be the cmake -DWITH_INNODB_EXTRA_DEBUG:BOOL=ON a.k.a. page_zip_validate(), which I think needs to be enabled to get to the root cause as early as possible.

I am attempting to load the dump as follows:

myloader --directory /dev/shm/data-dump2 --socket /dev/shm/mariadb.sock -u root --threads=1

Comment by Bart-Jan Hilbrands [ 2023-11-27 ]

Uploaded a new package in the same drive containing only the steps table. Reproducing it with one thread is indeed quite painful. In fact, I had like 10 successful restores in a row using --threads=1 while looking into this, which the resulted in us rolling out a fix to only run myloader single threaded everywhere. It then ran fine for about a week, before the same problem occurred (but this time on a server only). Just as a sanity check I'll try to actually reproduce it locally using just 1 thread.

Comment by Bart-Jan Hilbrands [ 2023-11-27 ]

It does make me wonder if there were other factors involved that caused the failure during the single threaded restore.

Comment by Bart-Jan Hilbrands [ 2023-11-27 ]

If it proves too time consuming to reproduce it single threaded, it might also be the most productive to wait for MDEV-16281 and see if it is still reproducible.

Comment by Marko Mäkelä [ 2023-11-29 ]

bjhilbrands, the stack trace of your crash is during the rollback of an INSERT statement. I suppose that the reason is a deadlock between concurrent INSERT transactions. These can occur when there are multiple indexes in a table. I am not familiar with mydumper or myloader, but it would make sense to create the indexes only after loading the data. This should not only prevent the deadlocks and rollbacks from occurring, but also speed up the operation, because only one index tree would have to be updated during the data load, which would allow more efficient use of the buffer pool. Can you try if this would make the crash go away?

In MDEV-30882 I already noted that there are hard-to-fix problems with rollback in ROW_FORMAT=COMPRESSED tables. I was not able to fully fix the bug. A page overflow during a rollback is possible. I have a data load running without --threads=1, but my build option cmake -DWITH_INNODB_EXTRA_DEBUG:BOOL=ON is slowing it down extremely. Only 2 CPU threads are occupied.

Comment by Marko Mäkelä [ 2023-11-29 ]

I have had the data loading running for 6.5 CPU hours (more than 3 wall-clock hours), and there has been no crash so far. The data directory has grown to only 5.1 GiB during this time. The build option that I am using will cause very frequent calls to page_zip_decompress(), which could easily slow down things by an order of magnitude. In any case, this does not look like a feasible option for reproducing the bug.

Comment by Marko Mäkelä [ 2023-11-29 ]

bjhilbrands, the data load had finally completed on my debug build, without any crash:

10:24:18 /dev/shm/10.6/mysql-test$myloader myloader --directory /dev/shm/data-dump-steps-only --socket /dev/shm/mariadb.sock -u root
15:50:30 /dev/shm/10.6/mysql-test$

The data directory size grew to 5.8 GiB, of which 4.0 GiB is the ib_logfile0. The server was shut down successfully, after consuming 706 minutes and 33 seconds of CPU time:

10.6 20b0ec9aae38faaa4d249f67315cc5394eaea22e

2023-11-29 16:36:48 0 [Note] InnoDB: Shutdown completed; log sequence number 13600918514; transaction id 1814

According to the LSN output, it had written some 13 GiB to the circular ib_logfile0.

Comment by Roel Van de Paar [ 2023-11-30 ]

marko mentioned the potential ROLLBACK issues in this ticket during re-testing of MDEV-31817 (that bug did not reproduce anymore in the patched branch 10.6-MDEV-32371 at commit 963a88be2e89100260e09e46e90e2b3271444899), so I added ROLLBACK and START TRANSACTION to the test using a random selection, as follows:

#!/bin/bash
# Start server with --max_connections=10000 --log-bin
# Set variables and ensure ramloc is a ramdisk or tmpfs (i.e. /dev/shm)
 
user="root"
socket="./socket.sock"
db="test"
client="./bin/mariadb"
errorlog="./log/master.err"
ramloc="/dev/shm"
threads=2000   # Number of concurrent threads
queries=100    # Number of t1/t2 INSERTs per thread/per test round
rounds=999999  # Number of max test rounds
 
# Setup
${client} -u ${user} -S ${socket} -D ${db} -e "
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
CREATE TABLE t1 (c1 INT NOT NULL AUTO_INCREMENT, c2 INT NOT NULL, PRIMARY KEY (c1), UNIQUE KEY u1 (c1,c2)) ENGINE=InnoDB AUTO_INCREMENT=1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4; 
CREATE TABLE t2 (c1 DATETIME NOT NULL, c2 DOUBLE NOT NULL, t1_c1 INT NOT NULL, PRIMARY KEY (t1_c1,c1)) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;
"
 
insert_rows(){
  SQL=
  for ((i=0;i<${queries};i++)); do
    rand=$((RANDOM % 4 + 1))
    if [ ${rand} -eq 1 ]; then
      SQL="${SQL}INSERT INTO t1 (c2) VALUES (0);INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID()));"
    elif [ ${rand} -eq 2 ]; then
      SQL="${SQL}START TRANSACTION;"
    else
      SQL="${SQL}ROLLBACK;"
    fi
  done
  ${client} -u ${user} -S ${socket} -D ${db} -e "${SQL}"
  rm -f ${ramloc}/${prefix}_md_proc_${1}  # Thread done
}
 
abort(){ jobs -p | xargs -P100 kill >/dev/null 2>&1; rm -Rf ${ramloc}/${prefix}_md_proc_*; exit 1; }
trap abort SIGINT
 
count=0
prefix="$(echo "${RANDOM}${RANDOM}${RANDOM}" | cut -b1-5)"
rm -f ${ramloc}/${prefix}_md_proc_*
for ((i=0;i<${rounds};i++)); do
  for ((i=0;i<${threads};i++)); do
    if [ ! -r ${ramloc}/${prefix}_md_proc_${i} ]; then  # Thread idle
      touch ${ramloc}/${prefix}_md_proc_${i}  # Thread busy
      insert_rows ${i} &
      if [ $[ ${count} % 1000 ] -eq 0 ]; then
        ${client} -u ${user} -S ${socket} -D ${db} -e "TRUNCATE TABLE t1;"
      fi
      count=$[ ${count} + 1 ]
      if [ $[ ${count} % 100 ] -eq 0 ]; then  # Limit disk I/O, check once every new 100 threads
        echo "Count: ${count}" | tee lastcount.log
        TAIL="$(tail -n10 ${errorlog} | tr -d '\n')"
        if [[ "${TAIL}" == *"ERROR"* ]]; then
          echo '*** Error found:'
          grep -i 'ERROR' log/master.err
          abort
        elif [[ "${TAIL}" == *"down complete"* ]]; then
          echo '*** Server shutdown'
          abort
        elif ! ${client}-admin ping -u ${user} -S ${socket} > /dev/null 2>&1; then
          echo '*** Server gone (killed/crashed)'
          abort
        fi
      fi
    fi
  done
done

And within 111800 iterations observed:

10.6-MDEV-32371 963a88be2e89100260e09e46e90e2b3271444899 (Optimized)

Count: 111500
Count: 111600
Count: 111700
Count: 111800
ERROR 1034 (HY000) at line 1: Index for table 't1' is corrupt; try to repair it
ERROR 1034 (HY000) at line 1: Index for table 't1' is corrupt; try to repair it
ERROR 1034 (HY000) at line 1: Index for table 't1' is corrupt; try to repair it
Count: 111900
*** Error found:
2023-11-30 14:17:25 117465 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-11-30 14:17:25 117470 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-11-30 14:17:25 117468 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-11-30 14:17:25 117468 [ERROR] mariadbd: Index for table 't1' is corrupt; try to repair it
2023-11-30 14:17:25 117465 [ERROR] mariadbd: Index for table 't1' is corrupt; try to repair it
2023-11-30 14:17:25 117470 [ERROR] mariadbd: Index for table 't1' is corrupt; try to repair it

Reflected in the error log as:

10.6.17 963a88be2e89100260e09e46e90e2b3271444899 (Optimized)

2023-11-30 13:51:15 0 [Note] /test/MARKO_MD281123-mariadb-10.6.17-linux-x86_64-opt/bin/mariadbd: ready for connections.
Version: '10.6.17-MariaDB-log'  socket: '/test/MARKO_MD281123-mariadb-10.6.17-linux-x86_64-opt/socket.sock'  port: 11471  MariaDB Server
2023-11-30 13:51:33 2392 [Warning] Aborted connection 2392 to db: 'unconnected' user: 'unauthenticated' host: 'localhost' (This connection closed normally without authentication)
2023-11-30 13:52:37 5935 [Note] InnoDB: Number of pools: 2
2023-11-30 14:17:25 117465 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-11-30 14:17:25 117470 [Note] InnoDB: btr_page_get_parent 2
2023-11-30 14:17:25 117470 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-11-30 14:17:25 117468 [Note] InnoDB: btr_page_get_parent 2
2023-11-30 14:17:25 117468 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-11-30 14:17:25 117468 [ERROR] mariadbd: Index for table 't1' is corrupt; try to repair it
2023-11-30 14:17:25 117465 [ERROR] mariadbd: Index for table 't1' is corrupt; try to repair it
2023-11-30 14:17:25 117470 [ERROR] mariadbd: Index for table 't1' is corrupt; try to repair it

The extra [Note] debug messages are the result of this commit.
I will try to see if I can get an rr trace, however MDEV-31817 has proven to be rr-elusive which likely means this issue will be too.

Comment by Marko Mäkelä [ 2023-11-30 ]

Roel, thank you. It is now clear that the hard-to-fix ROLLBACK issue on ROW_FORMAT=COMPRESSED tables that I identified in MDEV-30882 is rather easy to hit.

I believe that the reason why I was not able to reproduce the problem with bjhilbrands’s data dump was that the relative timing of the SQL threads was different due to the enabled UNIV_ZIP_DEBUG checks, and much fewer rollbacks (due to deadlock or lock wait timeout) took place.

Comment by Bart-Jan Hilbrands [ 2023-11-30 ]

Thank you indeed! This specific restore takes about ~30 minutes to complete for me locally, and it had to ran five times before the problem re-occurred. So even if it was still reproducible with the additional debug checks as you mentioned, it probably would have taken multiple days of non-stop loading for the problem occur, which I agree is not feasible.

Comment by Roel Van de Paar [ 2023-11-30 ]

If it matters for analysis, the script will see the occasional duplicate key:

Count: 1900
Count: 2000
ERROR 1062 (23000) at line 1: Duplicate entry '1610-2023-11-30 19:10:05' for key 'PRIMARY'
ERROR 1062 (23000) at line 1: Duplicate entry '2862-2023-11-30 19:10:05' for key 'PRIMARY'
Count: 2100

Comment by Marko Mäkelä [ 2024-01-25 ]

Roel, are you able to reproduce this?

Comment by Roel Van de Paar [ 2024-01-30 ]

marko Hi! Sorry been busy with Replication and Spider. I setup a new rr run with:

cmake -DCMAKE_C{,XX}_FLAGS='-Og -march=native -mtune=native -Wno-error=maybe-uninitialized -Wno-error=unused-result' -DCMAKE_BUILD_TYPE=Debug -DPLUGIN_PERFSCHEMA=NO -DWITH_DBUG_TRACE=OFF -DWITH_INNODB_EXTRA_DEBUG=ON && make -j128

However I immediately noticed that the processing (done count) seemed to be going much slower, though perhaps it is caused by `rr` slowness.
It looks like threads are getting stuck. I uploaded MDEV-32174_ps.txt for your review. I will let it run though at this rate it will take maybe a week orso to hit the same issue.

Comment by Roel Van de Paar [ 2024-01-30 ]

Confirmed that the slowness is due to rr. Without it, the count goes considerably quicker. Also, the processlist, while still showing 15-20 second old threads at times, does continually clear up.

Comment by Roel Van de Paar [ 2024-01-30 ]

The rr instance ran into:

10.6.17 12d05c8266567ea9a62ccabc3180a90129bf81e0

Version: '10.6.17-MariaDB-debug-log'  socket: '/test/MDEV-32174_mariadb-10.6.17-linux-x86_64_2/socket.sock'  port: 12864  Source distribution
2024-01-30 13:43:02 1571 [Note] InnoDB: Number of pools: 2
2024-01-30 14:08:01 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
240130 14:08:01 [ERROR] mysqld got signal 6 ;

10.6.17 12d05c8266567ea9a62ccabc3180a90129bf81e0

Core was generated by `/test/MDEV-32174_mariadb-10.6.17-linux-x86_64_2/bin/mariadbd --no-defaults --ma'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
[Current thread is 1 (Thread 0x39ef7f1ee700 (LWP 389013))]
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x000055ca95c706fe in my_write_core (sig=sig@entry=6) at /test/10.6_opt/mysys/stacktrace.c:424
#2  0x000055ca954e8a55 in handle_fatal_signal (sig=6) at /test/10.6_opt/sql/signal_handler.cc:357
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x000046220f678859 in __GI_abort () at abort.c:79
#6  0x000055ca95a69f84 in ib::fatal::~fatal (this=<optimized out>, __in_chrg=<optimized out>) at /test/10.6_opt/storage/innobase/ut/ut0ut.cc:515
#7  0x000055ca95a1256f in srv_monitor_task () at /test/10.6_opt/storage/innobase/include/ut0ut.h:393
#8  0x000055ca95c16003 in tpool::thread_pool_generic::timer_generic::run (this=0x55ca97277540) at /test/10.6_opt/tpool/tpool_generic.cc:344
#9  tpool::thread_pool_generic::timer_generic::execute (arg=0x55ca97277540) at /test/10.6_opt/tpool/tpool_generic.cc:364
#10 0x000055ca95c16f06 in tpool::task::execute (this=0x55ca97277580) at /test/10.6_opt/tpool/task.cc:37
#11 0x000055ca95c14a9f in tpool::thread_pool_generic::worker_main (this=0x55ca971bd820, thread_var=0x55ca971bdfe0) at /test/10.6_opt/tpool/tpool_generic.cc:583
#12 0x000055ca95c15e52 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/9/bits/invoke.h:89
#13 std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=<optimized out>) at /usr/include/c++/9/bits/invoke.h:95
#14 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/9/thread:244
#15 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=<optimized out>) at /usr/include/c++/9/thread:251
#16 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=<optimized out>) at /usr/include/c++/9/thread:195
#17 0x000048d363f37de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#18 0x000008d07419b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#19 0x000046220f775133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

marko Do you want to check the rr trace for this?

rr: /test/MDEV-32174_mariadb-10.6.17-linux-x86_64_2/rr/latest-trace

Related: MDEV-24911

Comment by Marko Mäkelä [ 2024-01-30 ]

Roel, I think that most hangs or deadlocks involve only 2 or 3 threads. The rr replay trace that you produced for my analysis contains 2,013 threads. Due to the way how rr record works, this may very easily lead to unfair scheduling and starvation of threads. Such “fake hangs” are best avoided by reducing the number of concurrent threads or by reducing some configuration parameters, or even patching the source code, such as reducing OS_AIO_N_PENDING_IOS_PER_THREAD from 256 to 1, which we did to uncover MDEV-32861 and to reduce the amount of false positive hang reports.

I believe that this particular bug (a corruption on ROLLBACK) should be reproducible with 1 connection thread, or maybe at most 2, if the rollbacks are being provoked not by explicit ROLLBACK statements no by duplicate key errors, but by deadlocks or lock wait timeouts between conflicting transactions. I do not see any reason to use thousands of concurrent connections.

I tried to find the latest threads that are contending on dict_sys.latch, but the trace is huge and the system is loaded with other processes. Already halfway through the trace (at when 2000000) the dict_sys.latch_ex_wait_start was at its final value 1706582093182145 (LANG=C TZ=UTC0 date --date @1706582093.182145 converts it to a readable timestamp). The interesting part of the trace should be somewhere after when 1534726 (timestamp 1706582052682666). In a few minutes I might find it. I only do this to prove that this is a false alarm.

Comment by Marko Mäkelä [ 2024-01-30 ]

The final wait for exclusive dict_sys.latch started in Thread 1014 at when 1680126, in the master callback task. This will block any subsequent waits for dict_sys.latch until the exclusive latch has been granted and released. Here is the stack trace:

10.6 12d05c8266567ea9a62ccabc3180a90129bf81e0

#13 0x000055ca95b3a4a2 in ssux_lock_impl<false>::wr_lock (this=0x55ca965a2340 <dict_sys+64>) at /test/10.6_opt/storage/innobase/dict/dict0dict.cc:966
#14 dict_sys_t::lock_wait (this=this@entry=0x55ca965a2300 <dict_sys>) at /test/10.6_opt/storage/innobase/dict/dict0dict.cc:961
#15 0x000055ca9581f65e in dict_sys_t::lock (this=this@entry=0x55ca965a2300 <dict_sys>) at /test/10.6_opt/storage/innobase/include/dict0dict.h:1524
#16 0x000055ca95b3ee6b in dict_sys_t::evict_table_LRU (this=0x55ca965a2300 <dict_sys>, half=half@entry=false) at /test/10.6_opt/storage/innobase/dict/dict0dict.cc:1300
#17 0x000055ca95a0f5c5 in srv_master_do_idle_tasks (counter_time=counter_time@entry=2321294823926) at /test/10.6_opt/storage/innobase/srv/srv0srv.cc:1465
#18 0x000055ca95a11a4c in srv_master_callback () at /test/10.6_opt/storage/innobase/srv/srv0srv.cc:1516
#19 0x000055ca95c16003 in tpool::thread_pool_generic::timer_generic::run (this=0x55ca972864f0) at /test/10.6_opt/tpool/tpool_generic.cc:344

Everything that I have observed so far suggests that between this point and the end of the rr replay trace, one thread will be holding dict_sys.latch in shared mode. We will need to identify this thread and find out what is blocking it. If it turns out that rr record is neglecting this thread (not executing it at all), then this is the type of “fake hang” that I suspected and expected. To avoid an occasional bug of rr replay that reverse-continue may miss breakpoints or watchpoints, I restarted the trace slightly before the start of the final wait started. At this point, the dict_sys.latch was still unoccupied:

watch -l dict_sys.latch_ex_wait_start._M_i
watch -l dict_sys.latch.readers._M_i
display dict_sys.latch_ex_wait_start
display dict_sys.latch
display dict_sys.latch_ex
run 1600000

At when 1600089, dict_sys.latch was both acquired and released in shared mode by Thread 6, and again reacquired at when 1600174. At this rate it will be several minutes before when 1680126 (the start of the wait in Thread 1014) will be reached.

Comment by Marko Mäkelä [ 2024-01-30 ]

The requests to acquire and release a shared dict_sys.latch in Thread 6 are by the purge coordinator task. Here is one stack trace:

10.6 12d05c8266567ea9a62ccabc3180a90129bf81e0

#3  0x000055ca95a219d1 in purge_sys_t::close_and_reopen (
    this=this@entry=0x55ca96e99e40 <purge_sys>, id=<optimized out>, 
    thd=thd@entry=0x55ca97392688, mdl=mdl@entry=0x5a8517a51ae8)
    at /test/10.6_opt/storage/innobase/trx/trx0purge.cc:1237
#4  0x000055ca95a28811 in trx_purge_attach_undo_recs (
    n_purge_threads=n_purge_threads@entry=4, thd=thd@entry=0x55ca97392688)
    at /test/10.6_opt/storage/innobase/trx/trx0purge.cc:1349
#5  0x000055ca95a28c1a in trx_purge (n_tasks=n_tasks@entry=4, 
    history_size=3911) at /test/10.6_opt/storage/innobase/trx/trx0purge.cc:1453
#6  0x000055ca95a1232d in purge_coordinator_state::do_purge (
    this=0x55ca96e99020 <purge_state>)
    at /test/10.6_opt/storage/innobase/srv/srv0srv.cc:1638
#7  purge_coordinator_callback ()
    at /test/10.6_opt/storage/innobase/srv/srv0srv.cc:1716

However, I realized that the dict_sys.latch was released after when 1680126. The trace is executing extremely slowly. This is the latest that I got, after several hours of waiting:

10.6 12d05c8266567ea9a62ccabc3180a90129bf81e0

Thread 2015 hit Hardware watchpoint 2: -location dict_sys.latch.readers._M_i
 
Old value = 2147483648
New value = 0
ssux_lock_impl<false>::wr_unlock (this=0x55ca965a2340 <dict_sys+64>) at /test/10.6_opt/storage/innobase/include/srw_lock.h:306
306	    writer.wr_unlock();
1: dict_sys.latch_ex_wait_start = {<std::__atomic_base<unsigned long long>> = {_M_i = 0}, <No data fields>}
2: dict_sys.latch_ex = {m = {<std::__atomic_base<unsigned long>> = {_M_i = 0}, <No data fields>}}
3: dict_sys.latch = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 2147483649}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0}, <No data fields>}}
Current event: 2788949
[Switching to Thread 350035.350041]
 
Thread 6 hit Hardware watchpoint 2: -location dict_sys.latch.readers._M_i
 
Old value = 0
New value = 1
0x000055ca95a20e42 in std::__atomic_base<unsigned int>::compare_exchange_weak (__m2=std::memory_order_relaxed, __m1=std::memory_order_acquire, __i2=1, __i1=<synthetic pointer>: <optimized out>, this=<optimized out>) at /usr/include/c++/9/bits/atomic_base.h:448
448	      compare_exchange_weak(__int_type& __i1, __int_type __i2,
1: dict_sys.latch_ex_wait_start = {<std::__atomic_base<unsigned long long>> = {_M_i = 0}, <No data fields>}
2: dict_sys.latch_ex = {m = {<std::__atomic_base<unsigned long>> = {_M_i = 0}, <No data fields>}}
3: dict_sys.latch = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 1}, <No data fields>}}
Current event: 2789015
 
Thread 6 hit Hardware watchpoint 2: -location dict_sys.latch.readers._M_i
 
Old value = 1
New value = 0
0x000055ca95a21123 in ssux_lock_impl<false>::rd_unlock (this=<optimized out>) at /usr/include/c++/9/bits/atomic_base.h:549
549	      fetch_sub(__int_type __i,
1: dict_sys.latch_ex_wait_start = {<std::__atomic_base<unsigned long long>> = {_M_i = 0}, <No data fields>}
2: dict_sys.latch_ex = {m = {<std::__atomic_base<unsigned long>> = {_M_i = 0}, <No data fields>}}
3: dict_sys.latch = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0}, <No data fields>}}
Current event: 2789015

That is, a shared dict_sys.latch was released, reacquired and released at this point, and acquired in shared mode after that. The final wait must have started somewhere between that and the final when count (slightly above 4 million).

I have never seen rr replay traces executing this slow when there is no other load on the system. I think that it must be due to the huge number of threads and the constant context switches between them. I will assume that this is a “fake hang”.

Please try to reproduce the bug of this ticket with a more reasonable configuration.

Comment by Marko Mäkelä [ 2024-01-30 ]

The last I observed before logging off the environment was a shared latch release at when 2790836. The final exclusive lock wait should have started somewhere after that.

Generated at Thu Feb 08 10:29:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.