Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29190

10.6.8 (+uring compiled in but not used) hangs after sudden load spikes

Details

    Description

      We are in the process of troubleshooting a 10.6.8 (compiled with uring) host that shows complete stalls after they get sudden load spikes.

      # ldd /opt/wmf-mariadb106/bin/mysqld | grep uring
      	liburing.so.1 => /lib/x86_64-linux-gnu/liburing.so.1 (0x00007f4d12123000)
      

      Along with Daniel and Vicențiu we got some stack traces
      See files attached for the full output.
      We also got some more specific ones requested by Marko and tracked at https://phabricator.wikimedia.org/P32031 but also pasting them here for the record

      root@db1132:/srv/tmp# gdb /opt/wmf-mariadb106/bin/mysqld core.3344701
      GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
      Copyright (C) 2021 Free Software Foundation, Inc.
      License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
      Type "show copying" and "show warranty" for details.
      This GDB was configured as "x86_64-linux-gnu".
      Type "show configuration" for configuration details.
      For bug reporting instructions, please see:
      <https://www.gnu.org/software/gdb/bugs/>.
      Find the GDB manual and other documentation resources online at:
          <http://www.gnu.org/software/gdb/documentation/>.
       
      For help, type "help".
      Type "apropos word" to search for commands related to "word"...
      Reading symbols from /opt/wmf-mariadb106/bin/mysqld...
      c[New LWP 3344701]
      [New LWP 3344702]
      [New LWP 3344703]
      [New LWP 3344704]
      [New LWP 3344707]
      [New LWP 3344708]
      [New LWP 3344709]
      [New LWP 3344710]
      [New LWP 3344711]
      [New LWP 3344712]
      [New LWP 3344713]
      [New LWP 1390476]
      [New LWP 1394265]
      [New LWP 1396255]
      [New LWP 1396643]
      [New LWP 1397334]
      [New LWP 1397339]
      [New LWP 1397368]
      [New LWP 1397405]
      [New LWP 1397494]
      [New LWP 1397698]
      [New LWP 1397721]
      [New LWP 1397753]
      [New LWP 1397815]
      [New LWP 1397919]
      [New LWP 1397927]
      [New LWP 1398086]
      [New LWP 1398119]
      [New LWP 1398179]
      [New LWP 1398180]
      [New LWP 1398239]
      [New LWP 1398242]
      [New LWP 1398278]
      [New LWP 1398528]
      [New LWP 1398717]
      [New LWP 1398775]
      [New LWP 1398821]
      [New LWP 1398855]
      [New LWP 1398860]
      [New LWP 1398892]
      [New LWP 1398913]
      [New LWP 1398935]
      [New LWP 1399058]
      [New LWP 1399138]
      [New LWP 1399140]
      [New LWP 1399146]
      [New LWP 1399179]
      [New LWP 1399181]
      [New LWP 1399226]
      [New LWP 1399339]
      [New LWP 1399480]
      [New LWP 1399491]
      [New LWP 1399517]
      [New LWP 1399586]
      [New LWP 1399629]
      [New LWP 1399639]
      [New LWP 1399770]
      [New LWP 1399867]
      [New LWP 1399877]
      [New LWP 1399878]
      [New LWP 1399960]
      [New LWP 1400005]
      [New LWP 1400015]
      [New LWP 1400037]
      [New LWP 1400043]
      [New LWP 1400080]
      [New LWP 1400081]
      [New LWP 1400082]
      [New LWP 1400084]
      [New LWP 1400086]
      [New LWP 1400196]
      [New LWP 1400212]
      [New LWP 1400266]
      [New LWP 1400267]
      [New LWP 1400288]
      [New LWP 1400292]
      [New LWP 1400358]
      [New LWP 1400364]
      [New LWP 1400393]
      [New LWP 1400394]
      [New LWP 1400396]
      [New LWP 1400402]
      [New LWP 1401080]
      [New LWP 1401164]
      [New LWP 1401169]
      [New LWP 1401173]
      [New LWP 1401181]
      [New LWP 1402234]
      [New LWP 1402235]
      [New LWP 1402370]
      [New LWP 1402379]
      [New LWP 1402403]
      [New LWP 1402405]
      [New LWP 1402407]
      [New LWP 1402419]
      [New LWP 1402437]
      [New LWP 1402458]
      [New LWP 1402477]
      [New LWP 1402493]
      [New LWP 1402514]
      [New LWP 1402521]
      [New LWP 1402522]
      [New LWP 1402536]
      [New LWP 1402537]
      [New LWP 1402539]
      [New LWP 1402591]
      [New LWP 1402606]
      [New LWP 1402661]
      [New LWP 1402721]
      [New LWP 1402737]
      [New LWP 1402748]
      [New LWP 1402776]
      [New LWP 1402777]
      [New LWP 1402778]
      [New LWP 1402780]
      [New LWP 1402781]
      [New LWP 1402783]
      [New LWP 1402784]
      [New LWP 1402792]
      [New LWP 1402793]
      [New LWP 1402797]
      [New LWP 1402807]
      [New LWP 1402830]
      [New LWP 1402832]
      [New LWP 1402902]
      [New LWP 1402903]
      [New LWP 1402917]
      [New LWP 1402935]
      [New LWP 1402949]
      [New LWP 1402953]
      [New LWP 1402954]
      [New LWP 1402963]
      [New LWP 1402965]
      [New LWP 1402982]
      [New LWP 1402983]
      [New LWP 1402985]
      [New LWP 1402994]
      [New LWP 1402998]
      [New LWP 1403004]
      [New LWP 1403018]
      [New LWP 1403019]
      [New LWP 1403035]
      [New LWP 1403036]
      [New LWP 1403037]
      [New LWP 1403057]
      [New LWP 1403071]
      [New LWP 1403072]
      [New LWP 1403073]
      [New LWP 1403074]
      [New LWP 1403078]
      [New LWP 1403097]
      [New LWP 1403098]
      [New LWP 1403099]
      [New LWP 1403118]
      [New LWP 1403119]
      [New LWP 1403123]
      [New LWP 1403124]
      [New LWP 1403125]
      [New LWP 1403167]
      [New LWP 1403168]
      [New LWP 1403169]
      [New LWP 1403170]
      [New LWP 1403171]
      [New LWP 1403172]
      [New LWP 1403182]
      [New LWP 1403183]
      [New LWP 1403185]
      [New LWP 1403241]
      [New LWP 1403242]
      [New LWP 1403243]
      [New LWP 1403245]
      [New LWP 1403247]
      [New LWP 1403248]
      [New LWP 1403249]
      [New LWP 1403258]
      [New LWP 1403259]
      [New LWP 1403260]
      [New LWP 1403261]
      [New LWP 1403266]
      [New LWP 1403267]
      [New LWP 1403268]
      [New LWP 1403269]
      [New LWP 1403270]
      [New LWP 1403271]
      [New LWP 1403272]
      [New LWP 1403273]
      [New LWP 1403274]
      [New LWP 1403275]
      [New LWP 1403277]
      [New LWP 1403279]
      [New LWP 1403280]
      [New LWP 1403282]
      [New LWP 1403283]
      [New LWP 1403284]
      [New LWP 1403285]
      [New LWP 1403288]
      [New LWP 1403290]
      [New LWP 1403291]
      [New LWP 1403297]
      [New LWP 1403298]
      [New LWP 1403299]
      [New LWP 1403301]
      [New LWP 1403303]
      [New LWP 1403304]
      [New LWP 1403305]
      [New LWP 1403307]
      [New LWP 1403311]
      [New LWP 1403313]
      [New LWP 1403314]
      [New LWP 1403316]
      [New LWP 1403318]
      [New LWP 1403320]
      [New LWP 1403324]
      [New LWP 1403325]
      [New LWP 1403326]
      [New LWP 1403327]
      [New LWP 1403328]
      [New LWP 1403329]
      [New LWP 1403330]
      [New LWP 1403333]
      [New LWP 1403334]
      [New LWP 1403336]
      [New LWP 1403339]
      [New LWP 1403341]
      [New LWP 1403345]
      [New LWP 1403346]
      [New LWP 1403347]
      [New LWP 1403349]
      [New LWP 1403350]
      [New LWP 1403352]
      [New LWP 1403355]
      [New LWP 1403356]
      [New LWP 1403357]
      [New LWP 1403359]
      [New LWP 1403362]
      [New LWP 1403363]
      [New LWP 1403365]
      [New LWP 1403366]
      [New LWP 1403367]
      [New LWP 1403368]
      [New LWP 1403369]
      [New LWP 1403373]
      [New LWP 1403375]
      [New LWP 1403380]
      [New LWP 1403381]
      [New LWP 1403383]
      [New LWP 1403384]
      [New LWP 1403385]
      [New LWP 1403387]
      [New LWP 1403388]
      [New LWP 1403389]
      [New LWP 1403398]
      [New LWP 1403400]
      [New LWP 1403422]
      [New LWP 1403435]
      [New LWP 1403515]
      [New LWP 1403554]
      [New LWP 1403592]
      [New LWP 1403721]
      [New LWP 1403730]
      [New LWP 1403750]
      [New LWP 1403785]
      [New LWP 1403840]
      [New LWP 1403843]
      [New LWP 1403869]
      [New LWP 1404226]
      [New LWP 1404615]
      [New LWP 1404754]
      [New LWP 1404763]
      [New LWP 1404819]
      [New LWP 1404821]
      [New LWP 1404829]
      [New LWP 1404852]
      [New LWP 1404913]
      [New LWP 1404918]
      [New LWP 1404921]
      [New LWP 1404946]
      [New LWP 1404963]
      [New LWP 1404965]
      [New LWP 1404966]
      [New LWP 1404976]
      [New LWP 1404984]
      [New LWP 1404985]
      [New LWP 1404987]
      [New LWP 1405028]
      [New LWP 1476733]
      [New LWP 1516656]
      [New LWP 1527625]
      [New LWP 1532592]
      [New LWP 1532599]
      [New LWP 1532680]
      [New LWP 1532682]
      [New LWP 1532683]
      [New LWP 1532684]
      [New LWP 1532685]
      [New LWP 1532686]
      [New LWP 1532687]
      [New LWP 1532692]
      [New LWP 1532736]
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
      --Type <RET> for more, q to quit, c to continue without paging--c
      Core was generated by `/opt/wmf-mariadb106/bin/mysqld'.
      #0  0x00007fac896ec3ff in poll () from /lib/x86_64-linux-gnu/libc.so.6
      [Current thread is 1 (Thread 0x7fac893631c0 (LWP 3344701))]
      (gdb) thread 11
      [Switching to thread 11 (Thread 0x7fac808e3700 (LWP 3344713))]
      #0  0x00007fac896f19b9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
      (gdb) frame 6
      #6  btr_cur_search_to_nth_level_func (index=index@entry=0x7f4c5406d500, level=level@entry=0, tuple=tuple@entry=0x7f49a9189ab0, mode=mode@entry=PAGE_CUR_LE, latch_mode=<optimized out>, latch_mode@entry=33, cursor=cursor@entry=0x7fac808e05f0, ahi_latch=<optimized out>,
          mtr=<optimized out>, autoinc=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/btr/btr0cur.cc:2229
      2229	/root/mariadb-10.6.8/storage/innobase/btr/btr0cur.cc: No such file or directory.
      (gdb) p *index
      $1 = {static MAX_N_FIELDS = 1023, id = 6376, heap = 0x7f4c5405a008, name = {m_name = 0x7f4c5406d690 "PRIMARY"}, table = 0x7f4c54059c90, page = 3, merge_threshold = 50, type = 3, trx_id_offset = 0, n_user_defined_cols = 2, nulls_equal = 0, n_uniq = 2, n_def = 5,
        n_fields = 5, n_nullable = 1, n_core_fields = 5, n_core_null_bytes = 1, static NO_CORE_NULL_BYTES = 255, static DICT_INDEXES_ID = 3, cached = 1, to_be_dropped = 0, online_status = 0, uncommitted = 0, fields = 0x7f4c5406d730, parser = 0x0, new_vcol_info = 0x0,
        indexes = {prev = 0x0, next = 0x7f4c5406dcc0}, search_info = 0x7f4c5406d828, online_log = 0x0, stat_n_diff_key_vals = 0x7f4c5406d7f8, stat_n_sample_sizes = 0x7f4c5406d808, stat_n_non_null_key_vals = 0x7f4c5406d818, stat_index_size = 1320173,
        stat_n_leaf_pages = 1151220, stats_error_printed = false, stat_defrag_modified_counter = 8582, stat_defrag_n_pages_freed = 0, stat_defrag_n_page_split = 8582, stat_defrag_data_size_sample = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, stat_defrag_sample_next_slot = 0, rtr_ssn = {
          m = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}}, rtr_track = 0x0, trx_id = 0, zip_pad = {mutex = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0,
                  __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, <No data fields>}, pad = {m = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 1408}, <No data fields>}},
          success = 125, failure = 0, n_rounds = 0}, lock = {lock = {pfs_psi = 0x7fac835f9b40, lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648},
              readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 213}, <No data fields>}, static WRITER = 2147483648}}, recursive = 65536, writer = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8,
              _M_i = 140378867906304}, <No data fields>}, static FOR_IO = <optimized out>, static RECURSIVE_X = 1, static RECURSIVE_U = 65536, static RECURSIVE_MAX = <optimized out>}}
      (gdb)
       
      (gdb) thread 11
      [Switching to thread 11 (Thread 0x7fac808e3700 (LWP 3344713))]
      #0  0x00007fac896f19b9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
      (gdb) frame 6
      #6  btr_cur_search_to_nth_level_func (index=index@entry=0x7f4c5406d500, level=level@entry=0, tuple=tuple@entry=0x7f49a9189ab0, mode=mode@entry=PAGE_CUR_LE, latch_mode=<optimized out>, latch_mode@entry=33, cursor=cursor@entry=0x7fac808e05f0, ahi_latch=<optimized out>,
          mtr=<optimized out>, autoinc=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/btr/btr0cur.cc:2229
      2229	/root/mariadb-10.6.8/storage/innobase/btr/btr0cur.cc: No such file or directory.
      (gdb) p index->table->space->chain.start->name
      $1 = 0x564d2408d618 "./enwiki/user_properties.ibd"
       
       
      (gdb) thread 11
      [Switching to thread 11 (Thread 0x7fac808e3700 (LWP 3344713))]
      #0  0x00007fac896f19b9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
      (gdb) frame 6
      #6  btr_cur_search_to_nth_level_func (index=index@entry=0x7f4c5406d500, level=level@entry=0, tuple=tuple@entry=0x7f49a9189ab0, mode=mode@entry=PAGE_CUR_LE, latch_mode=<optimized out>, latch_mode@entry=33, cursor=cursor@entry=0x7fac808e05f0, ahi_latch=<optimized out>,
          mtr=<optimized out>, autoinc=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/btr/btr0cur.cc:2229
      2229	/root/mariadb-10.6.8/storage/innobase/btr/btr0cur.cc: No such file or directory.
      (gdb) p index->table->space->chain.start->handle
      $1 = {m_file = 133, m_psi = 0x7fac8302afc0}
      
      

      Attachments

        1. gdb_db1132_24th_aug.txt
          5.32 MB
        2. gdb.txt
          126 kB
        3. io_slots.txt
          124 kB
        4. mariadb_full_bt_all_threads.txt
          8.01 MB

        Issue Links

          Activity

            marostegui, this looks trivial. In mariadb_full_bt_all_threads.txt, I do not find any occurrence of thread_routine. Hence, this ought to be a duplicate of MDEV-28665, whose fix is included in the upcoming 10.6.9 release.

            Due to this bug, InnoDB would stop processing any page writes. That will eventually stop buffer pool eviction and log writes (because the log checkpoint cannot advance).

            marko Marko Mäkelä added a comment - marostegui , this looks trivial. In mariadb_full_bt_all_threads.txt , I do not find any occurrence of thread_routine . Hence, this ought to be a duplicate of MDEV-28665 , whose fix is included in the upcoming 10.6.9 release. Due to this bug, InnoDB would stop processing any page writes. That will eventually stop buffer pool eviction and log writes (because the log checkpoint cannot advance).

            marko What's not included in the MDEV text is the startup log for the server:

            Jul 19 13:38:06 db1132 mysqld[3344701]: 2022-07-19 13:38:06 0 [Warning] mysqld: io_uring_queue_init() failed with ENOMEM: try larger memory locked limit, ulimit -l, or https://mariadb.com/kb/en/systemd/#configuring-limitmemlock under systemd
            Jul 19 13:38:06 db1132 mysqld[3344701]: 2022-07-19 13:38:06 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF
            

            cvicentiu Vicențiu Ciorbaru added a comment - marko What's not included in the MDEV text is the startup log for the server: Jul 19 13:38:06 db1132 mysqld[3344701]: 2022-07-19 13:38:06 0 [Warning] mysqld: io_uring_queue_init() failed with ENOMEM: try larger memory locked limit, ulimit -l, or https://mariadb.com/kb/en/systemd/#configuring-limitmemlock under systemd Jul 19 13:38:06 db1132 mysqld[3344701]: 2022-07-19 13:38:06 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF

            If this hang was not actually using the io_uring interface (due to some mlock() failure on startup), like danblack told me earlier, then there is not supposed to be any thread_routine among the stacks, but the "simulated AIO" would be used. In the core dump, you should find which thread might be holding waited-for page latches, like this one:

            #7  mtr_t::page_lock (this=this@entry=0x7f4cecb0fe60, block=block@entry=0x7fac2802c380, rw_latch=rw_latch@entry=1) at /root/mariadb-10.6.8/storage/innobase/mtr/mtr0mtr.cc:1240
                    fix_type = MTR_MEMO_PAGE_S_FIX
            

            This block=0x7fac2802c380 (page_id=5458903433219 a.k.a. 0x4F700000003, the primary key root page of some table, possibly user_properties) is being waited for by very many threads. It is possible that this block is in the tpool queue for block writes, but it has not been processed for some reason. wlad knows those data structures better than me, so he can suggest commands for finding blocks that are waiting for writes. I only know that those blocks that are in buf_pool.flush_list with oldest_modification_>1 are waiting to be written out.

            Could you also check the following?

            print *(buf_page_t*)0x7fac2802c380
            

            This only works if the core dump includes the buffer pool (which is not the default ever since MDEV-10814 was implemented).

            marko Marko Mäkelä added a comment - If this hang was not actually using the io_uring interface (due to some mlock() failure on startup), like danblack told me earlier, then there is not supposed to be any thread_routine among the stacks, but the "simulated AIO" would be used. In the core dump, you should find which thread might be holding waited-for page latches, like this one: #7 mtr_t::page_lock (this=this@entry=0x7f4cecb0fe60, block=block@entry=0x7fac2802c380, rw_latch=rw_latch@entry=1) at /root/mariadb-10.6.8/storage/innobase/mtr/mtr0mtr.cc:1240 fix_type = MTR_MEMO_PAGE_S_FIX This block=0x7fac2802c380 (page_id=5458903433219 a.k.a. 0x4F700000003, the primary key root page of some table, possibly user_properties ) is being waited for by very many threads. It is possible that this block is in the tpool queue for block writes, but it has not been processed for some reason. wlad knows those data structures better than me, so he can suggest commands for finding blocks that are waiting for writes. I only know that those blocks that are in buf_pool.flush_list with oldest_modification_>1 are waiting to be written out. Could you also check the following? print *(buf_page_t*)0x7fac2802c380 This only works if the core dump includes the buffer pool (which is not the default ever since MDEV-10814 was implemented).

            Yes, the buffer_pool wasn't included . Our hosts have almost 400GB of buffer_pool configured.

            marostegui Manuel Arostegui added a comment - Yes, the buffer_pool wasn't included . Our hosts have almost 400GB of buffer_pool configured.

            Does the core dump include write_slots (defined in storage/innobase/os/os0file.cc)? That would be the queue that tpool/aio_simulated.cc should have been processing. Even though you can’t dereference IORequest::bpage, the pointer value should be there if it was in the queue.

            I realized that many of the blocked threads are actually waiting for a shared lock. A write would not block them, but a read (in case the block had been evicted from the buffer pool previously) would, because the block would be exclusively latched until the read has completed. So, you should check read_slots as well.

            Reads can be synchronous or asynchronous (triggered by buf_read_page_low()). I do not see anything like buf_read_page among the stack traces. It could be that an asynchronous read was scheduled, but for some reason not processed by the simulated AIO code path.

            marko Marko Mäkelä added a comment - Does the core dump include write_slots (defined in storage/innobase/os/os0file.cc )? That would be the queue that tpool/aio_simulated.cc should have been processing. Even though you can’t dereference IORequest::bpage , the pointer value should be there if it was in the queue. I realized that many of the blocked threads are actually waiting for a shared lock. A write would not block them, but a read (in case the block had been evicted from the buffer pool previously) would, because the block would be exclusively latched until the read has completed. So, you should check read_slots as well. Reads can be synchronous or asynchronous (triggered by buf_read_page_low() ). I do not see anything like buf_read_page among the stack traces. It could be that an asynchronous read was scheduled, but for some reason not processed by the simulated AIO code path.

            Is this what you meant? Let me know if you have more specifics things I can check for you I would be happy to run them:

            (gdb) p write_slots
            $4 = std::unique_ptr<io_slots> = {get() = 0x564d23f2efd0}
            (gdb) p read_slots
            $5 = std::unique_ptr<io_slots> = {get() = 0x564d23f06e80}
            

            marostegui Manuel Arostegui added a comment - Is this what you meant? Let me know if you have more specifics things I can check for you I would be happy to run them: (gdb) p write_slots $4 = std::unique_ptr<io_slots> = {get() = 0x564d23f2efd0} (gdb) p read_slots $5 = std::unique_ptr<io_slots> = {get() = 0x564d23f06e80}

            wlad, can you please help marostegui to extract the data?

            marko Marko Mäkelä added a comment - wlad , can you please help marostegui to extract the data?

            print *(io_slots*)0x564d23f2efd0
            

            might be a step forward.

            If I remember correctly, the contents is a ring buffer, so even if we find a matching buffer pool block address there, it might belong to a removed entry.

            marko Marko Mäkelä added a comment - print *(io_slots*)0x564d23f2efd0 might be a step forward. If I remember correctly, the contents is a ring buffer, so even if we find a matching buffer pool block address there, it might belong to a removed entry.

            gdb.txt
            See attached.
            Thanks Marko!

            marostegui Manuel Arostegui added a comment - gdb.txt See attached. Thanks Marko!
            wlad Vladislav Vaintroub added a comment - - edited

            marko, io_slots are "unused" slots, cached io control blocks, which are borrowed for the duration of async IO.
            We do not have a special registry for "used" slots, those that represent inflight IO. The premise was that OS-level AIO works, and we would not need to debug Linux kernel bugs. If that premise has changed, specifically for Linux we could do also some kind of double linked list with IOs in-flight. I would rather not generalize that to OSes where IO does not have such glaring bugs.

            wlad Vladislav Vaintroub added a comment - - edited marko , io_slots are "unused" slots, cached io control blocks, which are borrowed for the duration of async IO. We do not have a special registry for "used" slots, those that represent inflight IO. The premise was that OS-level AIO works, and we would not need to debug Linux kernel bugs. If that premise has changed, specifically for Linux we could do also some kind of double linked list with IOs in-flight. I would rather not generalize that to OSes where IO does not have such glaring bugs.

            marostegui, I forgot that the m_userdata of each individual entry needs to be explicitly converted to IORequest. But, let me try to find the block address 0x7fac2802c380 in that format:

            (gdb) p (char[8])0x7fac2802c380
            $1 = "\200\303\002(\254\177\000"
            

            There is no occurrence of the string

            \254\177
            

            in gdb.txt, but the data looks otherwise rather reasonable to me.

            Somewhere else, cvicentiu wrote that the file handle of the table enwiki.user_properties was 133. I do not find that in the dump, but there is m_fh = 125.

            Can you also dump the read_slots? It could be more interesting.

            print *(io_slots*)0x564d23f06e80
            

            marko Marko Mäkelä added a comment - marostegui , I forgot that the m_userdata of each individual entry needs to be explicitly converted to IORequest . But, let me try to find the block address 0x7fac2802c380 in that format: (gdb) p (char[8])0x7fac2802c380 $1 = "\200\303\002(\254\177\000" There is no occurrence of the string \254\177 in gdb.txt , but the data looks otherwise rather reasonable to me. Somewhere else, cvicentiu wrote that the file handle of the table enwiki.user_properties was 133. I do not find that in the dump, but there is m_fh = 125 . Can you also dump the read_slots ? It could be more interesting. print *(io_slots*)0x564d23f06e80

            Yeah, what cvicentiu mentioned was:

            Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
            --Type <RET> for more, q to quit, c to continue without paging--c
            Core was generated by `/opt/wmf-mariadb106/bin/mysqld'.
            #0  0x00007fac896ec3ff in poll () from /lib/x86_64-linux-gnu/libc.so.6
            [Current thread is 1 (Thread 0x7fac893631c0 (LWP 3344701))]
            (gdb) thread 11
            [Switching to thread 11 (Thread 0x7fac808e3700 (LWP 3344713))]
            #0  0x00007fac896f19b9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
            (gdb) frame 6
            #6  btr_cur_search_to_nth_level_func (index=index@entry=0x7f4c5406d500, level=level@entry=0, tuple=tuple@entry=0x7f49a9189ab0, mode=mode@entry=PAGE_CUR_LE, latch_mode=<optimized out>, latch_mode@entry=33, cursor=cursor@entry=0x7fac808e05f0, ahi_latch=<optimized out>,
                mtr=<optimized out>, autoinc=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/btr/btr0cur.cc:2229
            2229	/root/mariadb-10.6.8/storage/innobase/btr/btr0cur.cc: No such file or directory.
            (gdb) p index->table->space->chain.start->handle
            $1 = {m_file = 133, m_psi = 0x7fac8302afc0}
            

            That was at: https://phabricator.wikimedia.org/P32031#134291

            See attached the ouput for the requested:

            print *(io_slots*)0x564d23f06e80
            

            io_slots.txt

            marostegui Manuel Arostegui added a comment - Yeah, what cvicentiu mentioned was: Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". --Type <RET> for more, q to quit, c to continue without paging--c Core was generated by `/opt/wmf-mariadb106/bin/mysqld'. #0 0x00007fac896ec3ff in poll () from /lib/x86_64-linux-gnu/libc.so.6 [Current thread is 1 (Thread 0x7fac893631c0 (LWP 3344701))] (gdb) thread 11 [Switching to thread 11 (Thread 0x7fac808e3700 (LWP 3344713))] #0 0x00007fac896f19b9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) frame 6 #6 btr_cur_search_to_nth_level_func (index=index@entry=0x7f4c5406d500, level=level@entry=0, tuple=tuple@entry=0x7f49a9189ab0, mode=mode@entry=PAGE_CUR_LE, latch_mode=<optimized out>, latch_mode@entry=33, cursor=cursor@entry=0x7fac808e05f0, ahi_latch=<optimized out>, mtr=<optimized out>, autoinc=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/btr/btr0cur.cc:2229 2229 /root/mariadb-10.6.8/storage/innobase/btr/btr0cur.cc: No such file or directory. (gdb) p index->table->space->chain.start->handle $1 = {m_file = 133, m_psi = 0x7fac8302afc0} That was at: https://phabricator.wikimedia.org/P32031#134291 See attached the ouput for the requested: print *(io_slots*)0x564d23f06e80 io_slots.txt

            In io_slots.txt there are several occurrences of 0x7fac (little-endian: \254\177), but no match for 0x7fac2802c380. Also, there is no m_fh=133 there. I do not have any other ideas how we could find more information about that problematic block. It may still be a bug in the simulated AIO code, but I am not familiar with that code.

            marko Marko Mäkelä added a comment - In io_slots.txt there are several occurrences of 0x7fac (little-endian: \254\177), but no match for 0x7fac2802c380. Also, there is no m_fh=133 there. I do not have any other ideas how we could find more information about that problematic block. It may still be a bug in the simulated AIO code, but I am not familiar with that code.

            Theoretically, this could be the same bug as MDEV-27983, but without having access to the buffer pool contents, we will never know that. Next time such a hang occurs, it would be good idea to attach gdb to the server process and extract as much data as possible.

            marko Marko Mäkelä added a comment - Theoretically, this could be the same bug as MDEV-27983 , but without having access to the buffer pool contents, we will never know that. Next time such a hang occurs, it would be good idea to attach gdb to the server process and extract as much data as possible.
            marostegui Manuel Arostegui added a comment - - edited

            So we had the server from the state it got after the hang and this is what we ran:

            gcore $PID
            

            Is there something else more specific you'd like us to get when this happens again? Or would this be enough to start with: https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#getting-full-backtraces-for-all-threads-from-a-running-mysqld-process

            The "good" thing about all this is that the mysqld process doesn't seem to be crashing.

            marostegui Manuel Arostegui added a comment - - edited So we had the server from the state it got after the hang and this is what we ran: gcore $PID Is there something else more specific you'd like us to get when this happens again? Or would this be enough to start with: https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#getting-full-backtraces-for-all-threads-from-a-running-mysqld-process The "good" thing about all this is that the mysqld process doesn't seem to be crashing.
            wlad Vladislav Vaintroub added a comment - - edited

            The io_slots printout says that all IOs in question (in this case read ios) have finished,. I derive that from
            "m_cache = std::vector of length 1024, capacity 1024"

            Explanation : Borrowing an aiocb from io_slots cache ends up in "std::vector::pop_back()" which decreases the vector length. If vector length == vector capacity, then nothing is borrowed . Not only that, also the callback function was called, because tpool::aiocb is returned to the cache within the IO callback

            So, there is no read AIO pending
            marko, FYI

            wlad Vladislav Vaintroub added a comment - - edited The io_slots printout says that all IOs in question (in this case read ios) have finished,. I derive that from "m_cache = std::vector of length 1024, capacity 1024" Explanation : Borrowing an aiocb from io_slots cache ends up in "std::vector::pop_back()" which decreases the vector length. If vector length == vector capacity, then nothing is borrowed . Not only that, also the callback function was called, because tpool::aiocb is returned to the cache within the IO callback So, there is no read AIO pending marko , FYI
            danblack Daniel Black added a comment -

            m_fh=125, page.ibd (link to all fds) which from the PROCESS LIST doesn't show up in an active table query (nor in the more complete queries in INNODB STATUS].

            SHOW ENGINE INNODB STATUS lists Pending reads 0 so at least buf_page_t::read_complete was started.

            danblack Daniel Black added a comment - m_fh=125, page.ibd (link to all fds) which from the PROCESS LIST doesn't show up in an active table query (nor in the more complete queries in INNODB STATUS]. SHOW ENGINE INNODB STATUS lists Pending reads 0 so at least buf_page_t::read_complete was started.

            marostegui, based on what I currently know, this could be a duplicate of MDEV-27983. Coincidentally, sstamm posted there some new GDB output on the day after this bug was filed.

            marko Marko Mäkelä added a comment - marostegui , based on what I currently know, this could be a duplicate of MDEV-27983 . Coincidentally, sstamm posted there some new GDB output on the day after this bug was filed.
            wlad Vladislav Vaintroub added a comment - - edited

            Previously , we established that all the backghround IOs have finished, and all post-io callbacks have finished, too.

            Now there are 2 threads that are hanging, one foreground( hangs in btr_cur_search_to_nth_level_func, deep down in srw_mutex_impl<true>::wait), and one background (purge_coordinator_task, deep down in buf_page_get_low reading the page).

            Why are they hanging, is mystery to me, but I think the author of srw_mutex_impl<true>::wait perhaps can find out the holder of the mutex, and perhaps what that holder does, and eventually find the place where the holder left without unlocking the synchronization primitive.

            wlad Vladislav Vaintroub added a comment - - edited Previously , we established that all the backghround IOs have finished, and all post-io callbacks have finished, too. Now there are 2 threads that are hanging, one foreground( hangs in btr_cur_search_to_nth_level_func, deep down in srw_mutex_impl<true>::wait), and one background (purge_coordinator_task, deep down in buf_page_get_low reading the page). Why are they hanging, is mystery to me, but I think the author of srw_mutex_impl<true>::wait perhaps can find out the holder of the mutex, and perhaps what that holder does, and eventually find the place where the holder left without unlocking the synchronization primitive.

            wlad, according to the information that my previous comment referred to, the number of threads waiting for each blocked page latch matches the number of waiters that the lock word knows about. In other words, it appears that each of those page latches is being held in exclusive mode. My primary suspect is that a page read request was not fully processed.

            marko Marko Mäkelä added a comment - wlad , according to the information that my previous comment referred to, the number of threads waiting for each blocked page latch matches the number of waiters that the lock word knows about. In other words, it appears that each of those page latches is being held in exclusive mode. My primary suspect is that a page read request was not fully processed.

            marko we've had two more issues of this and the hosts haven't been touched yet. I will be talking to danblack anyways to see if we can do some troubleshooting aside but, is there anything specific you'd like me to run/capture on those two hosts that could help you?

            Thanks

            marostegui Manuel Arostegui added a comment - marko we've had two more issues of this and the hosts haven't been touched yet. I will be talking to danblack anyways to see if we can do some troubleshooting aside but, is there anything specific you'd like me to run/capture on those two hosts that could help you? Thanks

            marostegui, if the use of the InnoDB adaptive hash index is enabled, then MDEV-29336 might explain this.

            marko Marko Mäkelä added a comment - marostegui , if the use of the InnoDB adaptive hash index is enabled, then MDEV-29336 might explain this.

            Thanks for the fast answer, in our case, it is disabled:

            root@db1132.eqiad.wmnet[(none)]> show global variables like 'innodb_adaptive_hash_index';
            +----------------------------+-------+
            | Variable_name              | Value |
            +----------------------------+-------+
            | innodb_adaptive_hash_index | OFF   |
            +----------------------------+-------+
            1 row in set (0.002 sec)
            

            marostegui Manuel Arostegui added a comment - Thanks for the fast answer, in our case, it is disabled: root@db1132.eqiad.wmnet[(none)]> show global variables like 'innodb_adaptive_hash_index'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | innodb_adaptive_hash_index | OFF | +----------------------------+-------+ 1 row in set (0.002 sec)

            Per my chat with Daniel, sending here more traces gdb_db1132_24th_aug.txt
            Commands run

            set print static-members off
            set print frame-arguments all
            thread apply all backtrace full
            thread apply all -s -- frame apply level 2 -s print/x *this
            p *read_slots
            p *write_slots
            

            As p *read_slots and p *write_slots didn't work, I included the output for p read_slots and p write_slots

            marostegui Manuel Arostegui added a comment - Per my chat with Daniel, sending here more traces gdb_db1132_24th_aug.txt Commands run set print static-members off set print frame-arguments all thread apply all backtrace full thread apply all -s -- frame apply level 2 -s print/x *this p *read_slots p *write_slots As p *read_slots and p *write_slots didn't work, I included the output for p read_slots and p write_slots

            The read_slots and write_slots are wlad’s creation, and he should be able to help you dump their contents. I don’t think that the following at the end of gdb_db1132_24th_aug.txt is very helpful:

            Could not find operator*.
            Could not find operator*.
            $46 = {_M_t = {<std::__uniq_ptr_impl<io_slots, std::default_delete<io_slots> >> = {_M_t = {<std::_Tuple_impl<0, io_slots*, std::default_delete<io_slots> >> = {<std::_Tuple_impl<1, std::default_delete<io_slots> >> = {<std::_Head_base<1, std::default_delete<io_slots>, true>> = {<std::default_delete<io_slots>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, io_slots*, false>> = {_M_head_impl = 0x55f7467b5e80}, <No data fields>}, <No data fields>}}, <No data fields>}}
            $47 = {_M_t = {<std::__uniq_ptr_impl<io_slots, std::default_delete<io_slots> >> = {_M_t = {<std::_Tuple_impl<0, io_slots*, std::default_delete<io_slots> >> = {<std::_Tuple_impl<1, std::default_delete<io_slots> >> = {<std::_Head_base<1, std::default_delete<io_slots>, true>> = {<std::default_delete<io_slots>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, io_slots*, false>> = {_M_head_impl = 0x55f7467e9fd0}, <No data fields>}, <No data fields>}}, <No data fields>}}
            

            I can identify 3 InnoDB latches in the output:

            sed -ne 's/^\(#2 .*this=\)/\1/p' < thread_apply_all_backtrace |sort|uniq -c
            

            The input file was the prefix of gdb_db1132_24th_aug.txt that contains the first output of the command thread apply all backtrace. The output was:

                  1 #2  srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7eff4d04bd10) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:326
                 39 #2  srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7f22741195f8) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:326
                  1 #2  ssux_lock_impl<true>::wr_wait (this=0x7f22741195f8, lk=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:378
                  1 #2  ssux_lock_impl<true>::wr_wait (this=this@entry=0x7f1d1408a538, lk=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:378
            

            I omitted the output related to std::condition_variable, which supposedly is for tpool::thread_pool_generic::wait_for_tasks.

            The address 0x7f22741195f8 occurs in 39+1 stack frames. Let us check each of the InnoDB latches:

            Thread 53 (Thread 0x7f00148c1700 (LWP 4008608) "mysqld"):
            #2  srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7eff4d04bd10) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:326
            $22 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000002}, <No data fields>}}
            

            The page latch is exclusively latched, and one waiter (thread 53) exists. This thread is waiting for a shared latch on page id 4131762166058 = 0x3c20037592a. There is no sign of the thread that would hold the exclusive latch. It would have helped to dump the entire contents of the page latch, including the writer member, which would identify the owning thread of the exclusive lock.

            Thread 65 (Thread 0x7eff7b767700 (LWP 4008909) "mysqld"):
            #2  ssux_lock_impl<true>::wr_wait (this=this@entry=0x7f1d1408a538, lk=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:378
            $17 = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000001}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000001}, <No data fields>}}
            

            Here we have a little more information of the lock object. The output includes the first component sux_lock::lock but not the subsequent fields. The blocked thread is attempting to lock page identifier 4131760041487 = 0x3c20016ee0f exclusively, but some other thread is apparently holding it in shared mode. In the attachment, I did not find any references to the block in other threads.

            Last, let us check the most popular latch, for page identifier 4131758538755 = 0x3c200000003. That is the clustered index root page of tablespace 0x3c2 (the same tablespace that was also the subject of the other page latch conflicts). The most information for the latch contents we have in this output:

            Thread 12 (Thread 0x7f5f586ac700 (LWP 2484735) "mysqld"):
            #2  ssux_lock_impl<true>::wr_wait (this=0x7f22741195f8, lk=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:378
            $45 = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000028}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000002}, <No data fields>}}
            

            The first component is 0x28 (40), meaning that one thread is holding an exclusive latch (or waiting for readers to exit), and 39 threads are waiting for the exclusive latch to be released. Coincidentally, there are 39 threads waiting for a shared latch, and Thread 12 is waiting for an exclusive latch to be granted. It is a mystery which thread is holding the exclusive latch that is blocking everyone else.

            The following commands should have answered the question: which threads are holding the exclusive latches?

            print/x *(block_lock*)0x7eff4d04bd10
            print/x *(block_lock*)0x7f1d1408a538
            print/x *(block_lock*)0x7f22741195f8
            

            marko Marko Mäkelä added a comment - The read_slots and write_slots are wlad ’s creation, and he should be able to help you dump their contents. I don’t think that the following at the end of gdb_db1132_24th_aug.txt is very helpful: Could not find operator*. Could not find operator*. $46 = {_M_t = {<std::__uniq_ptr_impl<io_slots, std::default_delete<io_slots> >> = {_M_t = {<std::_Tuple_impl<0, io_slots*, std::default_delete<io_slots> >> = {<std::_Tuple_impl<1, std::default_delete<io_slots> >> = {<std::_Head_base<1, std::default_delete<io_slots>, true>> = {<std::default_delete<io_slots>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, io_slots*, false>> = {_M_head_impl = 0x55f7467b5e80}, <No data fields>}, <No data fields>}}, <No data fields>}} $47 = {_M_t = {<std::__uniq_ptr_impl<io_slots, std::default_delete<io_slots> >> = {_M_t = {<std::_Tuple_impl<0, io_slots*, std::default_delete<io_slots> >> = {<std::_Tuple_impl<1, std::default_delete<io_slots> >> = {<std::_Head_base<1, std::default_delete<io_slots>, true>> = {<std::default_delete<io_slots>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, io_slots*, false>> = {_M_head_impl = 0x55f7467e9fd0}, <No data fields>}, <No data fields>}}, <No data fields>}} I can identify 3 InnoDB latches in the output: sed -ne 's/^\(#2 .*this=\)/\1/p' < thread_apply_all_backtrace |sort|uniq -c The input file was the prefix of gdb_db1132_24th_aug.txt that contains the first output of the command thread apply all backtrace . The output was: 1 #2 srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7eff4d04bd10) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:326 39 #2 srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7f22741195f8) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:326 1 #2 ssux_lock_impl<true>::wr_wait (this=0x7f22741195f8, lk=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:378 1 #2 ssux_lock_impl<true>::wr_wait (this=this@entry=0x7f1d1408a538, lk=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:378 I omitted the output related to std::condition_variable , which supposedly is for tpool::thread_pool_generic::wait_for_tasks . The address 0x7f22741195f8 occurs in 39+1 stack frames. Let us check each of the InnoDB latches: Thread 53 (Thread 0x7f00148c1700 (LWP 4008608) "mysqld"): #2 srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7eff4d04bd10) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:326 $22 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000002}, <No data fields>}} The page latch is exclusively latched, and one waiter (thread 53) exists. This thread is waiting for a shared latch on page id 4131762166058 = 0x3c20037592a. There is no sign of the thread that would hold the exclusive latch. It would have helped to dump the entire contents of the page latch, including the writer member, which would identify the owning thread of the exclusive lock. Thread 65 (Thread 0x7eff7b767700 (LWP 4008909) "mysqld"): #2 ssux_lock_impl<true>::wr_wait (this=this@entry=0x7f1d1408a538, lk=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:378 $17 = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000001}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000001}, <No data fields>}} Here we have a little more information of the lock object. The output includes the first component sux_lock::lock but not the subsequent fields. The blocked thread is attempting to lock page identifier 4131760041487 = 0x3c20016ee0f exclusively, but some other thread is apparently holding it in shared mode. In the attachment, I did not find any references to the block in other threads. Last, let us check the most popular latch, for page identifier 4131758538755 = 0x3c200000003. That is the clustered index root page of tablespace 0x3c2 (the same tablespace that was also the subject of the other page latch conflicts). The most information for the latch contents we have in this output: Thread 12 (Thread 0x7f5f586ac700 (LWP 2484735) "mysqld"): #2 ssux_lock_impl<true>::wr_wait (this=0x7f22741195f8, lk=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:378 $45 = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000028}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000002}, <No data fields>}} The first component is 0x28 (40), meaning that one thread is holding an exclusive latch (or waiting for readers to exit), and 39 threads are waiting for the exclusive latch to be released. Coincidentally, there are 39 threads waiting for a shared latch, and Thread 12 is waiting for an exclusive latch to be granted. It is a mystery which thread is holding the exclusive latch that is blocking everyone else. The following commands should have answered the question: which threads are holding the exclusive latches? print/x *(block_lock*)0x7eff4d04bd10 print/x *(block_lock*)0x7f1d1408a538 print/x *(block_lock*)0x7f22741195f8

            This looks very much like MDEV-27983.

            marko Marko Mäkelä added a comment - This looks very much like MDEV-27983 .

            Thanks Marko for the extensive review, here you are the requested commands:

            (gdb) print/x *(block_lock*)0x7eff4d04bd10
            $1 = {lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 0x4,
                      _M_i = 0x80000002}, <No data fields>}, static HOLDER = 0x80000000}, readers = {<std::__atomic_base<unsigned int>> = {
                    static _S_alignment = 0x4, _M_i = 0x80000000}, <No data fields>}, static WRITER = 0x80000000}, recursive = 0x1,
              writer = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 0x8, _M_i = 0x7f0014ae7700}, <No data fields>},
              static FOR_IO = 0xffffffffffffffff, static RECURSIVE_X = 0x1, static RECURSIVE_U = 0x10000, static RECURSIVE_MAX = <optimized out>}
             
            (gdb) print/x *(block_lock*)0x7f1d1408a538
            $2 = {lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 0x4,
                      _M_i = 0x80000001}, <No data fields>}, static HOLDER = 0x80000000}, readers = {<std::__atomic_base<unsigned int>> = {
                    static _S_alignment = 0x4, _M_i = 0x80000001}, <No data fields>}, static WRITER = 0x80000000}, recursive = 0x0,
              writer = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 0x8, _M_i = 0x0}, <No data fields>},
              static FOR_IO = 0xffffffffffffffff, static RECURSIVE_X = 0x1, static RECURSIVE_U = 0x10000, static RECURSIVE_MAX = <optimized out>}
             
            (gdb) print/x *(block_lock*)0x7f22741195f8
            $3 = {lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 0x4,
                      _M_i = 0x80000028}, <No data fields>}, static HOLDER = 0x80000000}, readers = {<std::__atomic_base<unsigned int>> = {
                    static _S_alignment = 0x4, _M_i = 0x80000002}, <No data fields>}, static WRITER = 0x80000000}, recursive = 0x0,
              writer = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 0x8, _M_i = 0x0}, <No data fields>},
              static FOR_IO = 0xffffffffffffffff, static RECURSIVE_X = 0x1, static RECURSIVE_U = 0x10000, static RECURSIVE_MAX = <optimized out>}
            

            marostegui Manuel Arostegui added a comment - Thanks Marko for the extensive review, here you are the requested commands: (gdb) print/x *(block_lock*)0x7eff4d04bd10 $1 = {lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 0x4, _M_i = 0x80000002}, <No data fields>}, static HOLDER = 0x80000000}, readers = {<std::__atomic_base<unsigned int>> = { static _S_alignment = 0x4, _M_i = 0x80000000}, <No data fields>}, static WRITER = 0x80000000}, recursive = 0x1, writer = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 0x8, _M_i = 0x7f0014ae7700}, <No data fields>}, static FOR_IO = 0xffffffffffffffff, static RECURSIVE_X = 0x1, static RECURSIVE_U = 0x10000, static RECURSIVE_MAX = <optimized out>}   (gdb) print/x *(block_lock*)0x7f1d1408a538 $2 = {lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 0x4, _M_i = 0x80000001}, <No data fields>}, static HOLDER = 0x80000000}, readers = {<std::__atomic_base<unsigned int>> = { static _S_alignment = 0x4, _M_i = 0x80000001}, <No data fields>}, static WRITER = 0x80000000}, recursive = 0x0, writer = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 0x8, _M_i = 0x0}, <No data fields>}, static FOR_IO = 0xffffffffffffffff, static RECURSIVE_X = 0x1, static RECURSIVE_U = 0x10000, static RECURSIVE_MAX = <optimized out>}   (gdb) print/x *(block_lock*)0x7f22741195f8 $3 = {lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 0x4, _M_i = 0x80000028}, <No data fields>}, static HOLDER = 0x80000000}, readers = {<std::__atomic_base<unsigned int>> = { static _S_alignment = 0x4, _M_i = 0x80000002}, <No data fields>}, static WRITER = 0x80000000}, recursive = 0x0, writer = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 0x8, _M_i = 0x0}, <No data fields>}, static FOR_IO = 0xffffffffffffffff, static RECURSIVE_X = 0x1, static RECURSIVE_U = 0x10000, static RECURSIVE_MAX = <optimized out>}

            marostegui, the only non-nullptr writer thread identiifer that we got is 0x7f0014ae7700. For future reference, you could have tried to identify the thread with the following:

            thread find 0x7f0014ae7700
            

            In gdb_db1132_24th_aug.txt, I found a match for this:

            mariadb-10.6.8

            Thread 72 (Thread 0x7f0014ae7700 (LWP 4009073) "mysqld"):
            #0  0x00007f5faf4b1c61 in clock_nanosleep () from target:/lib/x86_64-linux-gnu/libc.so.6
            No symbol table info available.
            #1  0x00007f5faf4b7443 in nanosleep () from target:/lib/x86_64-linux-gnu/libc.so.6
            No symbol table info available.
            #2  0x000055f744758bbb in std::this_thread::sleep_for<long, std::ratio<1l, 1000000l> > (__rtime=<optimized out>, __rtime=<optimized out>) at /usr/include/c++/10/thread:401
                    __s = {__r = <optimized out>}
                    __ns = {__r = <optimized out>}
                    __ts = {tv_sec = 0, tv_nsec = 94044}
                    __s = {__r = <optimized out>}
                    __ns = {__r = <optimized out>}
                    __ts = {tv_sec = <optimized out>, tv_nsec = <optimized out>}
            #3  buf_page_get_low (page_id={m_id = 4131762166058}, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=10, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/buf/buf0buf.cc:2812
            

            This is trying to access page 0x3c20037592a (again the same tablespace 0x3c2) while apparently holding the page latch. Let us see what is blocking it:

            	if (UNIV_UNLIKELY(!block->page.frame)) {
            		if (!block->page.lock.x_lock_try()) {
            			/* The page is being read or written, or
            			another thread is executing buf_zip_decompress()
            			in buf_page_get_low() on it. */
            			block->page.unfix();
            			std::this_thread::sleep_for(
            				std::chrono::microseconds(100));
            			goto loop;
            		}
             
            		buf_block_t *new_block = buf_LRU_get_free_block(false);
            		buf_block_init_low(new_block);
             
            wait_for_unfix:
            		mysql_mutex_lock(&buf_pool.mutex);
            		page_hash_latch& hash_lock=buf_pool.page_hash.lock_get(chain);
             
            		/* It does not make sense to use
            		transactional_lock_guard here, because buf_relocate()
            		would likely make a  memory transaction too large. */
            		hash_lock.lock();
             
            		/* block->page.lock implies !block->page.can_relocate() */
            		ut_ad(&block->page == buf_pool.page_hash.get(page_id, chain));
             
            		/* Wait for any other threads to release their buffer-fix
            		on the compressed-only block descriptor.
            		FIXME: Never fix() before acquiring the lock.
            		Only in buf_page_get_gen(), buf_page_get_low(), buf_page_free()
            		we are violating that principle. */
            		state = block->page.state();
             
            		switch (state) {
            		case buf_page_t::UNFIXED + 1:
            		case buf_page_t::IBUF_EXIST + 1:
            		case buf_page_t::REINIT + 1:
            			break;
            		default:
            			ut_ad(state < buf_page_t::READ_FIX);
             
            			if (state < buf_page_t::UNFIXED + 1) {
            				ut_ad(state > buf_page_t::FREED);
            				ut_ad(mode == BUF_GET_POSSIBLY_FREED
            				      || mode == BUF_PEEK_IF_IN_POOL);
            				block->page.unfix();
            				block->page.lock.x_unlock();
            				hash_lock.unlock();
            				buf_LRU_block_free_non_file_page(new_block);
            				mysql_mutex_unlock(&buf_pool.mutex);
            				return nullptr;
            			}
             
            			mysql_mutex_unlock(&buf_pool.mutex);
            			hash_lock.unlock();
            			std::this_thread::sleep_for(
            				std::chrono::microseconds(100));
            			goto wait_for_unfix;
            		}
            

            The block pointers are unavailable in the output, but I found Thread 53 is waiting on the same page identifier:

            mariadb-10.6.8

            #6  sux_lock<ssux_lock_impl<true> >::s_lock (this=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/include/sux_lock.h:362
            No locals.
            #7  buf_page_get_low (page_id={m_id = 4131762166058}, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=10, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/buf/buf0buf.cc:2698
            

            Thread 53 is blocked waiting for a shared latch on the page, because Thread 72 is holding the exclusive page latch.

            It seems possible to me that Thread 72 is waiting for Thread 53 to release its buffer-fix on the block. It would also be helpful to know the value of block->page.zip.fix.
            I am guessing here, but it seems that print ((buf_page_t*)0)->lock would return 0x18. If that is the case, we’d get the entire block descriptor shown by the following:

            print *(buf_block_t*)0x7eff4d04bcf8
            

            This code is related to ROW_FORMAT=COMPRESSED pages. Can you confirm that the table uses ROW_FORMAT=COMPRESSED?

            marko Marko Mäkelä added a comment - marostegui , the only non- nullptr writer thread identiifer that we got is 0x7f0014ae7700 . For future reference, you could have tried to identify the thread with the following: thread find 0x7f0014ae7700 In gdb_db1132_24th_aug.txt , I found a match for this: mariadb-10.6.8 Thread 72 (Thread 0x7f0014ae7700 (LWP 4009073) "mysqld"): #0 0x00007f5faf4b1c61 in clock_nanosleep () from target:/lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #1 0x00007f5faf4b7443 in nanosleep () from target:/lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #2 0x000055f744758bbb in std::this_thread::sleep_for<long, std::ratio<1l, 1000000l> > (__rtime=<optimized out>, __rtime=<optimized out>) at /usr/include/c++/10/thread:401 __s = {__r = <optimized out>} __ns = {__r = <optimized out>} __ts = {tv_sec = 0, tv_nsec = 94044} __s = {__r = <optimized out>} __ns = {__r = <optimized out>} __ts = {tv_sec = <optimized out>, tv_nsec = <optimized out>} #3 buf_page_get_low (page_id={m_id = 4131762166058}, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=10, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/buf/buf0buf.cc:2812 This is trying to access page 0x3c20037592a (again the same tablespace 0x3c2) while apparently holding the page latch. Let us see what is blocking it: if (UNIV_UNLIKELY(!block->page.frame)) { if (!block->page.lock.x_lock_try()) { /* The page is being read or written, or another thread is executing buf_zip_decompress() in buf_page_get_low() on it. */ block->page.unfix(); std::this_thread::sleep_for( std::chrono::microseconds(100)); goto loop; }   buf_block_t *new_block = buf_LRU_get_free_block( false ); buf_block_init_low(new_block);   wait_for_unfix: mysql_mutex_lock(&buf_pool.mutex); page_hash_latch& hash_lock=buf_pool.page_hash.lock_get(chain);   /* It does not make sense to use transactional_lock_guard here, because buf_relocate() would likely make a memory transaction too large. */ hash_lock.lock();   /* block->page.lock implies !block->page.can_relocate() */ ut_ad(&block->page == buf_pool.page_hash.get(page_id, chain));   /* Wait for any other threads to release their buffer-fix on the compressed-only block descriptor. FIXME: Never fix() before acquiring the lock. Only in buf_page_get_gen(), buf_page_get_low(), buf_page_free() we are violating that principle. */ state = block->page.state();   switch (state) { case buf_page_t::UNFIXED + 1: case buf_page_t::IBUF_EXIST + 1: case buf_page_t::REINIT + 1: break ; default : ut_ad(state < buf_page_t::READ_FIX);   if (state < buf_page_t::UNFIXED + 1) { ut_ad(state > buf_page_t::FREED); ut_ad(mode == BUF_GET_POSSIBLY_FREED || mode == BUF_PEEK_IF_IN_POOL); block->page.unfix(); block->page.lock.x_unlock(); hash_lock.unlock(); buf_LRU_block_free_non_file_page(new_block); mysql_mutex_unlock(&buf_pool.mutex); return nullptr; }   mysql_mutex_unlock(&buf_pool.mutex); hash_lock.unlock(); std::this_thread::sleep_for( std::chrono::microseconds(100)); goto wait_for_unfix; } The block pointers are unavailable in the output, but I found Thread 53 is waiting on the same page identifier: mariadb-10.6.8 #6 sux_lock<ssux_lock_impl<true> >::s_lock (this=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/include/sux_lock.h:362 No locals. #7 buf_page_get_low (page_id={m_id = 4131762166058}, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=10, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/buf/buf0buf.cc:2698 Thread 53 is blocked waiting for a shared latch on the page, because Thread 72 is holding the exclusive page latch. It seems possible to me that Thread 72 is waiting for Thread 53 to release its buffer-fix on the block. It would also be helpful to know the value of block->page.zip.fix . I am guessing here, but it seems that print ((buf_page_t*)0)->lock would return 0x18. If that is the case, we’d get the entire block descriptor shown by the following: print *(buf_block_t*)0x7eff4d04bcf8 This code is related to ROW_FORMAT=COMPRESSED pages. Can you confirm that the table uses ROW_FORMAT=COMPRESSED ?

            At least for MDEV-27983 I can confirm that we are using ROW_FORMAT=COMPRESSED

            sstamm Sebastian Stamm added a comment - At least for MDEV-27983 I can confirm that we are using ROW_FORMAT=COMPRESSED

            Thanks Marko.
            For what is worth:

            (gdb) print *(buf_block_t*)0x7eff4d04bcf8
            $2 = {page = {id_ = {m_id = 4131762166058}, hash = 0x0, oldest_modification_ = {m = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 0}, <No data fields>}}, static NOT_USED = 0, static MEMORY = 1, static REMOVE_HASH = 2, static FREED = 3, static UNFIXED = 536870912, static IBUF_EXIST = 1073741824, static REINIT = 1610612736, static READ_FIX = 2147483648, static WRITE_FIX = 2684354560, static WRITE_FIX_IBUF = 3221225472, static WRITE_FIX_REINIT = 3758096384, static LRU_MASK = 3758096384, lock = {lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483650}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483648}, <No data fields>}, static WRITER = 2147483648}, recursive = 1, writer = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 139638323705600}, <No data fields>}, static FOR_IO = 18446744073709551615, static RECURSIVE_X = 1, static RECURSIVE_U = 65536, static RECURSIVE_MAX = <optimized out>}, frame = 0x0, zip = {data = 0x7f2bc3594000 "4\213p/", m_end = 0, m_nonempty = 0, n_blobs = 0, ssize = 4, fix = {m = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 536870914}, <No data fields>}}}, list = {prev = 0x0, next = 0x0}, LRU = {prev = 0x7efa50be5fc8, next = 0x7f5f4c0c7880}, old = 1, freed_page_clock = 0, access_time = {m_counter = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}}}, unzip_LRU = {prev = 0x95, next = 0x0}, modify_clock = 0, n_bytes = 136, n_fields = 0, n_hash_helps = 0, left_side = false, curr_n_fields = 204, curr_n_bytes = 3842, curr_left_side = 0, index = 0x7efda684f868}
            

            I cannot confirm yet if we use ROW_FORMAT=COMPRESSED yet, as I would need to "unstall" the host, and hence we'd lose the ability to get traces from it. I am pretty sure we use compressed on that host, but not 100% sure until I restart it.

            marostegui Manuel Arostegui added a comment - Thanks Marko. For what is worth: (gdb) print *(buf_block_t*)0x7eff4d04bcf8 $2 = {page = {id_ = {m_id = 4131762166058}, hash = 0x0, oldest_modification_ = {m = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 0}, <No data fields>}}, static NOT_USED = 0, static MEMORY = 1, static REMOVE_HASH = 2, static FREED = 3, static UNFIXED = 536870912, static IBUF_EXIST = 1073741824, static REINIT = 1610612736, static READ_FIX = 2147483648, static WRITE_FIX = 2684354560, static WRITE_FIX_IBUF = 3221225472, static WRITE_FIX_REINIT = 3758096384, static LRU_MASK = 3758096384, lock = {lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483650}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483648}, <No data fields>}, static WRITER = 2147483648}, recursive = 1, writer = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 139638323705600}, <No data fields>}, static FOR_IO = 18446744073709551615, static RECURSIVE_X = 1, static RECURSIVE_U = 65536, static RECURSIVE_MAX = <optimized out>}, frame = 0x0, zip = {data = 0x7f2bc3594000 "4\213p/", m_end = 0, m_nonempty = 0, n_blobs = 0, ssize = 4, fix = {m = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 536870914}, <No data fields>}}}, list = {prev = 0x0, next = 0x0}, LRU = {prev = 0x7efa50be5fc8, next = 0x7f5f4c0c7880}, old = 1, freed_page_clock = 0, access_time = {m_counter = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}}}, unzip_LRU = {prev = 0x95, next = 0x0}, modify_clock = 0, n_bytes = 136, n_fields = 0, n_hash_helps = 0, left_side = false, curr_n_fields = 204, curr_n_bytes = 3842, curr_left_side = 0, index = 0x7efda684f868} I cannot confirm yet if we use ROW_FORMAT=COMPRESSED yet, as I would need to "unstall" the host, and hence we'd lose the ability to get traces from it. I am pretty sure we use compressed on that host, but not 100% sure until I restart it.

            Looks like the host responded after a few minutes and we do use Compressed for the majority of tables on that DB.

            root@db1132.eqiad.wmnet[information_schema]> select table_name,row_format from tables where table_schema='enwiki';
            +------------------------------+------------+
            | table_name                   | row_format |
            +------------------------------+------------+
            | ores_model                   | Compressed |
            | ores_classification          | Compressed |
            | babel                        | Compressed |
            | bv2017_edits                 | Compressed |
            | securepoll_strike            | Compressed |
            | page_assessments_projects    | Compressed |
            | oldimage                     | Compressed |
            | mathoid                      | Compressed |
            | page                         | Compressed |
            | slots                        | Compressed |
            | linter                       | Compressed |
            | archive                      | Compressed |
            | change_tag_def               | Compressed |
            | recentchanges                | Compressed |
            | templatelinks                | Compressed |
            | content_models               | Compressed |
            | job                          | Compressed |
            | ip_changes                   | Compressed |
            | logging                      | Compressed |
            | discussiontools_subscription | Dynamic    |
            | revision                     | Compressed |
            | actor                        | Compressed |
            | comment                      | Compressed |
            | global_block_whitelist       | Compressed |
            | externallinks                | Compressed |
            | linktarget                   | Dynamic    |
            | mcdc2021_edits               | Dynamic    |
            | page_assessments             | Compressed |
            | image                        | Compressed |
            | ipblocks                     | Compressed |
            | watchlist                    | Compressed |
            | watchlist_expiry             | Compressed |
            | sites                        | Compressed |
            | ucoc_edits                   | Dynamic    |
            | content                      | Compressed |
            | wikilove_log                 | Compressed |
            | bv2022_edits                 | Dynamic    |
            | securepoll_votes             | Compressed |
            | site_identifiers             | Compressed |
            | searchindex                  | Compressed |
            | flaggedrevs                  | Compressed |
            | user_properties              | Compressed |
            | geo_tags                     | Compressed |
            | __wmf_checksums              | Compressed |
            | abuse_filter_log             | Compressed |
            | category                     | Compressed |
            | pagelinks                    | Compressed |
            | optin_survey_old             | Compressed |
            | abuse_filter                 | Compressed |
            | abuse_filter_action          | Compressed |
            | abuse_filter_history         | Compressed |
            | arbcom1_vote                 | Compressed |
            | filearchive                  | Compressed |
            | bv2009_edits                 | Compressed |
            | bv2011_edits                 | Compressed |
            | categorylinks                | Compressed |
            | imagelinks                   | Compressed |
            | site_stats                   | Compressed |
            | cu_log                       | Compressed |
            | cur                          | Compressed |
            | cu_changes                   | Compressed |
            | page_restrictions            | Compressed |
            | langlinks                    | Compressed |
            | uploadstash                  | Compressed |
            | exarchive                    | Compressed |
            | exrevision                   | Compressed |
            | user_newtalk                 | Compressed |
            | flaggedpage_config           | Compressed |
            | flaggedpage_pending          | Compressed |
            | flaggedpages                 | Compressed |
            | flaggedrevs_promote          | Compressed |
            | page_props                   | Compressed |
            | flaggedrevs_statistics       | Compressed |
            | flaggedrevs_tracking         | Compressed |
            | flaggedtemplates             | Compressed |
            | change_tag                   | Compressed |
            | hidden                       | Compressed |
            | interwiki                    | Compressed |
            | l10n_cache                   | Compressed |
            | securepoll_cookie_match      | Compressed |
            | bv2021_edits                 | Dynamic    |
            | log_search                   | Compressed |
            | revision_comment_temp        | Compressed |
            | module_deps                  | Compressed |
            | moodbar_feedback             | Compressed |
            | moodbar_feedback_response    | Compressed |
            | user                         | Compressed |
            | querycache_info              | Compressed |
            | pagetriage_page              | Compressed |
            | pagetriage_page_tags         | Compressed |
            | pagetriage_tags              | Compressed |
            | pif_edits                    | Compressed |
            | objectcache                  | Compressed |
            | slot_roles                   | Compressed |
            | protected_titles             | Compressed |
            | querycache                   | Compressed |
            | bv2015_edits                 | Compressed |
            | querycachetwo                | Compressed |
            | redirect                     | Compressed |
            | securepoll_entity            | Compressed |
            | securepoll_lists             | Compressed |
            | securepoll_msgs              | Compressed |
            | securepoll_properties        | Compressed |
            | securepoll_questions         | Compressed |
            | securepoll_voters            | Compressed |
            | spoofuser                    | Compressed |
            | text                         | Compressed |
            | transcache                   | Compressed |
            | updatelog                    | Compressed |
            | user_former_groups           | Compressed |
            | user_groups                  | Compressed |
            | vote_log                     | Compressed |
            | iwlinks                      | Compressed |
            | bv2013_edits                 | Compressed |
            | securepoll_options           | Compressed |
            | wbc_entity_usage             | Compressed |
            | betafeatures_user_counts     | Compressed |
            | pagetriage_log               | Compressed |
            | transcode                    | Compressed |
            | securepoll_elections         | Compressed |
            | ipblocks_restrictions        | Compressed |
            +------------------------------+------------+
            

            marostegui Manuel Arostegui added a comment - Looks like the host responded after a few minutes and we do use Compressed for the majority of tables on that DB. root@db1132.eqiad.wmnet[information_schema]> select table_name,row_format from tables where table_schema='enwiki'; +------------------------------+------------+ | table_name | row_format | +------------------------------+------------+ | ores_model | Compressed | | ores_classification | Compressed | | babel | Compressed | | bv2017_edits | Compressed | | securepoll_strike | Compressed | | page_assessments_projects | Compressed | | oldimage | Compressed | | mathoid | Compressed | | page | Compressed | | slots | Compressed | | linter | Compressed | | archive | Compressed | | change_tag_def | Compressed | | recentchanges | Compressed | | templatelinks | Compressed | | content_models | Compressed | | job | Compressed | | ip_changes | Compressed | | logging | Compressed | | discussiontools_subscription | Dynamic | | revision | Compressed | | actor | Compressed | | comment | Compressed | | global_block_whitelist | Compressed | | externallinks | Compressed | | linktarget | Dynamic | | mcdc2021_edits | Dynamic | | page_assessments | Compressed | | image | Compressed | | ipblocks | Compressed | | watchlist | Compressed | | watchlist_expiry | Compressed | | sites | Compressed | | ucoc_edits | Dynamic | | content | Compressed | | wikilove_log | Compressed | | bv2022_edits | Dynamic | | securepoll_votes | Compressed | | site_identifiers | Compressed | | searchindex | Compressed | | flaggedrevs | Compressed | | user_properties | Compressed | | geo_tags | Compressed | | __wmf_checksums | Compressed | | abuse_filter_log | Compressed | | category | Compressed | | pagelinks | Compressed | | optin_survey_old | Compressed | | abuse_filter | Compressed | | abuse_filter_action | Compressed | | abuse_filter_history | Compressed | | arbcom1_vote | Compressed | | filearchive | Compressed | | bv2009_edits | Compressed | | bv2011_edits | Compressed | | categorylinks | Compressed | | imagelinks | Compressed | | site_stats | Compressed | | cu_log | Compressed | | cur | Compressed | | cu_changes | Compressed | | page_restrictions | Compressed | | langlinks | Compressed | | uploadstash | Compressed | | exarchive | Compressed | | exrevision | Compressed | | user_newtalk | Compressed | | flaggedpage_config | Compressed | | flaggedpage_pending | Compressed | | flaggedpages | Compressed | | flaggedrevs_promote | Compressed | | page_props | Compressed | | flaggedrevs_statistics | Compressed | | flaggedrevs_tracking | Compressed | | flaggedtemplates | Compressed | | change_tag | Compressed | | hidden | Compressed | | interwiki | Compressed | | l10n_cache | Compressed | | securepoll_cookie_match | Compressed | | bv2021_edits | Dynamic | | log_search | Compressed | | revision_comment_temp | Compressed | | module_deps | Compressed | | moodbar_feedback | Compressed | | moodbar_feedback_response | Compressed | | user | Compressed | | querycache_info | Compressed | | pagetriage_page | Compressed | | pagetriage_page_tags | Compressed | | pagetriage_tags | Compressed | | pif_edits | Compressed | | objectcache | Compressed | | slot_roles | Compressed | | protected_titles | Compressed | | querycache | Compressed | | bv2015_edits | Compressed | | querycachetwo | Compressed | | redirect | Compressed | | securepoll_entity | Compressed | | securepoll_lists | Compressed | | securepoll_msgs | Compressed | | securepoll_properties | Compressed | | securepoll_questions | Compressed | | securepoll_voters | Compressed | | spoofuser | Compressed | | text | Compressed | | transcache | Compressed | | updatelog | Compressed | | user_former_groups | Compressed | | user_groups | Compressed | | vote_log | Compressed | | iwlinks | Compressed | | bv2013_edits | Compressed | | securepoll_options | Compressed | | wbc_entity_usage | Compressed | | betafeatures_user_counts | Compressed | | pagetriage_log | Compressed | | transcode | Compressed | | securepoll_elections | Compressed | | ipblocks_restrictions | Compressed | +------------------------------+------------+

            The block->page.zip.fix contains the value 536870914, or 0x20000002 or buf_page_t::UNFIXED + 2. One of the references would be held by Thread 53 and the other by Thread 72, which are blocking each other.

            The reason of the hang is clear now. There is no ongoing I/O. What apparently happened was that two threads were requesting the same ROW_FORMAT=COMPRESSED block at the same time, all while the block only exists in compressed form in the buffer pool (buf_page_t::frame contains nullptr).

            This hang might have been introduced by MDEV-27058.

            marko Marko Mäkelä added a comment - The block->page.zip.fix contains the value 536870914 , or 0x20000002 or buf_page_t::UNFIXED + 2 . One of the references would be held by Thread 53 and the other by Thread 72, which are blocking each other. The reason of the hang is clear now. There is no ongoing I/O. What apparently happened was that two threads were requesting the same ROW_FORMAT=COMPRESSED block at the same time, all while the block only exists in compressed form in the buffer pool ( buf_page_t::frame contains nullptr ). This hang might have been introduced by MDEV-27058 .

            Thread 53 is waiting here:

            	if (state > buf_page_t::READ_FIX && state < buf_page_t::WRITE_FIX) {
            		if (mode == BUF_PEEK_IF_IN_POOL) {
            ignore_block:
            			ut_ad(mode == BUF_GET_POSSIBLY_FREED
            			      || mode == BUF_PEEK_IF_IN_POOL);
            			block->unfix();
            			if (err) {
            				*err = DB_CORRUPTION;
            			}
            			return nullptr;
            		}
             
            		/* A read-fix is released after block->page.lock
            		in buf_page_t::read_complete() or
            		buf_pool_t::corrupted_evict(), or
            		after buf_zip_decompress() in this function. */
            		block->page.lock.s_lock();
            

            When it started waiting, the block was read-fixed, but that is no longer the case, because we observed state == buf_page_t::UNFIXED + 2. After the read was completed, Thread 72 grabbed an exclusive page latch.

            This is a little tricky, because the buffer-fix is protecting the block from relocation. After the ROW_FORMAT=COMPRESSED block was read into the buffer pool, its block descriptor must be relocated, because the block descriptors for uncompressed page frames will be preallocated. A possible fix might be that Thread 53 would invoke s_lock_try(). If that fails, it should release the buffer-fix and look up the block from buf_pool.page_hash again. In this way, Thread 72 should be able to proceed, and Thread 53 would not attempt to access a freed block descriptor.

            marko Marko Mäkelä added a comment - Thread 53 is waiting here: if (state > buf_page_t::READ_FIX && state < buf_page_t::WRITE_FIX) { if (mode == BUF_PEEK_IF_IN_POOL) { ignore_block: ut_ad(mode == BUF_GET_POSSIBLY_FREED || mode == BUF_PEEK_IF_IN_POOL); block->unfix(); if (err) { *err = DB_CORRUPTION; } return nullptr; }   /* A read-fix is released after block->page.lock in buf_page_t::read_complete() or buf_pool_t::corrupted_evict(), or after buf_zip_decompress() in this function. */ block->page.lock.s_lock(); When it started waiting, the block was read-fixed, but that is no longer the case, because we observed state == buf_page_t::UNFIXED + 2 . After the read was completed, Thread 72 grabbed an exclusive page latch. This is a little tricky, because the buffer-fix is protecting the block from relocation. After the ROW_FORMAT=COMPRESSED block was read into the buffer pool, its block descriptor must be relocated, because the block descriptors for uncompressed page frames will be preallocated. A possible fix might be that Thread 53 would invoke s_lock_try() . If that fails, it should release the buffer-fix and look up the block from buf_pool.page_hash again. In this way, Thread 72 should be able to proceed, and Thread 53 would not attempt to access a freed block descriptor.
            marko Marko Mäkelä added a comment - This hang was introduced in MDEV-27058 .

            This is an exact duplicate of the case that was reported in MDEV-27983.

            marko Marko Mäkelä added a comment - This is an exact duplicate of the case that was reported in MDEV-27983 .

            People

              marko Marko Mäkelä
              marostegui Manuel Arostegui
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.