Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36228

SIGABRT in page_rec_check from page_rec_is_supremum

Details

    Description

      CS 11.8.1 6f1161aa34cbb178b00fc24cbc46e2e0e2af767a (Debug) Build 05/03/2025

      Core was generated by `/test/MD050325-mariadb-11.8.1-linux-x86_64-dbg/bin/mariadbd --no-defaults --max'.
      Program terminated with signal SIGABRT, Aborted.
      #0  0x0000556cd0ecc351 in page_rec_check (rec=0x53853879e71f "")at include/page0page.inl:315
       
      [Current thread is 1 (Thread 0x7c1d22df86c0 (LWP 1498795))]
      (gdb) bt
      #0  0x0000556cd0ecc351 in page_rec_check (rec=0x53853879e71f "")at include/page0page.inl:315
      #1  0x0000556cd0ede1a5 in page_rec_is_supremum (rec=0x53853879e71f "")at include/page0page.inl:167
      #2  0x0000556cd100eb95 in page_simple_validate_new (page=0x53853879c000 "")at /test/11.8_dbg/storage/innobase/page/page0page.cc:1875
      #3  0x0000556cd0ff6ba7 in page_cur_delete_rec (cursor=0x478154351508, offsets=0x7c1d22df6a00, mtr=0x7c1d22df6e48)at /test/11.8_dbg/storage/innobase/page/page0cur.cc:2566
      #4  0x0000556cd11ca517 in btr_cur_optimistic_delete (cursor=0x478154351508, flags=0, mtr=0x7c1d22df6e48)at /test/11.8_dbg/storage/innobase/btr/btr0cur.cc:4444
      #5  0x0000556cd1324931 in row_undo_ins_remove_clust_rec (node=0x478154351498)at /test/11.8_dbg/storage/innobase/row/row0uins.cc:195
      #6  0x0000556cd132284d in row_undo_ins (node=0x478154351498, thr=0x4781540aa518) at /test/11.8_dbg/storage/innobase/row/row0uins.cc:597
      #7  0x0000556cd11072c0 in row_undo (node=0x478154351498, thr=0x4781540aa518)at /test/11.8_dbg/storage/innobase/row/row0undo.cc:401
      #8  0x0000556cd1106f9c in row_undo_step (thr=0x4781540aa518)at /test/11.8_dbg/storage/innobase/row/row0undo.cc:442
      #9  0x0000556cd1031fed in que_thr_step (thr=0x4781540aa518)at /test/11.8_dbg/storage/innobase/que/que0que.cc:551
      #10 0x0000556cd10315f3 in que_run_threads_low (thr=0x4781540aa518)at /test/11.8_dbg/storage/innobase/que/que0que.cc:609
      #11 0x0000556cd10313a4 in que_run_threads (thr=0x4781540aa518)at /test/11.8_dbg/storage/innobase/que/que0que.cc:629
      #12 0x0000556cd115cf5f in trx_t::rollback_low (this=0x542a5b0a0680, savept=0x0)at /test/11.8_dbg/storage/innobase/trx/trx0roll.cc:121
      #13 0x0000556cd115ddd1 in trx_rollback_for_mysql (trx=0x542a5b0a0680)at /test/11.8_dbg/storage/innobase/trx/trx0roll.cc:218
      #14 0x0000556cd0ec6d17 in innobase_rollback (thd=0x478154000d58, rollback_trx=true)at /test/11.8_dbg/storage/innobase/handler/ha_innodb.cc:4765
      #15 0x0000556cd0ac46f2 in ha_rollback_trans (thd=0x478154000d58, all=true)at /test/11.8_dbg/sql/handler.cc:2344
      #16 0x0000556cd09dd88e in xa_trans_force_rollback (thd=0x478154000d58)at /test/11.8_dbg/sql/xa.cc:412
      #17 0x0000556cd09df591 in trans_xa_detach (thd=0x478154000d58)at /test/11.8_dbg/sql/xa.cc:898
      #18 0x0000556cd061309a in THD::cleanup (this=0x478154000d58)at /test/11.8_dbg/sql/sql_class.cc:1673
      #19 0x0000556cd04f937a in unlink_thd (thd=0x478154000d58)at /test/11.8_dbg/sql/mysqld.cc:2865
      #20 0x0000556cd088ba65 in do_handle_one_connection (connect=0x500687ce948, put_in_cache=true) at /test/11.8_dbg/sql/sql_connect.cc:1426
      #21 0x0000556cd088b79e in handle_one_connection (arg=0x500687b6818)at /test/11.8_dbg/sql/sql_connect.cc:1327
      #22 0x00002262041fbaa4 in start_thread (arg=<optimized out>)at ./nptl/pthread_create.c:447
      #23 0x0000226204288a34 in clone ()at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
      

      rr trace:

      rr:/data/MDEV-36228/rr$ rr replay ./latest-trace
      

      Attachments

        Activity

          Roel, I was spending quite a bit of time on this, assuming that the SIGABRT is genuine. But I don’t see anything actually wrong in the execution, and it seems to me that an external process invoked killall -ABRT mariadbd or something similar. In MDEV-36231 this was more obvious.

          Can you please double check the way how the test harness works? I would suggest to enable more optimization and also set cmake -DPLUGIN_PERFSCHEMA=NO -DWITH_DBUG_TRACE=OFF to avoid potential extreme slowdown due to an excessive amount of conditional branches. The disassembly of page_rec_check() suggests to me that not much optimization was enabled.

          The current instruction at the time of the SIGABRT is only dereferencing the frame pointer register (rbp), and that memory address is valid. If it weren’t, I would expect a SIGSEGV rather than SIGABRT to be triggered.

          As you can see from the stack trace, the current thread is rolling back a transaction, apparently after a client disconnect. That transaction had written 131,102 undo log records. By the time the process is forcibly killed by SIGABRT, it has rolled back most of them, but still 43,797 are awaiting rollback.

          marko Marko Mäkelä added a comment - Roel , I was spending quite a bit of time on this, assuming that the SIGABRT is genuine. But I don’t see anything actually wrong in the execution, and it seems to me that an external process invoked killall -ABRT mariadbd or something similar. In MDEV-36231 this was more obvious. Can you please double check the way how the test harness works? I would suggest to enable more optimization and also set cmake -DPLUGIN_PERFSCHEMA=NO -DWITH_DBUG_TRACE=OFF to avoid potential extreme slowdown due to an excessive amount of conditional branches. The disassembly of page_rec_check() suggests to me that not much optimization was enabled. The current instruction at the time of the SIGABRT is only dereferencing the frame pointer register ( rbp ), and that memory address is valid. If it weren’t, I would expect a SIGSEGV rather than SIGABRT to be triggered. As you can see from the stack trace, the current thread is rolling back a transaction, apparently after a client disconnect. That transaction had written 131,102 undo log records. By the time the process is forcibly killed by SIGABRT , it has rolled back most of them, but still 43,797 are awaiting rollback.

          Thank you for the analysis marko!

          I checked the test code and logs. See here for the full findings.

          Summary for this ticket;
          1. There was a genuine shutdown issue - i.e. mariadb-admin shutdown hang for 90 seconds.
          2. At that point a SIGABRT was send to ensure the rr trace would be valid
          3. The final stack itself may thus be a simple point-in-time capture and may or not be related to the shutdown hang.

          Given that you mentioned a rollback was running, perhaps this is what caused mariadb-admin shutdown to hang for 90 seconds? It seems quite long, though the server was very loaded during this time.

          As for the optimizations, -DPLUGIN_PERFSCHEMA=NO and -DWITH_DBUG_TRACE=OFF were indeed used, as well as turning off various other plugin etc. Otherwise, this was a debug build (we test both optimized/release builds and debug builds). Here is the full build command for reference:

          cmake . -DCMAKE_C_COMPILER=/usr/bin/clang -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DWITH_SSL=bundled -DBUILD_CONFIG=mysql_release -DWITH_UNIT_TESTS=0 -DWITH_TOKUDB=0 -DWITH_JEMALLOC=no -DFEATURE_SET=community -DDEBUG_EXTNAME=OFF -DWITH_EMBEDDED_SERVER=0 -DENABLE_DOWNLOADS=1 -DDOWNLOAD_BOOST=1 -DWITH_BOOST=/tmp/boost_291196 -DENABLED_LOCAL_INFILE=1 -DENABLE_DTRACE=0 -DWITH_{SAFEMALLOC,NUMA}=OFF -DWITH_UNIT_TESTS=OFF -DCONC_WITH_{UNITTEST,SSL}=OFF -DPLUGIN_PERFSCHEMA=NO -DWITH_DBUG_TRACE=OFF -DWITH_ZLIB=bundled -DWITH_ROCKSDB=1 -DWITH_PAM=ON -DWITH_MARIABACKUP=0 -DFORCE_INSOURCE_BUILD=1 -DWITHOUT_GROUP_REPLICATION=1 -DWITH_INNODB_EXTRA_DEBUG=ON -DMYSQL_MAINTAINER_MODE=OFF -DWARNING_AS_ERROR='' -DCMAKE_BUILD_TYPE=Debug
          

          Roel Roel Van de Paar added a comment - Thank you for the analysis marko ! I checked the test code and logs. See here for the full findings. Summary for this ticket; 1. There was a genuine shutdown issue - i.e. mariadb-admin shutdown hang for 90 seconds. 2. At that point a SIGABRT was send to ensure the rr trace would be valid 3. The final stack itself may thus be a simple point-in-time capture and may or not be related to the shutdown hang. Given that you mentioned a rollback was running, perhaps this is what caused mariadb-admin shutdown to hang for 90 seconds? It seems quite long, though the server was very loaded during this time. As for the optimizations, -DPLUGIN_PERFSCHEMA=NO and -DWITH_DBUG_TRACE=OFF were indeed used, as well as turning off various other plugin etc. Otherwise, this was a debug build (we test both optimized/release builds and debug builds). Here is the full build command for reference: cmake . -DCMAKE_C_COMPILER=/usr/bin/clang -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DWITH_SSL=bundled -DBUILD_CONFIG=mysql_release -DWITH_UNIT_TESTS=0 -DWITH_TOKUDB=0 -DWITH_JEMALLOC=no -DFEATURE_SET=community -DDEBUG_EXTNAME=OFF -DWITH_EMBEDDED_SERVER=0 -DENABLE_DOWNLOADS=1 -DDOWNLOAD_BOOST=1 -DWITH_BOOST=/tmp/boost_291196 -DENABLED_LOCAL_INFILE=1 -DENABLE_DTRACE=0 -DWITH_{SAFEMALLOC,NUMA}=OFF -DWITH_UNIT_TESTS=OFF -DCONC_WITH_{UNITTEST,SSL}=OFF -DPLUGIN_PERFSCHEMA=NO -DWITH_DBUG_TRACE=OFF -DWITH_ZLIB=bundled -DWITH_ROCKSDB=1 -DWITH_PAM=ON -DWITH_MARIABACKUP=0 -DFORCE_INSOURCE_BUILD=1 -DWITHOUT_GROUP_REPLICATION=1 -DWITH_INNODB_EXTRA_DEBUG=ON -DMYSQL_MAINTAINER_MODE=OFF -DWARNING_AS_ERROR='' -DCMAKE_BUILD_TYPE=Debug

          I think that you must allow more time than 90 seconds for any shutdown to complete, or you need to run with innodb_fast_shutdown=3 so that ongoing transactions will not be rolled back before shutdown.

          marko Marko Mäkelä added a comment - I think that you must allow more time than 90 seconds for any shutdown to complete, or you need to run with innodb_fast_shutdown=3 so that ongoing transactions will not be rolled back before shutdown.

          People

            Roel Roel Van de Paar
            Roel Roel Van de Paar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.