Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24096

Server crash, InnoDB fatal error, Assertion `first_free <= srv_page_size - 8' failed in trx_undo_page_report_modify

Details

    Description

      See further information in the comments and attached test case

      10.5 504d4c1ff6e0cecde9adf

      mysqld: /10.5/storage/innobase/trx/trx0rec.cc:807: uint16_t trx_undo_page_report_modify(buf_block_t*, trx_t*, dict_index_t*, const rec_t*, const rec_offs*, const upd_t*, ulint, const dtuple_t*, mtr_t*): Assertion `first_free <= srv_page_size - 8' failed.
      201102 14:59:50 [ERROR] mysqld got signal 6 ;
       
      Server version: 10.5.7-MariaDB-debug-log
       
      linux/raise.c:51(__GI_raise)[0x7fbf4e2fb7bb]
      stdlib/abort.c:81(__GI_abort)[0x7fbf4e2e6535]
      intl/loadmsgcat.c:1177(_nl_load_domain)[0x7fbf4e2e640f]
      ??:0(__assert_fail)[0x7fbf4e2f4102]
      trx/trx0rec.cc:809(trx_undo_page_report_modify(buf_block_t*, trx_t*, dict_index_t*, unsigned char const*, unsigned short const*, upd_t const*, unsigned long, dtuple_t const*, mtr_t*))[0x562753e9dff8]
      trx/trx0rec.cc:2018(trx_undo_report_row_operation(que_thr_t*, dict_index_t*, dtuple_t const*, upd_t const*, unsigned long, unsigned char const*, unsigned short const*, unsigned long*))[0x562753ea6c5f]
      btr/btr0cur.cc:5362(btr_cur_del_mark_set_clust_rec(buf_block_t*, unsigned char*, dict_index_t*, unsigned short const*, que_thr_t*, dtuple_t const*, mtr_t*))[0x562753f722c2]
      row/row0upd.cc:2683(row_upd_del_mark_clust_rec(upd_node_t*, dict_index_t*, unsigned short*, que_thr_t*, unsigned long, bool, mtr_t*))[0x562753def3a7]
      row/row0upd.cc:2860(row_upd_clust_step(upd_node_t*, que_thr_t*))[0x562753df05ce]
      row/row0upd.cc:2992(row_upd(upd_node_t*, que_thr_t*))[0x562753df12e2]
      row/row0upd.cc:3136(row_upd_step(que_thr_t*))[0x562753df2246]
      row/row0mysql.cc:1849(row_update_for_mysql(row_prebuilt_t*))[0x562753d3b164]
      handler/ha_innodb.cc:8491(ha_innobase::delete_row(unsigned char const*))[0x5627539aebeb]
      sql/handler.cc:7261(handler::ha_delete_row(unsigned char const*))[0x562752eece40]
      sql/sql_delete.cc:280(TABLE::delete_row())[0x562753357580]
      sql/sql_delete.cc:797(mysql_delete(THD*, TABLE_LIST*, Item*, SQL_I_List<st_order>*, unsigned long long, unsigned long long, select_result*))[0x56275334e856]
      sql/sql_parse.cc:4829(mysql_execute_command(THD*))[0x5627526e3199]
      sql/sql_parse.cc:8044(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x5627526f8b66]
      sql/sql_parse.cc:1875(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x5627526cf7c7]
      sql/sql_parse.cc:1353(do_command(THD*))[0x5627526cc0d5]
      sql/sql_connect.cc:1410(do_handle_one_connection(CONNECT*, bool))[0x562752af883c]
      sql/sql_connect.cc:1314(handle_one_connection)[0x562752af8199]
      perfschema/pfs.cc:2203(pfs_spawn_thread)[0x5627537b818c]
      nptl/pthread_create.c:487(start_thread)[0x7fbf4edb6fa3]
      x86_64/clone.S:97(clone)[0x7fbf4e3bd4cf]
      

      Attachments

        Issue Links

          Activity

            MDEV-23672 broke trx_undo_log_v_idx(), which is writing information about indexed virtual columns (MDEV-5800) to an InnoDB undo log page. It should be noted that also unique BLOBs (MDEV-371) internally use indexed virtual columns.

            The checks around trx_undo_left() are insufficient, and that function is inherently unsafe because it returns an unsigned value. If the size limit was ever exceeded, we would exceed it again and again.

            Exceeding the limit will not only corrupt the undo log page (which was caught by the assertion later when reading that undo log page), but it can also corrupt any adjacent pages in the InnoDB buffer pool (up to 65536 bytes, I think). When using the default innodb_page_size=16k, that would amount to up to 3 adjacent ‘random’ pages in the buffer pool. With innodb_page_size=4k it would be up to 15 unrelated pages. The buffer pool is also being used for allocating record lock bitmaps and the adaptive hash index. Thus, the stray writes could cause arbitrary ‘phase of moon’ crashes.

            marko Marko Mäkelä added a comment - MDEV-23672 broke trx_undo_log_v_idx() , which is writing information about indexed virtual columns ( MDEV-5800 ) to an InnoDB undo log page. It should be noted that also unique BLOBs ( MDEV-371 ) internally use indexed virtual columns. The checks around trx_undo_left() are insufficient, and that function is inherently unsafe because it returns an unsigned value. If the size limit was ever exceeded, we would exceed it again and again. Exceeding the limit will not only corrupt the undo log page (which was caught by the assertion later when reading that undo log page), but it can also corrupt any adjacent pages in the InnoDB buffer pool (up to 65536 bytes, I think). When using the default innodb_page_size=16k , that would amount to up to 3 adjacent ‘random’ pages in the buffer pool. With innodb_page_size=4k it would be up to 15 unrelated pages. The buffer pool is also being used for allocating record lock bitmaps and the adaptive hash index. Thus, the stray writes could cause arbitrary ‘phase of moon’ crashes.

            Patch looks OK to me

            thiru Thirunarayanan Balathandayuthapani added a comment - Patch looks OK to me

            I've run the tests which I had for this issue – the first RQG test used to produce the rr profile, and the second RQG test which was the source of the provided MTR test, and the MTR test itself, each repeatedly.
            The MTR test didn't fail either on the patched 10.4 nor on the patched 10.5, while without the patch it was failing on both (more or less deterministically on my machine).
            The second RQG test didn't fail either on the patched 10.4 nor on the patched 10.5. Without the patch, it fails non-deterministically on 10.5 but doesn't fail on 10.4.
            The first RQG test fails on the patched 10.4 and 10.5 non-deterministically with symptoms similar to MDEV-22564 and other versioning-related issues on debug builds or with diagnostics area assertion failures, and with MDEV-21987 on non-debug builds. It did it before the patch as well, intermittently with the problem at hand, which was one of the reasons why it took so long to narrow down the test case.

            Thus, results of the MTR test and the second RQG test are somewhat reassuring, while the result of the first RQG test is inconclusive. I think it's acceptable for an emergency patch, but more testing is definitely needed.

            elenst Elena Stepanova added a comment - I've run the tests which I had for this issue – the first RQG test used to produce the rr profile, and the second RQG test which was the source of the provided MTR test, and the MTR test itself, each repeatedly. The MTR test didn't fail either on the patched 10.4 nor on the patched 10.5, while without the patch it was failing on both (more or less deterministically on my machine). The second RQG test didn't fail either on the patched 10.4 nor on the patched 10.5. Without the patch, it fails non-deterministically on 10.5 but doesn't fail on 10.4. The first RQG test fails on the patched 10.4 and 10.5 non-deterministically with symptoms similar to MDEV-22564 and other versioning-related issues on debug builds or with diagnostics area assertion failures, and with MDEV-21987 on non-debug builds. It did it before the patch as well, intermittently with the problem at hand, which was one of the reasons why it took so long to narrow down the test case. Thus, results of the MTR test and the second RQG test are somewhat reassuring, while the result of the first RQG test is inconclusive. I think it's acceptable for an emergency patch, but more testing is definitely needed.

            Thank you. I filed MDEV-24156 for follow-up work.

            marko Marko Mäkelä added a comment - Thank you. I filed MDEV-24156 for follow-up work.
            danblack Daniel Black added a comment -

            For all those watching patiently, there have been packages released https://mariadb.org/mariadb-10-5-8-10-4-17-10-3-27-and-10-2-36-now-available/

            danblack Daniel Black added a comment - For all those watching patiently, there have been packages released https://mariadb.org/mariadb-10-5-8-10-4-17-10-3-27-and-10-2-36-now-available/

            People

              marko Marko Mäkelä
              alice Alice Sherepa
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.