Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36138

Server null-pointer crash at startup when tmptables left in --tmpdir

Details

    Description

      When the server starts up it looks in --tmpdir for leftover .FRM files from temporary tables (eg. from a prior crash). After the latest release, if any such .frm files are found and opened successfully, the server will crash on nullpointer access and will not be able to start until the .frm files have been manually removed from --tmpdir or the server downgraded to an earlier release without the problem.

      The bug is introduced with this patch from MDEV-35840:

      commit 78157c4765f2c086fabe183d51d7734ecffdbdd8
      Author: Yuchen Pei <ycp@mariadb.com>
      Date:   Tue Jan 14 17:47:08 2025 +1100
       
          MDEV-35840 Eliminate -warray-bounds triggered by TABLE_SHARE::db_type()
       
          The warnings are triggered with -O3
      

      diff --git a/sql/sql_base.cc b/sql/sql_base.cc
      index 5c03ba3d42d..aaa86e7bfa0 100644
      --- a/sql/sql_base.cc
      +++ b/sql/sql_base.cc
      @@ -8953,8 +8953,9 @@ my_bool mysql_rm_tmp_tables(void)
                 memcpy(path_copy, path, path_len - ext_len);
                 path_copy[path_len - ext_len]= 0;
                 init_tmp_table_share(thd, &share, "", 0, "", path_copy);
      +          handlerton *ht= share.db_type();
                 if (!open_table_def(thd, &share))
      -            share.db_type()->drop_table(share.db_type(), path_copy);
      +            ht->drop_table(share.db_type(), path_copy);
                 free_table_share(&share);
               }
               /*
      

      The share.db_type() is set by the call to open_table_def(), so after the patch the ht will always be NULL and the code will crash whenever open_table_def() returns 0.

      This regression is quite severe as it leaves the server in a state where it cannot be started and with no indication to the user what is needed to resolve the problem. So the fix will need to go into 10.5.

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment -

            Hi serg, ptal thanks:

            0fa141ebb46 upstream/bb-10.5-mdev-36138 MDEV-36138 Server null-pointer crash at startup when tmptables left in --tmpdir
            

            ycp Yuchen Pei added a comment - Hi serg , ptal thanks: 0fa141ebb46 upstream/bb-10.5-mdev-36138 MDEV-36138 Server null-pointer crash at startup when tmptables left in --tmpdir

            0fa141ebb46 is ok to push, thanks

            serg Sergei Golubchik added a comment - 0fa141ebb46 is ok to push, thanks

            It would be good to have some indication of the seriousness of this bug - what are the situations that cause it to trigger, how many users are likely to be affected?

            It does trigger in real life (it was found because it happened in our own Buildbot), but it does not seem to occur for all temporary tables (not sure if they unlink their open .frm or maybe don't create .frm in the first place). It makes a big difference if this is something that will affect a large majority of server crashes, or if it only occurs in special situations (specific storage engine or something?).

            The test case in the patch 0fa141ebb46 does not show a real scenario where the bug can trigger, it manually copies in a .frm to the tmpdir.

            knielsen Kristian Nielsen added a comment - It would be good to have some indication of the seriousness of this bug - what are the situations that cause it to trigger, how many users are likely to be affected? It does trigger in real life (it was found because it happened in our own Buildbot), but it does not seem to occur for all temporary tables (not sure if they unlink their open .frm or maybe don't create .frm in the first place). It makes a big difference if this is something that will affect a large majority of server crashes, or if it only occurs in special situations (specific storage engine or something?). The test case in the patch 0fa141ebb46 does not show a real scenario where the bug can trigger, it manually copies in a .frm to the tmpdir.
            ycp Yuchen Pei added a comment -

            knielsen: do you have a link to the buildbot failure?

            ycp Yuchen Pei added a comment - knielsen : do you have a link to the buildbot failure?
            ycp Yuchen Pei added a comment -

            Thanks for the review - pushed 0fa141ebb4639c5c6c4b5d990f448a932fd095a8 to 10.5

            ycp Yuchen Pei added a comment - Thanks for the review - pushed 0fa141ebb4639c5c6c4b5d990f448a932fd095a8 to 10.5

            ycp It's not a buildbot failure, it was our own servers running the Buildbot infrastructure this happened.
            There is a MariaDB slave from the Buildbot master database (used for cross-reference maybe), it crashed due to OOM, and then it could not start again because of this bug.

            There is a zulip discussion about the incidence:
            https://mariadb.zulipchat.com/#narrow/channel/118759-general/topic/MariaDB.2010.2E11.2E11.20not.20recovering.20from.20crash

            knielsen Kristian Nielsen added a comment - ycp It's not a buildbot failure, it was our own servers running the Buildbot infrastructure this happened. There is a MariaDB slave from the Buildbot master database (used for cross-reference maybe), it crashed due to OOM, and then it could not start again because of this bug. There is a zulip discussion about the incidence: https://mariadb.zulipchat.com/#narrow/channel/118759-general/topic/MariaDB.2010.2E11.2E11.20not.20recovering.20from.20crash

            Normally temporary tables does not have a .frm file, only storage engine files in /tmp
            I found at least one case where a .frm file is created.
            When creating a temporary table with federatedx that uses discovery, then a .frm table will be created in /tmp and the problem described in this ticket will happen.

            For example:

            CREATE temporary TABLE foo3 ENGINE=FEDERATED CONNECTION='mysql://monty@storm:3307/test/foo';

            When the table definition is provided, no tmp table is created:

            MariaDB [test2]> create temporary table foo (a int) engine=federated connection='server_one';

            monty Michael Widenius added a comment - Normally temporary tables does not have a .frm file, only storage engine files in /tmp I found at least one case where a .frm file is created. When creating a temporary table with federatedx that uses discovery, then a .frm table will be created in /tmp and the problem described in this ticket will happen. For example: CREATE temporary TABLE foo3 ENGINE=FEDERATED CONNECTION='mysql://monty@storm:3307/test/foo'; When the table definition is provided, no tmp table is created: MariaDB [test2] > create temporary table foo (a int) engine=federated connection='server_one';

            It fits, the setup for the multi-CI cross-reference in the foundation buildbot is rather complicated, and it does involve a federated table.

            elenst Elena Stepanova added a comment - It fits, the setup for the multi-CI cross-reference in the foundation buildbot is rather complicated, and it does involve a federated table.
            ycp Yuchen Pei added a comment - - edited

            I tried monty's scenario in the testcase federated.federatedx (see below) but it does not crash either. The first ls outputs a #sql....frm file as expected, but inside mysql_rm_tmp_tables() during the server restart, the tmpdir already appears to be empty, so it could have been removed in a cleanup in the server shutdown. Or it could be some extra setup of this specific test.

            modified   mysql-test/suite/federated/federatedx.test
            @@ -84,16 +84,15 @@ DROP TABLE IF EXISTS federated.t1;
             
             # # correct connection, same named tables
             --replace_result $SLAVE_MYPORT SLAVE_PORT
            -eval CREATE TABLE federated.t1 (
            -    `id` int(20) NOT NULL,
            -    `group` int NOT NULL default 0,
            -    `a\\\\b` InT NOT NULL default 0,
            -    `a\\\\` int NOT NULL default 0,
            -    `name` varchar(32) NOT NULL default ''
            -    )
            +eval CREATE TEMPORARY TABLE federated.t1
               ENGINE="FEDERATED" DEFAULT CHARSET=latin1
               CONNECTION='mysql://root@127.0.0.1:$SLAVE_MYPORT/federated/t1';
             
            +let $MYSQLD_TMPDIR=`SELECT @@tmpdir`;
            +exec ls $MYSQLD_TMPDIR;
            +--source include/restart_mysqld.inc
            +exec ls $MYSQLD_TMPDIR;
            +
             INSERT INTO federated.t1 (id, name) VALUES (1, 'foo');
             INSERT INTO federated.t1 (id, name) VALUES (2, 'fee');
             INSERT INTO federated.t1 (id, `group`) VALUES (3, 42);
             

            ycp Yuchen Pei added a comment - - edited I tried monty 's scenario in the testcase federated.federatedx (see below) but it does not crash either. The first ls outputs a #sql....frm file as expected, but inside mysql_rm_tmp_tables() during the server restart, the tmpdir already appears to be empty, so it could have been removed in a cleanup in the server shutdown. Or it could be some extra setup of this specific test. modified mysql-test/suite/federated/federatedx.test @@ -84,16 +84,15 @@ DROP TABLE IF EXISTS federated.t1; # # correct connection, same named tables --replace_result $SLAVE_MYPORT SLAVE_PORT -eval CREATE TABLE federated.t1 ( - `id` int(20) NOT NULL, - `group` int NOT NULL default 0, - `a\\\\b` InT NOT NULL default 0, - `a\\\\` int NOT NULL default 0, - `name` varchar(32) NOT NULL default '' - ) +eval CREATE TEMPORARY TABLE federated.t1 ENGINE="FEDERATED" DEFAULT CHARSET=latin1 CONNECTION='mysql://root@127.0.0.1:$SLAVE_MYPORT/federated/t1'; +let $MYSQLD_TMPDIR=`SELECT @@tmpdir`; +exec ls $MYSQLD_TMPDIR; +--source include/restart_mysqld.inc +exec ls $MYSQLD_TMPDIR; + INSERT INTO federated.t1 (id, name) VALUES (1, 'foo'); INSERT INTO federated.t1 (id, name) VALUES (2, 'fee'); INSERT INTO federated.t1 (id, `group`) VALUES (3, 42);  

            People

              ycp Yuchen Pei
              knielsen Kristian Nielsen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.