Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Fix
    • 10.1.25, 10.1.26
    • N/A
    • wsrep
    • None
    • Ubuntu 16.04.2, MariaDB 10.1.25 and 10.1.26

    Description

      We have a 3 node Galera Cluster and this weekend we tried rebooting each node one at a time for an EC2 instance upgrade. When the servers came back online they each crashed with signal 11 while trying to rejoin the cluster.

      The crash is occurring during WSREP recovery. If I set wsrep_on=OFF MySQL will startup without crashing, but it again crashes when dynamically setting wsrep_on=ON.

      Nothing shows up in the other two nodes' logs while the other join is starting up before it crashes. And all ports are open between the Galera nodes. Each node is running MariaDB 10.1.25 but I did upgrade one node to 10.1.26 to see if the problem was fixed there and it exhibited the same behavior. The only way I was get the nodes to rejoin the cluster was to force an SST sync. However the data directory is 1.8TB so that is far from ideal for each node restart.

      I've attached the wsrep_recovery log and apport crash file, but it doesn't contain the core dump for some reason. I've also uploaded the my.cnf and a mariadb.cnf config file containing the Galera Cluster related config options.

      Attachments

        1. _usr_sbin_mysqld.0.crash
          17 kB
        2. mariadb.cnf
          0.9 kB
        3. my.cnf
          2 kB
        4. wsrep_recovery.KbkcqG
          5 kB

        Issue Links

          Activity

            Could you provide output from one of nodes:

            use mysql;
            show create table servers\G
            show table status like "servers";
            select count(*) from servers;
            

            anikitin Andrii Nikitin (Inactive) added a comment - Could you provide output from one of nodes: use mysql; show create table servers\G show table status like "servers" ; select count (*) from servers;

            I strongly believe that I could generate similar crash by altering mysql.servers from MyISAM to InnoDB and then trying to re-join one node.

            Core was generated by `/home/a/env1/m6-10.1.25/../_depot/m-tar/10.1.25/bin/mysqld --defaults-file=/hom'.
            Program terminated with signal SIGSEGV, Segmentation fault.
            #0  0x00007f960fb8daa7 in kill () at ../sysdeps/unix/syscall-template.S:84
            84	../sysdeps/unix/syscall-template.S: No such file or directory.
            [Current thread is 1 (Thread 0x7f9611211640 (LWP 18478))]
            (gdb) bt
            #0  0x00007f960fb8daa7 in kill () at ../sysdeps/unix/syscall-template.S:84
            #1  0x000000000076a425 in handle_fatal_signal (sig=11) at /home/buildbot/buildbot/build/sql/signal_handler.cc:308
            #2  <signal handler called>
            #3  0x00000000006f5bff in wsrep_commit (hton=<optimized out>, thd=0x7f960f38b008, all=false)
                at /home/buildbot/buildbot/build/sql/wsrep_hton.cc:296
            #4  0x000000000076c60e in commit_one_phase_2 (thd=0x7f960f38b008, all=false, trans=0x7f960f38e038, is_real_trans=true)
                at /home/buildbot/buildbot/build/sql/handler.cc:1556
            #5  0x0000000000771f78 in ha_commit_one_phase (all=<optimized out>, thd=<optimized out>, this=<optimized out>, thd=<optimized out>, 
                this=<optimized out>) at /home/buildbot/buildbot/build/sql/handler.cc:1537
            #6  ha_commit_trans (thd=0x7f960f38b008, all=<optimized out>) at /home/buildbot/buildbot/build/sql/handler.cc:1404
            #7  0x00000000006a16db in trans_commit_stmt (thd=0x7f960f38b008) at /home/buildbot/buildbot/build/sql/transaction.cc:434
            #8  0x000000000056dc2b in close_mysql_tables (thd=0x0) at /home/buildbot/buildbot/build/sql/sql_base.cc:9333
            #9  0x000000000068c962 in servers_reload (thd=0x7f960f38b008) at /home/buildbot/buildbot/build/sql/sql_servers.cc:278
            #10 0x000000000068cafe in servers_init (dont_read_servers_table=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_servers.cc:174
            #11 0x0000000000521221 in init_server_components () at /home/buildbot/buildbot/build/sql/mysqld.cc:5355
            #12 0x0000000000521875 in mysqld_main (argc=98, argv=0x7f960f04fc58) at /home/buildbot/buildbot/build/sql/mysqld.cc:5737
            #13 0x00007f960fb783f1 in __libc_start_main (main=0x516da0 <main(int, char**)>, argc=24, argv=0x7ffd86e25648, init=<optimized out>, 
                fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd86e25638) at ../csu/libc-start.c:291
            #14 0x0000000000516ce9 in _start ()
            

            While we must insert protection from this behavior:

            • I was able to re-join problem node by starting it offline (WSREP_ON=OFF), altering mysql.servers back to MyISAM, restart with WSREP_ON=ON
            • it looks the triggering point for the issue was the fact that table got converted to InnoDB, which requires dedicated troublehooting
            anikitin Andrii Nikitin (Inactive) added a comment - I strongly believe that I could generate similar crash by altering mysql.servers from MyISAM to InnoDB and then trying to re-join one node. Core was generated by `/home/a/env1/m6-10.1.25/../_depot/m-tar/10.1.25/bin/mysqld --defaults-file=/hom'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f960fb8daa7 in kill () at ../sysdeps/unix/syscall-template.S:84 84 ../sysdeps/unix/syscall-template.S: No such file or directory. [Current thread is 1 (Thread 0x7f9611211640 (LWP 18478))] (gdb) bt #0 0x00007f960fb8daa7 in kill () at ../sysdeps/unix/syscall-template.S:84 #1 0x000000000076a425 in handle_fatal_signal (sig=11) at /home/buildbot/buildbot/build/sql/signal_handler.cc:308 #2 <signal handler called> #3 0x00000000006f5bff in wsrep_commit (hton=<optimized out>, thd=0x7f960f38b008, all=false) at /home/buildbot/buildbot/build/sql/wsrep_hton.cc:296 #4 0x000000000076c60e in commit_one_phase_2 (thd=0x7f960f38b008, all=false, trans=0x7f960f38e038, is_real_trans=true) at /home/buildbot/buildbot/build/sql/handler.cc:1556 #5 0x0000000000771f78 in ha_commit_one_phase (all=<optimized out>, thd=<optimized out>, this=<optimized out>, thd=<optimized out>, this=<optimized out>) at /home/buildbot/buildbot/build/sql/handler.cc:1537 #6 ha_commit_trans (thd=0x7f960f38b008, all=<optimized out>) at /home/buildbot/buildbot/build/sql/handler.cc:1404 #7 0x00000000006a16db in trans_commit_stmt (thd=0x7f960f38b008) at /home/buildbot/buildbot/build/sql/transaction.cc:434 #8 0x000000000056dc2b in close_mysql_tables (thd=0x0) at /home/buildbot/buildbot/build/sql/sql_base.cc:9333 #9 0x000000000068c962 in servers_reload (thd=0x7f960f38b008) at /home/buildbot/buildbot/build/sql/sql_servers.cc:278 #10 0x000000000068cafe in servers_init (dont_read_servers_table=<optimized out>) at /home/buildbot/buildbot/build/sql/sql_servers.cc:174 #11 0x0000000000521221 in init_server_components () at /home/buildbot/buildbot/build/sql/mysqld.cc:5355 #12 0x0000000000521875 in mysqld_main (argc=98, argv=0x7f960f04fc58) at /home/buildbot/buildbot/build/sql/mysqld.cc:5737 #13 0x00007f960fb783f1 in __libc_start_main (main=0x516da0 <main(int, char**)>, argc=24, argv=0x7ffd86e25648, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd86e25638) at ../csu/libc-start.c:291 #14 0x0000000000516ce9 in _start () While we must insert protection from this behavior: I was able to re-join problem node by starting it offline (WSREP_ON=OFF), altering mysql.servers back to MyISAM, restart with WSREP_ON=ON it looks the triggering point for the issue was the fact that table got converted to InnoDB, which requires dedicated troublehooting

            Here is the output from those commands:

            MariaDB [(none)]> use mysql;
            Reading table information for completion of table and column names
            You can turn off this feature to get a quicker startup with -A
             
            Database changed
            MariaDB [mysql]> show create table servers\G
            *************************** 1. row ***************************
                   Table: servers
            Create Table: CREATE TABLE `servers` (
              `Server_name` char(64) NOT NULL DEFAULT '',
              `Host` char(64) NOT NULL DEFAULT '',
              `Db` char(64) NOT NULL DEFAULT '',
              `Username` char(80) NOT NULL DEFAULT '',
              `Password` char(64) NOT NULL DEFAULT '',
              `Port` int(4) NOT NULL DEFAULT '0',
              `Socket` char(64) NOT NULL DEFAULT '',
              `Wrapper` char(64) NOT NULL DEFAULT '',
              `Owner` char(64) NOT NULL DEFAULT '',
              PRIMARY KEY (`Server_name`)
            ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='MySQL Foreign Servers table'
            1 row in set (0.00 sec)
             
            MariaDB [mysql]> show table status like 'servers';
            +---------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+-----------------------------+
            | Name    | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation       | Checksum | Create_options | Comment                     |
            +---------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+-----------------------------+
            | servers | InnoDB |      10 | Compact    |    0 |              0 |       16384 |               0 |            0 |         0 |           NULL | 2017-09-24 05:09:02 | NULL        | NULL       | utf8_general_ci |     NULL |                | MySQL Foreign Servers table |
            +---------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+-----------------------------+
            1 row in set (0.00 sec)
             
            MariaDB [mysql]> select count(*) from servers;
            +----------+
            | count(*) |
            +----------+
            |        0 |
            +----------+
            1 row in set (0.00 sec)
            

            The mysql.servers table does indeed appear to be InnoDB instead of MyISAM. As far as I know we didn't perform any explicit conversion from MyISAM to InnoDB on this table. The table structure was created from a mysqldump from another Galera Cluster running MariaDB 10.1.14.

            btraywick Bryan Traywick added a comment - Here is the output from those commands: MariaDB [(none)]> use mysql; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A   Database changed MariaDB [mysql]> show create table servers\G *************************** 1. row *************************** Table: servers Create Table: CREATE TABLE `servers` ( `Server_name` char(64) NOT NULL DEFAULT '', `Host` char(64) NOT NULL DEFAULT '', `Db` char(64) NOT NULL DEFAULT '', `Username` char(80) NOT NULL DEFAULT '', `Password` char(64) NOT NULL DEFAULT '', `Port` int(4) NOT NULL DEFAULT '0', `Socket` char(64) NOT NULL DEFAULT '', `Wrapper` char(64) NOT NULL DEFAULT '', `Owner` char(64) NOT NULL DEFAULT '', PRIMARY KEY (`Server_name`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='MySQL Foreign Servers table' 1 row in set (0.00 sec)   MariaDB [mysql]> show table status like 'servers'; +---------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+-----------------------------+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment | +---------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+-----------------------------+ | servers | InnoDB | 10 | Compact | 0 | 0 | 16384 | 0 | 0 | 0 | NULL | 2017-09-24 05:09:02 | NULL | NULL | utf8_general_ci | NULL | | MySQL Foreign Servers table | +---------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+-----------------------------+ 1 row in set (0.00 sec)   MariaDB [mysql]> select count(*) from servers; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) The mysql.servers table does indeed appear to be InnoDB instead of MyISAM. As far as I know we didn't perform any explicit conversion from MyISAM to InnoDB on this table. The table structure was created from a mysqldump from another Galera Cluster running MariaDB 10.1.14.

            So do you have that dump available? Is it InnoDB or MyISAM in it? Maybe you have following option configured on some of nodes?

            show variables like 'enforce_storage_engine';

            anikitin Andrii Nikitin (Inactive) added a comment - So do you have that dump available? Is it InnoDB or MyISAM in it? Maybe you have following option configured on some of nodes? show variables like 'enforce_storage_engine';

            I do have the dump available and it is InnoDB in the dump:

            --
            -- Table structure for table `servers`
            --
             
            DROP TABLE IF EXISTS `servers`;
            /*!40101 SET @saved_cs_client     = @@character_set_client */;
            /*!40101 SET character_set_client = utf8 */;
            CREATE TABLE `servers` (
              `Server_name` char(64) NOT NULL DEFAULT '',
              `Host` char(64) NOT NULL DEFAULT '',
              `Db` char(64) NOT NULL DEFAULT '',
              `Username` char(80) NOT NULL DEFAULT '',
              `Password` char(64) NOT NULL DEFAULT '',
              `Port` int(4) NOT NULL DEFAULT '0',
              `Socket` char(64) NOT NULL DEFAULT '',
              `Wrapper` char(64) NOT NULL DEFAULT '',
              `Owner` char(64) NOT NULL DEFAULT '',
              PRIMARY KEY (`Server_name`)
            ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='MySQL Foreign Servers table';
            

            The enforce_storage_variables variable is blank in the new MariaDB 10.1.25 cluster and doesn't appear to be present at all on the older 10.1.14 cluster (just gives me an empty set when I try to show it).

            btraywick Bryan Traywick added a comment - I do have the dump available and it is InnoDB in the dump: -- -- Table structure for table `servers` --   DROP TABLE IF EXISTS `servers`; /*!40101 SET @saved_cs_client = @@character_set_client */ ; /*!40101 SET character_set_client = utf8 */ ; CREATE TABLE `servers` ( `Server_name` char (64) NOT NULL DEFAULT '' , `Host` char (64) NOT NULL DEFAULT '' , `Db` char (64) NOT NULL DEFAULT '' , `Username` char (80) NOT NULL DEFAULT '' , ` Password ` char (64) NOT NULL DEFAULT '' , `Port` int (4) NOT NULL DEFAULT '0' , `Socket` char (64) NOT NULL DEFAULT '' , `Wrapper` char (64) NOT NULL DEFAULT '' , `Owner` char (64) NOT NULL DEFAULT '' , PRIMARY KEY (`Server_name`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT= 'MySQL Foreign Servers table' ; The enforce_storage_variables variable is blank in the new MariaDB 10.1.25 cluster and doesn't appear to be present at all on the older 10.1.14 cluster (just gives me an empty set when I try to show it).

            Thank you for confirmation. It should be safe to just convert it back to MyISAM to avoid the problem, but it may be good idea to do that in downtime.

            anikitin Andrii Nikitin (Inactive) added a comment - Thank you for confirmation. It should be safe to just convert it back to MyISAM to avoid the problem, but it may be good idea to do that in downtime.
            btraywick Bryan Traywick added a comment - - edited

            Thank you so much Andrii. I was able to recreate the issue in our staging cluster by converting mysql.servers to InnoDB on one of the nodes and restarting that node. I was then able to startup that node with wsrep_on=OFF, convert the table back to MyISAM, and then restart MySQL with wsrep_on=ON and it was able to rejoin the cluster without an SST sync.

            We will be converting the table back to MyISAM in our production cluster tonight and will restart a node to ensure it doesn't need a full resync. I will report back with confirmation once that has gone successfully but this appears to be the fix we are looking for.

            btraywick Bryan Traywick added a comment - - edited Thank you so much Andrii. I was able to recreate the issue in our staging cluster by converting mysql.servers to InnoDB on one of the nodes and restarting that node. I was then able to startup that node with wsrep_on=OFF , convert the table back to MyISAM, and then restart MySQL with wsrep_on=ON and it was able to rejoin the cluster without an SST sync. We will be converting the table back to MyISAM in our production cluster tonight and will restart a node to ensure it doesn't need a full resync. I will report back with confirmation once that has gone successfully but this appears to be the fix we are looking for.

            Thanks again Andrii. We converted the table back to MyISAM and were able to restart MySQL with only an IST sync required. As a final test I also tried restarting MySQL on one of the nodes in the older Galera Cluster running MariaDB 10.1.14 and we didn't run into this crash despite the mysql.servers table being InnoDB there as well. So it's likely a change introduced between 10.1.14 and 10.1.25. The older cluster is also running Ubuntu 14.04 and the 10.1.25 cluster is running 16.04 so it could be something to do with the systemd init scripts.

            btraywick Bryan Traywick added a comment - Thanks again Andrii. We converted the table back to MyISAM and were able to restart MySQL with only an IST sync required. As a final test I also tried restarting MySQL on one of the nodes in the older Galera Cluster running MariaDB 10.1.14 and we didn't run into this crash despite the mysql.servers table being InnoDB there as well. So it's likely a change introduced between 10.1.14 and 10.1.25. The older cluster is also running Ubuntu 14.04 and the 10.1.25 cluster is running 16.04 so it could be something to do with the systemd init scripts.

            10.1 is EOL.

            janlindstrom Jan Lindström added a comment - 10.1 is EOL.

            People

              jplindst Jan Lindström (Inactive)
              btraywick Bryan Traywick
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.