Details

      Description

      I've recently upgraded a few MariaDB Galera clusters from 10.1 to 10.2. Since this upgrade, MariaDB / Galera has been extremely unstable on every single cluster.

      One of the issue I'm seeing quite frequently is the combination of "query end" never ending, which appears to lock the table metadata. This issue is compounded by the fact that simple SELECT statements apparently need a metadata lock on these tables in order to complete.

      The processes with "query end" hanging around cannot be killed. And because SELECT statements are waiting on metadata locks, they cannot complete either. The only "work around" if you could call it that is to literally take the entire cluster offline and start it up again with --wsrep-new-cluster

      This is happening anywhere from every couple of hours to every few days, on clusters that otherwise lasted months of uptime and only ever saw downtime during upgrades.

      Here is an example process list

      MariaDB [(none)]> show processlist;
      +--------+-------------+--------------------+-----------+---------+--------+---------------------------------+------------------------------------------------------------------------------------------------------+----------+
      | Id     | User        | Host               | db        | Command | Time   | State                           | Info                                                                                                 | Progress |
      +--------+-------------+--------------------+-----------+---------+--------+---------------------------------+------------------------------------------------------------------------------------------------------+----------+
      |      2 | system user |                    | NULL      | Sleep   |   2913 | Waiting for table metadata lock | UPDATE `prfix_file` SET `file_views`=`file_views`+1 WHERE (`file_hash`=UNHEX('36bae2de8f3399e891a30b |    0.000 |
      |      1 | system user |                    | NULL      | Sleep   | 888882 | wsrep aborter idle              | NULL                                                                                                 |    0.000 |
      |      3 | system user |                    | NULL      | Daemon  |   NULL | InnoDB purge coordinator        | NULL                                                                                                 |    0.000 |
      |      5 | system user |                    | NULL      | Daemon  |   NULL | InnoDB purge worker             | NULL                                                                                                 |    0.000 |
      |      4 | system user |                    | NULL      | Daemon  |   NULL | InnoDB purge worker             | NULL                                                                                                 |    0.000 |
      |      6 | system user |                    | NULL      | Daemon  |   NULL | InnoDB purge worker             | NULL                                                                                                 |    0.000 |
      |      7 | system user |                    | NULL      | Daemon  |   NULL | InnoDB shutdown handler         | NULL                                                                                                 |    0.000 |
      | 485377 | user000     | xxx.xx.xx.53:11779 | database1 | Query   |   2913 | query end                       | INSERT INTO `prfix_user` (`user_name`) VALUES ('new user name')                                      |    0.000 |
      | 485430 | user000     | xxx.xx.xx.52:46210 | database1 | Query   |   2788 | query end                       | UPDATE `prfix_anchor` SET `anchor_clicks`=`anchor_clicks`+1 WHERE (`anchor_id`='36964')              |    0.000 |
      | 485933 | user000     | xxx.xx.xx.52:47277 | database1 | Query   |   1410 | query end                       | DELETE FROM `prfix_session` WHERE (`access`<1503723562)                                              |    0.000 |
      | 485993 | user000     | xxx.xx.xx.54:44107 | database1 | Query   |   1242 | query end                       | INSERT INTO `prfix_anchor` (`anchor_source`, `anchor_dest`) VALUES ('https:/www.example.com/some123' |    0.000 |
      | 486023 | user000     | xxx.xx.xx.52:47490 | database1 | Query   |   1163 | query end                       | UPDATE `prfix_anchor` SET `anchor_clicks`=`anchor_clicks`+1 WHERE (`anchor_id`='29042')              |    0.000 |
      | 486150 | user000     | xxx.xx.xx.51:60057 | database1 | Query   |    761 | query end                       | UPDATE `prfix_anchor` SET `anchor_clicks`=`anchor_clicks`+1 WHERE (`anchor_id`='5240')               |    0.000 |
      | 486223 | user000     | xxx.xx.xx.52:48073 | database1 | Query   |    497 | query end                       | UPDATE `prfix_anchor` SET `anchor_clicks`=`anchor_clicks`+1 WHERE (`anchor_id`='25304')              |    0.000 |
      | 486254 | user000     | xxx.xx.xx.51:60409 | database1 | Query   |    395 | query end                       | UPDATE `prfix_gallery` SET `gallery_views`=`gallery_views`+1 WHERE (`gallery_id`='114141')           |    0.000 |
      | 486285 | user000     | xxx.xx.xx.52:48281 | database1 | Query   |    269 | query end                       | UPDATE `prfix_anchor` SET `anchor_clicks`=`anchor_clicks`+1 WHERE (`anchor_id`='13309')              |    0.000 |
      | 486360 | root        | localhost          | NULL      | Query   |      0 | init                            | show processlist                                                                                     |    0.000 |
      | 486363 | user000     | xxx.xx.xx.52:48581 | database1 | Query   |     47 | Waiting for table metadata lock | SELECT SQL_NO_CACHE ga.*, us.user_id, us.user_name, us.user_url, th.thumb_hash, th.file_hash, fl.fil |    0.000 |
      | 486364 | user000     | xxx.xx.xx.53:21831 | database1 | Query   |     46 | Waiting for table metadata lock | SELECT us.*, ga.*, th.thumb_hash FROM `prfix_gallery` AS `ga` LEFT JOIN `prfix_file_thumb` AS `th` O |    0.000 |
      | 486365 | user000     | xxx.xx.xx.51:60884 | database1 | Query   |     34 | Waiting for table metadata lock | SELECT us.*, ga.*, th.thumb_hash FROM `prfix_gallery` AS `ga` LEFT JOIN `prfix_file_thumb` AS `th` O |    0.000 |
      | 486366 | user000     | xxx.xx.xx.51:60887 | database1 | Query   |     31 | Waiting for table metadata lock | SELECT * FROM `prfix_user` AS `us` LEFT JOIN `prfix_file_thumb` AS `th` ON (`th`.`file_hash`=`us`.`u |    0.000 |
      | 486373 | user000     | xxx.xx.xx.54:62339 | database1 | Query   |     26 | Waiting for table metadata lock | SELECT * FROM `prfix_user` AS `us` LEFT JOIN `prfix_file_thumb` AS `th` ON (`th`.`file_hash`=`us`.`u |    0.000 |
      | 486375 | user000     | xxx.xx.xx.51:60907 | database1 | Query   |     17 | Waiting for table metadata lock | SELECT us.*, ga.*, th.thumb_hash FROM `prfix_gallery` AS `ga` LEFT JOIN `prfix_file_thumb` AS `th` O |    0.000 |
      | 486381 | user000     | xxx.xx.xx.51:60934 | database1 | Query   |      6 | Waiting for table metadata lock | SELECT * FROM `prfix_user` AS `us` LEFT JOIN `prfix_file_thumb` AS `th` ON (`th`.`file_hash`=`us`.`u |    0.000 |
      +--------+-------------+--------------------+-----------+---------+--------+---------------------------------+------------------------------------------------------------------------------------------------------+----------+
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jplindst Jan Lindström
                Reporter:
                darkain Vincent Milum Jr
              • Votes:
                2 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: