Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7667

aborting stop slave doesn't recover

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 10.0.17
    • N/A
    • Replication
    • None
    • rhel5 x86_64

    Description

      Aborted a slave thread (accidentally) and the server stayed in a unresponse state with regard to shutdown or replication commands.

      [root@slave01]#  mysql
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 61015
      Server version: 10.0.17-MariaDB-log MariaDB Server
       
      Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      MariaDB [(none)]> stop slave; set global #skip-slave-start
      ;Ctrl-C -- query killed. Continuing normally.
      ^[[A
       
       
       
       
       
      \c
      Ctrl-C -- query killed. Continuing normally.
      ERROR 2013 (HY000): Lost connection to MySQL server during query
          -> Ctrl-C -- exit!
      Aborted
      [root@slave01]# 
       
      [root@slave01]#  mysql
      MariaDB [(none)]> show slave status\G                     
       
       
       
       
      Ctrl-C -- query killed. Continuing normally.
      Ctrl-C -- query killed. Continuing normally.
      ERROR 2013 (HY000): Lost connection to MySQL server during query
       
      [root@slave01]# ps -ef
      ...
      mysql    10731 10380 61 12:28 pts/8    01:01:05 /usr/sbin/mysqld --basedir=/usr --datadir=/u01/data --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/
      ...
      [root@slave01]#  more /var/lib/mysql/mysqld.log
       
      ....
      150305 13:40:46 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
      150305 13:45:44 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
      150305 13:50:45 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
      150305 13:53:10 [ERROR] Slave SQL: Error 'Table 'drupal_prod.__maatkit_char_chunking_map' doesn't exist' on query. Default database: 'drupal_prod'. Query: 'INSERT INT
      O `drupal_prod`.`__maatkit_char_chunking_map` VALUES (CHAR('50'))', Gtid 0-8-822547760, Internal MariaDB error code: 1146
      150305 13:53:10 [Warning] Slave: Table 'drupal_prod.__maatkit_char_chunking_map' doesn't exist Error_code: 1146
      150305 13:53:10 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.027842' position 32786
      153
      150305 13:53:10 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.027842' at position 32786954
      150305 13:55:46 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
      150305 14:00:48 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
      ....
       
      [root@slave01]#  mysql
       
      MariaDB [(none)]> select 1;
      +---+
      | 1 |
      +---+
      | 1 |
      +---+
      1 row in set (0.00 sec)
       
      MariaDB [(none)]> show processlist;
      +-------+---------------------+--------------------+--------------------------+---------+------+--------------------------------------------------------------------------------+--------------------------------+----------+
      | Id    | User                | Host               | db                       | Command | Time | State                                                                          | Info                           | Progress |
      +-------+---------------------+--------------------+--------------------------+---------+------+--------------------------------------------------------------------------------+--------------------------------+----------+
      |     3 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |     4 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |     5 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |     6 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |     7 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |     8 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |     9 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    10 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    11 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    12 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    13 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    14 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    15 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    16 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    17 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    18 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    19 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    20 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    21 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    22 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |   209 | replication         | 10.244.17.27:32946 | NULL                     | Query   |  143 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      |  2296 | system user         |                    | NULL                     | Connect | 5825 | Waiting for master to send event                                               | NULL                           |    0.000 |
      | 60758 | system user         |                    | NULL                     | Connect |  387 | Waiting for room in worker thread event queue                                  | NULL                           |    0.000 |
      | 60996 | root                | localhost          | NULL                     | Query   |  199 | Killing slave                                                                  | stop slave                     |    0.000 |
      | 61010 | cactiuser           | 10.5.0.66:34300    | NULL                     | Query   |  199 | Filling schema table                                                           | SHOW /*!50002 GLOBAL */ STATUS |    0.000 |
      | 61012 | mmm_monitor         | 10.244.17.7:45478  | NULL                     | Query   |  196 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61013 | mmm_monitor         | 10.244.17.7:45479  | NULL                     | Query   |  196 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61015 | root                | localhost          | NULL                     | Killed  |  175 | init                                                                           | stop slave                     |    0.000 |
      | 61017 | mmm_monitor         | 10.244.17.7:45558  | NULL                     | Query   |  188 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61018 | mmm_monitor         | 10.244.17.7:45559  | NULL                     | Query   |  188 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61019 | cactiuser           | 10.5.0.66:34308    | NULL                     | Query   |  184 | Filling schema table                                                           | SHOW /*!50002 GLOBAL */ STATUS |    0.000 |
      | 61021 | mmm_monitor         | 10.244.17.7:45625  | NULL                     | Query   |  180 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61022 | mmm_monitor         | 10.244.17.7:45626  | NULL                     | Query   |  180 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61027 | mmm_monitor         | 10.244.17.7:45773  | NULL                     | Query   |  172 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61028 | mmm_monitor         | 10.244.17.7:45774  | NULL                     | Query   |  172 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61034 | mmm_monitor         | 10.244.17.7:45871  | NULL                     | Query   |  164 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova added a comment - - edited

            Hi,

            Could you please attach your cnf file(s) from the slave (and master if possible), or point at the JIRA issue where I can find them if you already provided them before in earlier bug reports?
            What version does the master run?
            Do you happen to have a stack trace from the stuck slave?
            Do you know at which point of time, relatively to the error log, you issued stop slave – was it before or after the SQL thread aborted?

            elenst Elena Stepanova added a comment - - edited Hi, Could you please attach your cnf file(s) from the slave (and master if possible), or point at the JIRA issue where I can find them if you already provided them before in earlier bug reports? What version does the master run? Do you happen to have a stack trace from the stuck slave? Do you know at which point of time, relatively to the error log, you issued stop slave – was it before or after the SQL thread aborted?
            danblack Daniel Black added a comment -

            > What version does the master run?

            10.0.15

            > Do you happen to have a stack trace from the stuck slave?

            No, the availability of a debug symbols package (MDEV-572 - although this is RHEL) would greatly assist in providing stack traces.

            > Do you know at which point of time, relatively to the error log, you issued stop slave – was it before or after the SQL thread aborted?

            After. I had a stopped slave - exactly like MDEV-7668 (different slave - same stuck state)

            danblack Daniel Black added a comment - > What version does the master run? 10.0.15 > Do you happen to have a stack trace from the stuck slave? No, the availability of a debug symbols package ( MDEV-572 - although this is RHEL) would greatly assist in providing stack traces. > Do you know at which point of time, relatively to the error log, you issued stop slave – was it before or after the SQL thread aborted? After. I had a stopped slave - exactly like MDEV-7668 (different slave - same stuck state)

            knielsen,

            Is it one of the possible replication problems mentioned in MDEV-7668, or is it a totally different issue?

            elenst Elena Stepanova added a comment - knielsen , Is it one of the possible replication problems mentioned in MDEV-7668 , or is it a totally different issue?

            > Is it one of the possible replication problems mentioned in MDEV-7668, or is
            > it a totally different issue?

            It's almost certainly a different issue.

            It looks like insufficient error handling in STOP SLAVE. A kill (CTRL-C
            internally does KILL, I believe) manifests itself as an error return from
            whatever function detected the signal. If that error return is not handled
            correctly, the server might get stuck in some odd state...

            So one would need to check all error paths in STOP SLAVE, I suppose (since the
            actual point where the kill was detected is probably not available?)

            knielsen Kristian Nielsen added a comment - > Is it one of the possible replication problems mentioned in MDEV-7668 , or is > it a totally different issue? It's almost certainly a different issue. It looks like insufficient error handling in STOP SLAVE. A kill (CTRL-C internally does KILL, I believe) manifests itself as an error return from whatever function detected the signal. If that error return is not handled correctly, the server might get stuck in some odd state... So one would need to check all error paths in STOP SLAVE, I suppose (since the actual point where the kill was detected is probably not available?)

            Okay, thanks. I'll try to run some tests with focus on stopping/killing the slave, maybe i'll get lucky reproducing it.

            elenst Elena Stepanova added a comment - Okay, thanks. I'll try to run some tests with focus on stopping/killing the slave, maybe i'll get lucky reproducing it.
            danblack Daniel Black added a comment -

            might be MDEV-7126

            danblack Daniel Black added a comment - might be MDEV-7126

            and/or MDEV-8039?

            elenst Elena Stepanova added a comment - and/or MDEV-8039 ?

            MDEV-7126 was fixed in 10.0.18, so if the problem is not happening anymore, it might have been it.

            elenst Elena Stepanova added a comment - MDEV-7126 was fixed in 10.0.18, so if the problem is not happening anymore, it might have been it.

            People

              elenst Elena Stepanova
              danblack Daniel Black
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.