[MDEV-7667] aborting stop slave doesn't recover Created: 2015-03-05  Updated: 2019-03-31  Resolved: 2019-03-31

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0.17
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Elena Stepanova
Resolution: Cannot Reproduce Votes: 1
Labels: None
Environment:

rhel5 x86_64


Attachments: File my.cnf-db-master     File my.cnf-db-slave    
Issue Links:
Relates
relates to MDEV-7126 replication slave - deadlock in termi... Closed
relates to MDEV-8039 parallel replication slave - indefina... Closed

 Description   

Aborted a slave thread (accidentally) and the server stayed in a unresponse state with regard to shutdown or replication commands.

[root@slave01]#  mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 61015
Server version: 10.0.17-MariaDB-log MariaDB Server
 
Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [(none)]> stop slave; set global #skip-slave-start
;Ctrl-C -- query killed. Continuing normally.
^[[A
 
 
 
 
 
\c
Ctrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
    -> Ctrl-C -- exit!
Aborted
[root@slave01]# 
 
[root@slave01]#  mysql
MariaDB [(none)]> show slave status\G                     
 
 
 
 
Ctrl-C -- query killed. Continuing normally.
Ctrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
 
[root@slave01]# ps -ef
...
mysql    10731 10380 61 12:28 pts/8    01:01:05 /usr/sbin/mysqld --basedir=/usr --datadir=/u01/data --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/
...
[root@slave01]#  more /var/lib/mysql/mysqld.log
 
....
150305 13:40:46 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
150305 13:45:44 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
150305 13:50:45 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
150305 13:53:10 [ERROR] Slave SQL: Error 'Table 'drupal_prod.__maatkit_char_chunking_map' doesn't exist' on query. Default database: 'drupal_prod'. Query: 'INSERT INT
O `drupal_prod`.`__maatkit_char_chunking_map` VALUES (CHAR('50'))', Gtid 0-8-822547760, Internal MariaDB error code: 1146
150305 13:53:10 [Warning] Slave: Table 'drupal_prod.__maatkit_char_chunking_map' doesn't exist Error_code: 1146
150305 13:53:10 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.027842' position 32786
153
150305 13:53:10 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.027842' at position 32786954
150305 13:55:46 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
150305 14:00:48 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
....
 
[root@slave01]#  mysql
 
MariaDB [(none)]> select 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)
 
MariaDB [(none)]> show processlist;
+-------+---------------------+--------------------+--------------------------+---------+------+--------------------------------------------------------------------------------+--------------------------------+----------+
| Id    | User                | Host               | db                       | Command | Time | State                                                                          | Info                           | Progress |
+-------+---------------------+--------------------+--------------------------+---------+------+--------------------------------------------------------------------------------+--------------------------------+----------+
|     3 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|     4 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|     5 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|     6 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|     7 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|     8 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|     9 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|    10 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|    11 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|    12 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|    13 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|    14 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|    15 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|    16 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|    17 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|    18 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|    19 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|    20 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|    21 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|    22 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
|   209 | replication         | 10.244.17.27:32946 | NULL                     | Query   |  143 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
|  2296 | system user         |                    | NULL                     | Connect | 5825 | Waiting for master to send event                                               | NULL                           |    0.000 |
| 60758 | system user         |                    | NULL                     | Connect |  387 | Waiting for room in worker thread event queue                                  | NULL                           |    0.000 |
| 60996 | root                | localhost          | NULL                     | Query   |  199 | Killing slave                                                                  | stop slave                     |    0.000 |
| 61010 | cactiuser           | 10.5.0.66:34300    | NULL                     | Query   |  199 | Filling schema table                                                           | SHOW /*!50002 GLOBAL */ STATUS |    0.000 |
| 61012 | mmm_monitor         | 10.244.17.7:45478  | NULL                     | Query   |  196 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
| 61013 | mmm_monitor         | 10.244.17.7:45479  | NULL                     | Query   |  196 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
| 61015 | root                | localhost          | NULL                     | Killed  |  175 | init                                                                           | stop slave                     |    0.000 |
| 61017 | mmm_monitor         | 10.244.17.7:45558  | NULL                     | Query   |  188 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
| 61018 | mmm_monitor         | 10.244.17.7:45559  | NULL                     | Query   |  188 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
| 61019 | cactiuser           | 10.5.0.66:34308    | NULL                     | Query   |  184 | Filling schema table                                                           | SHOW /*!50002 GLOBAL */ STATUS |    0.000 |
| 61021 | mmm_monitor         | 10.244.17.7:45625  | NULL                     | Query   |  180 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
| 61022 | mmm_monitor         | 10.244.17.7:45626  | NULL                     | Query   |  180 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
| 61027 | mmm_monitor         | 10.244.17.7:45773  | NULL                     | Query   |  172 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
| 61028 | mmm_monitor         | 10.244.17.7:45774  | NULL                     | Query   |  172 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
| 61034 | mmm_monitor         | 10.244.17.7:45871  | NULL                     | Query   |  164 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |



 Comments   
Comment by Elena Stepanova [ 2015-03-05 ]

Hi,

Could you please attach your cnf file(s) from the slave (and master if possible), or point at the JIRA issue where I can find them if you already provided them before in earlier bug reports?
What version does the master run?
Do you happen to have a stack trace from the stuck slave?
Do you know at which point of time, relatively to the error log, you issued stop slave – was it before or after the SQL thread aborted?

Comment by Daniel Black [ 2015-03-05 ]

> What version does the master run?

10.0.15

> Do you happen to have a stack trace from the stuck slave?

No, the availability of a debug symbols package (MDEV-572 - although this is RHEL) would greatly assist in providing stack traces.

> Do you know at which point of time, relatively to the error log, you issued stop slave – was it before or after the SQL thread aborted?

After. I had a stopped slave - exactly like MDEV-7668 (different slave - same stuck state)

Comment by Elena Stepanova [ 2015-03-09 ]

knielsen,

Is it one of the possible replication problems mentioned in MDEV-7668, or is it a totally different issue?

Comment by Kristian Nielsen [ 2015-03-09 ]

> Is it one of the possible replication problems mentioned in MDEV-7668, or is
> it a totally different issue?

It's almost certainly a different issue.

It looks like insufficient error handling in STOP SLAVE. A kill (CTRL-C
internally does KILL, I believe) manifests itself as an error return from
whatever function detected the signal. If that error return is not handled
correctly, the server might get stuck in some odd state...

So one would need to check all error paths in STOP SLAVE, I suppose (since the
actual point where the kill was detected is probably not available?)

Comment by Elena Stepanova [ 2015-03-09 ]

Okay, thanks. I'll try to run some tests with focus on stopping/killing the slave, maybe i'll get lucky reproducing it.

Comment by Daniel Black [ 2015-05-12 ]

might be MDEV-7126

Comment by Elena Stepanova [ 2015-05-12 ]

and/or MDEV-8039?

Comment by Elena Stepanova [ 2019-03-31 ]

MDEV-7126 was fixed in 10.0.18, so if the problem is not happening anymore, it might have been it.

Generated at Thu Feb 08 07:21:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.