[MDEV-22400] SHUTDOWN on slave may failed in some case Created: 2020-04-28  Updated: 2020-05-12  Resolved: 2020-05-05

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.4.9, 10.4.11
Fix Version/s: 10.4.12

Type: Bug Priority: Major
Reporter: VAROQUI Stephane Assignee: Andrei Elkin
Resolution: Duplicate Votes: 0
Labels: None
Environment:

OSX siera 10.12.6


Attachments: PNG File Capture d’écran 2020-04-28 à 23.20.22.png    

 Description   

Playing with replication manager we have 2 modes of shutdown when orchetrator="local"
Means the toolkit manage starting and stoping mariadb process with golang command

1 - send kill process id
2 - send shutdown

When 1-2 is working all the time on master
2 - failed on slave

mysqld process is not stop but stay for ever in stoping without being able to connect
Only a kill -9 can resolve the issue

when 2 but with stop slave before it succeed

Note that more experiment on debian with 10.5 last github code without semi-sync on do not experiment similar issue , i will report more testing with osx to refined the issue



 Comments   
Comment by VAROQUI Stephane [ 2020-04-28 ]

Semi sync as nothing to so with that can reproduce with regular replication

Comment by VAROQUI Stephane [ 2020-04-28 ]

dtruss of process that does not stop
https://pastebin.com/cw6gTfwW

Comment by VAROQUI Stephane [ 2020-04-28 ]

Ps:
apple 42030 2,4 0,9 3425072 150788 s000 S+ 10:18 1:22.11 /usr/local/bin/mysqld --defaults-file=/var/lib/replication-manager/backup1/127.0.0.1_3331/init/etc/mysql/my.cnf --port=3331 --server-id=3331 --datadir=/var/lib/replication-manager/backup1/127.0.0.1_3331/var --socket=/var/lib/replication-manager/backup1/127.0.0.1_3331/db14808896910362857200.sock --user=apple --bind-address=0.0.0.0 --pid_file=/var/lib/replication-manager/backup1/127.0.0.1_3331/var/db14808896910362857200.pid
apple 42048 0,4 0,9 3416744 154124 s000 S+ 10:18 0:19.17 /usr/local/bin/mysqld --defaults-file=/var/lib/replication-manager/backup1/127.0.0.1_3332/init/etc/mysql/my.cnf --port=3332 --server-id=3332 --datadir=/var/lib/replication-manager/backup1/127.0.0.1_3332/var --socket=/var/lib/replication-manager/backup1/127.0.0.1_3332/db9955699208988733892.sock --user=apple --bind-address=0.0.0.0 --pid_file=/var/lib/replication-manager/backup1/127.0.0.1_3332/var/db9955699208988733892.pid
a

42048 is the salve afater shutdown

lrwxr-xr-x 1 apple staff 61 28 avr 14:55 var@ -> /var/lib/replication-manager/backup1/127.0.0.1_3331/init/data

Comment by VAROQUI Stephane [ 2020-04-28 ]

Can reproduce with mariadb db client shutdown

mysql -h 127.0.0.1 -P3332 -pmariadb -uroot
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 146
Server version: 10.4.11-MariaDB-log Homebrew

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> shutdown
-> ;
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> exit
Bye
macbook-pro-de-apple-2:mysql apple$ ps aux | grep mysqld
apple 42030 4,0 0,8 3425072 138308 s000 S+ 10:18 2:51.27 /usr/local/bin/mysqld --defaults-file=/var/lib/replication-manager/backup1/127.0.0.1_3331/init/etc/mysql/my.cnf --port=3331 --server-id=3331 --datadir=/var/lib/replication-manager/backup1/127.0.0.1_3331/var --socket=/var/lib/replication-manager/backup1/127.0.0.1_3331/db14808896910362857200.sock --user=apple --bind-address=0.0.0.0 --pid_file=/var/lib/replication-manager/backup1/127.0.0.1_3331/var/db14808896910362857200.pid
apple 73214 0,0 0,0 2451236 828 s001 S+ 11:07 0:00.00 grep mysqld
apple 73057 0,0 3,4 3415720 574628 s000 S+ 11:06 0:01.11 /usr/local/bin/mysqld --defaults-file=/var/lib/replication-manager/backup1/127.0.0.1_3332/init/etc/mysql/my.cnf --port=3332 --server-id=3332 --datadir=/var/lib/replication-manager/backup1/127.0.0.1_3332/var --socket=/var/lib/replication-manager/backup1/127.0.0.1_3332/db9955699208988733892.sock --user=apple --bind-address=0.0.0.0 --pid_file=/var/lib/replication-manager/backup1/127.0.0.1_3332/var/db9955699208988733892.pid

Comment by VAROQUI Stephane [ 2020-04-28 ]

can reproduce without repman

apple 73057 0,0 3,4 3415720 574628 s000 S+ 11:06 0:01.11 /usr/local/bin/mysqld --defaults-file=/var/lib/replication-manager/backup1/127.0.0.1_3332/init/etc/mysql/my.cnf --port=3332 --server-id=3332 --datadir=/var/lib/replication-manager/backup1/127.0.0.1_3332/var --socket=/var/lib/replication-manager/backup1/127.0.0.1_3332/db9955699208988733892.sock --user=apple --bind-address=0.0.0.0 --pid_file=/var/lib/replication-manager/backup1/127.0.0.1_3332/var/db9955699208988733892.pid
macbook-pro-de-apple-2:mysql apple$ kill -9 73057
macbook-pro-de-apple-2:mysql apple$ /usr/local/bin/mysqld --defaults-file=/var/lib/replication-manager/backup1/127.0.0.1_3332/init/etc/mysql/my.cnf --port=3332 --server-id=3332 --datadir=/var/lib/replication-manager/backup1/127.0.0.1_3332/var --socket=/var/lib/replication-manager/backup1/127.0.0.1_3332/db9955699208988733892.sock --user=apple --bind-address=0.0.0.0 --pid_file=/var/lib/replication-manager/backup1/127.0.0.1_3332/var/db9955699208988733892.pid &
[1] 73421
macbook-pro-de-apple-2:mysql apple$ 2020-04-28 23:09:31 0 [Note] Using unique option prefix 'ignore_db_dir' is error-prone and can break in the future. Please use the full name 'ignore_db_dirs' instead.
2020-04-28 23:09:31 0 [Note] /usr/local/bin/mysqld (mysqld 10.4.11-MariaDB-log) starting as process 73421 ...
2020-04-28 23:09:31 0 [Warning] Could not increase number of max_open_files to more than 256 (request: 2587)
2020-04-28 23:09:31 0 [Warning] Although a path was specified for the --log-slow-queries option, log tables are used. To enable logging to files use the --log-output=file option.
2020-04-28 23:09:31 0 [Warning] Setting lower_case_table_names=2 because file system for /var/lib/replication-manager/backup1/127.0.0.1_3332/var/ is case insensitive

macbook-pro-de-apple-2:mysql apple$ mysql -h 127.0.0.1 -P3332 -pmariadb -uroot
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 30
Server version: 10.4.11-MariaDB-log Homebrew

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> shutdown;
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> exit
Bye
macbook-pro-de-apple-2:mysql apple$ ps aux | grep mysqld
apple 42030 23,4 0,8 3425072 138336 s000 S+ 10:18 2:57.22 /usr/local/bin/mysqld --defaults-file=/var/lib/replication-manager/backup1/127.0.0.1_3331/init/etc/mysql/my.cnf --port=3331 --server-id=3331 --datadir=/var/lib/replication-manager/backup1/127.0.0.1_3331/var --socket=/var/lib/replication-manager/backup1/127.0.0.1_3331/db14808896910362857200.sock --user=apple --bind-address=0.0.0.0 --pid_file=/var/lib/replication-manager/backup1/127.0.0.1_3331/var/db14808896910362857200.pid
apple 73464 0,0 0,0 2442020 808 s001 S+ 11:09 0:00.00 grep mysqld
apple 73421 0,0 3,4 3417768 573440 s001 S 11:09 0:00.60 /usr/local/bin/mysqld --defaults-file=/var/lib/replication-manager/backup1/127.0.0.1_3332/init/etc/mysql/my.cnf --port=3332 --server-id=3332 --datadir=/var/lib/replication-manager/backup1/127.0.0.1_3332/var --socket=/var/lib/replication-manager/backup1/127.0.0.1_3332/db9955699208988733892.sock --user=apple --bind-address=0.0.0.0 --pid_file=/var/lib/replication-manager/backup1/127.0.0.1_3332/var/db9955699208988733892.pid

Comment by VAROQUI Stephane [ 2020-04-28 ]

Nota ; replication is GTID conservative not a single write have been done on the cluster

Comment by VAROQUI Stephane [ 2020-04-28 ]

Comment by VAROQUI Stephane [ 2020-04-28 ]

Nota , can reproduce with a write on the master before shutdown

Comment by VAROQUI Stephane [ 2020-04-28 ]

Can reproduce with no gtid

change master to master_use_gtid=no;
Query OK, 0 rows affected (0.025 sec)
 
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [(none)]> show slave status\G 
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 127.0.0.1
                   Master_User: root
                   Master_Port: 3331
                 Connect_Retry: 5
               Master_Log_File: log-bin.000004
           Read_Master_Log_Pos: 469
                Relay_Log_File: relay-bin.000002
                 Relay_Log_Pos: 553
         Relay_Master_Log_File: log-bin.000004
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB: 
           Replicate_Ignore_DB: 
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 0
                    Last_Error: 
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 469
               Relay_Log_Space: 856
               Until_Condition: None
                Until_Log_File: 
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File: 
            Master_SSL_CA_Path: 
               Master_SSL_Cert: 
             Master_SSL_Cipher: 
                Master_SSL_Key: 
         Seconds_Behind_Master: 0
 Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error: 
                Last_SQL_Errno: 0
                Last_SQL_Error: 
   Replicate_Ignore_Server_Ids: 
              Master_Server_Id: 3331
                Master_SSL_Crl: 
            Master_SSL_Crlpath: 
                    Using_Gtid: No
                   Gtid_IO_Pos: 0-3331-2
       Replicate_Do_Domain_Ids: 
   Replicate_Ignore_Domain_Ids: 
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 0
 
MariaDB [(none)]> shutdown 
    -> ;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [(none)]> exit 
Bye
macbook-pro-de-apple-2:mysql apple$ ps aux | grep mysqld 
apple            42030   0,1  0,8  3425380 139436 s000  S+   10:18     4:10.58 /usr/local/bin/mysqld --defaults-file=/var/lib/replication-manager/backup1/127.0.0.1_3331/init/etc/mysql/my.cnf --port=3331 --server-id=3331 --datadir=/var/lib/replication-manager/backup1/127.0.0.1_3331/var --socket=/var/lib/replication-manager/backup1/127.0.0.1_3331/db14808896910362857200.sock --user=apple --bind-address=0.0.0.0 --pid_file=/var/lib/replication-manager/backup1/127.0.0.1_3331/var/db14808896910362857200.pid
apple            75973   0,0  0,0  2461476    856 s001  S+   11:30     0:00.00 grep mysqld
apple            75768   0,0  3,4  3418792 574788 s001  S    11:28     0:01.16 /usr/local/bin/mysqld --defaults-file=/var/lib/replication-manager/backup1/127.0.0.1_3332/init/etc/mysql/my.cnf --port=3332 --server-id=3332 --datadir=/var/lib/replication-manager/backup1/127.0.0.1_3332/var --socket=/var/lib/replication-manager/backup1/127.0.0.1_3332/db9955699208988733892.sock --user=apple --bind-address=0.0.0.0 --pid_file=/var/lib/replication-manager/backup1/127.0.0.1_3332/var/db9955699208988733892.pid

Comment by Andrei Elkin [ 2020-05-05 ]

MDEV-20821 mariadb parallel slave server shutdown hang - v10.4.8
must be the parent.
I set the status straight to Duplicate as the relation is so apparent.
Feel free to reopen if 10.4.12 would do the same for you.

Comment by VAROQUI Stephane [ 2020-05-05 ]

Thanks all getting such fast process

Generated at Thu Feb 08 09:14:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.