Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20247

Replication hangs with "preparing" and never starts

    XMLWordPrintable

Details

    Description

      Hi,

      maybe first the setup. Given are 2 FreeBSD nodes running MariaDB 10.4.6 in a master <- -> master replication setup. Care has been taken that both nodes don't insert something in the same DB so that's no problem.
      Additionally there's a slave running Devuan ASCII (Debian Jessie) also running MariaDB 10.4.6 installed from the repository you provide.
      The slave isn't running 24x7 and primarily used for offsite backups.

      The Devuan installation is fresh, since you no longer provide an init script and expect everyone is running systemd, the init script from a previous version has been taken.
      Once installation is completed, the following steps have been taken (additionally on the masters).
      Renaming the 'root' user to another one, dumping the mysql.user event and correcting the definer to match the new 'root' user.
      At one of the masters everything except the mysql DB has been dumped with:

      mysqldump -u blah -p --master-data=1 --databases db1 db2 db3 > dump.sql
      

      That dump.sql has been copied and source'd at the slave, including the "change master" things.The host IP, username and password have been set and a "start slave" has been triggered.

      Then the fun starts as replication on the slave never does anything:

      *************************** 1. row ***************************
                      Slave_IO_State: NULL
                         Master_Host: master_host_ip
                         Master_User: local-dbrep
                         Master_Port: 3306
                       Connect_Retry: 60
                     Master_Log_File: mysql-bin.000084
                 Read_Master_Log_Pos: 1917045
                      Relay_Log_File: slave-relay.000001
                       Relay_Log_Pos: 4
               Relay_Master_Log_File: mysql-bin.000084
                    Slave_IO_Running: Preparing
                   Slave_SQL_Running: Yes
                     Replicate_Do_DB: 
                 Replicate_Ignore_DB: mysql,mysql
                  Replicate_Do_Table: 
              Replicate_Ignore_Table: db1.session,db2.session
             Replicate_Wild_Do_Table: 
         Replicate_Wild_Ignore_Table: 
                          Last_Errno: 0
                          Last_Error: 
                        Skip_Counter: 0
                 Exec_Master_Log_Pos: 1917045
                     Relay_Log_Space: 512
                     Until_Condition: None
                      Until_Log_File: 
                       Until_Log_Pos: 0
                  Master_SSL_Allowed: No
                  Master_SSL_CA_File: 
                  Master_SSL_CA_Path: 
                     Master_SSL_Cert: 
                   Master_SSL_Cipher: 
                      Master_SSL_Key: 
               Seconds_Behind_Master: NULL
       Master_SSL_Verify_Server_Cert: No
                       Last_IO_Errno: 0
                       Last_IO_Error: 
                      Last_SQL_Errno: 0
                      Last_SQL_Error: 
         Replicate_Ignore_Server_Ids: 
                    Master_Server_Id: 1
                      Master_SSL_Crl: 
                  Master_SSL_Crlpath: 
                          Using_Gtid: No
                         Gtid_IO_Pos: 
             Replicate_Do_Domain_Ids: 
         Replicate_Ignore_Domain_Ids: 
                       Parallel_Mode: conservative
                           SQL_Delay: 0
                 SQL_Remaining_Delay: NULL
             Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
                    Slave_DDL_Groups: 0
      Slave_Non_Transactional_Groups: 0
          Slave_Transactional_Groups: 0
      1 row in set (0.000 sec)
      

      The log reports that the slave has been connected to the master, at the master the log reports a connection from the slave.
      A show full processlist reports

      *************************** 1. row ***************************
            Id: 1
          User: system user
          Host:
            db: NULL
       Command: Daemon
          Time: NULL
         State: InnoDB purge worker
          Info: NULL
      Progress: 0.000
      *************************** 2. row ***************************
            Id: 2
          User: system user
          Host:
            db: NULL
       Command: Daemon
          Time: NULL
         State: InnoDB purge worker
          Info: NULL
      Progress: 0.000
      *************************** 3. row ***************************
            Id: 3
          User: system user
          Host:
            db: NULL
       Command: Daemon
          Time: NULL
         State: InnoDB purge worker
          Info: NULL
      Progress: 0.000
      *************************** 4. row ***************************
            Id: 4
          User: system user
          Host:
            db: NULL
       Command: Daemon
          Time: NULL
         State: InnoDB purge coordinator
          Info: NULL
      Progress: 0.000
      *************************** 5. row ***************************
            Id: 5
          User: system user
          Host:
            db: NULL
       Command: Daemon
          Time: NULL
         State: InnoDB shutdown handler
          Info: NULL
      Progress: 0.000
      *************************** 6. row ***************************
            Id: 42
          User: my_new_root_user
          Host: localhost
            db: NULL
       Command: Query
          Time: 0
         State: Init
          Info: show full processlist
      Progress: 0.000
      *************************** 7. row ***************************
            Id: 43
          User: system user
          Host:
            db: NULL
       Command: Connect
          Time: 690
         State: NULL
          Info: NULL
      Progress: 0.000
      *************************** 8. row ***************************
            Id: 44
          User: system user
          Host:
            db: NULL
       Command: Slave_SQL
          Time: 690
         State: Slave has read all relay log; waiting for the slave I/O thread to update it
          Info: NULL
      Progress: 0.000
      8 rows in set (0.000 sec)
      

      No replication happens at all at the slave, while everything between the 2 masters works fine.

      Manually doing a "stop slave;" at the slave hangs and nothing happens. Additionally, while the slave is still running, doing a "mysqlcheck mysql --auto-repair --optimize" the process hangs at the "mysql.gtid_slave_pos" table.

      Additionally trying to shutdown MariaDB via "mysqladmin shutdown" the process gets struck, strace doesn't show anything and a processlist shows "flushing relay log" which, after being kept running for 30 minutes, was killed manually.

      The entire setup has been working with MariaDB 10.1.2 and it's guaranteed to work with MySQL 8.0 without any problem at all.

      Attachments

        1. gdb_out_1047.txt
          54 kB
        2. my.cnf
          2 kB
        3. mysqld.err
          0.4 kB
        4. slave_db.err
          3 kB
        5. slave_my.cnf
          2 kB
        6. trace
          11 kB

        Issue Links

          Activity

            People

              thiru Thirunarayanan Balathandayuthapani
              phobos Phobos
              Votes:
              9 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.