Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16685

Replication data drift

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.0.26, 10.0.30, 10.0.33
    • Fix Version/s: 10.0
    • Component/s: Replication
    • Labels:
      None
    • Environment:
      CentOS 7.2-7.4
      hardware (32 cores, 128Gbs ram, 4.4Tb of data on ssd raid)
      10Gbps lan

      Description

      We've configured 4 mariadb 10.0.xx slaves, the master also has the same major-minor version and about the same hardware (32 cores, 128Gbs ram, 4.4Tb of data on ssd raid), everything was running fine until we noticed some data was missing in the slaves (that's when we read all over data drift in mysql/mariadb replication), after reading thousands of articles and keep trying to find what happened we noticed that the data is missing (in the slaves) but the schemas are being updated, for example if I created a new db with some tables it will show up in the slaves but not the data that I could potentially insert, the same it's been happening to the rest of the databases/tables (and there are thousands), however we see that no errors are being displayed with `SHOW SLAVE STATUS\G`, but the replication is partially working, we were analyzing the files using mysqlbinlog in the slaves to get to the conclusion that data wasn't being sent to the slaves, as there is a lot of data it's a big effort to reconfigure/copy and start one slave again (also knowing that it could get into the same issue in a few weeks)

      Here are some config parameters of interest:

      Master Config:

          [mariadb]
          
          #InnoDB
          innodb_buffer_pool_populate     = 1
          innodb_buffer_pool_size         = 95G
          innodb_file_per_table           = 1
          innodb_corrupt_table_action     = warn
          innodb_support_xa               = 1
          innodb_open_files               = 100000
          innodb_flush_neighbors          = 0
          innodb_write_io_threads         = 64
          innodb_log_file_size            = 1G
          innodb_log_files_in_group       = 4
          innodb_log_buffer_size          = 128M
          innodb_flush_log_at_trx_commit  = 0
          innodb_flush_method             = O_DIRECT
          innodb_io_capacity              = 2000
          innodb_buffer_pool_instances    = 8
          innodb_stats_on_metadata        = 0
          
          wait_timeout                    = 30
       
          datadir                         = /var/lib/mysql/data
          tmpdir                          = /var/lib/mysql/tmp
          socket                          = /var/lib/mysql/mysql.sock
          skip-name-resolve
          sql-mode                        = NO_ENGINE_SUBSTITUTION
          open_files_limit                = 100000
          max_connections                 = 1300
          
          #Disable query cache
          query-cache-size                = 0M
          query-cache-limit               = 1M
          
          # Recommended by Percona
          table_open_cache                = 8192
          thread_cache_size               = 256
          
          ## Replication
          server-id                       = 12345
          log-bin                         = /var/lib/mysqllogs/master-bin-log
          relay-log                       = /var/lib/mysqllogs/master-relay-log
          relay-log-space-limit           = 16G
          slave_parallel_threads          = 8
          read-only                       = 0
          log-slave-updates               = 1
          binlog-format                   = ROW
          skip_slave_start
          
          ## Logging
          log-output                      = FILE
          slow-query-log                  = 1
          slow-query-log-file             = /var/lib/mysql/slow-log
          log-slow-slave-statements
          long-query-time                 = 2
      

      Slave config (only different values, skipping server-id, log-bin, relay-log)

          read-only                       = 1
          sync-binlog                     = 1
          slave_skip_errors               = 1062
      

      Master status (Note that the master status was added several hours after the slave status):

          MariaDB [(none)]> show master status;
          +--------------------+----------+--------------+------------------+
          | File               | Position | Binlog_Do_DB | Binlog_Ignore_DB |
          +--------------------+----------+--------------+------------------+
          | master-bin-log.021135 | 13151964 |              |                  |
          +--------------------+----------+--------------+------------------+
          1 row in set (0.00 sec)
      

      Slave status:

          MariaDB [(none)]> show slave status\G
          *************************** 1. row ***************************
                         Slave_IO_State: Waiting for master to send event
                            Master_Host: 10.10.10.1
                            Master_User: replusr
                            Master_Port: 3306
                          Connect_Retry: 60
                        Master_Log_File: master-bin-log.021114
                    Read_Master_Log_Pos: 711412247
                         Relay_Log_File: slave1-relay-log.000044
                          Relay_Log_Pos: 711412536
                  Relay_Master_Log_File: master-bin-log.021114
                       Slave_IO_Running: Yes
                      Slave_SQL_Running: Yes
                        Replicate_Do_DB: 
                    Replicate_Ignore_DB: 
                     Replicate_Do_Table: 
                 Replicate_Ignore_Table: 
                Replicate_Wild_Do_Table: 
            Replicate_Wild_Ignore_Table: 
                             Last_Errno: 0
                             Last_Error: 
                           Skip_Counter: 0
                    Exec_Master_Log_Pos: 711412247
                        Relay_Log_Space: 711412876
                        Until_Condition: None
                         Until_Log_File: 
                          Until_Log_Pos: 0
                     Master_SSL_Allowed: No
                     Master_SSL_CA_File: 
                     Master_SSL_CA_Path: 
                        Master_SSL_Cert: 
                      Master_SSL_Cipher: 
                         Master_SSL_Key: 
                  Seconds_Behind_Master: 0
          Master_SSL_Verify_Server_Cert: No
                          Last_IO_Errno: 0
                          Last_IO_Error: 
                         Last_SQL_Errno: 0
                         Last_SQL_Error: 
            Replicate_Ignore_Server_Ids: 
                       Master_Server_Id: 95724
                         Master_SSL_Crl: 
                     Master_SSL_Crlpath: 
                             Using_Gtid: Slave_Pos
                            Gtid_IO_Pos: 0-95724-49194074113
          1 row in set (0.00 sec)
      

      We also investigated percona tools like pt-table-checksum and pt-table-sync but they are useless if the data is not getting inserted in the binlog, it could be a bug?

      The config for the slaves is similar to what's displayed here, so my questions are:

      Why mysql keeps replicating schemas but not data?
      Is there any way to fix the replication without having to start over again?
      Any other idea?

      Thanks.

        Attachments

          Activity

            People

            Assignee:
            Elkin Andrei Elkin
            Reporter:
            gabriel.garrido Gabriel Garrido
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:

                Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.