[MDEV-16685] Replication data drift - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.0.26, 10.0.30, 10.0.33
Fix Version/s: N/A
Component/s: Replication
Labels:
None
Environment:
CentOS 7.2-7.4
hardware (32 cores, 128Gbs ram, 4.4Tb of data on ssd raid)
10Gbps lan

Description

We've configured 4 mariadb 10.0.xx slaves, the master also has the same major-minor version and about the same hardware (32 cores, 128Gbs ram, 4.4Tb of data on ssd raid), everything was running fine until we noticed some data was missing in the slaves (that's when we read all over data drift in mysql/mariadb replication), after reading thousands of articles and keep trying to find what happened we noticed that the data is missing (in the slaves) but the schemas are being updated, for example if I created a new db with some tables it will show up in the slaves but not the data that I could potentially insert, the same it's been happening to the rest of the databases/tables (and there are thousands), however we see that no errors are being displayed with `SHOW SLAVE STATUS\G`, but the replication is partially working, we were analyzing the files using mysqlbinlog in the slaves to get to the conclusion that data wasn't being sent to the slaves, as there is a lot of data it's a big effort to reconfigure/copy and start one slave again (also knowing that it could get into the same issue in a few weeks)

Here are some config parameters of interest:

Master Config:

    [mariadb]

    #InnoDB

    innodb_buffer_pool_populate     = 1

    innodb_buffer_pool_size         = 95G

    innodb_file_per_table           = 1

    innodb_corrupt_table_action     = warn

    innodb_support_xa               = 1

    innodb_open_files               = 100000

    innodb_flush_neighbors          = 0

    innodb_write_io_threads         = 64

    innodb_log_file_size            = 1G

    innodb_log_files_in_group       = 4

    innodb_log_buffer_size          = 128M

    innodb_flush_log_at_trx_commit  = 0

    innodb_flush_method             = O_DIRECT

    innodb_io_capacity              = 2000

    innodb_buffer_pool_instances    = 8

    innodb_stats_on_metadata        = 0

    wait_timeout                    = 30

    datadir                         = /var/lib/mysql/data

    tmpdir                          = /var/lib/mysql/tmp

    socket                          = /var/lib/mysql/mysql.sock

    skip-name-resolve

    sql-mode                        = NO_ENGINE_SUBSTITUTION

    open_files_limit                = 100000

    max_connections                 = 1300

    #Disable query cache

    query-cache-size                = 0M

    query-cache-limit               = 1M

    # Recommended by Percona

    table_open_cache                = 8192

    thread_cache_size               = 256

    ## Replication

    server-id                       = 12345

    log-bin                         = /var/lib/mysqllogs/master-bin-log

    relay-log                       = /var/lib/mysqllogs/master-relay-log

    relay-log-space-limit           = 16G

    slave_parallel_threads          = 8

    read-only                       = 0

    log-slave-updates               = 1

    binlog-format                   = ROW

    skip_slave_start

    ## Logging

    log-output                      = FILE

    slow-query-log                  = 1

    slow-query-log-file             = /var/lib/mysql/slow-log

    log-slow-slave-statements

    long-query-time                 = 2

Slave config (only different values, skipping server-id, log-bin, relay-log)

    read-only                       = 1

    sync-binlog                     = 1

    slave_skip_errors               = 1062

Master status (Note that the master status was added several hours after the slave status):

    MariaDB [(none)]> show master status;

    +--------------------+----------+--------------+------------------+

    | File               | Position | Binlog_Do_DB | Binlog_Ignore_DB |

    +--------------------+----------+--------------+------------------+

    | master-bin-log.021135 | 13151964 |              |                  |

    +--------------------+----------+--------------+------------------+

    1 row in set (0.00 sec)

Slave status:

    MariaDB [(none)]> show slave status\G

    *************************** 1. row ***************************

                   Slave_IO_State: Waiting for master to send event

                      Master_Host: 10.10.10.1

                      Master_User: replusr

                      Master_Port: 3306

                    Connect_Retry: 60

                  Master_Log_File: master-bin-log.021114

              Read_Master_Log_Pos: 711412247

                   Relay_Log_File: slave1-relay-log.000044

                    Relay_Log_Pos: 711412536

            Relay_Master_Log_File: master-bin-log.021114

                 Slave_IO_Running: Yes

                Slave_SQL_Running: Yes

                  Replicate_Do_DB:

              Replicate_Ignore_DB:

               Replicate_Do_Table:

           Replicate_Ignore_Table:

          Replicate_Wild_Do_Table:

      Replicate_Wild_Ignore_Table:

                       Last_Errno: 0

                       Last_Error:

                     Skip_Counter: 0

              Exec_Master_Log_Pos: 711412247

                  Relay_Log_Space: 711412876

                  Until_Condition: None

                   Until_Log_File:

                    Until_Log_Pos: 0

               Master_SSL_Allowed: No

               Master_SSL_CA_File:

               Master_SSL_CA_Path:

                  Master_SSL_Cert:

                Master_SSL_Cipher:

                   Master_SSL_Key:

            Seconds_Behind_Master: 0

    Master_SSL_Verify_Server_Cert: No

                    Last_IO_Errno: 0

                    Last_IO_Error:

                   Last_SQL_Errno: 0

                   Last_SQL_Error:

      Replicate_Ignore_Server_Ids:

                 Master_Server_Id: 95724

                   Master_SSL_Crl:

               Master_SSL_Crlpath:

                       Using_Gtid: Slave_Pos

                      Gtid_IO_Pos: 0-95724-49194074113

    1 row in set (0.00 sec)

We also investigated percona tools like pt-table-checksum and pt-table-sync but they are useless if the data is not getting inserted in the binlog, it could be a bug?

The config for the slaves is similar to what's displayed here, so my questions are:

Why mysql keeps replicating schemas but not data?
Is there any way to fix the replication without having to start over again?
Any other idea?

Thanks.

Attachments

Activity

People

Assignee:: Andrei Elkin

Reporter:: Gabriel Garrido (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2018-07-04 14:51

Updated:: 2022-10-08 14:24

Resolved:: 2022-10-08 14:24

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.