Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.0.26, 10.0.30, 10.0.33
-
None
-
CentOS 7.2-7.4
hardware (32 cores, 128Gbs ram, 4.4Tb of data on ssd raid)
10Gbps lan
Description
We've configured 4 mariadb 10.0.xx slaves, the master also has the same major-minor version and about the same hardware (32 cores, 128Gbs ram, 4.4Tb of data on ssd raid), everything was running fine until we noticed some data was missing in the slaves (that's when we read all over data drift in mysql/mariadb replication), after reading thousands of articles and keep trying to find what happened we noticed that the data is missing (in the slaves) but the schemas are being updated, for example if I created a new db with some tables it will show up in the slaves but not the data that I could potentially insert, the same it's been happening to the rest of the databases/tables (and there are thousands), however we see that no errors are being displayed with `SHOW SLAVE STATUS\G`, but the replication is partially working, we were analyzing the files using mysqlbinlog in the slaves to get to the conclusion that data wasn't being sent to the slaves, as there is a lot of data it's a big effort to reconfigure/copy and start one slave again (also knowing that it could get into the same issue in a few weeks)
Here are some config parameters of interest:
Master Config:
[mariadb]
|
|
#InnoDB
|
innodb_buffer_pool_populate = 1
|
innodb_buffer_pool_size = 95G
|
innodb_file_per_table = 1
|
innodb_corrupt_table_action = warn
|
innodb_support_xa = 1
|
innodb_open_files = 100000
|
innodb_flush_neighbors = 0
|
innodb_write_io_threads = 64
|
innodb_log_file_size = 1G
|
innodb_log_files_in_group = 4
|
innodb_log_buffer_size = 128M
|
innodb_flush_log_at_trx_commit = 0
|
innodb_flush_method = O_DIRECT
|
innodb_io_capacity = 2000
|
innodb_buffer_pool_instances = 8
|
innodb_stats_on_metadata = 0
|
|
wait_timeout = 30
|
 |
datadir = /var/lib/mysql/data
|
tmpdir = /var/lib/mysql/tmp
|
socket = /var/lib/mysql/mysql.sock
|
skip-name-resolve
|
sql-mode = NO_ENGINE_SUBSTITUTION
|
open_files_limit = 100000
|
max_connections = 1300
|
|
#Disable query cache
|
query-cache-size = 0M
|
query-cache-limit = 1M
|
|
# Recommended by Percona
|
table_open_cache = 8192
|
thread_cache_size = 256
|
|
## Replication
|
server-id = 12345
|
log-bin = /var/lib/mysqllogs/master-bin-log
|
relay-log = /var/lib/mysqllogs/master-relay-log
|
relay-log-space-limit = 16G
|
slave_parallel_threads = 8
|
read-only = 0
|
log-slave-updates = 1
|
binlog-format = ROW
|
skip_slave_start
|
|
## Logging
|
log-output = FILE
|
slow-query-log = 1
|
slow-query-log-file = /var/lib/mysql/slow-log
|
log-slow-slave-statements
|
long-query-time = 2
|
Slave config (only different values, skipping server-id, log-bin, relay-log)
read-only = 1
|
sync-binlog = 1
|
slave_skip_errors = 1062
|
Master status (Note that the master status was added several hours after the slave status):
MariaDB [(none)]> show master status;
|
+--------------------+----------+--------------+------------------+
|
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
|
+--------------------+----------+--------------+------------------+
|
| master-bin-log.021135 | 13151964 | | |
|
+--------------------+----------+--------------+------------------+
|
1 row in set (0.00 sec)
|
Slave status:
MariaDB [(none)]> show slave status\G
|
*************************** 1. row ***************************
|
Slave_IO_State: Waiting for master to send event
|
Master_Host: 10.10.10.1
|
Master_User: replusr
|
Master_Port: 3306
|
Connect_Retry: 60
|
Master_Log_File: master-bin-log.021114
|
Read_Master_Log_Pos: 711412247
|
Relay_Log_File: slave1-relay-log.000044
|
Relay_Log_Pos: 711412536
|
Relay_Master_Log_File: master-bin-log.021114
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: Yes
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 0
|
Last_Error:
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 711412247
|
Relay_Log_Space: 711412876
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: 0
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 0
|
Last_IO_Error:
|
Last_SQL_Errno: 0
|
Last_SQL_Error:
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 95724
|
Master_SSL_Crl:
|
Master_SSL_Crlpath:
|
Using_Gtid: Slave_Pos
|
Gtid_IO_Pos: 0-95724-49194074113
|
1 row in set (0.00 sec)
|
We also investigated percona tools like pt-table-checksum and pt-table-sync but they are useless if the data is not getting inserted in the binlog, it could be a bug?
The config for the slaves is similar to what's displayed here, so my questions are:
Why mysql keeps replicating schemas but not data?
Is there any way to fix the replication without having to start over again?
Any other idea?
Thanks.