[MDEV-15350] Problem with slave replication Created: 2018-02-19  Updated: 2019-09-22  Resolved: 2019-09-05

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.2.12, 10.2.13
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Antonio Fernandes Assignee: Andrei Elkin
Resolution: Not a Bug Votes: 0
Labels: find_row(-1)
Environment:

Master and slave with CentOS 7.4 x86_64



 Description   

Since, I believe, 10.2.12 I'm having issues with a simple master-slave replication.

Every day with a particular query/replication event, the slave tops the CPU and lags behind never finishing.

It presents something like "Delete_rows_log_event::find_row(-1)". I didn't change anything in the app.

How can you better trace the problem so that I can make a informed issue?

BTW, it appears that it happens when deleting rows from MyISAM table without PK.

Best regards,
António Fernandes



 Comments   
Comment by Elena Stepanova [ 2018-02-19 ]

From which version did you upgrade to 10.2? Could it have been 10.1 or earlier?

To get more information, run SHOW SLAVE STATUS when you see it happen, several times with some time interval. The first thing to pay attention to is whether the slave position is changing at all. If it does not, then the slave indeed hangs on this particular event, or at least executes it for a very long time. If the position does change, only very slowly, that's a different matter.

Then, search the binary log file for the position that SHOW SLAVE STATUS shows (be it always the same position or a slowly changed one), and see what kind of event is being executed. If you can, attach it to the issue along with your cnf files and the output of SHOW SLAVE STATUS that you've collected; or, if the event contains confidential information, you can upload it to ftp.askmonty.org/private.

Additionally, if the slave indeed hangs and doesn't change the position at all, you could try to get the stack trace from the running server.

Comment by Antonio Fernandes [ 2018-02-19 ]

I'm upgrading as new releases are available. From 10.2.6 I have used all. It was 10.2.12.

I'll reply tomorrow (since the offending replication event occurs at night).

Comment by Antonio Fernandes [ 2018-02-20 ]

SHOW SLAVE STATUS

Slave_IO_State Master_Host Master_User Master_Port Connect_Retry Master_Log_File Read_Master_Log_Pos Relay_Log_File Relay_Log_Pos Relay_Master_Log_File Slave_IO_Running Slave_SQL_Running Replicate_Do_DB Replicate_Ignore_DB Replicate_Do_Table Replicate_Ignore_Table Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table Last_Errno Last_Error Skip_Counter Exec_Master_Log_Pos Relay_Log_Space Until_Condition Until_Log_File Until_Log_Pos Master_SSL_Allowed Master_SSL_CA_File Master_SSL_CA_Path Master_SSL_Cert Master_SSL_Cipher Master_SSL_Key Seconds_Behind_Master Master_SSL_Verify_Server_Cert Last_IO_Errno Last_IO_Error Last_SQL_Errno Last_SQL_Error Replicate_Ignore_Server_Ids Master_Server_Id Master_SSL_Crl Master_SSL_Crlpath Using_Gtid Gtid_IO_Pos Replicate_Do_Domain_Ids Replicate_Ignore_Domain_Ids Parallel_Mode SQL_Delay SQL_Remaining_Delay Slave_SQL_Running_State
Waiting for master to send event 192.168.30.46 repllpvs2 3306 60 lpvs1-mysql-bin.000023 108508087 lpvs2-relay-bin.000078 505807544 lpvs1-mysql-bin.000022 Yes Yes             0   0 641922625 1104002383 None   0 No           23624 No 0   0     1     No       none 0   Delete_rows_log_event::find_row(-1)
Comment by Antonio Fernandes [ 2018-02-20 ]

Another run

Slave_IO_State Master_Host Master_User Master_Port Connect_Retry Master_Log_File Read_Master_Log_Pos Relay_Log_File Relay_Log_Pos Relay_Master_Log_File Slave_IO_Running Slave_SQL_Running Replicate_Do_DB Replicate_Ignore_DB Replicate_Do_Table Replicate_Ignore_Table Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table Last_Errno Last_Error Skip_Counter Exec_Master_Log_Pos Relay_Log_Space Until_Condition Until_Log_File Until_Log_Pos Master_SSL_Allowed Master_SSL_CA_File Master_SSL_CA_Path Master_SSL_Cert Master_SSL_Cipher Master_SSL_Key Seconds_Behind_Master Master_SSL_Verify_Server_Cert Last_IO_Errno Last_IO_Error Last_SQL_Errno Last_SQL_Error Replicate_Ignore_Server_Ids Master_Server_Id Master_SSL_Crl Master_SSL_Crlpath Using_Gtid Gtid_IO_Pos Replicate_Do_Domain_Ids Replicate_Ignore_Domain_Ids Parallel_Mode SQL_Delay SQL_Remaining_Delay Slave_SQL_Running_State
Waiting for master to send event 192.168.30.46 repllpvs2 3306 60 lpvs1-mysql-bin.000023 120126020 lpvs2-relay-bin.000078 505807544 lpvs1-mysql-bin.000022 Yes Yes             0   0 641922625 1114819549 None   0 No           26940 No 0   0     1     No       none 0   Delete_rows_log_event::find_row(-1)
Comment by Elena Stepanova [ 2018-02-20 ]

So, the event in question is 505807544 in lpvs1-mysql-bin.000022. Can you find out what it is?

Comment by Elena Stepanova [ 2018-03-24 ]

Alternatively, you can upload the whole binlog to ftp.askmonty.org/private.

Comment by Antonio Fernandes [ 2018-03-26 ]

Hi Elena,

I've, since, upgraded the server to have more RAM and change the config accordingly. Today, I'll be re-syncing the slave. Let's see how it goes.

Best regards,
António

Comment by Antonio Fernandes [ 2018-03-27 ]

It keeps happening

Schema:

CREATE TABLE `tbl_facturas_detalhe` (
`Armazem` varchar(3) DEFAULT '71',
`Urgente` varchar(1) DEFAULT NULL,
`Guia` varchar(10) DEFAULT NULL,
`Factura` varchar(15) DEFAULT NULL,
`CodCliente` varchar(7) DEFAULT NULL,
`Data` date DEFAULT NULL,
`Tipo` varchar(5) DEFAULT NULL,
`Atendedor` varchar(50) DEFAULT NULL,
`Vendedor` varchar(6) DEFAULT NULL,
`NumeroPedidoCliente` varchar(15) DEFAULT NULL,
`NumeroQuiterPedidoOrigem` varchar(20) DEFAULT NULL,
`Marca` varchar(3) DEFAULT NULL,
`Referencia` varchar(50) DEFAULT NULL,
`Designacao` varchar(50) DEFAULT NULL,
`CodigoDesconto` varchar(10) DEFAULT NULL,
`TabelaDescontoVenda` varchar(5) DEFAULT NULL,
`DescontoStock` decimal(6,2) DEFAULT NULL,
`DescontoUrgente` decimal(6,2) DEFAULT NULL,
`Familia` varchar(10) DEFAULT NULL,
`FamiliaMCoutinho` varchar(10) DEFAULT NULL,
`Qtd` int(11) DEFAULT NULL,
`PVenda` decimal(10,2) DEFAULT NULL,
`Desconto` decimal(10,2) DEFAULT NULL,
`VVenda` decimal(10,2) DEFAULT NULL,
`VPCMP` decimal(10,2) DEFAULT NULL,
`Margem` decimal(10,2) DEFAULT NULL,
`Percentagem` decimal(6,2) DEFAULT NULL,
KEY `Referencia` (`Referencia`),
KEY `Factura` (`Factura`),
KEY `Guia` (`Guia`),
KEY `CodCliente` (`CodCliente`),
KEY `Data` (`Data`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
PARTITION BY RANGE COLUMNS(`Data`)
(PARTITION `p01` VALUES LESS THAN ('2006-01-01') ENGINE = MyISAM,
PARTITION `p02` VALUES LESS THAN ('2007-01-01') ENGINE = MyISAM,
PARTITION `p03` VALUES LESS THAN ('2008-01-01') ENGINE = MyISAM,
PARTITION `p04` VALUES LESS THAN ('2009-01-01') ENGINE = MyISAM,
PARTITION `p05` VALUES LESS THAN ('2010-01-01') ENGINE = MyISAM,
PARTITION `p06` VALUES LESS THAN ('2011-01-01') ENGINE = MyISAM,
PARTITION `p07` VALUES LESS THAN ('2012-01-01') ENGINE = MyISAM,
PARTITION `p08` VALUES LESS THAN ('2013-01-01') ENGINE = MyISAM,
PARTITION `p09` VALUES LESS THAN ('2014-01-01') ENGINE = MyISAM,
PARTITION `p10` VALUES LESS THAN ('2015-01-01') ENGINE = MyISAM,
PARTITION `p11` VALUES LESS THAN ('2016-01-01') ENGINE = MyISAM,
PARTITION `p12` VALUES LESS THAN ('2017-01-01') ENGINE = MyISAM,
PARTITION `p13` VALUES LESS THAN ('2018-01-01') ENGINE = MyISAM,
PARTITION `p14` VALUES LESS THAN ('2019-01-01') ENGINE = MyISAM,
PARTITION `p15` VALUES LESS THAN ('2020-01-01') ENGINE = MyISAM,
PARTITION `p99` VALUES LESS THAN (MAXVALUE) ENGINE = MyISAM)

Comment by Antonio Fernandes [ 2018-03-27 ]
Slave_IO_State Waiting for master to send event
Master_Host 192.168.30.221
Master_User repllpvs2
Master_Port 3306
Connect_Retry 60
Master_Log_File lpvs1-mysql-bin.000064
Read_Master_Log_Pos 65557889
Relay_Log_File lpvs2-relay-bin.000140
Relay_Log_Pos 405820055
Relay_Master_Log_File lpvs1-mysql-bin.000063
Slave_IO_Running Yes
Slave_SQL_Running Yes
Replicate_Do_DB
Replicate_Ignore_DB
Replicate_Do_Table
Replicate_Ignore_Table
Replicate_Wild_Do_Table
Replicate_Wild_Ignore_Table
Last_Errno 0
Last_Error
Skip_Counter 0
Exec_Master_Log_Pos 484742991
Relay_Log_Space 1059071280
Until_Condition None
Until_Log_File
Until_Log_Pos 0
Master_SSL_Allowed No
Master_SSL_CA_File
Master_SSL_CA_Path
Master_SSL_Cert
Master_SSL_Cipher
Master_SSL_Key
Seconds_Behind_Master 29261
Master_SSL_Verify_Server_Cert No
Last_IO_Errno 0
Last_IO_Error
Last_SQL_Errno 0
Last_SQL_Error
Replicate_Ignore_Server_Ids
Master_Server_Id 1
Master_SSL_Crl
Master_SSL_Crlpath
Using_Gtid No
Gtid_IO_Pos
Replicate_Do_Domain_Ids
Replicate_Ignore_Domain_Ids
Parallel_Mode none
SQL_Delay 0
SQL_Remaining_Delay
Slave_SQL_Running_State Delete_rows_log_event::find_row(-1)
Comment by Antonio Fernandes [ 2018-03-27 ]

Uploaded MDEV-15350.tgz

Comment by Antonio Fernandes [ 2018-03-28 ]

Upgraded the slave to 10.2.14 and the problem persists.

Comment by Antonio Fernandes [ 2018-06-18 ]

Problem does not exist for InnoDB. Closing.

Comment by Elena Stepanova [ 2018-07-17 ]

Thanks for letting us know.
So, it was still happening for you with MyISAM on 10.2.14?

Comment by Antonio Fernandes [ 2018-07-24 ]

Yes, it was. I believe it's related to the fact that the table does not have a primary key.
With InnoDB, at least, it has (not the schema but within InnoDB).

But that's just my suspicion.

Thanks

Comment by Andrei Elkin [ 2019-09-05 ]

ampf, "Delete_rows_log_event::find_row(-1)" status of a slave applier thread can be explained by lack
of unique key on the table. I am closing the issue, since you suspected the same and as I gather basing on
your table properties.

Comment by Andrei Elkin [ 2019-09-05 ]

Rows log event applying on unique-key-less tables may be slow.

Generated at Thu Feb 08 08:20:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.