[MDEV-17348] Replication stops with error 1062 Created: 2018-10-02  Updated: 2020-06-22  Resolved: 2020-06-22

Status: Closed
Project: MariaDB Server
Component/s: Replication, Storage Engine - InnoDB
Affects Version/s: 10.3.8
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Benjamin Setzer Assignee: Sujatha Sivakumar (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: need_feedback

Attachments: Text File master_status.txt     Text File master_variables.txt     Zip Archive masterconf.zip     Text File slave_status.txt     Text File slave_variables.txt     Zip Archive slaveconf.zip    

 Description   

Sep 03 07:30:17 svsdb02b mysqld[10408]: 2018-09-03  7:30:17 11 [ERROR] Slave SQL: Error 'Duplicate entry '383' for key 'PRIMARY'' on query. Default database: 'svs'. Query: 'INSERT INTO `invoice_customer_snapshots` (year, month) VALUES (2018, 8)', Gtid 0-23278185-3223146, Internal MariaDB error code: 1062
Sep 03 07:30:17 svsdb02b mysqld[10408]: 2018-09-03  7:30:17 11 [Warning] Slave: Duplicate entry '383' for key 'PRIMARY' Error_code: 1062
Sep 03 07:30:17 svsdb02b mysqld[10408]: 2018-09-03  7:30:17 11 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000050' position 38275946; GTID position '0-23278185-3223145'
Sep 03 07:30:17 svsdb02b mysqld[10408]: 2018-09-03  7:30:17 11 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.000050' at position 38275946; GTID position '0-23278185-3223145'
Sep 03 07:55:48 svsdb02b mysqld[10408]: 2018-09-03  7:55:48 10 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
Sep 03 07:55:48 svsdb02b mysqld[10408]: 2018-09-03  7:55:48 10 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000050', position 78892171; GTID position 0-23278185-3231670

Oct  1 04:49:32 svsdb02b mysqld[10408]: 2018-10-01  4:49:32 83771 [ERROR] Slave SQL: Error 'Duplicate entry '384' for key 'PRIMARY'' on query. Default database: 'svs'. Query: 'INSERT INTO `invoice_customer_snapsh
ots` (year, month) VALUES (2018, 9)', Gtid 0-23278185-9619862, Internal MariaDB error code: 1062
Oct  1 04:49:32 svsdb02b mysqld[10408]: 2018-10-01  4:49:32 83771 [Warning] Slave: Duplicate entry '384' for key 'PRIMARY' Error_code: 1062
Oct  1 04:49:32 svsdb02b mysqld[10408]: 2018-10-01  4:49:32 83771 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'my
sql-bin.000103' position 64738944; GTID position '0-23278185-9619861'
Oct  1 04:49:32 svsdb02b mysqld[10408]: 2018-10-01  4:49:32 83771 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.000103' at position 64738944; GTID position '0-23278185-9619861'

CREATE TABLE `invoice_customer_snapshots` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `year` mediumint(8) unsigned NOT NULL,
  `month` tinyint(3) unsigned NOT NULL,
  `creation_date` timestamp NOT NULL DEFAULT current_timestamp(),
  PRIMARY KEY (`id`),
  UNIQUE KEY `id` (`id`,`year`,`month`)
) ENGINE=InnoDB AUTO_INCREMENT=385 DEFAULT CHARSET=utf8

When we connected manually to the slave, the row didn't exist but the auto_increment was incremented. (So 384 was missing on slave, but counter was already at 385)

We checked the master, the insert was only once in the binlog.
We migrated from version 10.0.30 to 10.3.8. After the migration this error happens. (Edit: We migrated via mysqldump on a complete new host)
This script runs on the 1st workday each month, and it crashed both times so far.
We run binlogformat mixed.
Because we couldn't find the bug we resynced the whole slave, so we killed the error state.

I can reproduce this in our staging environment, so if you have questions I should be able to produce this error again
This script runs 20 times in parallel, and each script tries to insert the period if it doesn't already exist. This likely relate to the error and explains why we only see this error in this particular table, because usually we don't do such crazy stunts.
When I run this serial (instead of paralell) in dev the replications works fine.

We plan to rework the architecture until next month, so that this only runs once, to lessen the urgency for us.

possibly the same as: https://jira.mariadb.org/browse/MDEV-14566



 Comments   
Comment by Sachin Setiya (Inactive) [ 2018-10-04 ]

HI Benjamin Setzer!

can you provide
my.cnf, show global variables, show global status on both side; plus mysqlbinlog output for some range with Gtid 0-23278185-3223146 in it.

Comment by Benjamin Setzer [ 2018-10-08 ]

I added the confdir as zip, because we have some customfiles there.
the show global x was done with mysql svs "SHOW GLOBAL variables" > $.txt
Which binlog do you need? master/slave? relaylog from slave?
And how much? It contains real user data, the more you need the more I have to look that nothing sensitive leeks. Also, the original log is gone, but I can reproduce this error in dev.

I tried this in dev with binlog_format = row, then the error disappears and all works as expected.

Comment by Sachin Setiya (Inactive) [ 2018-10-08 ]

Hi Benjamin Setzer

Actually master/slave_status/variables file which you provided is wrong , you can do something like this

echo "show global status" | mysql  > file_name

>>Which binlog do you need? master/slave? relaylog from slave?
master and slave binary log (slave relay log is not needed)
>>And how much?

Couple of hundred gtid before the error
Run

MariaDB [(none)]> show master status;
+-------------------+----------+--------------+------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000002 |     19208 |              |                  |
+-------------------+----------+--------------+------------------+
 

use this file name(master-bin.000002)

Show binlog events in binary_log_file_name

If it contains too much log you can simple select last few hundred line using cursor , and if log is less then you can decrease the master-bin.XXXXX no and run show binlog events.

Since it is dev machine and you are able to reproduce error , is it possible to get gdb access to your machine ?

Comment by Benjamin Setzer [ 2018-10-08 ]

I fixed the status/variables files, I forgot the -e switch.
As for access, I have to ask.

Comment by Andrei Elkin [ 2018-10-08 ]

Benjamin Setzer, sachin.setiya.007 Let me jump in. For offline analysis we need mysqlbinlog ouput
of events preceding one that failed with the dup key error:
mysqlbinlog --start-position = pos_start --stop-position= pos_stop binlog.file
here pos_start,pos_stop are offsets in the binlog file. It was suggested pos_start
to be the offset corresponding to some -100th gtid (transaction) prior to the failing one,
and pos_stop value should be of the offset of a gtid that follows the failing one.
You can find them as the number attached to '^# at ' as in an example:

# at 469
#181008 15:13:24 server id 1  end_log_pos 511 CRC32 0x885b83c5  GTID 0-1-2

Alternatively feel free to upload your binlog, I hope it can be made small enough to handle as you wrote the case is reproducible.

Comment by Benjamin Setzer [ 2018-10-11 ]

cleaning up the binlog is more work than I anticipated.
So I propably won't have time cleaning up some 100 gtids.

There are talks about giving one of you access. You are employed at Maria, right?
This would involve a NonDisclosureAgreement.
This will propably take a while.

Comment by Andrei Elkin [ 2018-10-11 ]

Yes, we're the Maria engineering stuff members. Gaining access should definitely simplify resolution as well as If you will be able to reproduce it with relatively small binlog.

Thank you, Benjamin. Andrei Elkin @ MariaDB.com

Comment by Andrei Elkin [ 2019-07-16 ]

According to sujatha.sivakumar's search the issue may be caused because of https://bugs.mysql.com/bug.php?id=88321, which is going to be available in a new upstream release.

Comment by Sujatha Sivakumar (Inactive) [ 2019-12-05 ]

Hello Benjamin Setzer,

Upstream fix for https://bugs.mysql.com/bug.php?id=88321, is available on
MariaDB 10.3.17 onwards .

commit 7153e155424de4b42bc0f96ceb6433e4ed1f827a
Author: Marko Mäkelä <marko.makela@mariadb.com>
Date: Tue Jul 23 11:01:44 2019 +0300
Revert "MDEV-8827 Duplicate key with auto increment"
This reverts commit ef47b62551b0f37770e5d174ea028150c5b71fd8.
The parent commit 07ba5560da805c766da68811692272dd9eebdaf7
which is a backport of
mysql/mysql-server@1198267c331b045b9cad26be72b1a5b4f8930a79
fixes the issue differently.

Can you please verify the reported bug scenario on latest 10.3 code.

Thank you.

Comment by Andrei Elkin [ 2020-06-15 ]

Maybe the upstream bug fixes have dealt with the case, yet that case described as having auto_increment_increment > 1 which seems not to be the current case.

Comment by Benjamin Setzer [ 2020-06-15 ]

I no longer work at this place, so sadly I cant test it anymore, sorry.

Comment by Andrei Elkin [ 2020-06-15 ]

Benjamin Setzer, thank you for the efforts. If we won't complete it wo/ help, you got to file few more!
Sure, otherwise just report them.

Comment by Andrei Elkin [ 2020-06-22 ]

Closed in agreement with Sujatha. If a bug which this symptom still exist will proceed from that ticket.

Generated at Thu Feb 08 08:35:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.