[MDEV-11912] Debian package postinst scripts breaks chained replications Created: 2017-01-25  Updated: 2017-05-17  Resolved: 2017-05-17

Status: Closed
Project: MariaDB Server
Component/s: Packaging, Platform Debian, Replication
Affects Version/s: 10.1.20, 10.1.21
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Anton Avramov Assignee: Sergei Golubchik
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

The problem is discovered on Debian jessie and Ubuntu trusty with version 10.1.20. However I presume it is valid for the other major version as well.



 Description   

As part of the upgrade the postinst script of deb package is starting mysqld instance as followes:
MYSQL_BOOTSTRAP="/usr/sbin/mysqld --bootstrap --user=mysql --disable-log-bin --skip-grant-tables --default-storage-engine=myisam"

You would see that binary logging is disabled, but there is no option: --skip-slave-start
If you have a slave chains setup with the option --log-slave-updates this brief startup during the upgrade process will start the slaves and starts to download and execute binary logs, however because binary-logging is disabled those logs won't be written to disk and thus will not be picked up by other slaves in the chain.



 Comments   
Comment by Daniel Black [ 2017-01-26 ]

Good diagnosis and fix lukav.

Comment by Daniel Black [ 2017-01-26 ]

Though maybe the --skip-slave-start should be part of --bootstrap because of its use by mysql_install_db serg?

Comment by Otto Kekäläinen [ 2017-01-27 ]

lukav Thanks for the report! Which repository did you use to install MariaDB? What is the exact version number and revision?

What do you think the solution should be, and if you have a suggestion, did you test it and can you verify it worked?

Comment by Anton Avramov [ 2017-01-27 ]

I mirror the http://ftp.osuosl.org/pub/mariadb/repo/10.1/debian repository and the servers install from that mirror.
Since this is an automated process in a time where replication is not critical in the beginning I've just fixed the replication since the records skipped ware not important. It took me some time to realize the implications could be worse.
I can confirm that this was observed with 10.1.20+maria-1~wheezy, 10.1.21+maria-1~wheezy, 10.1.20+maria-1~jessie.

I haven't implemented or tested a solution, since I don't repackage the deb, just mirror them.
gtid_strict_mode stops the replication in this situation and I can restore them manually, but I hope you will fix the problem, so I don't have to do this anymore

Comment by Sergei Golubchik [ 2017-03-02 ]

lukav, you know, as far as I can see (in the source code) --bootstrap does automatically imply --skip-slave-start. And also the server starts slaves after checking if it's in a bootstrap mode, so even if --skip-slave-start was not implied, slaves would not have been started anyway, as the server exits immediately after bootstrap.

May be what you're seeing is caused by something completely different...

Comment by Anton Avramov [ 2017-03-02 ]

Hmmm...
Ok., let me think.
Is it possible then the following to happen:
Prerequires: I have a cron each minute that updates a datetime in a table on each slave.
1. The upgrade process starts in bootstrap.
2. While it is running the cron updates the record and increases gtid. However cause of --bootstrap it doesn't get recorded in binlog.
3. The server is then restarted normally.
4. The master reconnects and reads the log but is missing gtid and cause of gtid_strict_mode=1 it stops with error 'missing gtid, but both prior and a subsequent exists (quote by memory)', which is what is observed.

Since it is a master<->master replication the same error is observed on both sides.

What do you think?

Comment by Sergei Golubchik [ 2017-03-02 ]

In --bootstrap the server doesn't listen on a socket or tcp ports. Your cron job shouldn't have been able to connect to the server.

Comment by Anton Avramov [ 2017-03-02 ]

I suggest we wait to happen again and then will report further details.

Comment by Sergei Golubchik [ 2017-05-17 ]

lukav, I'll close this issue, because it didn't have any new information for two months.

But don't worry, when you have something — just add a comment and I'll reopen the issue.

Generated at Thu Feb 08 07:53:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.