[MDEV-9383] Server fails to read master.info after upgrade 10.0 -> 10.1 - Jira

Igor Pashev created issue - 2016-01-08 16:34

Igor Pashev added a comment - 2016-01-08 16:38

--skip-slave-start does not help

Igor Pashev added a comment - 2016-01-08 16:38 --skip-slave-start does not help

Elena Stepanova added a comment - 2016-01-11 14:29

ip1981,

Please paste a bigger unabridged log fragment, covering the previous server session (before the upgrade) if you still have it, when it started and shut down normally, and continuing onto the startup attempt of the upgraded server, until it aborted. If there is any confidential information in there – IPs or whatever – you can obfuscate it.

Please also attach your cnf file(s) and paste the output of ls -l of the directory where your binary logs, index files and such reside.

Thanks.

Elena Stepanova added a comment - 2016-01-11 14:29 ip1981 , Please paste a bigger unabridged log fragment, covering the previous server session (before the upgrade) if you still have it, when it started and shut down normally, and continuing onto the startup attempt of the upgraded server, until it aborted. If there is any confidential information in there – IPs or whatever – you can obfuscate it. Please also attach your cnf file(s) and paste the output of ls -l of the directory where your binary logs, index files and such reside. Thanks.

Elena Stepanova made changes - 2016-01-11 14:29

Field	Original Value	New Value
Labels		need_feedback

Igor Pashev made changes - 2016-01-21 06:36

Attachment

log.txt [ 41405 ]

Igor Pashev made changes - 2016-01-21 06:36

Attachment

mysqld.cnf [ 41406 ]

Igor Pashev added a comment - 2016-01-21 06:41

Attached log and cnf file. I found some IRC conversation as well

2015-10-29 11:34:36 Jippi Hi, I just upgraded a slave from 10.0.21 to 10.1.8  - looks like something went wrong with the gtid replication - the slave refuses to start up - I’m getting “ERROR 1201 (HY000): Could not initialize master info structure for ''; more error messages can be found in the MariaDB error log” when running ‘start slave’ and “[ERROR] Failed to initialize master info using_gtid” in my log file

2015-10-29 11:34:42 Jippi anyone know what the cause for that could be

2015-10-29 11:35:07 paulo-tada joined chan

2015-10-29 11:35:16 Jippi my mysql.gtid_slave_pos table looks correct and with the last values before upgrade

2015-10-29 11:36:23 Jippi selecting  @@GLOBAL.gtid_slave_pos also looks correct

2015-10-29 11:38:10 Jippi even doing the full ‘change master …’ fail with the same error

2015-10-29 11:39:14 Jippi hmm okay, so a “reset slave;” fixed it

I'd tried to reproduce the issue with fresh setup, but couldn't: migration went flawlessly.

Igor Pashev added a comment - 2016-01-21 06:41 Attached log and cnf file. I found some IRC conversation as well 2015-10-29 11:34:36 Jippi Hi, I just upgraded a slave from 10.0.21 to 10.1.8 - looks like something went wrong with the gtid replication - the slave refuses to start up - I’m getting “ERROR 1201 (HY000): Could not initialize master info structure for ''; more error messages can be found in the MariaDB error log” when running ‘start slave’ and “[ERROR] Failed to initialize master info using_gtid” in my log file 2015-10-29 11:34:42 Jippi anyone know what the cause for that could be 2015-10-29 11:35:07 paulo-tada joined chan 2015-10-29 11:35:16 Jippi my mysql.gtid_slave_pos table looks correct and with the last values before upgrade 2015-10-29 11:36:23 Jippi selecting @@GLOBAL.gtid_slave_pos also looks correct 2015-10-29 11:38:10 Jippi even doing the full ‘change master …’ fail with the same error 2015-10-29 11:39:14 Jippi hmm okay, so a “reset slave;” fixed it I'd tried to reproduce the issue with fresh setup, but couldn't: migration went flawlessly.

Igor Pashev made changes - 2016-01-21 06:41

Attachment

relay.ls.txt.gz [ 41407 ]

Igor Pashev made changes - 2016-01-21 06:43

Attachment

master-info.txt [ 41408 ]

Elena Stepanova made changes - 2016-01-21 13:27

Labels

need_feedback

Igor Pashev added a comment - 2016-01-21 15:10

One more thing. When it's started with -skip-slave-start mysqld terminated after [ERROR] Aborting. While without -skip-slave-start it keeps running, socket is created, but can't be read (mysql cli hangs), strace shows recvfrom as the last call.

Igor Pashev added a comment - 2016-01-21 15:10 One more thing. When it's started with - skip-slave-start mysqld terminated after [ERROR] Aborting . While without -skip-slave-start it keeps running, socket is created, but can't be read (mysql cli hangs), strace shows recvfrom as the last call.

Elena Stepanova added a comment - 2016-01-22 01:18

ip1981,

Thank you. Could you please pack all *.info files and upload them to our FTP (ftp.askmonty.org/private)? As you can see yourself, there is nothing remotely confidential in those files, except for maybe host names/addresses (you can obfuscate those if you want); but the server fails to parse some of them, it would be useful to see why.

Is it intentional that you have so many master connections? Seems to be a lot of garbage in there, maybe some of it stale or inconsistent.

Elena Stepanova added a comment - 2016-01-22 01:18 ip1981 , Thank you. Could you please pack all *.info files and upload them to our FTP (ftp.askmonty.org/private)? As you can see yourself, there is nothing remotely confidential in those files, except for maybe host names/addresses (you can obfuscate those if you want); but the server fails to parse some of them, it would be useful to see why. Is it intentional that you have so many master connections? Seems to be a lot of garbage in there, maybe some of it stale or inconsistent.

Elena Stepanova made changes - 2016-01-22 01:18

Labels

need_feedback

Elena Stepanova added a comment - 2016-02-19 03:10

Please comment to re-open if you have further information on the issue.

Elena Stepanova added a comment - 2016-02-19 03:10 Please comment to re-open if you have further information on the issue.

Elena Stepanova made changes - 2016-02-19 03:10

Component/s		OTHER [ 10125 ]
Fix Version/s		N/A [ 14700 ]
Resolution		Incomplete [ 4 ]
Status	Open [ 1 ]	Closed [ 6 ]

Anton Avramov added a comment - 2016-03-17 14:23

I have the same problem trying to upgrade from ubuntu .deb packages.
I have uploaded tar.gz to the ftp with the config folder .info files and syslog cut that represents stopping the old instance and running the new.

I also had some old replications but have managed to downgrade and remove all of them. The problem still persists.

Anton Avramov added a comment - 2016-03-17 14:23 I have the same problem trying to upgrade from ubuntu .deb packages. I have uploaded tar.gz to the ftp with the config folder .info files and syslog cut that represents stopping the old instance and running the new. I also had some old replications but have managed to downgrade and remove all of them. The problem still persists.

Elena Stepanova made changes - 2016-03-17 14:29

Assignee		Elena Stepanova [ elenst ]
Resolution	Incomplete [ 4 ]
Status	Closed [ 6 ]	Stalled [ 10000 ]

Elena Stepanova made changes - 2016-03-17 14:32

Labels

need_feedback

Andreas Leathley added a comment - 2016-04-03 18:45

I had the same problem when upgrading von 10.0 to 10.1, while an upgrade from 5.5 to 10.1 worked without problems. In both instances I uninstalled MariaDB and then installed the new version (this worked best so far). I noticed the following lines were missing in master.info in the 10.0->10.1 upgrade:

do_domain_ids=0
ignore_domain_ids=0
END_MARKER

When I added these lines, my master configuration could be read again and the replication slave worked as expected. It seems the 10.0 to 10.1 upgrade does not add these lines and then cannot read the configuration - maybe because "using_gtid=0" was already in the master configuration, while this is (of course) missing when upgrading from 5.5. I think MariaDB 10.1 should check the master.info for such a 10.0 configuration and also update it if necessary.

Andreas Leathley added a comment - 2016-04-03 18:45 I had the same problem when upgrading von 10.0 to 10.1, while an upgrade from 5.5 to 10.1 worked without problems. In both instances I uninstalled MariaDB and then installed the new version (this worked best so far). I noticed the following lines were missing in master.info in the 10.0->10.1 upgrade: do_domain_ids=0 ignore_domain_ids=0 END_MARKER When I added these lines, my master configuration could be read again and the replication slave worked as expected. It seems the 10.0 to 10.1 upgrade does not add these lines and then cannot read the configuration - maybe because "using_gtid=0" was already in the master configuration, while this is (of course) missing when upgrading from 5.5. I think MariaDB 10.1 should check the master.info for such a 10.0 configuration and also update it if necessary.

Anton Avramov added a comment - 2016-04-04 12:21

I confirm that the solution proposed by Mr. Leathley fix the problem for me.
I'va installed the new package via apt-get install mariadb-server.
The installation failed as before, on configuring the instance.
I then manually edit master-[myslavename].info file and added:
do_domain_ids=0
ignore_domain_ids=0
just after "using_gtid=0", preserving the last line.

Then I've rerun the apt-get command and this time it was all fine so far.
The slave continued as normal and the upgrade procedures ware finished, as far as I can tell.

Anton Avramov added a comment - 2016-04-04 12:21 I confirm that the solution proposed by Mr. Leathley fix the problem for me. I'va installed the new package via apt-get install mariadb-server. The installation failed as before, on configuring the instance. I then manually edit master- [myslavename] .info file and added: do_domain_ids=0 ignore_domain_ids=0 just after "using_gtid=0", preserving the last line. Then I've rerun the apt-get command and this time it was all fine so far. The slave continued as normal and the upgrade procedures ware finished, as far as I can tell.

Elena Stepanova added a comment - 2016-04-06 14:27 - edited

lukav@lukav.com, thanks a lot for providing the files, it's much clearer now.

The problem with the info file is not the three missing lines – 10.1 is perfectly fine starting without them, – but the extra empty line at the end of the file. I see it in lukav@lukav.com's archive, and apparently, iquito had it too. I get the same exact error if I insert an empty line manually into my info file.
By adding the three lines, you got rid of the empty one, and even if you didn't, it doesn't matter anymore because it's after the END MARKER now.

The question is where this empty line came from – I am not getting it by just firing the recent 10.0 server and shutting it down, apparently, something else caused it.

Elena Stepanova added a comment - 2016-04-06 14:27 - edited lukav@lukav.com , thanks a lot for providing the files, it's much clearer now. The problem with the info file is not the three missing lines – 10.1 is perfectly fine starting without them, – but the extra empty line at the end of the file. I see it in lukav@lukav.com 's archive, and apparently, iquito had it too. I get the same exact error if I insert an empty line manually into my info file. By adding the three lines, you got rid of the empty one, and even if you didn't, it doesn't matter anymore because it's after the END MARKER now. The question is where this empty line came from – I am not getting it by just firing the recent 10.0 server and shutting it down, apparently, something else caused it.

Andreas Leathley added a comment - 2016-04-06 14:48

Ah so the END MARKER solved it. Well I can only describe my case, where the situation was:

Master running MariaDB 5.5.34
Slave running MariaDB 5.5.34

I updated the slave from 5.5.34 to 10.0.10 about two years ago, and did not change anything else as far as I remember (no CHANGE MASTER TO or any other manual changes to master.info), the slave then continued to work until a few days ago, when I changed the master (to MariaDB 10.1.13) and upgraded MariaDB on the slave from 10.0.10 to 10.1.13. When I then started the slave, the errors about master.info came up. Upgrading from 5.5.34 to 10.1.13 on other slaves worked flawlessly. When comparing I noticed that MariaDB 10.0.10 added the "using_gtid=0" in master.info, and 10.1.13 did not change master.info in any way, so I would guess 10.0.10 also added the newline at the end, which lead to the problems in 10.1.13.

Andreas Leathley added a comment - 2016-04-06 14:48 Ah so the END MARKER solved it. Well I can only describe my case, where the situation was: Master running MariaDB 5.5.34 Slave running MariaDB 5.5.34 I updated the slave from 5.5.34 to 10.0.10 about two years ago, and did not change anything else as far as I remember (no CHANGE MASTER TO or any other manual changes to master.info), the slave then continued to work until a few days ago, when I changed the master (to MariaDB 10.1.13) and upgraded MariaDB on the slave from 10.0.10 to 10.1.13. When I then started the slave, the errors about master.info came up. Upgrading from 5.5.34 to 10.1.13 on other slaves worked flawlessly. When comparing I noticed that MariaDB 10.0.10 added the "using_gtid=0" in master.info, and 10.1.13 did not change master.info in any way, so I would guess 10.0.10 also added the newline at the end, which lead to the problems in 10.1.13.

Anton Avramov added a comment - 2016-04-06 14:58

I can also say that this was installation regularly upgraded.
I've check one of our clients multi master setups and can say that some of the Info files have those extra files, some doesn't.
Some have more that one extra lines.
Some have something like:
using_gtid=1
0

Others have:
using_gtid=1
=0

I can upload the files if this would help?
But I think that obviously there was a bug at some point that introduced those newlines, and the new 10.1 should just remove or ignore them.

Anton Avramov added a comment - 2016-04-06 14:58 I can also say that this was installation regularly upgraded. I've check one of our clients multi master setups and can say that some of the Info files have those extra files, some doesn't. Some have more that one extra lines. Some have something like: using_gtid=1 0 Others have: using_gtid=1 =0 I can upload the files if this would help? But I think that obviously there was a bug at some point that introduced those newlines, and the new 10.1 should just remove or ignore them.

Elena Stepanova added a comment - 2016-04-06 15:56 - edited

iquito, it is actually a very good story. So, you did not use multi-master replication at all? It narrows down the search considerably. Are you using parallel replication?

lukav@lukav.com, yes, I agree that the new server should ignore erroneous lines; the problem is, if we don't know why erroneous lines appear, we don't know where the server should expect and ignore them – is it at the end of the file only, or is it after using_gtid (which is not the end of the file in 10.1, so if 10.1 still has the bug that causes those empty lines sporadically, they can appear between using_gtid and do_domain_ids), or can it be in a random place of the file at all, in which case ignoring will be way more complicated.

Elena Stepanova added a comment - 2016-04-06 15:56 - edited iquito , it is actually a very good story. So, you did not use multi-master replication at all? It narrows down the search considerably. Are you using parallel replication? lukav@lukav.com , yes, I agree that the new server should ignore erroneous lines; the problem is, if we don't know why erroneous lines appear, we don't know where the server should expect and ignore them – is it at the end of the file only, or is it after using_gtid (which is not the end of the file in 10.1, so if 10.1 still has the bug that causes those empty lines sporadically, they can appear between using_gtid and do_domain_ids), or can it be in a random place of the file at all, in which case ignoring will be way more complicated.

Andreas Leathley added a comment - 2016-04-06 16:02

No, I only use "regular" replication, with just the out-of-the-box settings - so just the default parallel replication settings (which would be "conservative" mode now, I guess), and nothing fancy at all.

Andreas Leathley added a comment - 2016-04-06 16:02 No, I only use "regular" replication, with just the out-of-the-box settings - so just the default parallel replication settings (which would be "conservative" mode now, I guess), and nothing fancy at all.

Andreas Leathley added a comment - 2016-04-06 16:09

I seems as though in both our cases the additional newline was inserted after the using_gtid line, which was added by MariaDB 10.0, that is why I would point my finger at 10.0.(10 in my case) - but of course I don't know if there are more scenarios which are possible/likely.

Andreas Leathley added a comment - 2016-04-06 16:09 I seems as though in both our cases the additional newline was inserted after the using_gtid line, which was added by MariaDB 10.0, that is why I would point my finger at 10.0.(10 in my case) - but of course I don't know if there are more scenarios which are possible/likely.

Andreas Leathley made changes - 2016-04-06 16:13

Attachment

master.info.example [ 41992 ]

Andreas Leathley added a comment - 2016-04-06 16:15

I still have the master.info which was used in 5.5.34 (which was then upgraded to 10.0.10 and then to 10.1.13) - maybe that helps, I only anonymized IP, username and password. master.info.example

Unfortunately I don't have the version used in 10.0.10 anymore, I just know that the only "noticeable" difference was the enable_gtid=0 (and probably the newline at the end).

Andreas Leathley added a comment - 2016-04-06 16:15 I still have the master.info which was used in 5.5.34 (which was then upgraded to 10.0.10 and then to 10.1.13) - maybe that helps, I only anonymized IP, username and password. master.info.example Unfortunately I don't have the version used in 10.0.10 anymore, I just know that the only "noticeable" difference was the enable_gtid=0 (and probably the newline at the end).

Elena Stepanova added a comment - 2016-04-06 16:19 - edited

iquito, thank you.

Elena Stepanova added a comment - 2016-04-06 16:19 - edited iquito , thank you.

Elena Stepanova made changes - 2016-04-06 21:58

Status

Stalled [ 10000 ]

Confirmed [ 10101 ]

Elena Stepanova added a comment - 2016-04-06 22:06

Thanks all.
Old 10.0 did all kinds of crazy things with master.info, but even the current one can produce this empty line (or something other than empty, but it seems only an empty line causes the problem).

So, the issue is two-fold. In 10.0, do not add garbage at the end of the file beyond the line count.
Here is how I'm getting an empty line on 10.0, reproducible on 10.0 commit f8adeccd78bff80725a95b73447e34a5f4528179.

# Assuming a server with server-id=N --log-bin --port=3306 is already running

# Start a fresh 10.0 server with --log-bin --server-id=2 --port=3307, otherwise defaults

change master to master_host='127.0.0.1', master_port=3306, master_user='root';

start slave;

stop slave;

change master to master_log_pos=4000;

# At this point, an empty line appears in master.info after using_gtid=0

The second part of the problem is 10.1 – as lukav@lukav.com said above, it shouldn't be this sensitive to a trailing empty line.

shutdown the 10.0 server which was used above
start a fresh 10.1 server with the same options on the datadir

behold the errors

2016-04-07  0:53:53 140486837634944 [Note] Server socket created on IP: '::'.

2016-04-07  0:53:53 140486837634944 [ERROR] Failed to initialize master info using_gtid

2016-04-07  0:53:53 140486837634944 [ERROR] Error reading master configuration

2016-04-07  0:53:53 140486837634944 [ERROR] Failed to initialize the master info structure

2016-04-07  0:53:53 140486837634944 [ERROR] Failed to allocate memory for the Master Info structure

2016-04-07  0:53:53 140486837634944 [Note] /bld/10.1/bin/mysqld: ready for connections.

Version: '10.1.13-MariaDB-debug'  socket: '/bld/10.1/data/tmp/mysql.sock'  port: 3307  Source distribution

Reproducible on 10.1 commit 4b6a3518e4dc9088d1f42cd9bc487d06137d2760.

Elena Stepanova added a comment - 2016-04-06 22:06 Thanks all. Old 10.0 did all kinds of crazy things with master.info, but even the current one can produce this empty line (or something other than empty, but it seems only an empty line causes the problem). So, the issue is two-fold. In 10.0, do not add garbage at the end of the file beyond the line count. Here is how I'm getting an empty line on 10.0, reproducible on 10.0 commit f8adeccd78bff80725a95b73447e34a5f4528179. # Assuming a server with server-id=N --log-bin --port=3306 is already running # Start a fresh 10.0 server with --log-bin --server-id=2 --port=3307, otherwise defaults change master to master_host= '127.0.0.1' , master_port=3306, master_user= 'root' ; start slave; stop slave; change master to master_log_pos=4000; # At this point, an empty line appears in master.info after using_gtid=0 The second part of the problem is 10.1 – as lukav@lukav.com said above, it shouldn't be this sensitive to a trailing empty line. shutdown the 10.0 server which was used above start a fresh 10.1 server with the same options on the datadir behold the errors 2016-04-07 0:53:53 140486837634944 [Note] Server socket created on IP: '::'. 2016-04-07 0:53:53 140486837634944 [ERROR] Failed to initialize master info using_gtid 2016-04-07 0:53:53 140486837634944 [ERROR] Error reading master configuration 2016-04-07 0:53:53 140486837634944 [ERROR] Failed to initialize the master info structure 2016-04-07 0:53:53 140486837634944 [ERROR] Failed to allocate memory for the Master Info structure 2016-04-07 0:53:53 140486837634944 [Note] /bld/10.1/bin/mysqld: ready for connections. Version: '10.1.13-MariaDB-debug' socket: '/bld/10.1/data/tmp/mysql.sock' port: 3307 Source distribution Reproducible on 10.1 commit 4b6a3518e4dc9088d1f42cd9bc487d06137d2760.

Elena Stepanova made changes - 2016-04-06 22:07

Component/s		Replication [ 10100 ]
Component/s	OTHER [ 10125 ]
Fix Version/s		10.0 [ 16000 ]
Fix Version/s		10.1 [ 16100 ]
Fix Version/s	N/A [ 14700 ]
Affects Version/s		10.0 [ 16000 ]
Affects Version/s		10.1 [ 16100 ]
Assignee	Elena Stepanova [ elenst ]	Kristian Nielsen [ knielsen ]
Environment	Linux, Multi-source replication	Linux

Igor Pashev made changes - 2016-04-07 05:18

Description

I tried to upgrade the server 10.0.22 -> 10.1.10.
Old version was shutdown, but new failed to start:
{noformat}
2016-01-08 22:25:00 7f8017479700 InnoDB: Error: Column last_update in table "mysql"."innodb_table_stats" is INT UNSIGNED NOT NULL but should be BINARY(4) NOT NULL (type mismatch).
2016-01-08 22:25:00 7f8017479700 InnoDB: Error: Fetch of persistent statistics requested for table "mysql"."gtid_slave_pos" but the required system tables mysql.innodb_table_stats and mysql.innodb_index_stats are not present or have unexpected structure. Using transient stats instead.
2016-01-08 22:25:01 140188123334592 [Note] Server socket created on IP: '::'.
2016-01-08 22:25:01 140188123334592 [ERROR] Failed to initialize master info using_gtid
2016-01-08 22:25:01 140188123334592 [ERROR] Error reading master configuration
2016-01-08 22:25:01 140188123334592 [ERROR] Initialized Master_info from 'master-bob@002dhk.info' failed
2016-01-08 22:25:01 140188123334592 [ERROR] Failed to initialize master info using_gtid
2016-01-08 22:25:01 140188123334592 [ERROR] Error reading master configuration
2016-01-08 22:25:01 140188123334592 [ERROR] Initialized Master_info from 'master-bob@002dth.info' failed
2016-01-08 22:25:01 140188123334592 [ERROR] Failed to initialize master info using_gtid
2016-01-08 22:25:01 140188123334592 [ERROR] Error reading master configuration
2016-01-08 22:25:01 140188123334592 [ERROR] Initialized Master_info from 'master-bob@002dsg.info' failed
2016-01-08 22:25:01 140188123334592 [ERROR] Failed to initialize master info using_gtid
2016-01-08 22:25:01 140188123334592 [ERROR] Error reading master configuration
2016-01-08 22:25:01 140188123334592 [ERROR] Initialized Master_info from 'master-delivery.info' failed
2016-01-08 22:25:01 140188123334592 [Warning] Reading of some Master_info entries failed
2016-01-08 22:25:01 140188123334592 [ERROR] Failed to initialize multi master structures
2016-01-08 22:25:01 140188123334592 [ERROR] Aborting
{noformat}

I guess I can't user mysql_upgrade when the server is not running.

I tried to upgrade the server 10.0.22 -> 10.1.10.
Old version was shutdown, but new failed to start:
{noformat}
2016-01-08 22:25:00 7f8017479700 InnoDB: Error: Column last_update in table "mysql"."innodb_table_stats" is INT UNSIGNED NOT NULL but should be BINARY(4) NOT NULL (type mismatch).
2016-01-08 22:25:00 7f8017479700 InnoDB: Error: Fetch of persistent statistics requested for table "mysql"."gtid_slave_pos" but the required system tables mysql.innodb_table_stats and mysql.innodb_index_stats are not present or have unexpected structure. Using transient stats instead.
2016-01-08 22:25:01 140188123334592 [Note] Server socket created on IP: '::'.
2016-01-08 22:25:01 140188123334592 [ERROR] Failed to initialize master info using_gtid
2016-01-08 22:25:01 140188123334592 [ERROR] Error reading master configuration
2016-01-08 22:25:01 140188123334592 [ERROR] Initialized Master_info from 'master-bob@002dhk.info' failed
2016-01-08 22:25:01 140188123334592 [ERROR] Failed to initialize master info using_gtid
2016-01-08 22:25:01 140188123334592 [ERROR] Error reading master configuration
2016-01-08 22:25:01 140188123334592 [ERROR] Initialized Master_info from 'master-bob@002dth.info' failed
2016-01-08 22:25:01 140188123334592 [ERROR] Failed to initialize master info using_gtid
2016-01-08 22:25:01 140188123334592 [ERROR] Error reading master configuration
2016-01-08 22:25:01 140188123334592 [ERROR] Initialized Master_info from 'master-bob@002dsg.info' failed
2016-01-08 22:25:01 140188123334592 [ERROR] Failed to initialize master info using_gtid
2016-01-08 22:25:01 140188123334592 [ERROR] Error reading master configuration
2016-01-08 22:25:01 140188123334592 [ERROR] Initialized Master_info from 'master-delivery.info' failed
2016-01-08 22:25:01 140188123334592 [Warning] Reading of some Master_info entries failed
2016-01-08 22:25:01 140188123334592 [ERROR] Failed to initialize multi master structures
2016-01-08 22:25:01 140188123334592 [ERROR] Aborting
{noformat}

I guess I can't use mysql_upgrade when the server is not running.

Kristian Nielsen added a comment - 2016-04-07 07:13

Seems the bug was introduced with this commit:

commit a50ddebb5cfa7b79540d155e8e41c7a07c4c3fbf

Author: Nirbhay Choubey <nirbhay@mariadb.com>

Date:   Wed Dec 3 22:30:48 2014 -0500

    MDEV-6593 : domain_id based replication filters

    Implementation for domain ID based filtering of replication events.

When the function read_mi_key_from_file() sees an empty line, it handles it
incorrectly and returns whatever randomly happened to be the old key (which
is probably "using_gtid"). Thus the code tries to parse a phantom line
"using_gtid=", and fails.

(There seems to be more potential problems with this code? For example, what
happens if it sees a line like "using_gtid" or "do_domain_ids", without any
"=" sign?)

Kristian Nielsen added a comment - 2016-04-07 07:13 Seems the bug was introduced with this commit: commit a50ddebb5cfa7b79540d155e8e41c7a07c4c3fbf Author: Nirbhay Choubey <nirbhay@mariadb.com> Date: Wed Dec 3 22:30:48 2014 -0500 MDEV-6593 : domain_id based replication filters Implementation for domain ID based filtering of replication events. When the function read_mi_key_from_file() sees an empty line, it handles it incorrectly and returns whatever randomly happened to be the old key (which is probably "using_gtid"). Thus the code tries to parse a phantom line "using_gtid=", and fails. (There seems to be more potential problems with this code? For example, what happens if it sees a line like "using_gtid" or "do_domain_ids", without any "=" sign?)

Kristian Nielsen added a comment - 2016-04-07 08:12

BTW, the issue with leaving extra stuff at the end of master.info is fixed in 10.1 with END_MARKER.

Kristian Nielsen added a comment - 2016-04-07 08:12 BTW, the issue with leaving extra stuff at the end of master.info is fixed in 10.1 with END_MARKER.

Anton Avramov made changes - 2016-04-07 08:22

Attachment		MDEV-9383-100grama.sample [ 41993 ]
Attachment		MDEV-9383-ck.sample [ 41994 ]
Attachment		MDEV-9383-miss60.sample [ 41995 ]

Anton Avramov added a comment - 2016-04-07 08:24

MDEV-9383-100grama.sample MDEV-9383-ck.sample MDEV-9383-miss60.sample

I've compiled the ending of the .info files in some of our clients multi-masters.
I've used the command: tail ~~n 6 /var/lib/mysql/master~~*.info > /tmp/~~MDEV-9383~~-[host].sample

I think this would give you a good samples of the "damages" the bug made

Anton Avramov added a comment - 2016-04-07 08:24 MDEV-9383-100grama.sample MDEV-9383-ck.sample MDEV-9383-miss60.sample I've compiled the ending of the .info files in some of our clients multi-masters. I've used the command: tail n 6 /var/lib/mysql/master *.info > /tmp/ MDEV-9383 - [host] .sample I think this would give you a good samples of the "damages" the bug made

Kristian Nielsen made changes - 2016-04-07 12:11

Summary

Upgrade 10.0.22 -> 10.1.10: unable to start

Server fails to read master.info after upgrade 10.0 -> 10.1

Kristian Nielsen added a comment - 2016-04-07 12:53

Suggested fix:

http://lists.askmonty.org/pipermail/commits/2016-April/009251.html

Kristian Nielsen added a comment - 2016-04-07 12:53 Suggested fix: http://lists.askmonty.org/pipermail/commits/2016-April/009251.html

Kristian Nielsen added a comment - 2016-04-07 13:06

BTW, what happens here is that when a new master.info file is written, it
does not truncate the file (for performance reasons).

So if the new file happens to be shorter, extra junk is left at the
end. This has always been so, but it has become more of a problem now that
MySQL and MariaDB are each adding new lines to the file without
coordinating.

In 10.1, a line "END_MARKER" is added at the bottom to easily avoid reading
any left-over junk, but this change was deemed too dangerous for stable 10.0
(which is fine, since 10.0 is not going to add anothing more to
master.info).

But then apparently some bugs were introduced with the previously mentioned
patch, which could cause 10.1 to incorrectly interpret some cases of extra
junk left by 10.0, causing the problems seen here. In particular, an empty
line was treated as an empty "using_gtid=" line, causing the failure.

An easy work-around, until a fixed 10.1 release becomes available, is to
just remove anything after the "using_gtid=X" line (but do not remove the
newline character at the end of that line).

Kristian Nielsen added a comment - 2016-04-07 13:06 BTW, what happens here is that when a new master.info file is written, it does not truncate the file (for performance reasons). So if the new file happens to be shorter, extra junk is left at the end. This has always been so, but it has become more of a problem now that MySQL and MariaDB are each adding new lines to the file without coordinating. In 10.1, a line "END_MARKER" is added at the bottom to easily avoid reading any left-over junk, but this change was deemed too dangerous for stable 10.0 (which is fine, since 10.0 is not going to add anothing more to master.info). But then apparently some bugs were introduced with the previously mentioned patch, which could cause 10.1 to incorrectly interpret some cases of extra junk left by 10.0, causing the problems seen here. In particular, an empty line was treated as an empty "using_gtid=" line, causing the failure. An easy work-around, until a fixed 10.1 release becomes available, is to just remove anything after the "using_gtid=X" line (but do not remove the newline character at the end of that line).

Kristian Nielsen added a comment - 2016-04-08 08:58

Pushed to 10.1

Kristian Nielsen added a comment - 2016-04-08 08:58 Pushed to 10.1

Kristian Nielsen made changes - 2016-04-08 08:58

Fix Version/s		10.1.14 [ 21804 ]
Fix Version/s	10.0 [ 16000 ]
Fix Version/s	10.1 [ 16100 ]
Resolution		Fixed [ 1 ]
Status	Confirmed [ 10101 ]	Closed [ 6 ]

Sergei Golubchik made changes - 2021-12-06 21:42

Workflow

MariaDB v3 [ 73534 ]

MariaDB v4 [ 149983 ]

MariaDB Server

Server fails to read master.info after upgrade 10.0 -> 10.1

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration