[MXS-1589] maxscale stops working when binlogrouter is erroring Created: 2017-12-22 Updated: 2018-01-16 Resolved: 2018-01-16 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | binlogrouter |
| Affects Version/s: | 2.2.0 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Maikel Punie | Assignee: | Unassigned |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
centos7 3.10.0-693.5.2.el7.x86_64 |
||
| Description |
|
when we start the binlog router we always end up in the same situation. It starts connecting to the master, it reads some info and then times-out. On a reconnect it says that there is an error in the binlog.
Config for the binlog router:
master is running: Server version: 10.1.29-MariaDB MariaDB Server The moment this happens maxscale is not accepting any new connections on any listeners. Existing connections are still working |
| Comments |
| Comment by Massimiliano Pinto (Inactive) [ 2017-12-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Maikel Punie, when maxscale reports no heartbeat received would it be possible to issue SHOW SLAVE STATUS\G in maxscale mysql connection? If not provide at least SHOW SLAVE STATUS\G The CHANGE MASTER I see is without the previous set of In this case maxscale should get first GTID available in master binlogs. I see: 2017-12-22 10:57:39 notice : [binlogrouter] BinlogSVC: attempting to connect to master server [10.255.10.32]:3306, binlog='mysql-bin.000036', pos=749450152, GTID=0-1-1064389133 Are GTID and pos the right ones? Th error 2017-12-22 10:57:42 error : (929418) [binlogrouter] Error packet in binlog stream.mysql-bin.000030 @ 4423 Is related to a binlog file/pos not the same as the previous log (binlog='mysql-bin.000036', pos=749450152) Would you also mind looking into the GTID database in $binlogdir) You can easily check whether it keeps the right data. Additionally, does BLR work well without GTID master registration? You can check it by setting: mariadb10_master_gtid=0, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Maikel Punie [ 2017-12-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
in the sql command line: SET @@global.gtid_slave_pos='0-1-22545514499'; the show slave status at the moment of the error:
in the gtid db i see the following at the moment of the error:
This last line looks strange | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Massimiliano Pinto (Inactive) [ 2017-12-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Maikel Punie Indeed last line it's wrong I can't see the output of "show slave status \G" Please add here the last files in your master: SHOW BINARY LOGS; and the last events in mysql-bin.000030 MariaDB> show binlog events in 'mysql-bin.000030'; | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Maikel Punie [ 2017-12-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
the output of the show slave is aligned more to the right (scroll and then you can see it). i'm now running in a mode without gtid enabled to see if this reproduces the issue. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Massimiliano Pinto (Inactive) [ 2017-12-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I see now: Last_Error: The binlog on the master is missing the GTID 0-1-1070679239 requested by the slave (even though both a prior and a subsequent sequence number does exist), and GTID strict mode is enabled and Master_Log_File: mysql-bin.000030 It looks like the master is sending the information "mysql-bin.000030" It would be very useful to take the output of ngrep before and just after the error:
replace lo with the right interface name and the port as well You should be able to copy here the last good transmission from master say, events about GTID=0-1-1064389139 (the last one I see in SHOW SLAVE STATUS\G) and the transmission with the error. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Maikel Punie [ 2017-12-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ok, without gtid is see that maxscale also stops working but i don't see any errors in the logs. restarting now with gtid enabled again | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Maikel Punie [ 2017-12-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ok, when gtid is enabled and the slave is stopped and restarted i go into this error. looking at the binlog file 30 i have the following info in there. mysqlbinlog mysql-bin.000030mysqlbinlog mysql-bin.000030
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Massimiliano Pinto (Inactive) [ 2017-12-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This is the maxscale binlog file: as long as GTID can return any position in a file maxscale has written an Ignorable log event in order to keep its binlog file without holes. It would be interesting to see the master binlog file mysql-bin.000030 content Is that mysql-bin.000036 ? As next step I suggest to stop maxscale, wipe off gtid_maps.db and master.ini. Te you start maxscale, configure master replication and set the GTID you want, ay 0-1-1070679127 with is in mysql-bin.000038 Try to get all the network traffic using "ngrep" until the error is seen Massimiliano | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Maikel Punie [ 2018-01-02 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
hmm, i can't seem to reproduce this anymore. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2018-01-02 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for updating, we'll keep the issue open until we've had time to try and reproduce this on our side. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Johan Wikman [ 2018-01-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Please reopen if problem reappears. |