[MDEV-6518] Binlog is not synced to disk when binlog is changed in runtime Created: 2014-08-01 Updated: 2014-08-15 Due: 2014-09-05 Resolved: 2014-08-15 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 10.1.0 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Michal Zubkowicz | Assignee: | Elena Stepanova |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Centos 6 |
||
| Attachments: |
|
| Description |
|
I've changed binlog_format in runtime from STATEMENT to MIXED and after that binlog was not synced to disk for a few days, after restart all changes was lost. |
| Comments |
| Comment by Elena Stepanova [ 2014-08-05 ] |
|
Please provide error logs from master and slave, and binary logs from master. You can upload them to our ftp.askmonty.org/private. Thanks. |
| Comment by Elena Stepanova [ 2014-08-12 ] |
|
Hi Michal, Thanks for the logs. I have some questions about them. The log in the archive named "master" is in fact also a slave log. Do you have a chained replication, or multi-master replication, or what is your setup, exactly? The master log has several server restarts. Can you pinpoint the time when you changed the binlog format and after which restart your changes were lost? The master log says at some point that a table was marked as crashed. The previous shutdowns seem to be clean, so there is no obvious reason why tables would have been damaged. Could it be that you had a disk problem or something? The master log (which as mentioned earlier is also a slave log) shows several replication errors at different times. Are you aware of those and sure they don't have anything to do with your data loss? Finally, the slave log also shows several replication errors, which could attribute for the slave not replicating from the master. Whether they are related or not depends on the timing of your observations. |
| Comment by Michal Zubkowicz [ 2014-08-12 ] |
|
It was a master - master replication. Replication errors are not related to data loss, because all data in RAM was ok. All backups are ok, data was not synced to disk only. There was a gap in modification times of binlogs also. |
| Comment by Elena Stepanova [ 2014-08-12 ] |
|
Sorry, it doesn't help much for investigation. Could you please describe what exactly you did and what you observed, as precisely as you can, only facts, for now without drawing any conclusions? Like,
Also, please attach your cnf files (but they don't replace describing the problem as above, only supplement it). Thanks. |
| Comment by Michal Zubkowicz [ 2014-08-12 ] |
|
I had replication master-master with default binlog format (statement) Replication is checked every minute by script. So I've got a report when something is wrong. There were nothing unusual. |
| Comment by Elena Stepanova [ 2014-08-12 ] |
|
Hi Michal, Thanks a lot, it is much easier to look into the problem this way. However, with the provided binlogs, I don't see the gap you are talking about in the time interval that you are giving. Both sets of binlogs, "master" and "slave", seem to have a continuous sequence of events for 26.07, 27.07, 28.07, 29.07. In 'master' set it is in mysql-bin-2.000023, and in 'slave' set it is in mysql-bin-2.000006. If there was a data loss, the logs need to be analyzed event by event, to see if the events responsible for the data in question are present there; but apparently the logs were synced to disk somehow. If you have examples of events that should have been present in the binary logs but are not, we can try to look deeper into it. Also, I don't quite understand the connection you make between allegedly not synced binary logs and the data loss. |
| Comment by Michal Zubkowicz [ 2014-08-12 ] |
|
On both servers there is a gap in binlogs: Same range of data is missing on both servers. So i assumed there is a problem with disk sync after binlog format change, because it was only change I've made, and nothing else was touched, and same gap was in binlog files and the data files. Maybe my conclusion was bad. |
| Comment by Elena Stepanova [ 2014-08-12 ] |
|
Timestamps on the logs don't mean much. What's important is the contents. Binary logs can be rotated due to only a few reasons:
In your case, it is always the latter. Your binary logs are not even close to 1 Gb which is the default value for max_binlog_size and is not modified in your cnf. But between 2014-07-26 and 2014-07-29, you didn't have a single server restart, so the binary log didn't rotate even once. Now, I don't know why your server usually restarts so frequently. If it news for you, you might want to check your environment. All in all, I don't see anything wrong with the binary logs at all. |
| Comment by Michal Zubkowicz [ 2014-08-15 ] |
|
Thank you very much for explanation. You have right. Still don't understand how data disappeared, for sure it was not deleted accidentally because gap was in many tables. I will try to investigate it and reopen this issue if i will find anything. |