[MDEV-8315] recurrently we got LOST_EVENTS on slave Created: 2015-06-13 Updated: 2015-08-24 Resolved: 2015-08-24 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Documentation, Replication |
| Affects Version/s: | 10.0 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | bulepage | Assignee: | Ian Gilfillan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | replication | ||
| Environment: |
Ubuntu 12.04.5 LTS |
||
| Attachments: |
|
| Description |
|
After we upgraded to 10.0.19-MariaDB-1~precise-log our replication stopped weekly. We got on slave
We use paralell replication without GTID
|
| Comments |
| Comment by bulepage [ 2015-06-13 ] | ||||||||||||||||||||||||||||||||||
|
Atteched our mysql-relay log | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2015-06-13 ] | ||||||||||||||||||||||||||||||||||
|
Hi, From which version did you upgrade to 10.0.19? Please attach your cnf file(s) from the master. If possible, it would also be useful to enable and capture a general log on the master until the next failure like that occurs; it would help us understand which query causes the problem with the binlog. However, please be aware that in a busy environment general log can cause some performance degradation and require quite a lot of disk space. | ||||||||||||||||||||||||||||||||||
| Comment by bulepage [ 2015-06-15 ] | ||||||||||||||||||||||||||||||||||
|
Hi, | ||||||||||||||||||||||||||||||||||
| Comment by bulepage [ 2015-06-15 ] | ||||||||||||||||||||||||||||||||||
|
The event takes at every Friday night, I try enable gener_log for this time, but this time we have high traffic. | ||||||||||||||||||||||||||||||||||
| Comment by bulepage [ 2015-06-20 ] | ||||||||||||||||||||||||||||||||||
|
The LOST_EVENTS again occurred in same time, but for this time I enabled general_log ("SET GLOBAL log_output = 'file';set global general_log_file='queries.log';set GLOBAL general_log=1; )
In mysqld-relay-bin.002854 not was LOST_EVENT, and in bin-log.008287 neither.
The time maybe important!
Examined the queries.log, but I not found any intersting query. | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2015-06-21 ] | ||||||||||||||||||||||||||||||||||
|
Can you upload queries.log to our ftp.askmonty.org/private? | ||||||||||||||||||||||||||||||||||
| Comment by bulepage [ 2015-06-21 ] | ||||||||||||||||||||||||||||||||||
|
I uploaded | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2015-06-23 ] | ||||||||||||||||||||||||||||||||||
|
Thanks for the data. My theory is that the problem was caused by a huge event which exceeds your 8G max_binlog_stmt_cache_size (you have a smaller max_binlog_cache_size, but I don't know if exceeding it can ever cause LOST_EVENTS error; exceeding max_binlog_stmt_cache_size certainly can). My main suspect is connection 363340. I don't know if I can paste the query, but you can find it yourself in the query log. The connection did only 3 things:
So, maybe the procedure that it called in some circumstances produces a really, really big event. Naturally, it would be slow, so it could take time (~25 min in this case) to prepare the DML, then it would take time (~30 min) to execute, and eventually it fails to write the event. Could you maybe inspect the procedure to see if it's possible? (Maybe it does not produce any binlog-related events at all, which will immediately show that the theory is wrong). | ||||||||||||||||||||||||||||||||||
| Comment by bulepage [ 2015-06-23 ] | ||||||||||||||||||||||||||||||||||
|
In this time start a stored procedure p_data_from_cegek_teszt it is copyed lot of big tables from one to other database. drop table IF EXISTS dst_db dst_table Examined our process logs and p_data_from_cegek_teszt started 21:50:09 and ened 22:46:35 | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2015-06-23 ] | ||||||||||||||||||||||||||||||||||
|
So, do you think the explanation is plausible?
In the latter case, it might happen that you have some other procedures that cause the same problem, then the exercise will need to be repeated for them too. | ||||||||||||||||||||||||||||||||||
| Comment by bulepage [ 2015-06-23 ] | ||||||||||||||||||||||||||||||||||
|
The p_data_from_cegek_teszt procedure is very old and in is unnecessary, It can disable, but we have lot off another process (stored procedure) Earlier we got often " I think good solution incressed max_binlog_stmt_cache_size, but i worry about max_binlog_stmt_cache_size I decided tomorrow incressed max_binlog_stmt_cache_size to 16 777 216. You agree? | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2015-06-23 ] | ||||||||||||||||||||||||||||||||||
|
(I assume you mean you'll set it to 16G, not to 16M as above). The variable description in the KB does not look quite accurate. At the very least, the default value is 18446744073709547520 and not 4G as it says; so, I don't know if the rest is true, maybe it was meant for 32-bit systems. | ||||||||||||||||||||||||||||||||||
| Comment by bulepage [ 2015-06-24 ] | ||||||||||||||||||||||||||||||||||
|
I changed global max_binlog_stmt_cache_size to 17179869184 | ||||||||||||||||||||||||||||||||||
| Comment by bulepage [ 2015-06-24 ] | ||||||||||||||||||||||||||||||||||
|
Shall i enable general_log on friday night ? | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2015-06-24 ] | ||||||||||||||||||||||||||||||||||
|
If you can, please do, just in case the problem occurs again. | ||||||||||||||||||||||||||||||||||
| Comment by bulepage [ 2015-06-29 ] | ||||||||||||||||||||||||||||||||||
|
The event did not occur. | ||||||||||||||||||||||||||||||||||
| Comment by bulepage [ 2015-07-25 ] | ||||||||||||||||||||||||||||||||||
|
Can I help ? | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2015-08-01 ] | ||||||||||||||||||||||||||||||||||
|
bulepage, did it ever happen again after you had increased the cache size? | ||||||||||||||||||||||||||||||||||
| Comment by bulepage [ 2015-08-03 ] | ||||||||||||||||||||||||||||||||||
|
The event did not occured again. | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2015-08-20 ] | ||||||||||||||||||||||||||||||||||
|
In this case it is a configuration problem – naturally, if an event exceeds hard limits, it gets lost. | ||||||||||||||||||||||||||||||||||
| Comment by Ian Gilfillan [ 2015-08-24 ] | ||||||||||||||||||||||||||||||||||
|
The documentation has been updated (it was based on an old MySQL version and a Windows bug limiting the size, not relevant to MariaDB) |