[MDEV-9960] Binary log not rotating - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 5.5(EOL), 10.0(EOL)
Fix Version/s: 10.1.30
Component/s: Locking, Storage Engine - InnoDB
Labels:
None
Environment:

Hide
3 node Galera Cluster
Linux version 3.13.0-48-generic (buildd@orlo) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #80-Ubuntu SMP Thu Mar 12 11:16:15 U
TC 2015 (Ubuntu 3.13.0-48.80-generic 3.13.11-ckt16)

Show
3 node Galera Cluster Linux version 3.13.0-48-generic ( buildd@orlo ) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #80-Ubuntu SMP Thu Mar 12 11:16:15 U TC 2015 (Ubuntu 3.13.0-48.80-generic 3.13.11-ckt16)

Sprint:
10.1.15, 10.1.20, 10.1.22, 10.1.29

Description

There is an intermittent bug whereby the binary log does not rotate . The binary log size is set at 100MB yet there are times it does not rotate and exceeds 100MB .

The binary log has sometimes grown to 60GB in size . Commands like FLUSH LOGS do not achieve rotation . When a binary log file becomes more than 10 days old, it is supposed to be deleted by the database server , this is not happening once hit by the bug .

The Mariadb node has to be restarted to rotate the binary log once it grows above 100MB . Parameters like max-bin-log and expire-logs-days are configured correctly .

max_binlog_size | 104857600
expire_logs_days | 8

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

configuration.txt
12 kB
2016-05-10 15:28
mysql-bin.001551
300 kB
2016-05-10 15:27

Issue Links

includes

MDEV-9510 Segmentation fault in binlog thread causes crash

Closed

relates to

MDEV-8746 max_binlog_size was exceeded

Closed

MDEV-11610 Logrotate to only FLUSH LOCAL ERROR LOGS, ENGINE LOGS, GENERAL LOGS

Closed

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2016-04-22 11:26

nirbhay_c,
Another complaint like ~~MDEV-8746~~, and also about MariaDB Galera cluster.

Elena Stepanova added a comment - 2016-04-22 11:26 nirbhay_c , Another complaint like MDEV-8746 , and also about MariaDB Galera cluster.

John Burwell added a comment - 2016-05-03 20:29

We are seeing the same behavior on a non-replicated, single node MariaDB 5.5.40 instance. We have set expire_logs_days to 10 and max_binlog_size to 307100. With these settings in place, we continue to see unbounded binary log growth.

John Burwell added a comment - 2016-05-03 20:29 We are seeing the same behavior on a non-replicated, single node MariaDB 5.5.40 instance. We have set expire_logs_days to 10 and max_binlog_size to 307100. With these settings in place, we continue to see unbounded binary log growth.

Nirbhay Choubey (Inactive) added a comment - 2016-05-03 20:56

jburwell Is it MariaDB server or MariaDB Galera server that you are using? Could you share the last few events from the binary log file? What are the config options?

Nirbhay Choubey (Inactive) added a comment - 2016-05-03 20:56 jburwell Is it MariaDB server or MariaDB Galera server that you are using? Could you share the last few events from the binary log file? What are the config options?

John Burwell added a comment - 2016-05-03 21:13

nirbhay_c MariaDB server only – no Galera usage. Would the previous 20 bin log events be sufficient?

John Burwell added a comment - 2016-05-03 21:13 nirbhay_c MariaDB server only – no Galera usage. Would the previous 20 bin log events be sufficient?

Nirbhay Choubey (Inactive) added a comment - 2016-05-03 21:19

Yes.. and please also do share your server's configurations.

Nirbhay Choubey (Inactive) added a comment - 2016-05-03 21:19 Yes.. and please also do share your server's configurations.

John Burwell added a comment - 2016-05-03 21:25

nirbhay_c I have requested permission to share this information with you. As soon as I receive it, I will attach the requested information to this ticket.

Thank you for your quick response and assistance with this issue. It is greatly appreciated.

John Burwell added a comment - 2016-05-03 21:25 nirbhay_c I have requested permission to share this information with you. As soon as I receive it, I will attach the requested information to this ticket. Thank you for your quick response and assistance with this issue. It is greatly appreciated.

Nirbhay Choubey (Inactive) added a comment - 2016-05-03 21:39

jburwell In case you are not comfortable sharing those details here
in open, you could instead upload them to our private ftp server.
https://mariadb.com/kb/en/mariadb/reporting-bugs/

Nirbhay Choubey (Inactive) added a comment - 2016-05-03 21:39 jburwell In case you are not comfortable sharing those details here in open, you could instead upload them to our private ftp server. https://mariadb.com/kb/en/mariadb/reporting-bugs/

John Burwell added a comment - 2016-05-10 15:29

nirbhay_c I apologize for my delayed response. Per your request, I have attached a recent binary log file (mysql-bin.001551) and the sanitized results of SHOW VARIABLES (configuration.txt). Thank you again for your help.

John Burwell added a comment - 2016-05-10 15:29 nirbhay_c I apologize for my delayed response. Per your request, I have attached a recent binary log file (mysql-bin.001551) and the sanitized results of SHOW VARIABLES (configuration.txt). Thank you again for your help.

John Burwell added a comment - 2016-05-24 10:42

nirbhay_c have you had a chance to review the attached bin log and configuration information?

John Burwell added a comment - 2016-05-24 10:42 nirbhay_c have you had a chance to review the attached bin log and configuration information?

Arjen Lentz added a comment - 2016-06-18 00:25

Is someone working on this? There's clearly an issue, experienced by different people.

We appreciate it's an intermittent problem and the config may not clarify much, but the problem can be really damaging on a production system as it can make it run out of space.
We looked at workarounds such as use logrotate issuing the appropriate SQL commands "manually", however this to appears to not work (mysqladmin flush logs). So there's something problematic going on.

Has someone just had a look at the source code to see if anything eye-catching is happening there in the logic?

Arjen Lentz added a comment - 2016-06-18 00:25 Is someone working on this? There's clearly an issue, experienced by different people. We appreciate it's an intermittent problem and the config may not clarify much, but the problem can be really damaging on a production system as it can make it run out of space. We looked at workarounds such as use logrotate issuing the appropriate SQL commands "manually", however this to appears to not work (mysqladmin flush logs). So there's something problematic going on. Has someone just had a look at the source code to see if anything eye-catching is happening there in the logic?

Arjen Lentz added a comment - 2016-06-18 00:25

As a related thought - there is a scenario where the log doesn't get rotated/expired if there are still open transactions (semaphore waits related to those), systems sometimes have long-running transactions for perfectly valid reasons. Perhaps the two are related?
If we can get clarity on what things (in the code logic) might prevent the log rotation/expiry from working, we can (in our various production environments) look at those issues.

Below is a snippet from SHOW ENGINE INNODB STATUS
--Thread 139891669952256 has waited at lock0lock.cc line 7638 for 571.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
waiters flag 1
--Thread 139930203956992 has waited at lock0lock.cc line 6756 for 571.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
waiters flag 1
--Thread 139895193466624 has waited at lock0lock.cc line 7548 for 571.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
waiters flag 1
--Thread 139873440749312 has waited at lock0lock.cc line 3806 for 570.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
waiters flag 1
--Thread 139873432356608 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore:
X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock'
a writer (thread id 139873440749312) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 257
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257
--Thread 139873415571200 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore:
X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock'
a writer (thread id 139873440749312) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 257
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257
--Thread 139873423963904 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore:
X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock'
a writer (thread id 139873440749312) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 257
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257
--Thread 139930173536000 has waited at lock0lock.cc line 5014 for 570.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1

Arjen Lentz added a comment - 2016-06-18 00:25 As a related thought - there is a scenario where the log doesn't get rotated/expired if there are still open transactions (semaphore waits related to those), systems sometimes have long-running transactions for perfectly valid reasons. Perhaps the two are related? If we can get clarity on what things (in the code logic) might prevent the log rotation/expiry from working, we can (in our various production environments) look at those issues. Below is a snippet from SHOW ENGINE INNODB STATUS --Thread 139891669952256 has waited at lock0lock.cc line 7638 for 571.00 seconds the semaphore: Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1 waiters flag 1 --Thread 139930203956992 has waited at lock0lock.cc line 6756 for 571.00 seconds the semaphore: Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1 waiters flag 1 --Thread 139895193466624 has waited at lock0lock.cc line 7548 for 571.00 seconds the semaphore: Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1 waiters flag 1 --Thread 139873440749312 has waited at lock0lock.cc line 3806 for 570.00 seconds the semaphore: Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1 waiters flag 1 --Thread 139873432356608 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore: X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock' a writer (thread id 139873440749312) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file btr0cur.cc line 257 Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257 --Thread 139873415571200 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore: X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock' a writer (thread id 139873440749312) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file btr0cur.cc line 257 Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257 --Thread 139873423963904 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore: X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock' a writer (thread id 139873440749312) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file btr0cur.cc line 257 Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257 --Thread 139930173536000 has waited at lock0lock.cc line 5014 for 570.00 seconds the semaphore: Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1

Nirbhay Choubey (Inactive) added a comment - 2016-12-06 22:09

I tried performing some log rotations while having an active long transaction, and rotation
worked as expected.
In 10.1, one possible reason that could delay log rotation will be the failure to lock LOCK_xid_list
(TC_LOG_BINLOG::mark_xids_active) during rotate() if it's held by some other thread.
In that case, the thread that triggers the log rotation will remain stuck as well.

Nirbhay Choubey (Inactive) added a comment - 2016-12-06 22:09 I tried performing some log rotations while having an active long transaction, and rotation worked as expected. In 10.1, one possible reason that could delay log rotation will be the failure to lock LOCK_xid_list (TC_LOG_BINLOG::mark_xids_active) during rotate() if it's held by some other thread. In that case, the thread that triggers the log rotation will remain stuck as well.

Daniel Black added a comment - 2017-03-01 11:30

Folks, In light of the number of log-rotation based stalls, crashes I created the workaround in ~~MDEV-11610~~. The code for it is here https://github.com/MariaDB/server/pull/323. In essence it removes rotation of binary and relays logs from the logrotate script and injects a SET LOCAL SQL_BIN_LOG=0.

If you have any issues with this patch please note it on the github PR or in ~~MDEV-11610~~.
If you approve of this intermediary solution please vote for issue ~~MDEV-11610~~.

Daniel Black added a comment - 2017-03-01 11:30 Folks, In light of the number of log-rotation based stalls, crashes I created the workaround in MDEV-11610 . The code for it is here https://github.com/MariaDB/server/pull/323 . In essence it removes rotation of binary and relays logs from the logrotate script and injects a SET LOCAL SQL_BIN_LOG=0. If you have any issues with this patch please note it on the github PR or in MDEV-11610 . If you approve of this intermediary solution please vote for issue MDEV-11610 .

Andrei Elkin added a comment - 2017-10-23 10:45

Lack of expected rotation is a symptom of the parent of the current issue and is explained in its patch (to be reviewed atm).

Andrei Elkin added a comment - 2017-10-23 10:45 Lack of expected rotation is a symptom of the parent of the current issue and is explained in its patch (to be reviewed atm).

Andrei Elkin added a comment - 2017-10-25 09:04

The current issue seems to aggregate two separate ones. ~~MDEV-9510~~ covers the cluster replication one (there's a patch at testing now).
What is behind the 2nd no-cluster 5.5-based remains open.

Andrei Elkin added a comment - 2017-10-25 09:04 The current issue seems to aggregate two separate ones. MDEV-9510 covers the cluster replication one (there's a patch at testing now). What is behind the 2nd no-cluster 5.5-based remains open.

Andrei Elkin added a comment - 2017-12-12 18:40 - edited

The very reported cluster-related issue has been fixed by ~~MDEV-9510~~. I am not sure whether followup comments mentioning 5.5 still present an outstanding issue. If they are valid could you please make us known via either writing to this ticket, or better to report a new one.

Thanks.

Andrei Elkin added a comment - 2017-12-12 18:40 - edited The very reported cluster -related issue has been fixed by MDEV-9510 . I am not sure whether followup comments mentioning 5.5 still present an outstanding issue. If they are valid could you please make us known via either writing to this ticket, or better to report a new one. Thanks.

People

Assignee:: Andrei Elkin

Reporter:: TYNDALE BANZA

Votes:: 2 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 2016-04-21 01:27

Updated:: 2017-12-18 14:38

Resolved:: 2017-12-12 18:40

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration