3 node Galera Cluster
Linux version 3.13.0-48-generic ( buildd@orlo ) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #80-Ubuntu SMP Thu Mar 12 11:16:15 U
TC 2015 (Ubuntu 3.13.0-48.80-generic 3.13.11-ckt16)
10.1.15, 10.1.20, 10.1.22, 10.1.29
Description
There is an intermittent bug whereby the binary log does not rotate . The binary log size is set at 100MB yet there are times it does not rotate and exceeds 100MB .
The binary log has sometimes grown to 60GB in size . Commands like FLUSH LOGS do not achieve rotation . When a binary log file becomes more than 10 days old, it is supposed to be deleted by the database server , this is not happening once hit by the bug .
The Mariadb node has to be restarted to rotate the binary log once it grows above 100MB . Parameters like max-bin-log and expire-logs-days are configured correctly .
We are seeing the same behavior on a non-replicated, single node MariaDB 5.5.40 instance. We have set expire_logs_days to 10 and max_binlog_size to 307100. With these settings in place, we continue to see unbounded binary log growth.
John Burwell
added a comment - We are seeing the same behavior on a non-replicated, single node MariaDB 5.5.40 instance. We have set expire_logs_days to 10 and max_binlog_size to 307100. With these settings in place, we continue to see unbounded binary log growth.
jburwell Is it MariaDB server or MariaDB Galera server that you are using? Could you share the last few events from the binary log file? What are the config options?
Nirbhay Choubey (Inactive)
added a comment - jburwell Is it MariaDB server or MariaDB Galera server that you are using? Could you share the last few events from the binary log file? What are the config options?
nirbhay_c I have requested permission to share this information with you. As soon as I receive it, I will attach the requested information to this ticket.
Thank you for your quick response and assistance with this issue. It is greatly appreciated.
John Burwell
added a comment - nirbhay_c I have requested permission to share this information with you. As soon as I receive it, I will attach the requested information to this ticket.
Thank you for your quick response and assistance with this issue. It is greatly appreciated.
Nirbhay Choubey (Inactive)
added a comment - jburwell In case you are not comfortable sharing those details here
in open, you could instead upload them to our private ftp server.
https://mariadb.com/kb/en/mariadb/reporting-bugs/
nirbhay_c I apologize for my delayed response. Per your request, I have attached a recent binary log file (mysql-bin.001551) and the sanitized results of SHOW VARIABLES (configuration.txt). Thank you again for your help.
John Burwell
added a comment - nirbhay_c I apologize for my delayed response. Per your request, I have attached a recent binary log file (mysql-bin.001551) and the sanitized results of SHOW VARIABLES (configuration.txt). Thank you again for your help.
Is someone working on this? There's clearly an issue, experienced by different people.
We appreciate it's an intermittent problem and the config may not clarify much, but the problem can be really damaging on a production system as it can make it run out of space.
We looked at workarounds such as use logrotate issuing the appropriate SQL commands "manually", however this to appears to not work (mysqladmin flush logs). So there's something problematic going on.
Has someone just had a look at the source code to see if anything eye-catching is happening there in the logic?
Arjen Lentz
added a comment - Is someone working on this? There's clearly an issue, experienced by different people.
We appreciate it's an intermittent problem and the config may not clarify much, but the problem can be really damaging on a production system as it can make it run out of space.
We looked at workarounds such as use logrotate issuing the appropriate SQL commands "manually", however this to appears to not work (mysqladmin flush logs). So there's something problematic going on.
Has someone just had a look at the source code to see if anything eye-catching is happening there in the logic?
As a related thought - there is a scenario where the log doesn't get rotated/expired if there are still open transactions (semaphore waits related to those), systems sometimes have long-running transactions for perfectly valid reasons. Perhaps the two are related?
If we can get clarity on what things (in the code logic) might prevent the log rotation/expiry from working, we can (in our various production environments) look at those issues.
Below is a snippet from SHOW ENGINE INNODB STATUS
--Thread 139891669952256 has waited at lock0lock.cc line 7638 for 571.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
waiters flag 1
--Thread 139930203956992 has waited at lock0lock.cc line 6756 for 571.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
waiters flag 1
--Thread 139895193466624 has waited at lock0lock.cc line 7548 for 571.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
waiters flag 1
--Thread 139873440749312 has waited at lock0lock.cc line 3806 for 570.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
waiters flag 1
--Thread 139873432356608 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore:
X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock'
a writer (thread id 139873440749312) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 257
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257
--Thread 139873415571200 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore:
X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock'
a writer (thread id 139873440749312) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 257
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257
--Thread 139873423963904 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore:
X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock'
a writer (thread id 139873440749312) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 257
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257
--Thread 139930173536000 has waited at lock0lock.cc line 5014 for 570.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
Arjen Lentz
added a comment - As a related thought - there is a scenario where the log doesn't get rotated/expired if there are still open transactions (semaphore waits related to those), systems sometimes have long-running transactions for perfectly valid reasons. Perhaps the two are related?
If we can get clarity on what things (in the code logic) might prevent the log rotation/expiry from working, we can (in our various production environments) look at those issues.
Below is a snippet from SHOW ENGINE INNODB STATUS
--Thread 139891669952256 has waited at lock0lock.cc line 7638 for 571.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
waiters flag 1
--Thread 139930203956992 has waited at lock0lock.cc line 6756 for 571.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
waiters flag 1
--Thread 139895193466624 has waited at lock0lock.cc line 7548 for 571.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
waiters flag 1
--Thread 139873440749312 has waited at lock0lock.cc line 3806 for 570.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
waiters flag 1
--Thread 139873432356608 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore:
X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock'
a writer (thread id 139873440749312) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 257
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257
--Thread 139873415571200 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore:
X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock'
a writer (thread id 139873440749312) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 257
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257
--Thread 139873423963904 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore:
X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock'
a writer (thread id 139873440749312) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0cur.cc line 257
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257
--Thread 139930173536000 has waited at lock0lock.cc line 5014 for 570.00 seconds the semaphore:
Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
I tried performing some log rotations while having an active long transaction, and rotation
worked as expected.
In 10.1, one possible reason that could delay log rotation will be the failure to lock LOCK_xid_list
(TC_LOG_BINLOG::mark_xids_active) during rotate() if it's held by some other thread.
In that case, the thread that triggers the log rotation will remain stuck as well.
Nirbhay Choubey (Inactive)
added a comment - I tried performing some log rotations while having an active long transaction, and rotation
worked as expected.
In 10.1, one possible reason that could delay log rotation will be the failure to lock LOCK_xid_list
(TC_LOG_BINLOG::mark_xids_active) during rotate() if it's held by some other thread.
In that case, the thread that triggers the log rotation will remain stuck as well.
Folks, In light of the number of log-rotation based stalls, crashes I created the workaround in MDEV-11610. The code for it is here https://github.com/MariaDB/server/pull/323. In essence it removes rotation of binary and relays logs from the logrotate script and injects a SET LOCAL SQL_BIN_LOG=0.
If you have any issues with this patch please note it on the github PR or in MDEV-11610.
If you approve of this intermediary solution please vote for issue MDEV-11610.
Daniel Black
added a comment - Folks, In light of the number of log-rotation based stalls, crashes I created the workaround in MDEV-11610 . The code for it is here https://github.com/MariaDB/server/pull/323 . In essence it removes rotation of binary and relays logs from the logrotate script and injects a SET LOCAL SQL_BIN_LOG=0.
If you have any issues with this patch please note it on the github PR or in MDEV-11610 .
If you approve of this intermediary solution please vote for issue MDEV-11610 .
Lack of expected rotation is a symptom of the parent of the current issue and is explained in its patch (to be reviewed atm).
Andrei Elkin
added a comment - Lack of expected rotation is a symptom of the parent of the current issue and is explained in its patch (to be reviewed atm).
The current issue seems to aggregate two separate ones. MDEV-9510 covers the cluster replication one (there's a patch at testing now).
What is behind the 2nd no-cluster 5.5-based remains open.
Andrei Elkin
added a comment - The current issue seems to aggregate two separate ones. MDEV-9510 covers the cluster replication one (there's a patch at testing now).
What is behind the 2nd no-cluster 5.5-based remains open.
The very reported cluster-related issue has been fixed by MDEV-9510. I am not sure whether followup comments mentioning 5.5 still present an outstanding issue. If they are valid could you please make us known via either writing to this ticket, or better to report a new one.
Thanks.
Andrei Elkin
added a comment - - edited The very reported cluster -related issue has been fixed by MDEV-9510 . I am not sure whether followup comments mentioning 5.5 still present an outstanding issue. If they are valid could you please make us known via either writing to this ticket, or better to report a new one.
Thanks.
People
Andrei Elkin
TYNDALE BANZA
Votes:
2Vote for this issue
Watchers:
10Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":2000.9000000953674,"ttfb":808.2000002861023,"pageVisibility":"visible","entityId":56493,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"c8458d45-8412-4b60-a73a-174f2335fc9f","navigationType":0,"readyForUser":2088.5,"redirectCount":0,"resourceLoadedEnd":1728.4000000953674,"resourceLoadedStart":822.5999999046326,"resourceTiming":[{"duration":327.1000003814697,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":822.5999999046326,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":822.5999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1149.7000002861023,"responseStart":0,"secureConnectionStart":0},{"duration":327.09999990463257,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":822.9000000953674,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":822.9000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1150,"responseStart":0,"secureConnectionStart":0},{"duration":602.8000001907349,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":823.0999999046326,"connectEnd":823.0999999046326,"connectStart":823.0999999046326,"domainLookupEnd":823.0999999046326,"domainLookupStart":823.0999999046326,"fetchStart":823.0999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":1155.7000002861023,"responseEnd":1425.9000000953674,"responseStart":1182.8000001907349,"secureConnectionStart":823.0999999046326},{"duration":891.6999998092651,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":823.4000000953674,"connectEnd":823.4000000953674,"connectStart":823.4000000953674,"domainLookupEnd":823.4000000953674,"domainLookupStart":823.4000000953674,"fetchStart":823.4000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":1155.5,"responseEnd":1715.0999999046326,"responseStart":1181.8000001907349,"secureConnectionStart":823.4000000953674},{"duration":357.2000002861023,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":823.5999999046326,"connectEnd":823.5999999046326,"connectStart":823.5999999046326,"domainLookupEnd":823.5999999046326,"domainLookupStart":823.5999999046326,"fetchStart":823.5999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":1155.8000001907349,"responseEnd":1180.8000001907349,"responseStart":1179.7000002861023,"secureConnectionStart":823.5999999046326},{"duration":409.2999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":823.7000002861023,"connectEnd":823.7000002861023,"connectStart":823.7000002861023,"domainLookupEnd":823.7000002861023,"domainLookupStart":823.7000002861023,"fetchStart":823.7000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":1164.0999999046326,"responseEnd":1233,"responseStart":1231.5999999046326,"secureConnectionStart":823.7000002861023},{"duration":381.19999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":823.9000000953674,"connectEnd":823.9000000953674,"connectStart":823.9000000953674,"domainLookupEnd":823.9000000953674,"domainLookupStart":823.9000000953674,"fetchStart":823.9000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":1165.8000001907349,"responseEnd":1205.0999999046326,"responseStart":1202.5999999046326,"secureConnectionStart":823.9000000953674},{"duration":339.80000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":824.0999999046326,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":824.0999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1163.9000000953674,"responseStart":0,"secureConnectionStart":0},{"duration":381.09999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":824.3000001907349,"connectEnd":824.3000001907349,"connectStart":824.3000001907349,"domainLookupEnd":824.3000001907349,"domainLookupStart":824.3000001907349,"fetchStart":824.3000001907349,"redirectEnd":0,"redirectStart":0,"requestStart":1168.3000001907349,"responseEnd":1205.4000000953674,"responseStart":1203.5999999046326,"secureConnectionStart":824.3000001907349},{"duration":340.30000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":824.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":824.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1164.8000001907349,"responseStart":0,"secureConnectionStart":0},{"duration":395.1000003814697,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":824.5999999046326,"connectEnd":824.5999999046326,"connectStart":824.5999999046326,"domainLookupEnd":824.5999999046326,"domainLookupStart":824.5999999046326,"fetchStart":824.5999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":1170.0999999046326,"responseEnd":1219.7000002861023,"responseStart":1206,"secureConnectionStart":824.5999999046326},{"duration":860.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":867.0999999046326,"connectEnd":867.0999999046326,"connectStart":867.0999999046326,"domainLookupEnd":867.0999999046326,"domainLookupStart":867.0999999046326,"fetchStart":867.0999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":1674.7000002861023,"responseEnd":1727.5999999046326,"responseStart":1723.3000001907349,"secureConnectionStart":867.0999999046326},{"duration":842.3000001907349,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":886.0999999046326,"connectEnd":886.0999999046326,"connectStart":886.0999999046326,"domainLookupEnd":886.0999999046326,"domainLookupStart":886.0999999046326,"fetchStart":886.0999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":1705.3000001907349,"responseEnd":1728.4000000953674,"responseStart":1725.2000002861023,"secureConnectionStart":886.0999999046326},{"duration":338.59999990463257,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":1597.2000002861023,"connectEnd":1597.2000002861023,"connectStart":1597.2000002861023,"domainLookupEnd":1597.2000002861023,"domainLookupStart":1597.2000002861023,"fetchStart":1597.2000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":1904.2000002861023,"responseEnd":1935.8000001907349,"responseStart":1934.0999999046326,"secureConnectionStart":1597.2000002861023}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":545,"responseStart":808,"responseEnd":886,"domLoading":813,"domInteractive":2191,"domContentLoadedEventStart":2191,"domContentLoadedEventEnd":2246,"domComplete":2786,"loadEventStart":2786,"loadEventEnd":2787,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":2151},{"name":"bigPipe.sidebar-id.end","time":2151.800000190735},{"name":"bigPipe.activity-panel-pipe-id.start","time":2152.0999999046326},{"name":"bigPipe.activity-panel-pipe-id.end","time":2155.5},{"name":"activityTabFullyLoaded","time":2273.9000000953674}],"measures":[],"correlationId":"665061b2f4fe1e","effectiveType":"4g","downlink":10,"rtt":0,"serverDuration":127,"dbReadsTimeInMs":15,"dbConnsTimeInMs":24,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
nirbhay_c,
Another complaint like
MDEV-8746, and also about MariaDB Galera cluster.