Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • 5.5(EOL), 10.0(EOL)
    • 10.1.30
    • None
    • 10.1.15, 10.1.20, 10.1.22, 10.1.29

    Description

      There is an intermittent bug whereby the binary log does not rotate . The binary log size is set at 100MB yet there are times it does not rotate and exceeds 100MB .

      The binary log has sometimes grown to 60GB in size . Commands like FLUSH LOGS do not achieve rotation . When a binary log file becomes more than 10 days old, it is supposed to be deleted by the database server , this is not happening once hit by the bug .

      The Mariadb node has to be restarted to rotate the binary log once it grows above 100MB . Parameters like max-bin-log and expire-logs-days are configured correctly .

      max_binlog_size | 104857600
      expire_logs_days | 8

      Attachments

        Issue Links

          Activity

            nirbhay_c,
            Another complaint like MDEV-8746, and also about MariaDB Galera cluster.

            elenst Elena Stepanova added a comment - nirbhay_c , Another complaint like MDEV-8746 , and also about MariaDB Galera cluster.
            jburwell John Burwell added a comment -

            We are seeing the same behavior on a non-replicated, single node MariaDB 5.5.40 instance. We have set expire_logs_days to 10 and max_binlog_size to 307100. With these settings in place, we continue to see unbounded binary log growth.

            jburwell John Burwell added a comment - We are seeing the same behavior on a non-replicated, single node MariaDB 5.5.40 instance. We have set expire_logs_days to 10 and max_binlog_size to 307100. With these settings in place, we continue to see unbounded binary log growth.

            jburwell Is it MariaDB server or MariaDB Galera server that you are using? Could you share the last few events from the binary log file? What are the config options?

            nirbhay_c Nirbhay Choubey (Inactive) added a comment - jburwell Is it MariaDB server or MariaDB Galera server that you are using? Could you share the last few events from the binary log file? What are the config options?
            jburwell John Burwell added a comment -

            nirbhay_c MariaDB server only – no Galera usage. Would the previous 20 bin log events be sufficient?

            jburwell John Burwell added a comment - nirbhay_c MariaDB server only – no Galera usage. Would the previous 20 bin log events be sufficient?

            Yes.. and please also do share your server's configurations.

            nirbhay_c Nirbhay Choubey (Inactive) added a comment - Yes.. and please also do share your server's configurations.
            jburwell John Burwell added a comment -

            nirbhay_c I have requested permission to share this information with you. As soon as I receive it, I will attach the requested information to this ticket.

            Thank you for your quick response and assistance with this issue. It is greatly appreciated.

            jburwell John Burwell added a comment - nirbhay_c I have requested permission to share this information with you. As soon as I receive it, I will attach the requested information to this ticket. Thank you for your quick response and assistance with this issue. It is greatly appreciated.

            jburwell In case you are not comfortable sharing those details here
            in open, you could instead upload them to our private ftp server.
            https://mariadb.com/kb/en/mariadb/reporting-bugs/

            nirbhay_c Nirbhay Choubey (Inactive) added a comment - jburwell In case you are not comfortable sharing those details here in open, you could instead upload them to our private ftp server. https://mariadb.com/kb/en/mariadb/reporting-bugs/
            jburwell John Burwell added a comment -

            nirbhay_c I apologize for my delayed response. Per your request, I have attached a recent binary log file (mysql-bin.001551) and the sanitized results of SHOW VARIABLES (configuration.txt). Thank you again for your help.

            jburwell John Burwell added a comment - nirbhay_c I apologize for my delayed response. Per your request, I have attached a recent binary log file (mysql-bin.001551) and the sanitized results of SHOW VARIABLES (configuration.txt). Thank you again for your help.
            jburwell John Burwell added a comment -

            nirbhay_c have you had a chance to review the attached bin log and configuration information?

            jburwell John Burwell added a comment - nirbhay_c have you had a chance to review the attached bin log and configuration information?
            arjen Arjen Lentz added a comment -

            Is someone working on this? There's clearly an issue, experienced by different people.

            We appreciate it's an intermittent problem and the config may not clarify much, but the problem can be really damaging on a production system as it can make it run out of space.
            We looked at workarounds such as use logrotate issuing the appropriate SQL commands "manually", however this to appears to not work (mysqladmin flush logs). So there's something problematic going on.

            Has someone just had a look at the source code to see if anything eye-catching is happening there in the logic?

            arjen Arjen Lentz added a comment - Is someone working on this? There's clearly an issue, experienced by different people. We appreciate it's an intermittent problem and the config may not clarify much, but the problem can be really damaging on a production system as it can make it run out of space. We looked at workarounds such as use logrotate issuing the appropriate SQL commands "manually", however this to appears to not work (mysqladmin flush logs). So there's something problematic going on. Has someone just had a look at the source code to see if anything eye-catching is happening there in the logic?
            arjen Arjen Lentz added a comment -

            As a related thought - there is a scenario where the log doesn't get rotated/expired if there are still open transactions (semaphore waits related to those), systems sometimes have long-running transactions for perfectly valid reasons. Perhaps the two are related?
            If we can get clarity on what things (in the code logic) might prevent the log rotation/expiry from working, we can (in our various production environments) look at those issues.

            Below is a snippet from SHOW ENGINE INNODB STATUS
            --Thread 139891669952256 has waited at lock0lock.cc line 7638 for 571.00 seconds the semaphore:
            Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
            waiters flag 1
            --Thread 139930203956992 has waited at lock0lock.cc line 6756 for 571.00 seconds the semaphore:
            Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
            waiters flag 1
            --Thread 139895193466624 has waited at lock0lock.cc line 7548 for 571.00 seconds the semaphore:
            Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
            waiters flag 1
            --Thread 139873440749312 has waited at lock0lock.cc line 3806 for 570.00 seconds the semaphore:
            Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1
            waiters flag 1
            --Thread 139873432356608 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore:
            X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock'
            a writer (thread id 139873440749312) has reserved it in mode exclusive
            number of readers 0, waiters flag 1, lock_word: 0
            Last time read locked in file btr0cur.cc line 257
            Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257
            --Thread 139873415571200 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore:
            X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock'
            a writer (thread id 139873440749312) has reserved it in mode exclusive
            number of readers 0, waiters flag 1, lock_word: 0
            Last time read locked in file btr0cur.cc line 257
            Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257
            --Thread 139873423963904 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore:
            X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock'
            a writer (thread id 139873440749312) has reserved it in mode exclusive
            number of readers 0, waiters flag 1, lock_word: 0
            Last time read locked in file btr0cur.cc line 257
            Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257
            --Thread 139930173536000 has waited at lock0lock.cc line 5014 for 570.00 seconds the semaphore:
            Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1

            arjen Arjen Lentz added a comment - As a related thought - there is a scenario where the log doesn't get rotated/expired if there are still open transactions (semaphore waits related to those), systems sometimes have long-running transactions for perfectly valid reasons. Perhaps the two are related? If we can get clarity on what things (in the code logic) might prevent the log rotation/expiry from working, we can (in our various production environments) look at those issues. Below is a snippet from SHOW ENGINE INNODB STATUS --Thread 139891669952256 has waited at lock0lock.cc line 7638 for 571.00 seconds the semaphore: Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1 waiters flag 1 --Thread 139930203956992 has waited at lock0lock.cc line 6756 for 571.00 seconds the semaphore: Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1 waiters flag 1 --Thread 139895193466624 has waited at lock0lock.cc line 7548 for 571.00 seconds the semaphore: Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1 waiters flag 1 --Thread 139873440749312 has waited at lock0lock.cc line 3806 for 570.00 seconds the semaphore: Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1 waiters flag 1 --Thread 139873432356608 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore: X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock' a writer (thread id 139873440749312) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file btr0cur.cc line 257 Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257 --Thread 139873415571200 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore: X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock' a writer (thread id 139873440749312) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file btr0cur.cc line 257 Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257 --Thread 139873423963904 has waited at btr0cur.cc line 257 for 570.00 seconds the semaphore: X-lock on RW-latch at 0x7f3cb9fff640 '&block->lock' a writer (thread id 139873440749312) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file btr0cur.cc line 257 Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.21/storage/xtradb/btr/btr0cur.cc line 257 --Thread 139930173536000 has waited at lock0lock.cc line 5014 for 570.00 seconds the semaphore: Mutex at 0x7f36d8507068 '&lock_sys->mutex', lock var 1

            I tried performing some log rotations while having an active long transaction, and rotation
            worked as expected.
            In 10.1, one possible reason that could delay log rotation will be the failure to lock LOCK_xid_list
            (TC_LOG_BINLOG::mark_xids_active) during rotate() if it's held by some other thread.
            In that case, the thread that triggers the log rotation will remain stuck as well.

            nirbhay_c Nirbhay Choubey (Inactive) added a comment - I tried performing some log rotations while having an active long transaction, and rotation worked as expected. In 10.1, one possible reason that could delay log rotation will be the failure to lock LOCK_xid_list (TC_LOG_BINLOG::mark_xids_active) during rotate() if it's held by some other thread. In that case, the thread that triggers the log rotation will remain stuck as well.
            danblack Daniel Black added a comment -

            Folks, In light of the number of log-rotation based stalls, crashes I created the workaround in MDEV-11610. The code for it is here https://github.com/MariaDB/server/pull/323. In essence it removes rotation of binary and relays logs from the logrotate script and injects a SET LOCAL SQL_BIN_LOG=0.

            If you have any issues with this patch please note it on the github PR or in MDEV-11610.
            If you approve of this intermediary solution please vote for issue MDEV-11610.

            danblack Daniel Black added a comment - Folks, In light of the number of log-rotation based stalls, crashes I created the workaround in MDEV-11610 . The code for it is here https://github.com/MariaDB/server/pull/323 . In essence it removes rotation of binary and relays logs from the logrotate script and injects a SET LOCAL SQL_BIN_LOG=0. If you have any issues with this patch please note it on the github PR or in MDEV-11610 . If you approve of this intermediary solution please vote for issue MDEV-11610 .
            Elkin Andrei Elkin added a comment -

            Lack of expected rotation is a symptom of the parent of the current issue and is explained in its patch (to be reviewed atm).

            Elkin Andrei Elkin added a comment - Lack of expected rotation is a symptom of the parent of the current issue and is explained in its patch (to be reviewed atm).
            Elkin Andrei Elkin added a comment -

            The current issue seems to aggregate two separate ones. MDEV-9510 covers the cluster replication one (there's a patch at testing now).
            What is behind the 2nd no-cluster 5.5-based remains open.

            Elkin Andrei Elkin added a comment - The current issue seems to aggregate two separate ones. MDEV-9510 covers the cluster replication one (there's a patch at testing now). What is behind the 2nd no-cluster 5.5-based remains open.
            Elkin Andrei Elkin added a comment - - edited

            The very reported cluster-related issue has been fixed by MDEV-9510. I am not sure whether followup comments mentioning 5.5 still present an outstanding issue. If they are valid could you please make us known via either writing to this ticket, or better to report a new one.

            Thanks.

            Elkin Andrei Elkin added a comment - - edited The very reported cluster -related issue has been fixed by MDEV-9510 . I am not sure whether followup comments mentioning 5.5 still present an outstanding issue. If they are valid could you please make us known via either writing to this ticket, or better to report a new one. Thanks.

            People

              Elkin Andrei Elkin
              tbanza TYNDALE BANZA
              Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.