Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28680

bin log interrupted and hung for minutes after transaction commit

    XMLWordPrintable

Details

    Description

      *Hi,

      In our production lab we found one issue that the bin log update got interrupted and hung there for minutes after C++ API mysql_query with "commit" param returned success.

      Timeline:*

      1. Transaction committed by calling of mysql_query with "commit" param which returned success (product log which confirmed successful return of mysql_query):

      +++ 2022/05/26 11:32:46.065 DB MEDIUM remotedb:11310 E:105110 S:2657 (RemotedbNotify.cpp 1026 v-0001 169.254.64.1 oam-a)
      INFO: Completed notify connection attempt.

      2. bin log started to be updated, but only updated the first 3 records:

      1. at 2296562
        #220526 11:32:46 server id 1 end_log_pos 2296604 CRC32 0x30537424 GTID 0-1-29252 trans
        /Unable to render embedded object: File (100001 SET @@session.gtid_seq_no=29252*//*) not found./;
        START TRANSACTION
        /!/;
      2. at 2296604
        #220526 11:32:46 server id 1 end_log_pos 2297037 CRC32 0xed2efe0a Query thread_id=11783 exec_time=0 error_code=0
        SET TIMESTAMP=1653564766/!/;
        update db01host.notify_remdb_operation set process_id = 11312, objlen = 48, flag = 0, socket = 171, retry_count = 0, host_ip = x'00a9fe4101000000000000000000000000000000', schema_class = 'cfg' where operation = 1 and notification_type = 2 and remote_ip = x'00a9fe4001000000000000000000000000000000' and table_name = 'resource_parms' and process_name = 'SSPcntl_agt'
        /!/;
      3. at 2297037
        #220526 11:32:46 server id 1 end_log_pos 2297470 CRC32 0xa716708c Query thread_id=11783 exec_time=0 error_code=0
        SET TIMESTAMP=1653564766/!/;
        update db01host.notify_remdb_operation set process_id = 11312, objlen = 48, flag = 0, socket = 171, retry_count = 0, host_ip = x'00a9fe4101000000000000000000000000000000', schema_class = 'cfg' where operation = 2 and notification_type = 2 and remote_ip = x'00a9fe4001000000000000000000000000000000' and table_name = 'resource_parms' and process_name = 'SSPcntl_agt'
        /!/;
      4. at 2297470
        #220526 11:32:46 server id 1 end_log_pos 2297903 CRC32 0x3144ec9e Query thread_id=11783 exec_time=0 error_code=0
        SET TIMESTAMP=1653564766/!/;
        update db01host.notify_remdb_operation set process_id = 11312, objlen = 48, flag = 0, socket = 171, retry_count = 0, host_ip = x'00a9fe4101000000000000000000000000000000', schema_class = 'cfg' where operation = 3 and notification_type = 2 and remote_ip = x'00a9fe4001000000000000000000000000000000' and table_name = 'resource_parms' and process_name = 'SSPcntl_agt'
        /!/;
      5. at 2297903
        #220526 11:32:46 server id 1 end_log_pos 2297934 CRC32 0x3214e514 Xid = 249264
        COMMIT/!/;

      *3. About >6m later, it started to write bin log for the remaining records for that transaction (Note in-between there were bin logs writes for other transactions): *

      1. at 2443522
        #220526 11:39:18 server id 1 end_log_pos 2443961 CRC32 0xd83cc209 Query thread_id=72 exec_time=0 error_code=0
        SET TIMESTAMP=1653565158/!/;
        update db01host.notify_remdb_operation set process_id = 11312, objlen = 824, flag = 0, socket = 171, retry_count = 0, host_ip = x'00a9fe4001000000000000000000000000000000', schema_class = 'cfg' where operation = 1 and notification_type = 2 and remote_ip = x'00a9fe4001000000000000000000000000000000' and table_name = 'gateway_family_base' and process_name = 'SSPcntl_agt'
        /!/;
        ......
      2. at 2498796
        #220526 11:39:20 server id 1 end_log_pos 2499229 CRC32 0x2b4c10a9 Query thread_id=72 exec_time=0 error_code=0
        SET TIMESTAMP=1653565160/!/;
        update db01host.notify_remdb_operation set process_id = 11312, objlen = 48, flag = 0, socket = 171, retry_count = 0, host_ip = x'00a9fe4001000000000000000000000000000000', schema_class = 'cfg' where operation = 3 and notification_type = 2 and remote_ip = x'00a9fe4001000000000000000000000000000000' and table_name = 'resource_parms' and process_name = 'SSPcntl_agt'
        /!/;

      Then finally the DB contents were overwritten incorrectly by the delayed bin log updates causing product failures.

      *Mariadb Server version: 10.4.21-MariaDB-log Source distribution

      Attached config file my.cnf and bin log as well as decoded bin log.

      Your advice is highly appreciated.*

      Attachments

        1. my.cnf
          3 kB
        2. bin.log
          5.06 MB
        3. mysql-lcpdb-bin.000003
          4.38 MB

        Activity

          People

            Unassigned Unassigned
            liberiul Andy Liu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.