Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28681

bin log interrupted and hung for minutes after transaction commit

Details

    Description

      *Hi,

      In our production lab we found one issue that the bin log update got interrupted and hung there for minutes after C++ API mysql_query with "commit" param returned success.

      Timeline:*

      1. Transaction committed by calling of mysql_query with "commit" param which returned success (product log which confirmed successful return of mysql_query):

      +++ 2022/05/26 11:32:46.065 DB MEDIUM remotedb:11310 E:105110 S:2657 (RemotedbNotify.cpp 1026 v-0001 169.254.64.1 oam-a)
      INFO: Completed notify connection attempt.

      2. bin log started to be updated, but only updated the first 3 records:

      1. at 2296562
        #220526 11:32:46 server id 1 end_log_pos 2296604 CRC32 0x30537424 GTID 0-1-29252 trans
        /Unable to render embedded object: File (100001 SET @@session.gtid_seq_no=29252*//*) not found./;
        START TRANSACTION
        /!/;
      2. at 2296604
        #220526 11:32:46 server id 1 end_log_pos 2297037 CRC32 0xed2efe0a Query thread_id=11783 exec_time=0 error_code=0
        SET TIMESTAMP=1653564766/!/;
        update db01host.notify_remdb_operation set process_id = 11312, objlen = 48, flag = 0, socket = 171, retry_count = 0, host_ip = x'00a9fe4101000000000000000000000000000000', schema_class = 'cfg' where operation = 1 and notification_type = 2 and remote_ip = x'00a9fe4001000000000000000000000000000000' and table_name = 'resource_parms' and process_name = 'SSPcntl_agt'
        /!/;
      3. at 2297037
        #220526 11:32:46 server id 1 end_log_pos 2297470 CRC32 0xa716708c Query thread_id=11783 exec_time=0 error_code=0
        SET TIMESTAMP=1653564766/!/;
        update db01host.notify_remdb_operation set process_id = 11312, objlen = 48, flag = 0, socket = 171, retry_count = 0, host_ip = x'00a9fe4101000000000000000000000000000000', schema_class = 'cfg' where operation = 2 and notification_type = 2 and remote_ip = x'00a9fe4001000000000000000000000000000000' and table_name = 'resource_parms' and process_name = 'SSPcntl_agt'
        /!/;
      4. at 2297470
        #220526 11:32:46 server id 1 end_log_pos 2297903 CRC32 0x3144ec9e Query thread_id=11783 exec_time=0 error_code=0
        SET TIMESTAMP=1653564766/!/;
        update db01host.notify_remdb_operation set process_id = 11312, objlen = 48, flag = 0, socket = 171, retry_count = 0, host_ip = x'00a9fe4101000000000000000000000000000000', schema_class = 'cfg' where operation = 3 and notification_type = 2 and remote_ip = x'00a9fe4001000000000000000000000000000000' and table_name = 'resource_parms' and process_name = 'SSPcntl_agt'
        /!/;
      5. at 2297903
        #220526 11:32:46 server id 1 end_log_pos 2297934 CRC32 0x3214e514 Xid = 249264
        COMMIT/!/;

      *3. About >6m later, it started to write bin log for the remaining records for that transaction (Note in-between there were bin logs writes for other transactions): *

      1. at 2443522
        #220526 11:39:18 server id 1 end_log_pos 2443961 CRC32 0xd83cc209 Query thread_id=72 exec_time=0 error_code=0
        SET TIMESTAMP=1653565158/!/;
        update db01host.notify_remdb_operation set process_id = 11312, objlen = 824, flag = 0, socket = 171, retry_count = 0, host_ip = x'00a9fe4001000000000000000000000000000000', schema_class = 'cfg' where operation = 1 and notification_type = 2 and remote_ip = x'00a9fe4001000000000000000000000000000000' and table_name = 'gateway_family_base' and process_name = 'SSPcntl_agt'
        /!/;
        ......
      2. at 2498796
        #220526 11:39:20 server id 1 end_log_pos 2499229 CRC32 0x2b4c10a9 Query thread_id=72 exec_time=0 error_code=0
        SET TIMESTAMP=1653565160/!/;
        update db01host.notify_remdb_operation set process_id = 11312, objlen = 48, flag = 0, socket = 171, retry_count = 0, host_ip = x'00a9fe4001000000000000000000000000000000', schema_class = 'cfg' where operation = 3 and notification_type = 2 and remote_ip = x'00a9fe4001000000000000000000000000000000' and table_name = 'resource_parms' and process_name = 'SSPcntl_agt'
        /!/;

      Then finally the DB contents were overwritten incorrectly by the delayed bin log updates causing product failures.

      *Mariadb Server version: 10.4.21-MariaDB-log Source distribution

      Attached config file my.cnf and bin log as well as decoded bin log.

      Your advice is highly appreciated.*

      Attachments

        1. my.cnf
          3 kB
        2. bin.log
          5.06 MB
        3. mysql-lcpdb-bin.000003
          4.38 MB

        Activity

          liberiul Andy Liu created issue -
          Elkin Andrei Elkin added a comment -

          > it started to write bin log for the remaining records for that transaction

          I don't see what you call p.3 binlog snippet relevant to p.2 fully logged trx.
          Notice specifically p.3 thread_id=72 is different from one in p.2.

          Could you please explain what do you mean?

          Elkin Andrei Elkin added a comment - > it started to write bin log for the remaining records for that transaction I don't see what you call p.3 binlog snippet relevant to p.2 fully logged trx. Notice specifically p.3 thread_id=72 is different from one in p.2. Could you please explain what do you mean?
          Elkin Andrei Elkin added a comment -

          > , it started to write bin log for the remaining records for that transaction

          'That' transaction is fully committed in binlog, having thread_id 11783. In the follow up that the thread id is different.
          Could you please explain in details what got wrong in your case.

          Elkin Andrei Elkin added a comment - > , it started to write bin log for the remaining records for that transaction 'That' transaction is fully committed in binlog, having thread_id 11783. In the follow up that the thread id is different. Could you please explain in details what got wrong in your case.
          Elkin Andrei Elkin made changes -
          Field Original Value New Value
          Comment [ > it started to write bin log for the remaining records for that transaction ]
          Elkin Andrei Elkin made changes -
          Comment [ > it started to write bin log for the remaining records for that transaction ]
          liberiul Andy Liu added a comment - - edited

          Thanks for looking at this.

          So what does thread_id in binlog signify and under what condition should it be changed? Does it mean new connection initiated from client?

          Here are details of what was going on (note they were all running under single product thread context):

          1. Start one transaction and loop reading "db01host.notify_remdb_operation" table by calling C++ mysql_real_query with statement param "select * from db01host.notify_remdb_operation where remote_ip = x'00a9fe4001000000000000000000000000000000' and process_name = 'SSPcntl_agt'"

          2. Also start another transaction to handle all tuples read from p.1. For each tuple, we just reset its retry_count field to 0 and write back to DB via calling mysql_real_query with update statement.

          3. After all tuples matching specified "remote_ip" and "process_name" were handled, the first transaction got "rollback" and the second transaction got "commit" via mysql_query which returned success as confirmed from product logs.

          The binlog snippets (both p.2 and p.3 in description) should all belong to the same transaction (the second) because they all have the same process_id = 11312 which was the old one. In between p.2 and p.3 snippets, the process_id had been updated to the new one "process_id = 64602".
          So regarding p.3 binlog there should be no way for a new transaction to get the old process_id 11312 after all tuples had been updated to the new value 64602. That's why we say p.2 and p.3 binlogs in description belong to the same committed transaction. However p.3 bin log was updated >6m later than p.2 and overwrote the latest correct DB tuples with wrong tuples with old process_id = 11312.

          liberiul Andy Liu added a comment - - edited Thanks for looking at this. So what does thread_id in binlog signify and under what condition should it be changed? Does it mean new connection initiated from client? Here are details of what was going on (note they were all running under single product thread context): 1. Start one transaction and loop reading "db01host.notify_remdb_operation" table by calling C++ mysql_real_query with statement param "select * from db01host.notify_remdb_operation where remote_ip = x'00a9fe4001000000000000000000000000000000' and process_name = 'SSPcntl_agt'" 2. Also start another transaction to handle all tuples read from p.1. For each tuple, we just reset its retry_count field to 0 and write back to DB via calling mysql_real_query with update statement. 3. After all tuples matching specified "remote_ip" and "process_name" were handled, the first transaction got "rollback" and the second transaction got "commit" via mysql_query which returned success as confirmed from product logs. The binlog snippets (both p.2 and p.3 in description) should all belong to the same transaction (the second) because they all have the same process_id = 11312 which was the old one. In between p.2 and p.3 snippets, the process_id had been updated to the new one "process_id = 64602". So regarding p.3 binlog there should be no way for a new transaction to get the old process_id 11312 after all tuples had been updated to the new value 64602. That's why we say p.2 and p.3 binlogs in description belong to the same committed transaction. However p.3 bin log was updated >6m later than p.2 and overwrote the latest correct DB tuples with wrong tuples with old process_id = 11312.
          Elkin Andrei Elkin added a comment -

          > The binlog snippets (both p.2 and p.3 in description) should all belong to the same transaction (the second) because they all have the same process_id = 11312 which was the old one.

          This assumption bases on your application logics. But indeed we see in the database different connection identifiers (that's what thread_id is) for blocks of sql code that you think one transaction. A transaction can be associated with at most one connection.

          Elkin Andrei Elkin added a comment - > The binlog snippets (both p.2 and p.3 in description) should all belong to the same transaction (the second) because they all have the same process_id = 11312 which was the old one. This assumption bases on your application logics. But indeed we see in the database different connection identifiers (that's what thread_id is) for blocks of sql code that you think one transaction. A transaction can be associated with at most one connection.

          One way to reconcile expectations and reality here would be to temporarily enable the general log and then to match it with the binary log, to see which updates come from where.

          elenst Elena Stepanova added a comment - One way to reconcile expectations and reality here would be to temporarily enable the general log and then to match it with the binary log, to see which updates come from where.
          elenst Elena Stepanova made changes -
          Status Open [ 1 ] Needs Feedback [ 10501 ]
          liberiul Andy Liu added a comment -

          Thanks for the advice.
          We'll try to reproduce with general log enabled for further clue. Meanwhile we'll be also looking at this from product perspective.

          liberiul Andy Liu added a comment - Thanks for the advice. We'll try to reproduce with general log enabled for further clue. Meanwhile we'll be also looking at this from product perspective.

          liberiul, any new discoveries?

          elenst Elena Stepanova added a comment - liberiul , any new discoveries?
          liberiul Andy Liu added a comment -

          It's found that the replication got delayed significantly under extreme high CPU usage. The statement executed on master host was not applied to slave host until minutes later. The delayed replication events finally overrode the statements queried to slave and caused the issue.

          Would you advise:
          1) Are there any tweaking options to improve mariadb performance or reduce CPU usage? It is noticed that the commit may take long time to finish (>1m).
          2) How to assure the replication timing based on timestamp?
          e.g.
          Event A: master update tuple T at 1:00;
          Event B: slave updates tuple T at 1:02;
          The event A should be replicated and applied to slave no later than event B. Otherwise the DB value on slave would be mistakenly overridden by the obsolete one.

          liberiul Andy Liu added a comment - It's found that the replication got delayed significantly under extreme high CPU usage. The statement executed on master host was not applied to slave host until minutes later. The delayed replication events finally overrode the statements queried to slave and caused the issue. Would you advise: 1) Are there any tweaking options to improve mariadb performance or reduce CPU usage? It is noticed that the commit may take long time to finish (>1m). 2) How to assure the replication timing based on timestamp? e.g. Event A: master update tuple T at 1:00; Event B: slave updates tuple T at 1:02; The event A should be replicated and applied to slave no later than event B. Otherwise the DB value on slave would be mistakenly overridden by the obsolete one.
          Elkin Andrei Elkin added a comment -

          liberiul, to the 2nd of the two, it looks like you'd be interested in master_wait_gtid. That is the B's updater would invoke the function prior to submit the update. That obviously requires A and B updaters coordination on the matter of the sequence number of replication events. (B must be aware of the last A's update's gtid).

          To the 1st question, it looks like the request is about the slave threads scheduling prioritization (which we don't provide unfortunately) unless your whole slave server was busy.

          Elkin Andrei Elkin added a comment - liberiul , to the 2nd of the two, it looks like you'd be interested in master_wait_gtid . That is the B's updater would invoke the function prior to submit the update. That obviously requires A and B updaters coordination on the matter of the sequence number of replication events. (B must be aware of the last A's update's gtid). To the 1st question, it looks like the request is about the slave threads scheduling prioritization (which we don't provide unfortunately) unless your whole slave server was busy.
          liberiul Andy Liu made changes -
          Status Needs Feedback [ 10501 ] Open [ 1 ]
          liberiul Andy Liu added a comment -

          Thanks for the explanation. This ticket can be closed.

          liberiul Andy Liu added a comment - Thanks for the explanation. This ticket can be closed.
          serg Sergei Golubchik made changes -
          Fix Version/s N/A [ 14700 ]
          Resolution Won't Fix [ 2 ]
          Status Open [ 1 ] Closed [ 6 ]

          People

            Unassigned Unassigned
            liberiul Andy Liu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.