Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.1(EOL), 10.2(EOL), 10.3(EOL)
    • 10.1.35, 10.2.17, 10.3.9
    • Galera
    • None
    • CentOS 7.4 (with microcode_ctl-2.1-22.2.el7.x86_64, AFAIK withdrawn because of stability issues)

    Description

      hi,

      this might be duplicate if MDEV-15747, but is for Verision 10.2.12 so I ma not sure

      We ran in this bug last month, but I had no time to look deeper into this until now.

      It was on a Cluster with 2 nodes. The problems started on node "DB2", and shifted to node "DB1" when trying to restart "DB2". The restart failed, so I rebooted the node.

      here is the timeline of what happend when on node DB2:
      08:00:24 - problem on node DB2
      10:27:58 - trying to restart mariadb (failed!)

      from /var/log/messages:
      Mar 13 10:29:28 DB2 systemd: mariadb.service stop-sigterm timed out. Skipping SIGKILL.
      Mar 13 10:30:59 DB2 systemd: mariadb.service stop-final-sigterm timed out. Skipping SIGKILL. Entering failed mode.
      Mar 13 10:30:59 DB2 systemd: Stopped MariaDB 10.2.12 database server.
      Mar 13 10:30:59 DB2 systemd: Unit mariadb.service entered failed state.
      Mar 13 10:30:59 DB2 systemd: mariadb.service failed.
      Mar 13 10:31:13 DB2 systemd: Starting MariaDB 10.2.12 database server...
      Mar 13 10:32:43 DB2 systemd: mariadb.service start-pre operation timed out. Terminating.
      Mar 13 10:32:43 DB2 systemd: Failed to start MariaDB 10.2.12 database server.
      Mar 13 10:32:43 DB2 systemd: Unit mariadb.service entered failed state.
      Mar 13 10:32:43 DB2 systemd: mariadb.service failed.

      10:41 - rebooted node DB2

      Attachments

        Issue Links

          Activity

            Dimov Vasil (Inactive) added a comment - - edited

            Just for the records, a copy from IM:

            I do not see a problem in your patch. If it fixes the problem, then it should be ok. Some minor things: make sure that the checks will not crash due to trx or trx->mysql_thd being NULL (I am not sure if that is possible). Also in the test - you do not need galera_bf_background_statistics.opt with --innodb_stats_persistent=ON inside, it should be the default and you already check @@innodb_stats_persistent at the beginning of the test. In "--eval SELECT $wsrep_local_bf_aborts_after - $wsrep_local_bf_aborts_before = 1 AS ..." I would remove the " = 1" part because it would mask any failure as 0 instead of providing the actual difference for analysis. E.g. if you just do select new - old as ... and the .result file contains 1 and anything else is returned, like 0 or 2 or 3 or 5 you will actually see that value in the failure message from mtr.
            

            Dimov Vasil (Inactive) added a comment - - edited Just for the records, a copy from IM: I do not see a problem in your patch. If it fixes the problem, then it should be ok. Some minor things: make sure that the checks will not crash due to trx or trx->mysql_thd being NULL (I am not sure if that is possible). Also in the test - you do not need galera_bf_background_statistics.opt with --innodb_stats_persistent=ON inside, it should be the default and you already check @@innodb_stats_persistent at the beginning of the test. In "--eval SELECT $wsrep_local_bf_aborts_after - $wsrep_local_bf_aborts_before = 1 AS ..." I would remove the " = 1" part because it would mask any failure as 0 instead of providing the actual difference for analysis. E.g. if you just do select new - old as ... and the .result file contains 1 and anything else is returned, like 0 or 2 or 3 or 5 you will actually see that value in the failure message from mtr.

            Dimov, I believe that in MariaDB’s mysql-test-run, InnoDB persistent statistics are disabled by default.

            marko Marko Mäkelä added a comment - Dimov , I believe that in MariaDB’s mysql-test-run , InnoDB persistent statistics are disabled by default.

            Ok, then of course, the `.opt` file is necessary.

            Dimov Vasil (Inactive) added a comment - Ok, then of course, the `.opt` file is necessary.
            ktchow_ogcio Chow King Tak added a comment -

            I am using version 10.3.9 but I just encountered the BF error. It blocked all transactions and I needed to restart the node. Is there any configuration in the my.cnf I have to adjust to avoid the error?

            The following are some of the error messages:

            2018-11-22 8:53:28 0 [Note] InnoDB: WSREP: BF lock wait long for trx:0x2129df query: DELETE FROM pe3logex1.tmp_app_to_db_status_report WHERE site='SP'¢Yõ[^S^B
            TRANSACTION 2173407, ACTIVE 42084 sec starting index read
            mysql tables in use 1, locked 1
            LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
            MySQL thread id 2, OS thread handle 140537284146944, query id 5316319 Delete_rows_log_event::find_row(278043)
            DELETE FROM pe3logex1.tmp_app_to_db_status_report WHERE site='SP'
            2018-11-22 8:53:29 0 [Note] InnoDB: WSREP: BF lock wait long for trx:0x2129df query: DELETE FROM pe3logex1.tmp_app_to_db_status_report WHERE site='SP'¢Yõ[^S^B
            TRANSACTION 2173407, ACTIVE 42085 sec starting index read
            mysql tables in use 1, locked 1
            LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
            MySQL thread id 2, OS thread handle 140537284146944, query id 5316319 Delete_rows_log_event::find_row(278043)
            DELETE FROM pe3logex1.tmp_app_to_db_status_report WHERE site='SP'

            =====================================
            2018-11-22 08:53:29 0x7fd0f9431700 INNODB MONITOR OUTPUT
            =====================================
            Per second averages calculated from the last 16 seconds
            -----------------
            BACKGROUND THREAD
            -----------------
            srv_master_thread loops: 8323 srv_active, 0 srv_shutdown, 59026 srv_idle
            srv_master_thread log flush and writes: 67349
            ----------
            SEMAPHORES
            ----------
            OS WAIT ARRAY INFO: reservation count 212583
            OS WAIT ARRAY INFO: signal count 659238
            RW-shared spins 0, rounds 1190209, OS waits 78610
            RW-excl spins 0, rounds 10315112, OS waits 63310
            RW-sx spins 213949, rounds 3318375, OS waits 31377
            Spin rounds per wait: 1190209.00 RW-shared, 10315112.00 RW-excl, 15.51 RW-sx
            ------------
            TRANSACTIONS
            ------------
            Trx id counter 2173554
            Purge done for trx's n:o < 2173504 undo n:o < 0 state: running but idle
            History list length 3
            LIST OF TRANSACTIONS FOR EACH SESSION:
            ---TRANSACTION 422011681601128, not started
            0 lock struct(s), heap size 1136, 0 row lock(s)
            ---TRANSACTION 422011681596232, not started
            0 lock struct(s), heap size 1136, 0 row lock(s)
            ---TRANSACTION 2173426, ACTIVE 41969 sec
            4 lock struct(s), heap size 1136, 129 row lock(s), undo log entries 258
            MySQL thread id 17, OS thread handle 140537170228992, query id 5333660 kpcp-portalapp1.egis.hksarg 10.88.223.9 e3logexp10 Init
            commit
            TABLE LOCK table `pe3logex1`.`db_instance_restart_time_to_portal` trx id 2173426 lock mode IX
            RECORD LOCKS space id 22 page no 3 n bits 272 index PRIMARY of table `pe3logex1`.`db_instance_restart_time_to_portal` trx id 2173426 lock_mode X locks rec but not gap
            RECORD LOCKS space id 22 page no 4 n bits 256 index PRIMARY of table `pe3logex1`.`db_instance_restart_time_to_portal` trx id 2173426 lock_mode X locks rec but not gap
            Record lock, heap no 2 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
            0: len 8; hex 800000000001590c; asc Y ;;
            1: len 6; hex 0000002129f2; asc !) ;;
            2: len 7; hex 00000008230110; asc # ;;
            3: len 5; hex 4353464953; asc CSFIS;;
            4: len 2; hex 4b50; asc KP;;
            5: len 10; hex 6373666973696e747031; asc csfisintp1;;
            6: len 11; hex 703063736669736d646231; asc p0csfismdb1;;
            7: len 2; hex 4b50; asc KP;;
            8: len 5; hex 99a05b26c2; asc [& ;;
            9: len 3; hex 52554e; asc RUN;;
            10: len 5; hex 99a16b52c2; asc kR ;;
            11: len 5; hex 99a16b5300; asc kS ;;
            Record lock, heap no 3 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
            0: len 8; hex 800000000001590e; asc Y ;;
            1: len 6; hex 0000002129f2; asc !) ;;
            2: len 7; hex 0000000823013a; asc # :;;
            3: len 4; hex 45504159; asc EPAY;;
            4: len 2; hex 4b50; asc KP;;
            5: len 9; hex 65706179696e747031; asc epayintp1;;
            6: len 10; hex 7031657061796d646231; asc p1epaymdb1;;
            7: len 2; hex 4b50; asc KP;;
            8: len 5; hex 99a0480ade; asc H ;;
            9: len 3; hex 52554e; asc RUN;;
            10: len 5; hex 99a16b52c2; asc kR ;;
            11: len 5; hex 99a16b5300; asc kS ;;

            Record lock, heap no 4 PHYSICAL RECORD: n_fields 12; compact format; info bits 32
            0: len 8; hex 8000000000015910; asc Y ;;
            1: len 6; hex 0000002129f2; asc !) ;;
            2: len 7; hex 00000008230164; asc # d;;
            3: len 5; hex 4f47455053; asc OGEPS;;
            4: len 2; hex 4b50; asc KP;;
            5: len 10; hex 6f67657073696e747031; asc ogepsintp1;;
            6: len 12; hex 70306f676570736d64627332; asc p0ogepsmdbs2;;
            7: len 2; hex 5350; asc SP;;
            8: len 5; hex 99a04e0aed; asc N ;;
            9: len 3; hex 52554e; asc RUN;;
            10: len 5; hex 99a16b52c2; asc kR ;;
            11: len 5; hex 99a16b5300; asc kS ;;

            ktchow_ogcio Chow King Tak added a comment - I am using version 10.3.9 but I just encountered the BF error. It blocked all transactions and I needed to restart the node. Is there any configuration in the my.cnf I have to adjust to avoid the error? The following are some of the error messages: 2018-11-22 8:53:28 0 [Note] InnoDB: WSREP: BF lock wait long for trx:0x2129df query: DELETE FROM pe3logex1.tmp_app_to_db_status_report WHERE site='SP'¢Yõ[^S^B TRANSACTION 2173407, ACTIVE 42084 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s) MySQL thread id 2, OS thread handle 140537284146944, query id 5316319 Delete_rows_log_event::find_row(278043) DELETE FROM pe3logex1.tmp_app_to_db_status_report WHERE site='SP' 2018-11-22 8:53:29 0 [Note] InnoDB: WSREP: BF lock wait long for trx:0x2129df query: DELETE FROM pe3logex1.tmp_app_to_db_status_report WHERE site='SP'¢Yõ[^S^B TRANSACTION 2173407, ACTIVE 42085 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s) MySQL thread id 2, OS thread handle 140537284146944, query id 5316319 Delete_rows_log_event::find_row(278043) DELETE FROM pe3logex1.tmp_app_to_db_status_report WHERE site='SP' ===================================== 2018-11-22 08:53:29 0x7fd0f9431700 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 16 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 8323 srv_active, 0 srv_shutdown, 59026 srv_idle srv_master_thread log flush and writes: 67349 ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 212583 OS WAIT ARRAY INFO: signal count 659238 RW-shared spins 0, rounds 1190209, OS waits 78610 RW-excl spins 0, rounds 10315112, OS waits 63310 RW-sx spins 213949, rounds 3318375, OS waits 31377 Spin rounds per wait: 1190209.00 RW-shared, 10315112.00 RW-excl, 15.51 RW-sx ------------ TRANSACTIONS ------------ Trx id counter 2173554 Purge done for trx's n:o < 2173504 undo n:o < 0 state: running but idle History list length 3 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 422011681601128, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 422011681596232, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 2173426, ACTIVE 41969 sec 4 lock struct(s), heap size 1136, 129 row lock(s), undo log entries 258 MySQL thread id 17, OS thread handle 140537170228992, query id 5333660 kpcp-portalapp1.egis.hksarg 10.88.223.9 e3logexp10 Init commit TABLE LOCK table `pe3logex1`.`db_instance_restart_time_to_portal` trx id 2173426 lock mode IX RECORD LOCKS space id 22 page no 3 n bits 272 index PRIMARY of table `pe3logex1`.`db_instance_restart_time_to_portal` trx id 2173426 lock_mode X locks rec but not gap RECORD LOCKS space id 22 page no 4 n bits 256 index PRIMARY of table `pe3logex1`.`db_instance_restart_time_to_portal` trx id 2173426 lock_mode X locks rec but not gap Record lock, heap no 2 PHYSICAL RECORD: n_fields 12; compact format; info bits 32 0: len 8; hex 800000000001590c; asc Y ;; 1: len 6; hex 0000002129f2; asc !) ;; 2: len 7; hex 00000008230110; asc # ;; 3: len 5; hex 4353464953; asc CSFIS;; 4: len 2; hex 4b50; asc KP;; 5: len 10; hex 6373666973696e747031; asc csfisintp1;; 6: len 11; hex 703063736669736d646231; asc p0csfismdb1;; 7: len 2; hex 4b50; asc KP;; 8: len 5; hex 99a05b26c2; asc [& ;; 9: len 3; hex 52554e; asc RUN;; 10: len 5; hex 99a16b52c2; asc kR ;; 11: len 5; hex 99a16b5300; asc kS ;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 12; compact format; info bits 32 0: len 8; hex 800000000001590e; asc Y ;; 1: len 6; hex 0000002129f2; asc !) ;; 2: len 7; hex 0000000823013a; asc # :;; 3: len 4; hex 45504159; asc EPAY;; 4: len 2; hex 4b50; asc KP;; 5: len 9; hex 65706179696e747031; asc epayintp1;; 6: len 10; hex 7031657061796d646231; asc p1epaymdb1;; 7: len 2; hex 4b50; asc KP;; 8: len 5; hex 99a0480ade; asc H ;; 9: len 3; hex 52554e; asc RUN;; 10: len 5; hex 99a16b52c2; asc kR ;; 11: len 5; hex 99a16b5300; asc kS ;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 12; compact format; info bits 32 0: len 8; hex 8000000000015910; asc Y ;; 1: len 6; hex 0000002129f2; asc !) ;; 2: len 7; hex 00000008230164; asc # d;; 3: len 5; hex 4f47455053; asc OGEPS;; 4: len 2; hex 4b50; asc KP;; 5: len 10; hex 6f67657073696e747031; asc ogepsintp1;; 6: len 12; hex 70306f676570736d64627332; asc p0ogepsmdbs2;; 7: len 2; hex 5350; asc SP;; 8: len 5; hex 99a04e0aed; asc N ;; 9: len 3; hex 52554e; asc RUN;; 10: len 5; hex 99a16b52c2; asc kR ;; 11: len 5; hex 99a16b5300; asc kS ;;
            gaopeng peng gao added a comment -

            Hi All:

            Like Chow King Tak, we also use mariadb 10.3.9 with gelera, one node stuck for ever.finally, we restart cluster,
            errlog like ,

            2022-09-14 9:33:43 59 [Note] InnoDB: WSREP: BF lock wait long for trx:0x12a95746f query: insert into *** () TRANSACTION 5009405039, ACTIVE 3706 sec inserting
            mysql tables in use 1, locked 1
            2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
            MySQL thread id 59, OS thread handle 139771672282880, query id 2495060706 Write_rows_log_event::write_row(842660726)
            insert into *** ()

            Here insert transaction stuck 3706s,state is inserting .

            Have any fixed here ? Thanks..

            gaopeng peng gao added a comment - Hi All: Like Chow King Tak, we also use mariadb 10.3.9 with gelera, one node stuck for ever.finally, we restart cluster, errlog like , 2022-09-14 9:33:43 59 [Note] InnoDB: WSREP: BF lock wait long for trx:0x12a95746f query: insert into *** () TRANSACTION 5009405039, ACTIVE 3706 sec inserting mysql tables in use 1, locked 1 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1 MySQL thread id 59, OS thread handle 139771672282880, query id 2495060706 Write_rows_log_event::write_row(842660726) insert into *** () Here insert transaction stuck 3706s,state is inserting . Have any fixed here ? Thanks..

            People

              jplindst Jan Lindström (Inactive)
              rs@rodlauer.co.at Reinhard Sojka
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.