Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13754

TRANSACTION sometimes stay in "KILLLED' STATE for inifinitve time!

Details

    Description

      Hi dudes, after upgrade from mariadb 10.1 to 10.2.8, some simple transaction is running for infinitive times...

      SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST where Time > 9000\G
      

      I have 19 transaction that is running for this time

      they won't die with kill command..

                 ID: 10245887
               USER: novid
               HOST: 
                 DB: api
            COMMAND: Killed
               TIME: 106767
              STATE: NULL
               INFO: insert into `driver_status_changelog` (`status`, `driver_id`, `updated_at`, `created_at`) values ('4', '65767', '2017-09-06 09:22:04', '2017-09-06 09:22:04')
            TIME_MS: 106767343.552
              STAGE: 0
          MAX_STAGE: 0
           PROGRESS: 0.000
        MEMORY_USED: 92848
      EXAMINED_ROWS: 0
           QUERY_ID: 300183941
        INFO_BINARY: insert into `driver_status_changelog` (`status`, `driver_id`, `updated_at`, `created_at`) values ('4', '65767', '2017-09-06 09:22:04', '2017-09-06 09:22:04')
                TID: 15109
      

      and with SHOW ENGINE INNODB STATUS\G

      ---TRANSACTION 46385651188, ACTIVE 107004 sec
      8 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 4
      MySQL thread id 10245887, OS thread handle 0x7f898dcb6700, query id 300183941 api
      insert into `driver_status_changelog` (`status`, `driver_id`, `updated_at`, `created_at`) values ('4', '65767', '2017-09-06 09:22:04', '2017-09-06 09:22:04')
      Trx read view will not see trx with id >= 46385651191, sees < 46385650307
      

      changing in these transaction is so small (2 or 3) so waiting is not for transaction rollback.

      Thread stay in killed state and so some record is lock for infinitive time

      Attachments

        1. 42003.gdb
          97 kB
        2. 43205.gdb
          114 kB
        3. after.gdb
          92 kB
        4. after-restart.gdb
          67 kB
        5. before.gdb
          92 kB
        6. before-restart.gdb
          90 kB
        7. innodb-status
          48 kB
        8. my.cnf
          6 kB

        Issue Links

          Activity

            novid,

            Is there anything in the error log – errors, warnings, maybe even unusual notes?
            Did you check the disk space?
            How did the transactions get killed? As you said, they are very small, so what happened before that made them be killed – were they waiting on a lock too long, or..?
            Please also attach your cnf file(s).

            elenst Elena Stepanova added a comment - novid , Is there anything in the error log – errors, warnings, maybe even unusual notes? Did you check the disk space? How did the transactions get killed? As you said, they are very small, so what happened before that made them be killed – were they waiting on a lock too long, or..? Please also attach your cnf file(s).
            novid novid added a comment -

            Some transactions randomly run for infinitive time... and lock some rows for infinitive time

            At syslog , I got this errors when application access to rows that is locked.

             Sep  8 11:18:50 sv17 mysqld[47544]: 2017-09-08 11:18:50 139718467409664 [Warning] Aborted connection 5710374 to db: 'api' user: 'novid' host: '172.16.77.201' (Lock wait timeout exceeded; try restarting transaction)
            

            I got more information from innodb status

            show engine innodb status\G
             
            =====================================
            2017-09-08 13:15:02 0x7f12f254b700 INNODB MONITOR OUTPUT
            =====================================
            Per second averages calculated from the last 19 seconds
            -----------------
            BACKGROUND THREAD
            -----------------
            srv_master_thread loops: 35578 srv_active, 0 srv_shutdown, 128 srv_idle
            srv_master_thread log flush and writes: 35705
            ----------
            SEMAPHORES
            ----------
            OS WAIT ARRAY INFO: reservation count 4842727
            --Thread 139718442252032 has waited at buf0buf.cc line 4145 for 0.00 seconds the semaphore:
            S-lock on RW-latch at 0x7f1a8fc10568 created in file buf0buf.cc line 1478
            a writer (thread id 0) has reserved it in mode  exclusive
            number of readers 0, waiters flag 1, lock_word: 0
            Last time read locked in file row0sel.cc line 3340
            Last time write locked in file buf0buf.cc line 5349
            OS WAIT ARRAY INFO: signal count 3350362
            RW-shared spins 0, rounds 3624185, OS waits 1967284
            RW-excl spins 0, rounds 79887179, OS waits 2596656
            RW-sx spins 55955, rounds 1253737, OS waits 27193
            Spin rounds per wait: 3624185.00 RW-shared, 79887179.00 RW-excl, 22.41 RW-sx
            ------------
            TRANSACTIONS
            ------------
            Trx id counter 46468946036
            Purge done for trx's n:o < 46468944596 undo n:o < 0 state: running but idle
            History list length 228
            LIST OF TRANSACTIONS FOR EACH SESSION:
            ---TRANSACTION 421268236506000, not started
            0 lock struct(s), heap size 1136, 0 row lock(s)
            ---TRANSACTION 421268236487016, not started estimating records in index range
            mysql tables in use 1, locked 0
            0 lock struct(s), heap size 1136, 0 row lock(s)
            ---TRANSACTION 421268236490632, not started
            0 lock struct(s), heap size 1136, 0 row lock(s)
            ---TRANSACTION 421268236507808, not started
            0 lock struct(s), heap size 1136, 0 row lock(s)
            ---TRANSACTION 421268236488824, not started estimating records in index range
            mysql tables in use 1, locked 0
            0 lock struct(s), heap size 1136, 0 row lock(s)
            ---TRANSACTION 421268236503288, not started
            0 lock struct(s), heap size 1136, 0 row lock(s)
            ---TRANSACTION 421268236487920, not started
            0 lock struct(s), heap size 1136, 0 row lock(s)
            ---TRANSACTION 421268236380344, not started
            0 lock struct(s), heap size 1136, 0 row lock(s)
            ---TRANSACTION 421268236471648, not started
            0 lock struct(s), heap size 1136, 0 row lock(s)
            ---TRANSACTION 421268236486112, not started
            0 lock struct(s), heap size 1136, 0 row lock(s)
            mysql tables in use 3, locked 3
            1 lock struct(s), heap size 1136, 0 row lock(s)
            MySQL thread id 9411053, OS thread handle 0x7f12bc1e7700, query id 251372917 172.16.77.126 novid update
            insert into `oauth_s` (`id`, `expire_time`, `session_id`, `created_at`, `updated_at`) values ('REzPa1eqBaX3cjj', '1505292302', '13472555', '2017-09-08 08:45:02', '2017-09-08 08:45:02')
            
            

            before killing transaction

             
            SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST where Time > 1000\G
             
            *************************** 4. row ***************************
                       ID: 2299660
                     USER: novid
                     HOST: 172.16.77.238:49159
                       DB: api
                  COMMAND: Execute
                     TIME: 16896
                    STATE: query end
                     INFO: update `drivers` set `has_traffic_license` = '0', `updated_at` = '2017-09-08 08:37:29' where `id` = '176766'
                  TIME_MS: 16896741.939
                    STAGE: 0
                MAX_STAGE: 0
                 PROGRESS: 0.000
              MEMORY_USED: 92848
            EXAMINED_ROWS: 1
                 QUERY_ID: 58525251
              INFO_BINARY: update `drivers` set `has_traffic_license` = '0', `updated_at` = '2017-09-08 08:37:29' where `id` = '176766'
                      TID: 13065
            
            

            after killing the transactions, their status is changed from “Execute” to “Killed” but the rows are still locked

            kill 2299660
                       ID: 2299660
                     USER: novid
                     HOST: 172.16.77.238:49159
                       DB: api
                  COMMAND: Killed
                     TIME: 17210
                    STATE: query end
                     INFO: update `drivers` set `has_traffic_license` = '0', `updated_at` = '2017-09-08 08:37:29' where `id` = '176766'
                  TIME_MS: 17210265.710
                    STAGE: 0
                MAX_STAGE: 0
                 PROGRESS: 0.000
              MEMORY_USED: 92848
            EXAMINED_ROWS: 1
                 QUERY_ID: 58525251
              INFO_BINARY: update `drivers` set `has_traffic_license` = '0', `updated_at` = '2017-09-08 08:37:29' where `id` = '176766'
                      TID: 13065
            

            To fix this problem, I have to restart mariadb every midnight, after restart I got these alarms at syslog

             : 2017-09-08  3:19:08 139793750386880 [Note] InnoDB: 22 transaction(s) which must be rolled back or cleaned up in total 56 row operations to undo
            Sep  8 03:19:10 sv17 mysqld[47544]: 2017-09-08  3:19:10 139718481258240 [Note] InnoDB: Starting in background the rollback of recovered transactions
            Sep  8 03:19:12 sv17 mysqld[47544]: 2017-09-08  3:19:12 139718481258240 [Note] InnoDB: Rollback of non-prepared transactions completed
            

            my config is attached
            my.cnf

            novid novid added a comment - Some transactions randomly run for infinitive time... and lock some rows for infinitive time At syslog , I got this errors when application access to rows that is locked. Sep 8 11:18:50 sv17 mysqld[47544]: 2017-09-08 11:18:50 139718467409664 [Warning] Aborted connection 5710374 to db: 'api' user: 'novid' host: '172.16.77.201' (Lock wait timeout exceeded; try restarting transaction) I got more information from innodb status show engine innodb status\G   ===================================== 2017-09-08 13:15:02 0x7f12f254b700 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 19 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 35578 srv_active, 0 srv_shutdown, 128 srv_idle srv_master_thread log flush and writes: 35705 ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 4842727 --Thread 139718442252032 has waited at buf0buf.cc line 4145 for 0.00 seconds the semaphore: S-lock on RW-latch at 0x7f1a8fc10568 created in file buf0buf.cc line 1478 a writer (thread id 0) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file row0sel.cc line 3340 Last time write locked in file buf0buf.cc line 5349 OS WAIT ARRAY INFO: signal count 3350362 RW-shared spins 0, rounds 3624185, OS waits 1967284 RW-excl spins 0, rounds 79887179, OS waits 2596656 RW-sx spins 55955, rounds 1253737, OS waits 27193 Spin rounds per wait: 3624185.00 RW-shared, 79887179.00 RW-excl, 22.41 RW-sx ------------ TRANSACTIONS ------------ Trx id counter 46468946036 Purge done for trx 's n:o < 46468944596 undo n:o < 0 state: running but idle History list length 228 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 421268236506000, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421268236487016, not started estimating records in index range mysql tables in use 1, locked 0 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421268236490632, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421268236507808, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421268236488824, not started estimating records in index range mysql tables in use 1, locked 0 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421268236503288, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421268236487920, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421268236380344, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421268236471648, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421268236486112, not started 0 lock struct(s), heap size 1136, 0 row lock(s) mysql tables in use 3, locked 3 1 lock struct(s), heap size 1136, 0 row lock(s) MySQL thread id 9411053, OS thread handle 0x7f12bc1e7700, query id 251372917 172.16.77.126 novid update insert into `oauth_s` (`id`, `expire_time`, `session_id`, `created_at`, `updated_at`) values (' REzPa1eqBaX3cjj ', ' 1505292302 ', ' 13472555 ', ' 2017-09-08 08:45:02 ', ' 2017-09-08 08:45:02') before killing transaction   SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST where Time > 1000\G   *************************** 4. row *************************** ID: 2299660 USER : novid HOST: 172.16.77.238:49159 DB: api COMMAND: Execute TIME : 16896 STATE: query end INFO: update `drivers` set `has_traffic_license` = '0' , `updated_at` = '2017-09-08 08:37:29' where `id` = '176766' TIME_MS: 16896741.939 STAGE: 0 MAX_STAGE: 0 PROGRESS: 0.000 MEMORY_USED: 92848 EXAMINED_ROWS: 1 QUERY_ID: 58525251 INFO_BINARY: update `drivers` set `has_traffic_license` = '0' , `updated_at` = '2017-09-08 08:37:29' where `id` = '176766' TID: 13065 after killing the transactions, their status is changed from “Execute” to “Killed” but the rows are still locked kill 2299660 ID: 2299660 USER : novid HOST: 172.16.77.238:49159 DB: api COMMAND: Killed TIME : 17210 STATE: query end INFO: update `drivers` set `has_traffic_license` = '0' , `updated_at` = '2017-09-08 08:37:29' where `id` = '176766' TIME_MS: 17210265.710 STAGE: 0 MAX_STAGE: 0 PROGRESS: 0.000 MEMORY_USED: 92848 EXAMINED_ROWS: 1 QUERY_ID: 58525251 INFO_BINARY: update `drivers` set `has_traffic_license` = '0' , `updated_at` = '2017-09-08 08:37:29' where `id` = '176766' TID: 13065 To fix this problem, I have to restart mariadb every midnight, after restart I got these alarms at syslog : 2017-09-08 3:19:08 139793750386880 [Note] InnoDB: 22 transaction(s) which must be rolled back or cleaned up in total 56 row operations to undo Sep 8 03:19:10 sv17 mysqld[47544]: 2017-09-08 3:19:10 139718481258240 [Note] InnoDB: Starting in background the rollback of recovered transactions Sep 8 03:19:12 sv17 mysqld[47544]: 2017-09-08 3:19:12 139718481258240 [Note] InnoDB: Rollback of non-prepared transactions completed my config is attached my.cnf
            novid novid added a comment -

            I change many parameters these days without any successfully!

            transaction-isolation = READ-COMMITTED
            innodb_deadlock_detect = OFF

            The only solution at mariadb 10.2 for rollback transaction is restarting server at midnight :| after that all transactions Rollback successfully...

            novid novid added a comment - I change many parameters these days without any successfully! transaction-isolation = READ-COMMITTED innodb_deadlock_detect = OFF The only solution at mariadb 10.2 for rollback transaction is restarting server at midnight :| after that all transactions Rollback successfully...

            novid,

            So, the problem starts even before you attempt to kill threads, the queries are already hanging.
            Can you get a stack trace from the running server (with hanging transactions) before you kill them, and then another one when the queries are in 'Killed' state, before you restart the server? Run gdb --batch --eval-command="thread apply all bt" <path to mysqld> <pid> or whatever equivalent you have on your system.

            The InnoDB status appears to be truncated, could you please attach the full output?

            Does it always happen on INSERTs?

            You didn't say if you have checked the disk space and possible disk errors, if you haven't yet, please do.

            elenst Elena Stepanova added a comment - novid , So, the problem starts even before you attempt to kill threads, the queries are already hanging. Can you get a stack trace from the running server (with hanging transactions) before you kill them, and then another one when the queries are in 'Killed' state, before you restart the server? Run gdb --batch --eval-command="thread apply all bt" <path to mysqld> <pid> or whatever equivalent you have on your system. The InnoDB status appears to be truncated, could you please attach the full output? Does it always happen on INSERTs? You didn't say if you have checked the disk space and possible disk errors, if you haven't yet, please do.
            novid novid added a comment - - edited

            Hi dude, thanks for the response,

            We've just upgraded to mariadb 10.2.9 but the problem still exist.

            yes, the problem starts before I attempt to kill them.

            The problem has occurred on two tables and it's happen on Inserts and updates . That is the second time we've changed our logic codes, but we can't fix the problem.

            we've just moved to servers with 8 ssds (have been raid10 with physical raid controller) and 256 GB memory. the past server had hdd storages. unfortunately we experience the problems on two both of them and the main point is that we've never seen any problem on mariadb 10.1 and another point is that we've just seen the problem on two tables. so I think so the problem's came from mariadb 10.2 we monitor all servers with netdata, and specially I monitor mariadb with pt-stalk, I can't see any strange things.

            We've never had backup from database (I've got from slave) or we never can't stop mysql, after running systemctl stop mysql, I'm waiting for flushing dirty pages to disk and then I run kill -9 mysql!

            I'm attaching two files now , gdb's output before I kill transactions and then after kill them. before.gdb after.gdb

            I'll send three files for you at midnight, gdb's output before restart and after restart and innodb status. please waiting for them.

            I add them.. before-restart.gdb after-restart.gdb innodb-status

            After 42003 seconds from restarting mysql server, we've had 10 locked transactions.

            I've attached gdb's output after 42003 seconds and before killing them. 42003.gdb

            I've attached gdb's output after 43205 seconds and after killing them. 43205.gdb

            I've hoped to see this new commit. does this commit solve our problem? is there any way for testing it?

            thanks a lot.

            novid novid added a comment - - edited Hi dude, thanks for the response, We've just upgraded to mariadb 10.2.9 but the problem still exist. yes, the problem starts before I attempt to kill them. The problem has occurred on two tables and it's happen on Inserts and updates . That is the second time we've changed our logic codes, but we can't fix the problem. we've just moved to servers with 8 ssds (have been raid10 with physical raid controller) and 256 GB memory. the past server had hdd storages. unfortunately we experience the problems on two both of them and the main point is that we've never seen any problem on mariadb 10.1 and another point is that we've just seen the problem on two tables. so I think so the problem's came from mariadb 10.2 we monitor all servers with netdata, and specially I monitor mariadb with pt-stalk, I can't see any strange things. We've never had backup from database (I've got from slave) or we never can't stop mysql, after running systemctl stop mysql, I'm waiting for flushing dirty pages to disk and then I run kill -9 mysql! I'm attaching two files now , gdb's output before I kill transactions and then after kill them. before.gdb after.gdb I'll send three files for you at midnight, gdb's output before restart and after restart and innodb status. please waiting for them. I add them.. before-restart.gdb after-restart.gdb innodb-status After 42003 seconds from restarting mysql server, we've had 10 locked transactions. I've attached gdb's output after 42003 seconds and before killing them. 42003.gdb I've attached gdb's output after 43205 seconds and after killing them. 43205.gdb I've hoped to see this new commit. does this commit solve our problem? is there any way for testing it? thanks a lot.

            I stumbled upon this ticket while searching for something else.
            I had a quick look at before-restart.gdb and 42003.gdb, but unfortunately the stack traces are incomplete. I do not see any reference to InnoDB functions in the stack traces. I do see a large number of ?? function names, which would suggest that the core dump was analyzed in a different environment than where it was generated, possibly with a mismatching set of available libraries.

            I would guess that the reported problem was fixed under MDEV-13983 (10.2.25) and had been introduced in MariaDB 10.2.8 in MDEV-11896. Could that be the case?

            marko Marko Mäkelä added a comment - I stumbled upon this ticket while searching for something else. I had a quick look at before-restart.gdb and 42003.gdb , but unfortunately the stack traces are incomplete. I do not see any reference to InnoDB functions in the stack traces. I do see a large number of ?? function names, which would suggest that the core dump was analyzed in a different environment than where it was generated, possibly with a mismatching set of available libraries. I would guess that the reported problem was fixed under MDEV-13983 (10.2.25) and had been introduced in MariaDB 10.2.8 in MDEV-11896 . Could that be the case?

            People

              Unassigned Unassigned
              novid novid
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.