[MDEV-13754] TRANSACTION sometimes stay in "KILLLED' STATE for inifinitve time! Created: 2017-09-07  Updated: 2020-08-28  Resolved: 2020-08-28

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.2.8
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: novid Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: need_feedback

Attachments: File 42003.gdb     File 43205.gdb     File after-restart.gdb     File after.gdb     File before-restart.gdb     File before.gdb     HTML File innodb-status     File my.cnf    
Issue Links:
Relates
relates to MDEV-11896 thd_get_error_context_description rac... Closed
relates to MDEV-13983 Mariadb becomes unresponsive Closed

 Description   

Hi dudes, after upgrade from mariadb 10.1 to 10.2.8, some simple transaction is running for infinitive times...

SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST where Time > 9000\G

I have 19 transaction that is running for this time

they won't die with kill command..

           ID: 10245887
         USER: novid
         HOST: 
           DB: api
      COMMAND: Killed
         TIME: 106767
        STATE: NULL
         INFO: insert into `driver_status_changelog` (`status`, `driver_id`, `updated_at`, `created_at`) values ('4', '65767', '2017-09-06 09:22:04', '2017-09-06 09:22:04')
      TIME_MS: 106767343.552
        STAGE: 0
    MAX_STAGE: 0
     PROGRESS: 0.000
  MEMORY_USED: 92848
EXAMINED_ROWS: 0
     QUERY_ID: 300183941
  INFO_BINARY: insert into `driver_status_changelog` (`status`, `driver_id`, `updated_at`, `created_at`) values ('4', '65767', '2017-09-06 09:22:04', '2017-09-06 09:22:04')
          TID: 15109

and with SHOW ENGINE INNODB STATUS\G

---TRANSACTION 46385651188, ACTIVE 107004 sec
8 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 4
MySQL thread id 10245887, OS thread handle 0x7f898dcb6700, query id 300183941 api
insert into `driver_status_changelog` (`status`, `driver_id`, `updated_at`, `created_at`) values ('4', '65767', '2017-09-06 09:22:04', '2017-09-06 09:22:04')
Trx read view will not see trx with id >= 46385651191, sees < 46385650307

changing in these transaction is so small (2 or 3) so waiting is not for transaction rollback.

Thread stay in killed state and so some record is lock for infinitive time



 Comments   
Comment by Elena Stepanova [ 2017-09-07 ]

novid,

Is there anything in the error log – errors, warnings, maybe even unusual notes?
Did you check the disk space?
How did the transactions get killed? As you said, they are very small, so what happened before that made them be killed – were they waiting on a lock too long, or..?
Please also attach your cnf file(s).

Comment by novid [ 2017-09-08 ]

Some transactions randomly run for infinitive time... and lock some rows for infinitive time

At syslog , I got this errors when application access to rows that is locked.

 Sep  8 11:18:50 sv17 mysqld[47544]: 2017-09-08 11:18:50 139718467409664 [Warning] Aborted connection 5710374 to db: 'api' user: 'novid' host: '172.16.77.201' (Lock wait timeout exceeded; try restarting transaction)

I got more information from innodb status

show engine innodb status\G
 
=====================================
2017-09-08 13:15:02 0x7f12f254b700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 19 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 35578 srv_active, 0 srv_shutdown, 128 srv_idle
srv_master_thread log flush and writes: 35705
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 4842727
--Thread 139718442252032 has waited at buf0buf.cc line 4145 for 0.00 seconds the semaphore:
S-lock on RW-latch at 0x7f1a8fc10568 created in file buf0buf.cc line 1478
a writer (thread id 0) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0sel.cc line 3340
Last time write locked in file buf0buf.cc line 5349
OS WAIT ARRAY INFO: signal count 3350362
RW-shared spins 0, rounds 3624185, OS waits 1967284
RW-excl spins 0, rounds 79887179, OS waits 2596656
RW-sx spins 55955, rounds 1253737, OS waits 27193
Spin rounds per wait: 3624185.00 RW-shared, 79887179.00 RW-excl, 22.41 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 46468946036
Purge done for trx's n:o < 46468944596 undo n:o < 0 state: running but idle
History list length 228
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421268236506000, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421268236487016, not started estimating records in index range
mysql tables in use 1, locked 0
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421268236490632, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421268236507808, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421268236488824, not started estimating records in index range
mysql tables in use 1, locked 0
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421268236503288, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421268236487920, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421268236380344, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421268236471648, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421268236486112, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
mysql tables in use 3, locked 3
1 lock struct(s), heap size 1136, 0 row lock(s)
MySQL thread id 9411053, OS thread handle 0x7f12bc1e7700, query id 251372917 172.16.77.126 novid update
insert into `oauth_s` (`id`, `expire_time`, `session_id`, `created_at`, `updated_at`) values ('REzPa1eqBaX3cjj', '1505292302', '13472555', '2017-09-08 08:45:02', '2017-09-08 08:45:02')

before killing transaction

 
SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST where Time > 1000\G
 
*************************** 4. row ***************************
           ID: 2299660
         USER: novid
         HOST: 172.16.77.238:49159
           DB: api
      COMMAND: Execute
         TIME: 16896
        STATE: query end
         INFO: update `drivers` set `has_traffic_license` = '0', `updated_at` = '2017-09-08 08:37:29' where `id` = '176766'
      TIME_MS: 16896741.939
        STAGE: 0
    MAX_STAGE: 0
     PROGRESS: 0.000
  MEMORY_USED: 92848
EXAMINED_ROWS: 1
     QUERY_ID: 58525251
  INFO_BINARY: update `drivers` set `has_traffic_license` = '0', `updated_at` = '2017-09-08 08:37:29' where `id` = '176766'
          TID: 13065

after killing the transactions, their status is changed from “Execute” to “Killed” but the rows are still locked

kill 2299660
           ID: 2299660
         USER: novid
         HOST: 172.16.77.238:49159
           DB: api
      COMMAND: Killed
         TIME: 17210
        STATE: query end
         INFO: update `drivers` set `has_traffic_license` = '0', `updated_at` = '2017-09-08 08:37:29' where `id` = '176766'
      TIME_MS: 17210265.710
        STAGE: 0
    MAX_STAGE: 0
     PROGRESS: 0.000
  MEMORY_USED: 92848
EXAMINED_ROWS: 1
     QUERY_ID: 58525251
  INFO_BINARY: update `drivers` set `has_traffic_license` = '0', `updated_at` = '2017-09-08 08:37:29' where `id` = '176766'
          TID: 13065

To fix this problem, I have to restart mariadb every midnight, after restart I got these alarms at syslog

 : 2017-09-08  3:19:08 139793750386880 [Note] InnoDB: 22 transaction(s) which must be rolled back or cleaned up in total 56 row operations to undo
Sep  8 03:19:10 sv17 mysqld[47544]: 2017-09-08  3:19:10 139718481258240 [Note] InnoDB: Starting in background the rollback of recovered transactions
Sep  8 03:19:12 sv17 mysqld[47544]: 2017-09-08  3:19:12 139718481258240 [Note] InnoDB: Rollback of non-prepared transactions completed

my config is attached
my.cnf

Comment by novid [ 2017-09-19 ]

I change many parameters these days without any successfully!

transaction-isolation = READ-COMMITTED
innodb_deadlock_detect = OFF

The only solution at mariadb 10.2 for rollback transaction is restarting server at midnight :| after that all transactions Rollback successfully...

Comment by Elena Stepanova [ 2017-10-01 ]

novid,

So, the problem starts even before you attempt to kill threads, the queries are already hanging.
Can you get a stack trace from the running server (with hanging transactions) before you kill them, and then another one when the queries are in 'Killed' state, before you restart the server? Run gdb --batch --eval-command="thread apply all bt" <path to mysqld> <pid> or whatever equivalent you have on your system.

The InnoDB status appears to be truncated, could you please attach the full output?

Does it always happen on INSERTs?

You didn't say if you have checked the disk space and possible disk errors, if you haven't yet, please do.

Comment by novid [ 2017-10-02 ]

Hi dude, thanks for the response,

We've just upgraded to mariadb 10.2.9 but the problem still exist.

yes, the problem starts before I attempt to kill them.

The problem has occurred on two tables and it's happen on Inserts and updates . That is the second time we've changed our logic codes, but we can't fix the problem.

we've just moved to servers with 8 ssds (have been raid10 with physical raid controller) and 256 GB memory. the past server had hdd storages. unfortunately we experience the problems on two both of them and the main point is that we've never seen any problem on mariadb 10.1 and another point is that we've just seen the problem on two tables. so I think so the problem's came from mariadb 10.2 we monitor all servers with netdata, and specially I monitor mariadb with pt-stalk, I can't see any strange things.

We've never had backup from database (I've got from slave) or we never can't stop mysql, after running systemctl stop mysql, I'm waiting for flushing dirty pages to disk and then I run kill -9 mysql!

I'm attaching two files now , gdb's output before I kill transactions and then after kill them. before.gdb after.gdb

I'll send three files for you at midnight, gdb's output before restart and after restart and innodb status. please waiting for them.

I add them.. before-restart.gdb after-restart.gdb innodb-status

After 42003 seconds from restarting mysql server, we've had 10 locked transactions.

I've attached gdb's output after 42003 seconds and before killing them. 42003.gdb

I've attached gdb's output after 43205 seconds and after killing them. 43205.gdb

I've hoped to see this new commit. does this commit solve our problem? is there any way for testing it?

thanks a lot.

Comment by Marko Mäkelä [ 2020-07-31 ]

I stumbled upon this ticket while searching for something else.
I had a quick look at before-restart.gdb and 42003.gdb, but unfortunately the stack traces are incomplete. I do not see any reference to InnoDB functions in the stack traces. I do see a large number of ?? function names, which would suggest that the core dump was analyzed in a different environment than where it was generated, possibly with a mismatching set of available libraries.

I would guess that the reported problem was fixed under MDEV-13983 (10.2.25) and had been introduced in MariaDB 10.2.8 in MDEV-11896. Could that be the case?

Generated at Thu Feb 08 08:08:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.