[MDEV-12467] encryption.create_or_replace hangs during DROP TABLE Created: 2017-04-07  Updated: 2018-05-04  Resolved: 2017-04-21

Status: Closed
Project: MariaDB Server
Component/s: Encryption, Storage Engine - InnoDB
Affects Version/s: 10.1.23, 10.2
Fix Version/s: 10.1.23, 10.2.6

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None
Environment:

gcc-5.4.0
x86_64 (and ppc64le - x86_64 backtrace attached only)
RelWithDebInfo
ubuntu-16.04


Attachments: File 10.2-encryption.mtr.log.bz2    
Issue Links:
Problem/Incident
is caused by MDEV-11581 Mariadb starts innodb encryption thre... Closed
is caused by MDEV-11738 Mariadb uses 100% of several of my 8 ... Closed
Relates
relates to MDEV-11929 During delete: InnoDB: Assertion fail... Closed
relates to MDEV-12694 test failure: encryption.create_or_re... Closed

 Description   

Not the same as MDEV-9359

Revision 428a922cd0284b5fbdf97f74118209a6a9b4fb4c (mariadb-upstream/10.2)

2017-04-07  0:57:25 140222316631808 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2017-04-07 00:57:25 0x7f880d5f5700  InnoDB: Assertion failure in file /source/storage/innobase/ut/ut0ut.cc line 844
 
 
 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/build/build/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/build/build'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
62      ../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x7f880d5f5700 (LWP 23328))]
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
#1  0x000055685118295a in my_write_core (sig=sig@entry=6) at /source/mysys/stacktrace.c:477
#2  0x0000556850c16464 in handle_fatal_signal (sig=6) at /source/sql/signal_handler.cc:299
#3  <signal handler called>
#4  0x00007f881c1a0428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#5  0x00007f881c1a202a in __GI_abort () at abort.c:89
#6  0x00005568509e28ed in ut_dbg_assertion_failed (expr=expr@entry=0x0, file=file@entry=0x5568512b1f70 "/source/storage/innobase/ut/ut0ut.cc", line=line@entry=844) at /source/storage/innobase/ut/ut0dbg.cc:60
#7  0x0000556850f03471 in ib::fatal::~fatal (this=0x7f880d5f4be0, __in_chrg=<optimized out>) at /source/storage/innobase/ut/ut0ut.cc:844
#8  0x0000556850ea7f95 in srv_error_monitor_thread () at /source/storage/innobase/srv/srv0srv.cc:1956
#9  0x00007f881cde36ba in start_thread (arg=0x7f880d5f5700) at pthread_create.c:333
#10 0x00007f881c27182d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109



 Comments   
Comment by Marko Mäkelä [ 2017-04-21 ]

I encountered this very same issue when testing MDEV-12545. With that patch, it always repeats.
In the reported case, just like on my local system, we have this hang:

Thread 28 (Thread 0x7f0385b91700 (LWP 24959)):
#0  0x00007f038cfc7c1d in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1  0x000056159e95764c in os_thread_sleep (tm=tm@entry=20000) at /source/storage/innobase/os/os0thread.cc:220
#2  0x000056159eb16ccb in fil_space_crypt_close_tablespace (space=space@entry=0x7f033c035cd0) at /source/storage/innobase/fil/fil0crypt.cc:2359
#3  0x000056159eb05d32 in fil_check_pending_operations (id=id@entry=6, operation=operation@entry=FIL_OPERATION_DELETE, space=space@entry=0x7f0385b89938, path=path@entry=0x7f0385b89930) at /source/storage/innobase/fil/fil0fil.cc:2940
#4  0x000056159eb1047c in fil_delete_tablespace (id=id@entry=6, buf_remove=buf_remove@entry=BUF_REMOVE_FLUSH_NO_WRITE) at /source/storage/innobase/fil/fil0fil.cc:3078
#5  0x000056159e9b7192 in row_drop_single_table_tablespace (table_flags=33, filepath=0x7f033c033fe0 "./test/t1.ibd", tablename=<optimized out>, space_id=6) at /source/storage/innobase/row/row0mysql.cc:3566
#6  row_drop_table_for_mysql (name=name@entry=0x7f0385b8a4d0 "test/t1", trx=trx@entry=0x7f038673bb28, drop_db=<optimized out>, create_failed=create_failed@entry=0, nonatomic=<optimized out>, nonatomic@entry=true) at /source/storage/innobase/row/row0mysql.cc:4088

I got it for a different table, but that does not matter.
I believe that this is the relevant fix:

@@ -2174,6 +2168,8 @@ DECLARE_THREAD(fil_crypt_thread)(
 				/* If space is marked as stopping, release
 				space and stop rotation. */
 				if (thr.space->is_stopping()) {
+					fil_crypt_complete_rotate_space(
+						&new_state, &thr);
 					fil_space_release(thr.space);
 					thr.space = NULL;
 					break;

I would also do some other cleanup in the same fix, removing some unnecessary null-pointer checks.

I believe that this could hang in 10.1 as well.

Comment by Marko Mäkelä [ 2017-04-21 ]

The error was introduced in 10.1 a few days after the 10.1.22 release.

Comment by Marko Mäkelä [ 2017-04-21 ]

bb-10.1-marko

Comment by Jan Lindström (Inactive) [ 2017-04-21 ]

ok to push.

Comment by Daniel Black [ 2017-04-23 ]

Thanks marko and jplindst

Generated at Thu Feb 08 07:57:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.