[MDEV-20399] SEGV in fil_system_t::keyrotate_next , InnoDB, table encrypted Created: 2019-08-21  Updated: 2022-02-10  Resolved: 2020-08-12

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.2.35, 10.3.26, 10.4.16, 10.5.7

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: need_rr, not-10.1

Attachments: File MDEV-20399.cfg     File MDEV-20399.prt     File MDEV-20399.sh     File MDEV-20399.yy    
Issue Links:
Blocks
is blocked by MDEV-16115 encryption.create_or_replace fails in... Closed
Relates
relates to MDEV-23447 SIGSEGV in fil_system_t::keyrotate_next Closed
relates to MDEV-17596 Assertion `block->page.flush_observer... Closed

 Description   

Problem found during RQG testing. Valid for debug but also non debug builds.
.... 
2019-08-21 21:59:48 139863730599424 [Note] 10.2/bld_fast//sql/mysqld: ready for connections.
Version: '10.2.27-MariaDB-log'  socket:  ... port: 29020  Source distribution
190821 21:59:59 [ERROR] mysqld got signal 11 ;
...
    There is no query mentioned at all.
Thread 1 (Thread 0x7f342a7fc700 (LWP 265625)):
#0  my_read (Filedes=Filedes@entry=90, Buffer=Buffer@entry=0x7f342a7f9930 "Limit", ' ' <repeats 21 times>, "Soft Limit", ' ' <repeats 11 times>, "Hard Limit", ' ' <repeats 11 times>, "Units     \nMax cpu time", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "seconds   \nMax file size", ' ' <repeats 13 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "bytes     \nMax data size", ' ' <repeats 13 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "bytes     \nMax stack size", ' ' <repeats 12 times>, "8388608", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "bytes     \nMax core file size        unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "bytes     \nMax resident set          unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "bytes     \nMax processes", ' ' <repeats 13 times>, "768201", ' ' <repeats 15 times>, "768201", ' ' <repeats 15 times>, "processes \nMax open files", ' ' <repeats 12 times>, "40960", ' ' <repeats 16 times>, "1048576", ' ' <repeats 14 times>, "files     \nMax locked memory         65536", ' ' <repeats 16 times>, "65536", ' ' <repeats 16 times>, "bytes     \nMax address space         unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "bytes     \nMax file locks", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "locks     \nMax pending signals       768201", ' ' <repeats 15 times>, "768201", ' ' <repeats 15 times>, "signals   \nMax msgqueue size         819200", ' ' <repeats 15 times>, "819200", ' ' <repeats 15 times>, "bytes     \nMax nice priority         0", ' ' <repeats 20 times>, "0", ' ' <repeats 20 times>, "\nMax realtime priority     0", ' ' <repeats 20 times>, "0", ' ' <repeats 20 times>, "\nMax realtime timeout      unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "us        \n", Count=Count@entry=4096, MyFlags=MyFlags@entry=0) at mysys/my_read.c:63
#1  0x0000557a40400584 in output_core_info () at sql/signal_handler.cc:66
#2  0x0000557a404008f3 in handle_fatal_signal (sig=11) at sql/signal_handler.cc:339
#3  <signal handler called>
#4  fil_system_t::keyrotate_next (this=<optimized out>, prev_space=0x0, recheck=recheck@entry=true, key_version=key_version@entry=0) at storage/innobase/fil/fil0fil.cc:6025
#5  0x0000557a4075b694 in fil_crypt_find_space_to_rotate (recheck=<optimized out>, state=<optimized out>, key_state=<optimized out>) at storage/innobase/fil/fil0crypt.cc:1472
#6  fil_crypt_thread () at storage/innobase/fil/fil0crypt.cc:2183
#7  0x00007f348e8b06da in start_thread (arg=0x7f342a7fc700) at pthread_create.c:456
#8  0x00007f348dd42d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
 
MDEV-20399 -- The protocol of my RQG run against 10.2 commit 6c06defb5f6ddccba1a3d7b03ba34dbb83c7ef69 2019-08-19 compiled without debug.
MDEV-20399.yy -- Simplified RQG grammar
MDEV-20399.cfg - Config file for replaying the problem with rqg_batch.pl
MDEV-20399.sh - Shellscript doing the replay
      The call is ./MDEV-20399.sh MDEV-20399.cfg <Binaries>
 RQG "expects" to find the required plugins like file_key_management.so in <Binaries>/lib/plugin/
The test is capable to replay also
     https://jira.mariadb.org/browse/MDEV-17596
     ... Assertion `block->page.flush_observer == __null || block->page.flush_observer == observer' failed in buf_flush_note_modification ... 
It replays MDEV-17596 far way more frequent than MDEV-20399.



 Comments   
Comment by Roel Van de Paar [ 2020-06-02 ]

SET GLOBAL innodb_encryption_threads=5;
SET GLOBAL innodb_encryption_rotate_key_age=0;
SELECT SLEEP(5);  # Somewhat delayed crash happens during sleep

Leads to:

10.5.4 8569dac1ec9f6853a0b2f3ea9bcbda67644ead24

Core was generated by `/test/MD260520-mariadb-10.5.4-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000560b6df9754c in my_read (Filedes=Filedes@entry=39, 
    Buffer=Buffer@entry=0x152de8ffb230 "Limit", ' ' <repeats 21 times>, "Soft Limit", ' ' <repeats 11 times>, "Hard Limit", ' ' <repeats 11 times>, "Units     \nMax cpu time", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "seconds   \nMax file size", ' ' <repeats 13 times>, "unlimited       "..., Count=Count@entry=4096, MyFlags=MyFlags@entry=0)
    at /test/10.5_dbg/mysys/my_read.c:63
[Current thread is 1 (Thread 0x152de8ffd700 (LWP 3360052))]
(gdb) bt
(gdb) (gdb) #0  0x0000560b6df9754c in my_read (Filedes=Filedes@entry=39, Buffer=Buffer@entry=0x152de8ffb230 "Limit", ' ' <repeats 21 times>, "Soft Limit", ' ' <repeats 11 times>, "Hard Limit", ' ' <repeats 11 times>, "Units     \nMax cpu time", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "seconds   \nMax file size", ' ' <repeats 13 times>, "unlimited       "..., Count=Count@entry=4096, MyFlags=MyFlags@entry=0) at /test/10.5_dbg/mysys/my_read.c:63
#1  0x0000560b6d73fdd0 in output_core_info () at /test/10.5_dbg/sql/signal_handler.cc:68
#2  0x0000560b6d740367 in handle_fatal_signal (sig=11) at /test/10.5_dbg/sql/signal_handler.cc:326
#3  <signal handler called>
#4  fil_system_t::keyrotate_next (this=<optimized out>, prev_space=0x152e0a8c6898, recheck=recheck@entry=false, key_version=key_version@entry=0) at /test/10.5_dbg/storage/innobase/fil/fil0fil.cc:4674
#5  0x0000560b6de828f2 in fil_crypt_find_space_to_rotate (recheck=<optimized out>, state=<optimized out>, key_state=<optimized out>) at /test/10.5_dbg/storage/innobase/fil/fil0crypt.cc:1517
#6  fil_crypt_thread () at /test/10.5_dbg/storage/innobase/fil/fil0crypt.cc:2051
#7  0x0000152e0c0666db in start_thread (arg=0x152de8ffd700) at pthread_create.c:463
#8  0x0000152e0b46488f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Bug confirmed present in:
MariaDB: 10.2.33 (dbg), 10.3.24 (dbg), 10.3.24 (opt), 10.4.14 (dbg), 10.5.4 (dbg)

Bug confirmed not present in:
MariaDB: 10.1.46 (dbg), 10.1.46 (opt), 10.2.33 (opt), 10.4.14 (opt), 10.5.4 (opt)
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.47 (dbg), 5.6.47 (opt), 5.7.29 (dbg), 5.7.29 (opt), 8.0.19 (dbg), 8.0.19 (opt)

Comment by Marko Mäkelä [ 2020-08-08 ]

I think that in order to fix MDEV-16115, we should get rid of multiple encryption threads that mark pages dirty in the buffer pool. Maybe this problem would be fixed by that too.

Comment by Marko Mäkelä [ 2020-08-10 ]

In MDEV-22258 it was demonstrated that a larger value of innodb_encryption_threads may improve throughput. So, the fix of MDEV-16115 cannot limit the number of threads to 1.

Roel, mleich, is the SIGSEGV still repeatable with the MDEV-16115 fix? The SIGSEGV in the Description ought to have been fixed by MDEV-16115, because after that fix, we only execute that code after checking that the opinter is not null:

	const bool remove = ((!recheck || prev_space->crypt_data)
			     && (!key_version == !srv_encrypt_tables));

I suppose that Roel’s crash might be repeatable, because that SIGSEGV occurred in

    while (it != fil_system.rotation_list.end() &&
           (UT_LIST_GET_LEN(it->chain) == 0 || it->is_stopping()))
      ++it;

apparently when dereferencing it.

So far, I have pushed the MDEV-16115 fix to 10.1, 10.2, 10.3.

Comment by Marko Mäkelä [ 2020-08-10 ]

I have merged the MDEV-16115 fix to 10.4 and 10.5 as well.

Comment by Roel Van de Paar [ 2020-08-11 ]

Testcase with SLEEP in comment 2020-06-02 is still repeatable.

I will log a different bug for this as discussed on Slack. Logged as MDEV-23447

Comment by Marko Mäkelä [ 2020-08-12 ]

I believe that both causes of this crash were fixed in MDEV-16115 and MDEV-23447.

Generated at Thu Feb 08 08:59:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.