Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20399

SEGV in fil_system_t::keyrotate_next , InnoDB, table encrypted

Details

    Description

      Problem found during RQG testing. Valid for debug but also non debug builds.
      .... 
      2019-08-21 21:59:48 139863730599424 [Note] 10.2/bld_fast//sql/mysqld: ready for connections.
      Version: '10.2.27-MariaDB-log'  socket:  ... port: 29020  Source distribution
      190821 21:59:59 [ERROR] mysqld got signal 11 ;
      ...
          There is no query mentioned at all.
      Thread 1 (Thread 0x7f342a7fc700 (LWP 265625)):
      #0  my_read (Filedes=Filedes@entry=90, Buffer=Buffer@entry=0x7f342a7f9930 "Limit", ' ' <repeats 21 times>, "Soft Limit", ' ' <repeats 11 times>, "Hard Limit", ' ' <repeats 11 times>, "Units     \nMax cpu time", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "seconds   \nMax file size", ' ' <repeats 13 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "bytes     \nMax data size", ' ' <repeats 13 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "bytes     \nMax stack size", ' ' <repeats 12 times>, "8388608", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "bytes     \nMax core file size        unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "bytes     \nMax resident set          unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "bytes     \nMax processes", ' ' <repeats 13 times>, "768201", ' ' <repeats 15 times>, "768201", ' ' <repeats 15 times>, "processes \nMax open files", ' ' <repeats 12 times>, "40960", ' ' <repeats 16 times>, "1048576", ' ' <repeats 14 times>, "files     \nMax locked memory         65536", ' ' <repeats 16 times>, "65536", ' ' <repeats 16 times>, "bytes     \nMax address space         unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "bytes     \nMax file locks", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "locks     \nMax pending signals       768201", ' ' <repeats 15 times>, "768201", ' ' <repeats 15 times>, "signals   \nMax msgqueue size         819200", ' ' <repeats 15 times>, "819200", ' ' <repeats 15 times>, "bytes     \nMax nice priority         0", ' ' <repeats 20 times>, "0", ' ' <repeats 20 times>, "\nMax realtime priority     0", ' ' <repeats 20 times>, "0", ' ' <repeats 20 times>, "\nMax realtime timeout      unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "us        \n", Count=Count@entry=4096, MyFlags=MyFlags@entry=0) at mysys/my_read.c:63
      #1  0x0000557a40400584 in output_core_info () at sql/signal_handler.cc:66
      #2  0x0000557a404008f3 in handle_fatal_signal (sig=11) at sql/signal_handler.cc:339
      #3  <signal handler called>
      #4  fil_system_t::keyrotate_next (this=<optimized out>, prev_space=0x0, recheck=recheck@entry=true, key_version=key_version@entry=0) at storage/innobase/fil/fil0fil.cc:6025
      #5  0x0000557a4075b694 in fil_crypt_find_space_to_rotate (recheck=<optimized out>, state=<optimized out>, key_state=<optimized out>) at storage/innobase/fil/fil0crypt.cc:1472
      #6  fil_crypt_thread () at storage/innobase/fil/fil0crypt.cc:2183
      #7  0x00007f348e8b06da in start_thread (arg=0x7f342a7fc700) at pthread_create.c:456
      #8  0x00007f348dd42d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
       
      MDEV-20399 -- The protocol of my RQG run against 10.2 commit 6c06defb5f6ddccba1a3d7b03ba34dbb83c7ef69 2019-08-19 compiled without debug.
      MDEV-20399.yy -- Simplified RQG grammar
      MDEV-20399.cfg - Config file for replaying the problem with rqg_batch.pl
      MDEV-20399.sh - Shellscript doing the replay
            The call is ./MDEV-20399.sh MDEV-20399.cfg <Binaries>
       RQG "expects" to find the required plugins like file_key_management.so in <Binaries>/lib/plugin/
      The test is capable to replay also
           https://jira.mariadb.org/browse/MDEV-17596
           ... Assertion `block->page.flush_observer == __null || block->page.flush_observer == observer' failed in buf_flush_note_modification ... 
      It replays MDEV-17596 far way more frequent than MDEV-20399.
      

      Attachments

        1. MDEV-20399.cfg
          39 kB
        2. MDEV-20399.prt
          135 kB
        3. MDEV-20399.sh
          7 kB
        4. MDEV-20399.yy
          0.5 kB

        Issue Links

          Activity

            SET GLOBAL innodb_encryption_threads=5;
            SET GLOBAL innodb_encryption_rotate_key_age=0;
            SELECT SLEEP(5);  # Somewhat delayed crash happens during sleep
            

            Leads to:

            10.5.4 8569dac1ec9f6853a0b2f3ea9bcbda67644ead24

            Core was generated by `/test/MD260520-mariadb-10.5.4-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
            Program terminated with signal SIGSEGV, Segmentation fault.
            #0  0x0000560b6df9754c in my_read (Filedes=Filedes@entry=39, 
                Buffer=Buffer@entry=0x152de8ffb230 "Limit", ' ' <repeats 21 times>, "Soft Limit", ' ' <repeats 11 times>, "Hard Limit", ' ' <repeats 11 times>, "Units     \nMax cpu time", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "seconds   \nMax file size", ' ' <repeats 13 times>, "unlimited       "..., Count=Count@entry=4096, MyFlags=MyFlags@entry=0)
                at /test/10.5_dbg/mysys/my_read.c:63
            [Current thread is 1 (Thread 0x152de8ffd700 (LWP 3360052))]
            (gdb) bt
            (gdb) (gdb) #0  0x0000560b6df9754c in my_read (Filedes=Filedes@entry=39, Buffer=Buffer@entry=0x152de8ffb230 "Limit", ' ' <repeats 21 times>, "Soft Limit", ' ' <repeats 11 times>, "Hard Limit", ' ' <repeats 11 times>, "Units     \nMax cpu time", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "seconds   \nMax file size", ' ' <repeats 13 times>, "unlimited       "..., Count=Count@entry=4096, MyFlags=MyFlags@entry=0) at /test/10.5_dbg/mysys/my_read.c:63
            #1  0x0000560b6d73fdd0 in output_core_info () at /test/10.5_dbg/sql/signal_handler.cc:68
            #2  0x0000560b6d740367 in handle_fatal_signal (sig=11) at /test/10.5_dbg/sql/signal_handler.cc:326
            #3  <signal handler called>
            #4  fil_system_t::keyrotate_next (this=<optimized out>, prev_space=0x152e0a8c6898, recheck=recheck@entry=false, key_version=key_version@entry=0) at /test/10.5_dbg/storage/innobase/fil/fil0fil.cc:4674
            #5  0x0000560b6de828f2 in fil_crypt_find_space_to_rotate (recheck=<optimized out>, state=<optimized out>, key_state=<optimized out>) at /test/10.5_dbg/storage/innobase/fil/fil0crypt.cc:1517
            #6  fil_crypt_thread () at /test/10.5_dbg/storage/innobase/fil/fil0crypt.cc:2051
            #7  0x0000152e0c0666db in start_thread (arg=0x152de8ffd700) at pthread_create.c:463
            #8  0x0000152e0b46488f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            

            Bug confirmed present in:
            MariaDB: 10.2.33 (dbg), 10.3.24 (dbg), 10.3.24 (opt), 10.4.14 (dbg), 10.5.4 (dbg)

            Bug confirmed not present in:
            MariaDB: 10.1.46 (dbg), 10.1.46 (opt), 10.2.33 (opt), 10.4.14 (opt), 10.5.4 (opt)
            MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.47 (dbg), 5.6.47 (opt), 5.7.29 (dbg), 5.7.29 (opt), 8.0.19 (dbg), 8.0.19 (opt)

            Roel Roel Van de Paar added a comment - SET GLOBAL innodb_encryption_threads=5; SET GLOBAL innodb_encryption_rotate_key_age=0; SELECT SLEEP(5); # Somewhat delayed crash happens during sleep Leads to: 10.5.4 8569dac1ec9f6853a0b2f3ea9bcbda67644ead24 Core was generated by `/test/MD260520-mariadb-10.5.4-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000560b6df9754c in my_read (Filedes=Filedes@entry=39, Buffer=Buffer@entry=0x152de8ffb230 "Limit", ' ' <repeats 21 times>, "Soft Limit", ' ' <repeats 11 times>, "Hard Limit", ' ' <repeats 11 times>, "Units \nMax cpu time", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "seconds \nMax file size", ' ' <repeats 13 times>, "unlimited "..., Count=Count@entry=4096, MyFlags=MyFlags@entry=0) at /test/10.5_dbg/mysys/my_read.c:63 [Current thread is 1 (Thread 0x152de8ffd700 (LWP 3360052))] (gdb) bt (gdb) (gdb) #0 0x0000560b6df9754c in my_read (Filedes=Filedes@entry=39, Buffer=Buffer@entry=0x152de8ffb230 "Limit", ' ' <repeats 21 times>, "Soft Limit", ' ' <repeats 11 times>, "Hard Limit", ' ' <repeats 11 times>, "Units \nMax cpu time", ' ' <repeats 14 times>, "unlimited", ' ' <repeats 12 times>, "unlimited", ' ' <repeats 12 times>, "seconds \nMax file size", ' ' <repeats 13 times>, "unlimited "..., Count=Count@entry=4096, MyFlags=MyFlags@entry=0) at /test/10.5_dbg/mysys/my_read.c:63 #1 0x0000560b6d73fdd0 in output_core_info () at /test/10.5_dbg/sql/signal_handler.cc:68 #2 0x0000560b6d740367 in handle_fatal_signal (sig=11) at /test/10.5_dbg/sql/signal_handler.cc:326 #3 <signal handler called> #4 fil_system_t::keyrotate_next (this=<optimized out>, prev_space=0x152e0a8c6898, recheck=recheck@entry=false, key_version=key_version@entry=0) at /test/10.5_dbg/storage/innobase/fil/fil0fil.cc:4674 #5 0x0000560b6de828f2 in fil_crypt_find_space_to_rotate (recheck=<optimized out>, state=<optimized out>, key_state=<optimized out>) at /test/10.5_dbg/storage/innobase/fil/fil0crypt.cc:1517 #6 fil_crypt_thread () at /test/10.5_dbg/storage/innobase/fil/fil0crypt.cc:2051 #7 0x0000152e0c0666db in start_thread (arg=0x152de8ffd700) at pthread_create.c:463 #8 0x0000152e0b46488f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Bug confirmed present in: MariaDB: 10.2.33 (dbg), 10.3.24 (dbg), 10.3.24 (opt), 10.4.14 (dbg), 10.5.4 (dbg) Bug confirmed not present in: MariaDB: 10.1.46 (dbg), 10.1.46 (opt), 10.2.33 (opt), 10.4.14 (opt), 10.5.4 (opt) MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.47 (dbg), 5.6.47 (opt), 5.7.29 (dbg), 5.7.29 (opt), 8.0.19 (dbg), 8.0.19 (opt)

            I think that in order to fix MDEV-16115, we should get rid of multiple encryption threads that mark pages dirty in the buffer pool. Maybe this problem would be fixed by that too.

            marko Marko Mäkelä added a comment - I think that in order to fix MDEV-16115 , we should get rid of multiple encryption threads that mark pages dirty in the buffer pool. Maybe this problem would be fixed by that too.

            In MDEV-22258 it was demonstrated that a larger value of innodb_encryption_threads may improve throughput. So, the fix of MDEV-16115 cannot limit the number of threads to 1.

            Roel, mleich, is the SIGSEGV still repeatable with the MDEV-16115 fix? The SIGSEGV in the Description ought to have been fixed by MDEV-16115, because after that fix, we only execute that code after checking that the opinter is not null:

            	const bool remove = ((!recheck || prev_space->crypt_data)
            			     && (!key_version == !srv_encrypt_tables));
            

            I suppose that Roel’s crash might be repeatable, because that SIGSEGV occurred in

                while (it != fil_system.rotation_list.end() &&
                       (UT_LIST_GET_LEN(it->chain) == 0 || it->is_stopping()))
                  ++it;
            

            apparently when dereferencing it.

            So far, I have pushed the MDEV-16115 fix to 10.1, 10.2, 10.3.

            marko Marko Mäkelä added a comment - In MDEV-22258 it was demonstrated that a larger value of innodb_encryption_threads may improve throughput. So, the fix of MDEV-16115 cannot limit the number of threads to 1. Roel , mleich , is the SIGSEGV still repeatable with the MDEV-16115 fix? The SIGSEGV in the Description ought to have been fixed by MDEV-16115 , because after that fix, we only execute that code after checking that the opinter is not null: const bool remove = ((!recheck || prev_space->crypt_data) && (!key_version == !srv_encrypt_tables)); I suppose that Roel ’s crash might be repeatable, because that SIGSEGV occurred in while (it != fil_system.rotation_list.end() && (UT_LIST_GET_LEN(it->chain) == 0 || it->is_stopping())) ++it; apparently when dereferencing it . So far, I have pushed the MDEV-16115 fix to 10.1, 10.2, 10.3.

            I have merged the MDEV-16115 fix to 10.4 and 10.5 as well.

            marko Marko Mäkelä added a comment - I have merged the MDEV-16115 fix to 10.4 and 10.5 as well.
            Roel Roel Van de Paar added a comment - - edited

            Testcase with SLEEP in comment 2020-06-02 is still repeatable.

            I will log a different bug for this as discussed on Slack. Logged as MDEV-23447

            Roel Roel Van de Paar added a comment - - edited Testcase with SLEEP in comment 2020-06-02 is still repeatable. I will log a different bug for this as discussed on Slack. Logged as MDEV-23447

            I believe that both causes of this crash were fixed in MDEV-16115 and MDEV-23447.

            marko Marko Mäkelä added a comment - I believe that both causes of this crash were fixed in MDEV-16115 and MDEV-23447 .

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.