Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12396

MariaDB Server SEGV crash when importing tablespace for a large table (4GB)

Details

    Description

      I'm trying to import a database by bringing over the .ibd files from another server (which were saved after tables were flushed and locked), running the "alter table X discard tablespace" for each table and then copying the file over and running "alter table X import tablespace". I disable foreign key checks prior to importing each table.

      the sequence of import statements works correctly for most tables. When it reaches a table that is about 4GB large, but with large records, I get a SEGV crash. This happens consistently for the same table. Weird thing is that a 6GB table (but "thin") works OK.

      Error in log is:

      2017-03-29 9:25:56 139871748651776 [Note] InnoDB: Sync to disk
      2017-03-29 9:26:00 139871748651776 [Note] InnoDB: Sync to disk - done!
      2017-03-29 9:26:00 139871748651776 [Note] InnoDB: Phase I - Update all pages
      2017-03-29 9:27:18 139871748651776 [Note] InnoDB: Sync to disk
      2017-03-29 9:27:21 139871748651776 [Note] InnoDB: Sync to disk - done!
      2017-03-29 9:27:22 139871748651776 [Note] InnoDB: Phase III - Flush changes to disk
      2017-03-29 9:27:22 139871748651776 [Note] InnoDB: Phase IV - Flush complete
      170329 9:27:22 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see https://mariadb.com/kb/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 10.1.22-MariaDB
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=1
      max_threads=502
      thread_count=1
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1233714 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x7f3622a42008
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f366de47110 thread_stack 0x48400
      /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7f366e984fbb]
      /usr/sbin/mysqld(handle_fatal_signal+0x4d5)[0x7f366e4dfbe5]
      /lib64/libpthread.so.0(+0xf7e0)[0x7f366dae17e0]
      /usr/sbin/mysqld(+0x7a4fd0)[0x7f366e6b5fd0]
      /usr/sbin/mysqld(+0x7fa283)[0x7f366e70b283]
      /usr/sbin/mysqld(+0x72c2cd)[0x7f366e63d2cd]
      /usr/sbin/mysqld(+0x73327d)[0x7f366e64427d]
      /usr/sbin/mysqld(_Z34mysql_discard_or_import_tablespaceP3THDP10TABLE_LISTb+0xe8)[0x7f366e3ef2f8]
      /usr/sbin/mysqld(_ZN33Sql_cmd_discard_import_tablespace7executeEP3THD+0xc8)[0x7f366e436d88]
      /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x152d)[0x7f366e36b03d]
      /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x324)[0x7f366e372934]
      /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x22a3)[0x7f366e3753b3]
      /usr/sbin/mysqld(_Z10do_commandP3THD+0x151)[0x7f366e375931]
      /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x17f)[0x7f366e4348bf]
      /usr/sbin/mysqld(handle_one_connection+0x47)[0x7f366e4349f7]
      /usr/sbin/mysqld(+0x960c8d)[0x7f366e871c8d]
      /lib64/libpthread.so.0(+0x7aa1)[0x7f366dad9aa1]
      /lib64/libc.so.6(clone+0x6d)[0x7f366bfbdaad]

      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7f364a42c020): is an invalid pointer
      Connection ID (thread ID): 230
      Status: NOT_KILLED

      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off

      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.

      We think the query pointer is invalid, but we will try to print it anyway.
      Query: ALTER TABLE RTODATA_T IMPORT TABLESPACE

      170329 09:27:22 mysqld_safe Number of processes running now: 0
      170329 09:27:22 mysqld_safe mysqld restarted

      From the output it would seem like the calculation of memory usage says that MariaDB could use up to a little bit more than 1GB of memory, which should be OK since this server has 8GB of memory.

      Am I doing something wrong? I tried searching for this kind of crash but couldn't find much.

      Attachments

        Issue Links

          Activity

            This report could be a duplicate of MDEV-12419 which was fixed as part of MDEV-12253.
            Does the large import work with MariaDB 10.1.23?

            marko Marko Mäkelä added a comment - This report could be a duplicate of MDEV-12419 which was fixed as part of MDEV-12253 . Does the large import work with MariaDB 10.1.23?

            On a somewhat related note, Valgrind reported an error for a memcpy() operation that is executed when importing encrypted files:
            http://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/10056/steps/test/logs/stdio

            encryption.innodb-discard-import-change 'cbc,innodb' w2 [ fail ]  Found warnings/errors in server log file!
                    Test ended at 2017-05-17 07:42:11
            line
            ==31112== Thread 27:
            ==31112== Source and destination overlap in memcpy(0x11bb8000, 0x11bb8000, 16384)
            ==31112==    at 0x4C2C51C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018)
            ==31112==    by 0x10048A5: fil_iterate(fil_iterator_t const&, buf_block_t*, PageCallback&) (fil0fil.cc:6047)
            ==31112==    by 0x1005392: fil_tablespace_iterate(dict_table_t*, unsigned long, PageCallback&) (fil0fil.cc:6283)
            ==31112==    by 0xE03456: row_import_for_mysql(dict_table_t*, row_prebuilt_t*) (row0import.cc:3496)
            

            I do not think that the buffers should legitimately overlap in any case here:

            			if (encrypted) {
            				memcpy(writeptr + (i * size),
            					row_compressed ? block->page.zip.data :
            					block->frame, size);
            			}
            

            I was unable to repeat this failure locally.
            The arithmetics done to n_bytes in fil_iterate() look a bit suspicious to me. Even more alarming is this code:

            		byte*		io_buffer = iter.io_buffer;
            		block->frame = io_buffer;
             
            		if (row_compressed) {
            …
            			block->page.zip.data = block->frame + UNIV_PAGE_SIZE;
            …
            			io_buffer = block->page.zip.data;
            		}
            

            Could block->page.zip.data point past the end of the iter.io_buffer array, overlapping with writeptr (which points to iter.crypt_io_buffer)?

            marko Marko Mäkelä added a comment - On a somewhat related note, Valgrind reported an error for a memcpy() operation that is executed when importing encrypted files: http://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/10056/steps/test/logs/stdio encryption.innodb-discard-import-change 'cbc,innodb' w2 [ fail ] Found warnings/errors in server log file! Test ended at 2017-05-17 07:42:11 line ==31112== Thread 27: ==31112== Source and destination overlap in memcpy(0x11bb8000, 0x11bb8000, 16384) ==31112== at 0x4C2C51C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018) ==31112== by 0x10048A5: fil_iterate(fil_iterator_t const&, buf_block_t*, PageCallback&) (fil0fil.cc:6047) ==31112== by 0x1005392: fil_tablespace_iterate(dict_table_t*, unsigned long, PageCallback&) (fil0fil.cc:6283) ==31112== by 0xE03456: row_import_for_mysql(dict_table_t*, row_prebuilt_t*) (row0import.cc:3496) I do not think that the buffers should legitimately overlap in any case here: if (encrypted) { memcpy(writeptr + (i * size), row_compressed ? block->page.zip.data : block->frame, size); } I was unable to repeat this failure locally. The arithmetics done to n_bytes in fil_iterate() look a bit suspicious to me. Even more alarming is this code: byte* io_buffer = iter.io_buffer; block->frame = io_buffer;   if (row_compressed) { … block->page.zip.data = block->frame + UNIV_PAGE_SIZE; … io_buffer = block->page.zip.data; } Could block->page.zip.data point past the end of the iter.io_buffer array, overlapping with writeptr (which points to iter.crypt_io_buffer)?
            marko Marko Mäkelä added a comment - - edited

            I pushed some clean-up to 10.1 so that page checksums will be validated in one function.
            I did not find anything that could explain this crash.

            marko Marko Mäkelä added a comment - - edited I pushed some clean-up to 10.1 so that page checksums will be validated in one function. I did not find anything that could explain this crash.

            I merged the clean-up to 10.2, so it will be in both 10.1.32 and 10.2.14 (and 10.3.6). The main functional difference should be that IMPORT no longer fills the error log with an unbounded number of messages about retrying partial page reads when you try to import an .ibd file that is shorter than innodb_page_size. For 10.2, there is also the difference that IMPORT will now continue to read and write multiple pages at a time, like it used to work. page_compressed tables will grow during import; it looks like hole-punching has not been implemented yet.

            The originally reported error ought to occur somewhere in the final page flushing after doing some final adjustments through the buffer pool.

            rgallastpsi, I wonder if you have enabled innodb_use_mtflush, which is known to cause occasional hangs (MDEV-10843); maybe it could cause crashes as well?
            In MariaDB 10.2, an alternative to innodb_use_mtflush was provided by innodb_page_cleaners, and accordingly, innodb_use_mtflush was deprecated in 10.2 (MDEV-13674) and removed in 10.3 (MDEV-13690).

            marko Marko Mäkelä added a comment - I merged the clean-up to 10.2, so it will be in both 10.1.32 and 10.2.14 (and 10.3.6). The main functional difference should be that IMPORT no longer fills the error log with an unbounded number of messages about retrying partial page reads when you try to import an .ibd file that is shorter than innodb_page_size . For 10.2, there is also the difference that IMPORT will now continue to read and write multiple pages at a time, like it used to work. page_compressed tables will grow during import; it looks like hole-punching has not been implemented yet. The originally reported error ought to occur somewhere in the final page flushing after doing some final adjustments through the buffer pool. rgallastpsi , I wonder if you have enabled innodb_use_mtflush , which is known to cause occasional hangs ( MDEV-10843 ); maybe it could cause crashes as well? In MariaDB 10.2, an alternative to innodb_use_mtflush was provided by innodb_page_cleaners , and accordingly, innodb_use_mtflush was deprecated in 10.2 ( MDEV-13674 ) and removed in 10.3 ( MDEV-13690 ).

            rgallastpsi, I just noticed that this ticket is still open.
            We made significant improvements to InnoDB table encryption in MariaDB, especially in 10.1.23,10.2.6 but also in later. Do you still experience this kind of problems with more recent versions?

            marko Marko Mäkelä added a comment - rgallastpsi , I just noticed that this ticket is still open. We made significant improvements to InnoDB table encryption in MariaDB, especially in 10.1.23,10.2.6 but also in later. Do you still experience this kind of problems with more recent versions?

            People

              marko Marko Mäkelä
              rgallastpsi Rafael Gallastegui
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.