Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14290

Mariadb 10.2 RocksDB crash on binlog switch to new file

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.2(EOL), 10.3(EOL)
    • 10.2.12
    • None
    • Debian 9 x64, Mariadb 10.2

    Description

          04 23:04:33 mon-db-st mysqld[1180]: 2017-11-04 23:04:33 139846979504320 [Note] Reading of all Master_info entries succeded
      Nov 04 23:04:33 mon-db-st mysqld[1180]: 2017-11-04 23:04:33 139846979504320 [Note] Added new Master_info '' to hash table
      Nov 04 23:04:33 mon-db-st mysqld[1180]: 2017-11-04 23:04:33 139846979504320 [Note] /usr/sbin/mysqld: ready for connections.
      Nov 04 23:04:33 mon-db-st mysqld[1180]: Version: '10.2.10-MariaDB-10.2.10+maria~stretch-log'  socket: '/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
      Nov 04 23:04:34 mon-db-st systemd[1]: Started MariaDB database server.
      Nov 04 23:05:20 mon-db-st mysqld[1180]: 2017-11-04 23:05:20 139843793712896 [ERROR] Invalid (old?) table or database name '.rocksdb'
      Nov 04 23:05:20 mon-db-st mysqld[1180]: 2017-11-04 23:05:20 139843793712896 [ERROR] Invalid (old?) table or database name '.binlog'
      Nov 04 23:08:49 mon-db-st mysqld[1180]: 171104 23:08:49 [ERROR] mysqld got signal 11 ;
      Nov 04 23:08:49 mon-db-st mysqld[1180]: This could be because you hit a bug. It is also possible that this binary
      Nov 04 23:08:49 mon-db-st mysqld[1180]: or one of the libraries it was linked against is corrupt, improperly built,
      Nov 04 23:08:49 mon-db-st mysqld[1180]: or misconfigured. This error can also be caused by malfunctioning hardware.
      Nov 04 23:08:49 mon-db-st mysqld[1180]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      Nov 04 23:08:49 mon-db-st mysqld[1180]: We will try our best to scrape up some info that will hopefully help
      Nov 04 23:08:49 mon-db-st mysqld[1180]: diagnose the problem, but since we have already crashed,
      Nov 04 23:08:49 mon-db-st mysqld[1180]: something is definitely wrong and this may fail.
      Nov 04 23:08:49 mon-db-st mysqld[1180]: Server version: 10.2.10-MariaDB-10.2.10+maria~stretch-log
      Nov 04 23:08:49 mon-db-st mysqld[1180]: key_buffer_size=33554432
      Nov 04 23:08:49 mon-db-st mysqld[1180]: read_buffer_size=1048576
      Nov 04 23:08:49 mon-db-st mysqld[1180]: max_used_connections=6
      Nov 04 23:08:49 mon-db-st mysqld[1180]: max_threads=302
      Nov 04 23:08:49 mon-db-st mysqld[1180]: thread_count=13
      Nov 04 23:08:49 mon-db-st mysqld[1180]: It is possible that mysqld could use up to
      Nov 04 23:08:49 mon-db-st mysqld[1180]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 966889 K  bytes of memory
      Nov 04 23:08:49 mon-db-st mysqld[1180]: Hope that's ok; if not, decrease some variables in the equation.
      Nov 04 23:08:49 mon-db-st mysqld[1180]: Thread pointer: 0x7f2fc00009a8
      Nov 04 23:08:49 mon-db-st mysqld[1180]: Attempting backtrace. You can use the following information to find out
      Nov 04 23:08:49 mon-db-st mysqld[1180]: where mysqld died. If you see no messages after this, something went
      Nov 04 23:08:49 mon-db-st mysqld[1180]: terribly wrong...
      Nov 04 23:08:49 mon-db-st mysqld[1180]: stack_bottom = 0x7f308c1ded18 thread_stack 0x49000
      Nov 04 23:08:49 mon-db-st mysqld[1180]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x56063106bc2e]
      Nov 04 23:08:49 mon-db-st mysqld[1180]: /usr/sbin/mysqld(handle_fatal_signal+0x3cd)[0x560630b0a8dd]
      Nov 04 23:08:49 mon-db-st mysqld[1180]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7f30a947c0c0]
      Nov 04 23:08:49 mon-db-st mysqld[1180]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x97)[0x560630bd8e97]
      Nov 04 23:08:49 mon-db-st mysqld[1180]: /usr/sbin/mysqld(binlog_background_thread+0x323)[0x560630bd9573]
      Nov 04 23:08:49 mon-db-st mysqld[1180]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7494)[0x7f30a9472494]
      Nov 04 23:08:49 mon-db-st mysqld[1180]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f30a7743aff]
      Nov 04 23:08:49 mon-db-st mysqld[1180]: Trying to get some variables.
      Nov 04 23:08:49 mon-db-st mysqld[1180]: Some pointers may be invalid and cause the dump to abort.
      Nov 04 23:08:49 mon-db-st mysqld[1180]: Query (0x0):
      Nov 04 23:08:49 mon-db-st mysqld[1180]: Connection ID (thread ID): 6
      Nov 04 23:08:49 mon-db-st mysqld[1180]: Status: NOT_KILLED
      Nov 04 23:08:49 mon-db-st mysqld[1180]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
      Nov 04 23:08:49 mon-db-st mysqld[1180]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      Nov 04 23:08:49 mon-db-st mysqld[1180]: information that should help you find out what is causing the crash.
      Nov 04 23:08:50 mon-db-st systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV
      

      I just try move some data from table to table,
      src table TokuDB, dst table RocksDB
      When binlog reach maximum size and new binlog must be created, mariadb just crash with above info.

      Thanks

      Attachments

        Issue Links

          Activity

            • MyRocks' implementation of rocksdb_checkpoint_request looks trivial.
            • Putting a breakpoint there makes the crash go away
            psergei Sergei Petrunia added a comment - MyRocks' implementation of rocksdb_checkpoint_request looks trivial. Putting a breakpoint there makes the crash go away

            commit 77872e4519a9ebba2cd2ab4f5794faccf0a1f879
            Author: Elena Stepanova <elenst@montyprogram.com>
            Date:   Mon Nov 27 12:04:51 2017 +0200
             
                Additions to the list of unstable tests for 10.2.11
            

            Steps to reproduce:

            [mysqld]
            binlog-format=row
            log_bin=pslp
            sync_binlog=1
            server-id=12
            

            after server start:

            set global max_binlog_size=1024000;
            CREATE TABLE `tslap2` (
              `pk` int(11) NOT NULL AUTO_INCREMENT,
              `a` varchar(255) DEFAULT NULL,
              `b` varchar(255) DEFAULT NULL,
              `c` varchar(255) DEFAULT NULL,
              `d` varchar(255) DEFAULT NULL,
              PRIMARY KEY (`pk`)
            ) ENGINE=ROCKSDB AUTO_INCREMENT=28175 DEFAULT CHARSET=latin1;
            

            file1.pl:

            #!/usr/bin/perl
             
            for ($i =0; $i < 1000*1000; $i++) { 
              print "insert into tslap2 (a,b,c,d) values (repeat(uuid(),7),repeat(uuid(),7),repeat(uuid(),7),repeat(uuid(),7));\n";
            }
            

            Then

            perl file1.pl | mysql -uroot -A --socket=/tmp/mysql4.sock test
            

            psergei Sergei Petrunia added a comment - commit 77872e4519a9ebba2cd2ab4f5794faccf0a1f879 Author: Elena Stepanova <elenst@montyprogram.com> Date: Mon Nov 27 12:04:51 2017 +0200   Additions to the list of unstable tests for 10.2.11 Steps to reproduce: [mysqld] binlog-format=row log_bin=pslp sync_binlog=1 server-id=12 after server start: set global max_binlog_size=1024000; CREATE TABLE `tslap2` ( `pk` int(11) NOT NULL AUTO_INCREMENT, `a` varchar(255) DEFAULT NULL, `b` varchar(255) DEFAULT NULL, `c` varchar(255) DEFAULT NULL, `d` varchar(255) DEFAULT NULL, PRIMARY KEY (`pk`) ) ENGINE=ROCKSDB AUTO_INCREMENT=28175 DEFAULT CHARSET=latin1; file1.pl: #!/usr/bin/perl   for ($i =0; $i < 1000*1000; $i++) { print "insert into tslap2 (a,b,c,d) values (repeat(uuid(),7),repeat(uuid(),7),repeat(uuid(),7),repeat(uuid(),7));\n"; } Then perl file1.pl | mysql -uroot -A --socket=/tmp/mysql4.sock test

            Reproducible on 1d526f31fbb417ecb3e17b1c965050ae18fe3997, too

            psergei Sergei Petrunia added a comment - Reproducible on 1d526f31fbb417ecb3e17b1c965050ae18fe3997, too

            On

            commit 2c1e4d4d7a174c180cfcac5e245840aec8458b77
            Author: Sergei Petrunia <psergey@askmonty.org>
            Date:   Tue Dec 5 16:33:38 2017 +0300
             
                MDEV-14563: Wrong query plan for query with no PK
            

            The failure is reproducible on the tree. But even a patch like this:

            diff --git a/sql/log.cc b/sql/log.cc
            index 93d7a00..52e24b2 100644
            --- a/sql/log.cc
            +++ b/sql/log.cc
            @@ -9535,6 +9535,8 @@ TC_LOG_BINLOG::mark_xids_active(ulong binlog_id, uint xid_count)
               {
                 if (b->binlog_id == binlog_id)
                 {
            +      fprintf(stderr,"TC_LOG_BINLOG::mark_xids_active binlog_id=%ld,  %ld + %ld\n",
            +              binlog_id, b->xid_count, xid_count);
                   b->xid_count += xid_count;
                   break;
                 }
            @@ -9575,7 +9577,9 @@ TC_LOG_BINLOG::mark_xid_done(ulong binlog_id, bool write_checkpoint)
               {
                 if (b->binlog_id == binlog_id)
                 {
            +      //fprintf(stderr,"TC_LOG_BINLOG::mark_xids_active %d %d\n", binlog_id, xid_count);
                   --b->xid_count;
            +      fprintf(stderr, "TC_LOG_BINLOG::mark_xid_done (x-1= %ld)\n", b->xid_count);
             
                   DBUG_ASSERT(b->xid_count >= 0); // catch unmatched (++) decrement
             
            
            

            makes it go away. It seems to be a race condition, and I still have no idea about its cause

            psergei Sergei Petrunia added a comment - On commit 2c1e4d4d7a174c180cfcac5e245840aec8458b77 Author: Sergei Petrunia <psergey@askmonty.org> Date: Tue Dec 5 16:33:38 2017 +0300   MDEV-14563: Wrong query plan for query with no PK The failure is reproducible on the tree. But even a patch like this: diff --git a/sql/log.cc b/sql/log.cc index 93d7a00..52e24b2 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -9535,6 +9535,8 @@ TC_LOG_BINLOG::mark_xids_active(ulong binlog_id, uint xid_count) { if (b->binlog_id == binlog_id) { + fprintf(stderr,"TC_LOG_BINLOG::mark_xids_active binlog_id=%ld, %ld + %ld\n", + binlog_id, b->xid_count, xid_count); b->xid_count += xid_count; break; } @@ -9575,7 +9577,9 @@ TC_LOG_BINLOG::mark_xid_done(ulong binlog_id, bool write_checkpoint) { if (b->binlog_id == binlog_id) { + //fprintf(stderr,"TC_LOG_BINLOG::mark_xids_active %d %d\n", binlog_id, xid_count); --b->xid_count; + fprintf(stderr, "TC_LOG_BINLOG::mark_xid_done (x-1= %ld)\n", b->xid_count); DBUG_ASSERT(b->xid_count >= 0); // catch unmatched (++) decrement makes it go away. It seems to be a race condition, and I still have no idea about its cause

            Didn't find any issues in Andrei's patch. Pushed it to 10.2

            psergei Sergei Petrunia added a comment - Didn't find any issues in Andrei's patch. Pushed it to 10.2

            People

              psergei Sergei Petrunia
              Nefelim4ag Timofey
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.