Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16575

rocksdb.bulk_load_errors fails in buildbot with wrong result

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/winx64-packages/builds/8855

      rocksdb.bulk_load_errors                 w2 [ fail ]
              Test ended at 2018-06-24 23:19:29
       
      CURRENT_TEST: rocksdb.bulk_load_errors
      --- D:/winx64-packages/build/src/storage/rocksdb/mysql-test/rocksdb/r/bulk_load_errors.result	2018-06-24 22:01:26.000000000 +0000
      +++ D:\winx64-packages\build\src\storage\rocksdb\mysql-test\rocksdb\r\bulk_load_errors.reject	2018-06-24 23:19:29.697502800 +0000
      @@ -37,7 +37,7 @@
       pk
       10
       11
      -FOUND 1 /RocksDB: Error [0-9]+ finalizing last SST file while disconnecting/ in rocksdb.bulk_load_errors.2.err
      +NOT FOUND /RocksDB: Error [0-9]+ finalizing last SST file while disconnecting/ in rocksdb.bulk_load_errors.2.err
       TRUNCATE TABLE t1;
       SET rocksdb_bulk_load_allow_unsorted=1;
       SET rocksdb_bulk_load=1;
       
      mysqltest: Result length mismatch
      

      Attachments

        Activity

          Investigation: this should be "Found error 198". The cause of the error is that the datafiles (or is it the MemTable?) still have the data (and/or tombstones) with the same keys as the ones being loaded.

          The upstream has this error.

          psergei Sergei Petrunia added a comment - Investigation: this should be "Found error 198". The cause of the error is that the datafiles (or is it the MemTable?) still have the data (and/or tombstones) with the same keys as the ones being loaded. The upstream has this error.

          Looking at the failing test:

          # Make sure we get an error in log when we disconnect and do not assert the server
          --connect (con1,localhost,root,,)
          SET rocksdb_bulk_load=1;
          INSERT INTO t1 VALUES(1);
          INSERT INTO t1 VALUES(2);
          INSERT INTO t1 VALUES(20);
          INSERT INTO t1 VALUES(21);
          --connection default
          --disconnect con1
           
          SELECT * FROM t1;
           
          --source include/wait_until_count_sessions.inc
           
          --let SEARCH_FILE=$LOG2
          --let SEARCH_PATTERN=RocksDB: Error [0-9]+ finalizing last SST file while disconnecting
          --source include/search_pattern_in_file.inc
          

          dosconnect con1 will cause myrocks::rocksdb_close_connection to be called, which will try to finish the bulk load and hit the error.

          How can we be sure that search_pattern_in_file.inc is called AFTER the bulk load mode cleanup is finished? It seems the author of the testcase tried to achieve that by calling wait_until_count_sessions.inc.

          The problem is that it doesn't help in MariaDB. I have modified the code in myrocks::rocksdb_close_connection to loop forever, and then I can see that the disconnecting thread is stuck here:

            
            Breakpoint 2, myrocks::rocksdb_close_connection (hton=0x5555579ac410, thd=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/storage/rocksdb/ha_rocksdb.cc:3168
          (gdb) wher
            #0  myrocks::rocksdb_close_connection (hton=0x5555579ac410, thd=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/storage/rocksdb/ha_rocksdb.cc:3168
            #1  0x0000555555d8099f in closecon_handlerton (thd=0x7fff64000b00, plugin=0x7fff64007de0, unused=0x0) at /home/psergey/dev-git/10.2-r11/sql/handler.cc:783
            #2  0x0000555555b26141 in plugin_foreach_with_mask (thd=0x7fff64000b00, func=0x555555d80924 <closecon_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=10, arg=0x0) at /home/psergey/dev-git/10.2-r11/sql/sql_plugin.cc:2398
            #3  0x0000555555d809ed in ha_close_connection (thd=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/sql/handler.cc:798
            #4  0x0000555555abf5b3 in THD::free_connection (this=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/sql/sql_class.cc:1432
            #5  0x0000555555a27896 in unlink_thd (thd=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/sql/mysqld.cc:2916
            #6  0x0000555555a27d53 in one_thread_per_connection_end (thd=0x7fff64000b00, put_in_cache=true) at /home/psergey/dev-git/10.2-r11/sql/mysqld.cc:3061
            #7  0x0000555555c55cf1 in do_handle_one_connection (connect=0x5555587596c0) at /home/psergey/dev-git/10.2-r11/sql/sql_connect.cc:1354
            #8  0x0000555555c55995 in handle_one_connection (arg=0x5555587596c0) at /home/psergey/dev-git/10.2-r11/sql/sql_connect.cc:1241
            #9  0x00007ffff64fa6aa in start_thread (arg=0x7ffff40f7700) at pthread_create.c:333
            #10 0x00007ffff5a02eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
          (gdb) break "/home/psergey/dev-git/10.2-r11/storage/rocksdb/ha_rocksdb.cc:3174"
          

          while wait_for_count_sessions.inc would not have succeeded: Threads_connected status variable shows 1, I_S.processlist doesn't include the thread being disconnected, etc, etc.

          psergei Sergei Petrunia added a comment - Looking at the failing test: # Make sure we get an error in log when we disconnect and do not assert the server --connect (con1,localhost,root,,) SET rocksdb_bulk_load=1; INSERT INTO t1 VALUES(1); INSERT INTO t1 VALUES(2); INSERT INTO t1 VALUES(20); INSERT INTO t1 VALUES(21); --connection default --disconnect con1   SELECT * FROM t1;   --source include/wait_until_count_sessions.inc   --let SEARCH_FILE=$LOG2 --let SEARCH_PATTERN=RocksDB: Error [0-9]+ finalizing last SST file while disconnecting --source include/search_pattern_in_file.inc dosconnect con1 will cause myrocks::rocksdb_close_connection to be called, which will try to finish the bulk load and hit the error. How can we be sure that search_pattern_in_file.inc is called AFTER the bulk load mode cleanup is finished? It seems the author of the testcase tried to achieve that by calling wait_until_count_sessions.inc . The problem is that it doesn't help in MariaDB. I have modified the code in myrocks::rocksdb_close_connection to loop forever, and then I can see that the disconnecting thread is stuck here: Breakpoint 2, myrocks::rocksdb_close_connection (hton=0x5555579ac410, thd=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/storage/rocksdb/ha_rocksdb.cc:3168 (gdb) wher #0 myrocks::rocksdb_close_connection (hton=0x5555579ac410, thd=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/storage/rocksdb/ha_rocksdb.cc:3168 #1 0x0000555555d8099f in closecon_handlerton (thd=0x7fff64000b00, plugin=0x7fff64007de0, unused=0x0) at /home/psergey/dev-git/10.2-r11/sql/handler.cc:783 #2 0x0000555555b26141 in plugin_foreach_with_mask (thd=0x7fff64000b00, func=0x555555d80924 <closecon_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=10, arg=0x0) at /home/psergey/dev-git/10.2-r11/sql/sql_plugin.cc:2398 #3 0x0000555555d809ed in ha_close_connection (thd=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/sql/handler.cc:798 #4 0x0000555555abf5b3 in THD::free_connection (this=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/sql/sql_class.cc:1432 #5 0x0000555555a27896 in unlink_thd (thd=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/sql/mysqld.cc:2916 #6 0x0000555555a27d53 in one_thread_per_connection_end (thd=0x7fff64000b00, put_in_cache=true) at /home/psergey/dev-git/10.2-r11/sql/mysqld.cc:3061 #7 0x0000555555c55cf1 in do_handle_one_connection (connect=0x5555587596c0) at /home/psergey/dev-git/10.2-r11/sql/sql_connect.cc:1354 #8 0x0000555555c55995 in handle_one_connection (arg=0x5555587596c0) at /home/psergey/dev-git/10.2-r11/sql/sql_connect.cc:1241 #9 0x00007ffff64fa6aa in start_thread (arg=0x7ffff40f7700) at pthread_create.c:333 #10 0x00007ffff5a02eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 (gdb) break "/home/psergey/dev-git/10.2-r11/storage/rocksdb/ha_rocksdb.cc:3174" while wait_for_count_sessions.inc would not have succeeded: Threads_connected status variable shows 1, I_S.processlist doesn't include the thread being disconnected, etc, etc.

          The upstream does not have this problem: If I force the execution to loop in myrocks::rocksdb_close_connection(), the thread that is being terminated is still counted in Threads_connected counter:

          MySQL [test]> show status like 'Threads_connected';
          +-------------------+-------+
          | Variable_name     | Value |
          +-------------------+-------+
          | Threads_connected | 2     |
          +-------------------+-------+
          

          psergei Sergei Petrunia added a comment - The upstream does not have this problem: If I force the execution to loop in myrocks::rocksdb_close_connection(), the thread that is being terminated is still counted in Threads_connected counter: MySQL [test]> show status like 'Threads_connected'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | Threads_connected | 2 | +-------------------+-------+

          I don't see any ways to make sure that myrocks::rocksdb_close_connection is finished, except A) using DEBUG_SYNC (which is only available in debug builds), or B) properly shutting down the server and starting it back.

          Implementing B.

          psergei Sergei Petrunia added a comment - I don't see any ways to make sure that myrocks::rocksdb_close_connection is finished, except A) using DEBUG_SYNC (which is only available in debug builds), or B) properly shutting down the server and starting it back. Implementing B.

          People

            psergei Sergei Petrunia
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.