[MDEV-16575] rocksdb.bulk_load_errors fails in buildbot with wrong result Created: 2018-06-25  Updated: 2018-08-13  Resolved: 2018-08-13

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - RocksDB, Tests
Affects Version/s: 10.2
Fix Version/s: 10.2.17, 10.3.9

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None


 Description   

http://buildbot.askmonty.org/buildbot/builders/winx64-packages/builds/8855

rocksdb.bulk_load_errors                 w2 [ fail ]
        Test ended at 2018-06-24 23:19:29
 
CURRENT_TEST: rocksdb.bulk_load_errors
--- D:/winx64-packages/build/src/storage/rocksdb/mysql-test/rocksdb/r/bulk_load_errors.result	2018-06-24 22:01:26.000000000 +0000
+++ D:\winx64-packages\build\src\storage\rocksdb\mysql-test\rocksdb\r\bulk_load_errors.reject	2018-06-24 23:19:29.697502800 +0000
@@ -37,7 +37,7 @@
 pk
 10
 11
-FOUND 1 /RocksDB: Error [0-9]+ finalizing last SST file while disconnecting/ in rocksdb.bulk_load_errors.2.err
+NOT FOUND /RocksDB: Error [0-9]+ finalizing last SST file while disconnecting/ in rocksdb.bulk_load_errors.2.err
 TRUNCATE TABLE t1;
 SET rocksdb_bulk_load_allow_unsorted=1;
 SET rocksdb_bulk_load=1;
 
mysqltest: Result length mismatch



 Comments   
Comment by Sergei Petrunia [ 2018-08-13 ]

Investigation: this should be "Found error 198". The cause of the error is that the datafiles (or is it the MemTable?) still have the data (and/or tombstones) with the same keys as the ones being loaded.

The upstream has this error.

Comment by Sergei Petrunia [ 2018-08-13 ]

Looking at the failing test:

# Make sure we get an error in log when we disconnect and do not assert the server
--connect (con1,localhost,root,,)
SET rocksdb_bulk_load=1;
INSERT INTO t1 VALUES(1);
INSERT INTO t1 VALUES(2);
INSERT INTO t1 VALUES(20);
INSERT INTO t1 VALUES(21);
--connection default
--disconnect con1
 
SELECT * FROM t1;
 
--source include/wait_until_count_sessions.inc
 
--let SEARCH_FILE=$LOG2
--let SEARCH_PATTERN=RocksDB: Error [0-9]+ finalizing last SST file while disconnecting
--source include/search_pattern_in_file.inc

dosconnect con1 will cause myrocks::rocksdb_close_connection to be called, which will try to finish the bulk load and hit the error.

How can we be sure that search_pattern_in_file.inc is called AFTER the bulk load mode cleanup is finished? It seems the author of the testcase tried to achieve that by calling wait_until_count_sessions.inc.

The problem is that it doesn't help in MariaDB. I have modified the code in myrocks::rocksdb_close_connection to loop forever, and then I can see that the disconnecting thread is stuck here:

  
  Breakpoint 2, myrocks::rocksdb_close_connection (hton=0x5555579ac410, thd=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/storage/rocksdb/ha_rocksdb.cc:3168
(gdb) wher
  #0  myrocks::rocksdb_close_connection (hton=0x5555579ac410, thd=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/storage/rocksdb/ha_rocksdb.cc:3168
  #1  0x0000555555d8099f in closecon_handlerton (thd=0x7fff64000b00, plugin=0x7fff64007de0, unused=0x0) at /home/psergey/dev-git/10.2-r11/sql/handler.cc:783
  #2  0x0000555555b26141 in plugin_foreach_with_mask (thd=0x7fff64000b00, func=0x555555d80924 <closecon_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=10, arg=0x0) at /home/psergey/dev-git/10.2-r11/sql/sql_plugin.cc:2398
  #3  0x0000555555d809ed in ha_close_connection (thd=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/sql/handler.cc:798
  #4  0x0000555555abf5b3 in THD::free_connection (this=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/sql/sql_class.cc:1432
  #5  0x0000555555a27896 in unlink_thd (thd=0x7fff64000b00) at /home/psergey/dev-git/10.2-r11/sql/mysqld.cc:2916
  #6  0x0000555555a27d53 in one_thread_per_connection_end (thd=0x7fff64000b00, put_in_cache=true) at /home/psergey/dev-git/10.2-r11/sql/mysqld.cc:3061
  #7  0x0000555555c55cf1 in do_handle_one_connection (connect=0x5555587596c0) at /home/psergey/dev-git/10.2-r11/sql/sql_connect.cc:1354
  #8  0x0000555555c55995 in handle_one_connection (arg=0x5555587596c0) at /home/psergey/dev-git/10.2-r11/sql/sql_connect.cc:1241
  #9  0x00007ffff64fa6aa in start_thread (arg=0x7ffff40f7700) at pthread_create.c:333
  #10 0x00007ffff5a02eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) break "/home/psergey/dev-git/10.2-r11/storage/rocksdb/ha_rocksdb.cc:3174"

while wait_for_count_sessions.inc would not have succeeded: Threads_connected status variable shows 1, I_S.processlist doesn't include the thread being disconnected, etc, etc.

Comment by Sergei Petrunia [ 2018-08-13 ]

The upstream does not have this problem: If I force the execution to loop in myrocks::rocksdb_close_connection(), the thread that is being terminated is still counted in Threads_connected counter:

MySQL [test]> show status like 'Threads_connected';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_connected | 2     |
+-------------------+-------+

Comment by Sergei Petrunia [ 2018-08-13 ]

I don't see any ways to make sure that myrocks::rocksdb_close_connection is finished, except A) using DEBUG_SYNC (which is only available in debug builds), or B) properly shutting down the server and starting it back.

Implementing B.

Generated at Thu Feb 08 08:29:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.