LevelDB storage engine (MDEV-3841)

[MDEV-4053] LevelDB: DELETE hangs in state System lock on concurrent DELETE / INSERT flow Created: 2013-01-15  Updated: 2013-02-06  Resolved: 2013-02-06

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Technical task Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: leveldb
Environment:

Linux 32-bit


Issue Links:
Relates

 Description   

Important: It seems that it's a 32bit-specific issue, it is easily reproducible on two different 32-bit machines (slow openSUSE 12.2, decent CentOS 5.8 in a VM), but I could never reproduce it on 64 bit. Please try a 32-bit machine if possible.

If the test is run with Valgrind, it also causes valgrind warnings (see at the bottom of the description); but again, I could only get them on 32-bit machines.

process list:

+----+------+-----------------+------+---------+------+-------------+-----------------------------------------------------------------------------------------------------------------+
| Id | User | Host            | db   | Command | Time | State       | Info                                                                                                            |
+----+------+-----------------+------+---------+------+-------------+-----------------------------------------------------------------------------------------------------------------+
|  6 | root | localhost:49747 | test | Query   |  371 | System lock | DELETE FROM `B` WHERE `pk` IN ( 2515140608 , 8 , 'n' , 188 , 203 ) ORDER BY `col_char_8`,`col_int`,`pk` LIMIT 3 |
|  3 | root | localhost:49742 | NULL | Query   |    0 | init        | show full processlist                                                                                           |
+----+------+-----------------+------+---------+------+-------------+-----------------------------------------------------------------------------------------------------------------+

Stack trace from a thread with DELETE on the living server:

Thread 3 (Thread 0xa5ab2b40 (LWP 24549)):
#0  0x0895a118 in my_tosort_unicode (uni_plane=0x90535a0 <my_unicase_default>, wc=0xa5aaff58, flags=993) at mysql-5.6-leveldb/strings/ctype-utf8.c:4842
#1  0x0895b067 in my_hash_sort_utf8 (cs=0x9058300 <my_charset_utf8_general_ci>, s=0xd877880 "test.B", slen=6, n1=0xa5aaff9c, n2=0xa5aaff98) at mysql-5.6-leveldb/strings/ctype-utf8.c:5568
#2  0x085ee363 in calc_hash (hash=0x9098160 <ddl_manager>, key=0xd877880 "test.B", length=6) at mysql-5.6-leveldb/mysys/hash.c:46
#3  0x085ee801 in my_hash_first (hash=0x9098160 <ddl_manager>, key=0xd877880 "test.B", length=6, current_record=0xa5ab000c) at mysql-5.6-leveldb/mysys/hash.c:248
#4  0x085ee74a in my_hash_search (hash=0x9098160 <ddl_manager>, key=0xd877880 "test.B", length=6) at mysql-5.6-leveldb/mysys/hash.c:216
#5  0x0892077d in Table_ddl_manager::find (this=0x9098160 <ddl_manager>, table_name=0xd877880 "test.B", table_name_len=6) at mysql-5.6-leveldb/storage/leveldb/ldb_datadic.cc:566
#6  0x0891dfb3 in Primary_key_comparator::get_hashnr (key_arg=0xd877880 "test.B", key_len=11) at mysql-5.6-leveldb/storage/leveldb/ha_leveldb.cc:1453
#7  0x089361ed in calc_hash (hash=0x90981a0 <row_locks>, key=0xd877880 "test.B", keylen=11) at mysql-5.6-leveldb/mysys/lf_hash.c:302
#8  0x089365f3 in lf_hash_delete (hash=0x90981a0 <row_locks>, pins=0xa594af40, key=0xd877880, keylen=11) at mysql-5.6-leveldb/mysys/lf_hash.c:431
#9  0x08921215 in LockTable::release_lock (this=0x90981a0 <row_locks>, pins=0xa594af40, own_lock=0xa594f578) at mysql-5.6-leveldb/storage/leveldb/ldb_locks.cc:277
#10 0x0891ced7 in ha_leveldb::rnd_next (this=0xae80d38, buf=0xae80f00  <incomplete sequence \373>) at mysql-5.6-leveldb/storage/leveldb/ha_leveldb.cc:967
#11 0x0824edc0 in handler::ha_rnd_next (this=0xae80d38, buf=0xae80f00  <incomplete sequence \373>) at mysql-5.6-leveldb/sql/handler.cc:2603
#12 0x084c2dc3 in find_all_keys (param=0xa5ab0410, select=0xaf90060, fs_info=0xa5ab03d8, buffpek_pointers=0xa5ab0500, tempfile=0xa5ab0460, pq=0x0, found_rows=0xa5ab0b38) at mysql-5.6-leveldb/sql/filesort.cc:764
#13 0x084c1970 in filesort (thd=0xae7e8b8, table=0xafb9500, filesort=0xa5ab0b20, sort_positions=true, examined_rows=0xa5ab0b40, found_rows=0xa5ab0b38) at mysql-5.6-leveldb/sql/filesort.cc:337
#14 0x0852593a in mysql_delete (thd=0xae7e8b8, table_list=0xaf8f430, conds=0xaf8fb58, order_list=0xae804f4, limit=3, options=0) at mysql-5.6-leveldb/sql/sql_delete.cc:302
#15 0x083a7eb3 in mysql_execute_command (thd=0xae7e8b8) at mysql-5.6-leveldb/sql/sql_parse.cc:3469
#16 0x083aead8 in mysql_parse (thd=0xae7e8b8, rawbuf=0xaf8f2d8 "DELETE FROM `B` WHERE `pk` IN ( 2515140608 , 8 , 'n' , 188 , 203 ) ORDER BY `col_char_8`,`col_int`,`pk` LIMIT 3", length=111, parser_state=0xa5ab1758) at mysql-5.6-leveldb/sql/sql_parse.cc:6097
#17 0x083a34be in dispatch_command (command=COM_QUERY, thd=0xae7e8b8, packet=0xae81141 "DELETE FROM `B` WHERE `pk` IN ( 2515140608 , 8 , 'n' , 188 , 203 ) ORDER BY `col_char_8`,`col_int`,`pk` LIMIT 3", packet_length=111) at mysql-5.6-leveldb/sql/sql_parse.cc:1312
#18 0x083a28ee in do_command (thd=0xae7e8b8) at mysql-5.6-leveldb/sql/sql_parse.cc:1036
#19 0x083722a9 in do_handle_one_connection (thd_arg=0xae7e8b8) at mysql-5.6-leveldb/sql/sql_connect.cc:969
#20 0x08371e52 in handle_one_connection (arg=0xae7e8b8) at mysql-5.6-leveldb/sql/sql_connect.cc:885
#21 0x08650cc7 in pfs_spawn_thread (arg=0xae41a68) at mysql-5.6-leveldb/storage/perfschema/pfs.cc:1853
#22 0xb76c4e32 in start_thread () from /lib/libpthread.so.0
#23 0xb74348ee in clone () from /lib/libc.so.6

RQG data template (save as randgen/leveldb-bug-1.zz):

 
$tables = {
	names => ['A','B'],
        rows => [0,10],
	pk => [ 'int' ],
	engines => [ 'LevelDB' ]
};
 
$fields = {
        types => [ 'int', 'char(8)' ],
        indexes => [ undef ]
};
 

RQG grammar (save as randgen/leveldb-bug-1.yy):

 
query:
	insert_replace | delete ;
 
subselect:
	SELECT _field FROM _table WHERE `pk` = value ;
 
insert_replace:
	INSERT INTO _table ( _field_no_pk ) SELECT _field FROM _table ORDER BY _field_list LIMIT large_digit;
 
delete:
	DELETE FROM _table where_delete ORDER BY _field_list LIMIT small_digit ;
 
where_delete:
	WHERE _field IN ( value , value , value , value , value ) |
	WHERE _field BETWEEN small_digit AND large_digit ;
 
large_digit:
	5 | 6 | 7 | 8 ;
 
small_digit:
	1 | 2 | 3 | 4 ;
 
value:
	_digit | _tinyint_unsigned | _varchar(1) | _int_unsigned ;
 

RQG command line (fix dsn and make sure that your server is running and doesn't have anything in test schema; for higher probability of getting the problem, try running with valgrind):

perl gentest.pl --gendata=leveldb-bug-1.zz --threads=4 --queries=100M --duration=300 --dsn=dbi:mysql:host=127.0.0.1:port=3306:user=root:database=test --grammar=leveldb-bug-1.yy

Valgrind warnings:

==26754== Thread 23:
==26754== Use of uninitialised value of size 4
==26754==    at 0x40AC794: leveldb::crc32c::Extend(unsigned int, char const*, unsigned int) (in /home/elenst/leveldb-1.7.0/libleveldb.so.1.7)
==26754==    by 0x4093278: leveldb::log::Writer::EmitPhysicalRecord(leveldb::log::RecordType, char const*, unsigned int) (in /home/elenst/leveldb-1.7.0/libleveldb.so.1.7)
==26754==    by 0x4093499: leveldb::log::Writer::AddRecord(leveldb::Slice const&) (in /home/elenst/leveldb-1.7.0/libleveldb.so.1.7)
==26754==    by 0x4087B03: leveldb::DBImpl::Write(leveldb::WriteOptions const&, leveldb::WriteBatch*) (in /home/elenst/leveldb-1.7.0/libleveldb.so.1.7)
==26754==    by 0x892768D: Ldb_transaction::commit() (ha_leveldb.cc:241)
==26754==    by 0x89265E6: ha_leveldb::external_lock(THD*, int) (ha_leveldb.cc:1222)
==26754==    by 0x8258023: handler::ha_external_lock(THD*, int) (handler.cc:7050)
==26754==    by 0x84E04CD: unlock_external(THD*, TABLE**, unsigned int) (lock.cc:646)
==26754==    by 0x84DFBAF: mysql_unlock_tables(THD*, st_mysql_lock*) (lock.cc:389)
==26754==    by 0x8349B5A: close_thread_tables(THD*) (sql_base.cc:1446)
==26754==    by 0x83AEF99: mysql_execute_command(THD*) (sql_parse.cc:4903)
==26754==    by 0x83B1705: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6097)
==26754==    by 0x83A6097: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1312)
==26754==    by 0x83A54C7: do_command(THD*) (sql_parse.cc:1036)
==26754==    by 0x83749B6: do_handle_one_connection(THD*) (sql_connect.cc:969)
==26754==    by 0x837455F: handle_one_connection (sql_connect.cc:885)
 



 Comments   
Comment by Elena Stepanova [ 2013-01-17 ]

Added information about 32-bit specifics and valgrind warnings.

Comment by Elena Stepanova [ 2013-01-17 ]

Most universal way to get the hang at the moment seems to be to run the test with threads=4 and with valgrind (works equally on both my 32-bit machines). Whether valgrind warnings are produced or not, might vary, but a thread always hangs.
I've updated the command line to threads=4.

Comment by Sergei Petrunia [ 2013-01-21 ]

Could not repeat the hang. Instead, I get a valgrind failure when running this load:

==1341== Invalid read of size 1
==1341== at 0x8A8523D: my_utf8_uni (ctype-utf8.c:5282)
==1341== by 0x8A85BF5: my_strnncoll_utf8 (ctype-utf8.c:5674)
==1341== by 0x86ADE65: hashcmp (hash.c:359)
==1341== by 0x86ADC65: my_hash_first_from_hash_value (hash.c:275)
==1341== by 0x86ADBAC: my_hash_first (hash.c:248)
==1341== by 0x86ADABD: my_hash_search (hash.c:216)
==1341== by 0x86E5854: Table_ddl_manager::find(unsigned char*, unsigned int) (ldb_datadic.cc:566)
==1341== by 0x86E4BB8: Primary_key_comparator::Compare(leveldb::Slice const&, leveldb::Slice const&) const (ha_leveldb.cc:154)
==1341== by 0x86E0BAF: Primary_key_comparator::compare_keys(unsigned char const*, unsigned int, unsigned char const*, unsigned int) (ha_leveldb.cc:182)
==1341== by 0x8A5E49B: lfind (lf_hash.c:114)
==1341== by 0x8A5E9D8: lsearch (lf_hash.c:274)
==1341== by 0x8A5F203: lf_hash_search (lf_hash.c:479)
==1341== Address 0x1856d1fb is 3 bytes inside a block of size 12 free'd
==1341== at 0x4025DFA: free (vg_replace_malloc.c:323)
==1341== by 0x86C872E: my_free (my_malloc.c:130)
==1341== by 0x86E76AC: LockTable::release_lock(LF_PINS*, Row_lock*) (ldb_locks.cc:279)
==1341== by 0x86E3E1E: ha_leveldb::write_row(unsigned char*) (ha_leveldb.cc:881)
==1341== by 0x8230964: handler::ha_write_row(unsigned char*) (handler.cc:7131)
==1341== by 0x83C3570: write_record(THD*, TABLE*, COPY_INFO*, COPY_INFO*) (sql_insert.cc:1858)
==1341== by 0x83C39AE: select_insert::send_data(List<Item>&) (sql_insert.cc:3630)
==1341== by 0x83AFBAB: end_send(JOIN*, st_join_table*, bool) (sql_executor.cc:2719)
==1341== by 0x83B609F: evaluate_join_record(JOIN*, st_join_table*) (sql_executor.cc:1591)
==1341== by 0x83B64FE: QEP_tmp_table::end_send() (sql_executor.cc:4307)
==1341== by 0x83B6D52: sub_select_op(JOIN*, st_join_table*, bool) (sql_executor.cc:1045)
==1341== by 0x83B685B: sub_select(JOIN*, st_join_table*, bool) (sql_executor.cc:1199)
==1341==
==1341== Invalid read of size 1
==1341== at 0x8A5B2D9: ha_key_cmp (my_compare.c:302)
==1341== by 0x86E4A86: LDBSE_KEYDEF::cmp_keys(char const*, char const*, unsigned int) (ldb_datadic.h:38)
==1341== by 0x86E4C14: Primary_key_comparator::Compare(leveldb::Slice const&, leveldb::Slice const&) const (ha_leveldb.cc:158)
==1341== by 0x86E0BAF: Primary_key_comparator::compare_keys(unsigned char const*, unsigned int, unsigned char const*, unsigned int) (ha_leveldb.cc:182)
==1341== by 0x8A5E49B: lfind (lf_hash.c:114)
==1341== by 0x8A5E9D8: lsearch (lf_hash.c:274)
==1341== by 0x8A5F203: lf_hash_search (lf_hash.c:479)
==1341== by 0x86E78EC: LockTable::get_lock(LF_PINS*, unsigned char const*, unsigned int, int) (ldb_locks.cc:125)
==1341== by 0x86E388C: ha_leveldb::rnd_next(unsigned char*) (ha_leveldb.cc:964)
==1341== by 0x822DE7A: handler::ha_rnd_next(unsigned char*) (handler.cc:2603)
==1341== by 0x853B496: find_all_keys(Sort_param*, SQL_SELECT*, Filesort_info*, st_io_cache*, st_io_cache*, Bounded_queue<unsigned char, unsigned char>, unsigned long long) (filesort.cc:764)
==1341== by 0x853C95D: filesort(THD*, TABLE*, Filesort*, bool, unsigned long long*, unsigned long long*) (filesort.cc:337)
==1341== Address 0x1856d203 is 11 bytes inside a block of size 12 free'd
==1341== at 0x4025DFA: free (vg_replace_malloc.c:323)
==1341== by 0x86C872E: my_free (my_malloc.c:130)
==1341== by 0x86E76AC: LockTable::release_lock(LF_PINS*, Row_lock*) (ldb_locks.cc:279)
==1341== by 0x86E3E1E: ha_leveldb::write_row(unsigned char*) (ha_leveldb.cc:881)
==1341== by 0x8230964: handler::ha_write_row(unsigned char*) (handler.cc:7131)
==1341== by 0x83C3570: write_record(THD*, TABLE*, COPY_INFO*, COPY_INFO*) (sql_insert.cc:1858)
==1341== by 0x83C39AE: select_insert::send_data(List<Item>&) (sql_insert.cc:3630)
==1341== by 0x83AFBAB: end_send(JOIN*, st_join_table*, bool) (sql_executor.cc:2719)
==1341== by 0x83B609F: evaluate_join_record(JOIN*, st_join_table*) (sql_executor.cc:1591)
==1341== by 0x83B64FE: QEP_tmp_table::end_send() (sql_executor.cc:4307)
==1341== by 0x83B6D52: sub_select_op(JOIN*, st_join_table*, bool) (sql_executor.cc:1045)
==1341== by 0x83B685B: sub_select(JOIN*, st_join_table*, bool) (sql_executor.cc:1199)

Comment by Sergei Petrunia [ 2013-01-21 ]

A similar warning observed by Elena:

==27344== Thread 23:
==27344== Invalid read of size 1
==27344== at 0xDCE115: my_utf8_uni (ctype-utf8.c:5282)
==27344== by 0xDCEBC8: my_strnncoll_utf8 (ctype-utf8.c:5674)
==27344== by 0xA234EE: hashcmp (hash.c:360)
==27344== by 0xA232D1: my_hash_first_from_hash_value (hash.c:275)
==27344== by 0xA2321D: my_hash_first (hash.c:248)
==27344== by 0xA2313C: my_hash_search (hash.c:216)
==27344== by 0xA62E80: Table_ddl_manager::find(unsigned char*, unsigned int) (ldb_datadic.cc:566)
==27344== by 0xA60B63: Primary_key_comparator::Compare(leveldb::Slice const&, leveldb::Slice const&) const (ha_leveldb.cc:154)
==27344== by 0xA5D108: Primary_key_comparator::compare_keys(unsigned char const*, unsigned long, unsigned char const*, unsigned long) (ha_leveldb.cc:182)
==27344== by 0xDA33C1: lfind (lf_hash.c:116)
==27344== by 0xDA37B5: lsearch (lf_hash.c:274)
==27344== by 0xDA3F0B: lf_hash_search (lf_hash.c:479)
==27344== by 0xA63650: LockTable::get_lock(LF_PINS*, unsigned char const*, unsigned long, int) (ldb_locks.cc:125)
==27344== by 0xA5EF15: ha_leveldb::rnd_next(unsigned char*) (ha_leveldb.cc:964)
==27344== by 0x618275: handler::ha_rnd_next(unsigned char*) (handler.cc:2603)
==27344== by 0x8DA193: find_all_keys(Sort_param*, SQL_SELECT*, Filesort_info*, st_io_cache*, st_io_cache*, Bounded_queue<unsigned char, unsigned char>, unsigned long long) (filesort.cc:764)
==27344== Address 0x18795070 is 0 bytes inside a block of size 11 free'd
==27344== at 0x4C282E0: free (vg_replace_malloc.c:366)
==27344== by 0xA3CB8A: my_free (my_malloc.c:130)
==27344== by 0xA63B53: LockTable::release_lock(LF_PINS*, Row_lock*) (ldb_locks.cc:279)
==27344== by 0xA5EFB0: ha_leveldb::rnd_next(unsigned char*) (ha_leveldb.cc:973)
==27344== by 0x618275: handler::ha_rnd_next(unsigned char*) (handler.cc:2603)
==27344== by 0x8DA193: find_all_keys(Sort_param*, SQL_SELECT*, Filesort_info*, st_io_cache*, st_io_cache*, Bounded_queue<unsigned char, unsigned char>, unsigned long long) (filesort.cc:764)
==27344== by 0x8D8C05: filesort(THD*, TABLE*, Filesort*, bool, unsigned long long*, unsigned long long*) (filesort.cc:337)
==27344== by 0x9493BF: mysql_delete(THD*, TABLE_LIST*, Item*, SQL_I_List<st_order>*, unsigned long long, unsigned long long) (sql_delete.cc:302)
==27344== by 0x79F7CF: mysql_execute_command(THD*) (sql_parse.cc:3469)
==27344== by 0x7A6859: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6097)
==27344== by 0x79A696: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1312)

Comment by Sergei Petrunia [ 2013-01-21 ]

An interesting observation is that the problem is only repeatable on 32-bit systems. Could it be that LF_HASH has some bug for 32-bit systems, which one can see only when items deleted from the hash are immediately freed?

Comment by Elena Stepanova [ 2013-01-21 ]

The warning that you quoted above (from me) was from a 64-bit system, so it's not strictly related to 32-bit. The initial failure was, indeed, only reproducible for me on 32-bit, as mentioned in the description. But after bugfix for MDEV-4064, revno 4500, I started getting both the quoted valgrind warning and the hang on a 64-bit machine too.

Comment by Elena Stepanova [ 2013-02-06 ]

I am not getting these failures anymore on the following revision:

revision-id: psergey@askmonty.org-20130201180328-ocmbh9uvcoedmihp
revno: 4591
branch-nick: mysql-5.6-leveldb

(neither the hang nor the valgrind error), neither on a 32-bit nor on 64-bit machine, so I presume it has been fixed with the last changes.

Generated at Thu Feb 08 06:53:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.