|
It's not one error, note the "Further errors of this kind will be suppressed." part of the message.
The valgrind warning in this issue shows that the accessed data has been freed by ha_leveldb::unlock_row(). The problem seems to be unrelated to ha_leveldb::unlock_row(). If I comment out all of the code in that function, I get:
- 2013-05-20T16:23:29 ==27196== Thread 24:
- 2013-05-20T16:23:29 ==27196== Invalid read of size 1
- 2013-05-20T16:23:29 ==27196== at 0x4C2DCC0: bcmp (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
- 2013-05-20T16:23:29 ==27196== by 0xE0F33A: Primary_key_comparator::Compare(leveldb::Slice const&, leveldb::Slice const&) const (ha_leveldb.cc:162)
- 2013-05-20T16:23:29 ==27196== by 0xE08AD4: compare_mem_comparable_keys(unsigned char const*, unsigned long, unsigned char const*, unsigned long) (ha_leveldb.cc:191)
- 2013-05-20T16:23:29 ==27196== by 0xE854D3: lfind (lf_hash.c:116)
- 2013-05-20T16:23:29 ==27196== by 0xE858C7: lsearch (lf_hash.c:274)
- 2013-05-20T16:23:29 ==27196== by 0xE8601D: lf_hash_search (lf_hash.c:479)
- 2013-05-20T16:23:29 ==27196== by 0xE12A93: LockTable::get_lock(LF_PINS*, unsigned char const*, unsigned long, int) (ldb_locks.cc:132)
- 2013-05-20T16:23:29 ==27196== by 0xE0F49E: Ldb_transaction::get_lock(unsigned char const*, unsigned long, bool*) (ha_leveldb.cc:243)
- 2013-05-20T16:23:29 ==27196== by 0xE0D03D: ha_leveldb::rnd_next_with_direction(unsigned char*, bool) (ha_leveldb.cc:1908)
- 2013-05-20T16:23:29 ==27196== by 0xE0CE41: ha_leveldb::rnd_next(unsigned char*) (ha_leveldb.cc:1848)
- 2013-05-20T16:23:29 ==27196== by 0x63C125: handler::ha_rnd_next(unsigned char*) (handler.cc:2621)
- 2013-05-20T16:23:29 ==27196== by 0x98CB08: rr_sequential(READ_RECORD*) (records.cc:478)
- 2013-05-20T16:23:29 ==27196== by 0x9A29AA: mysql_delete(THD*, TABLE_LIST*, Item*, SQL_I_List<st_order>*, unsigned long long, unsigned long long) (sql_delete.cc:354)
- 2013-05-20T16:23:29 ==27196== by 0x7DB57A: mysql_execute_command(THD*) (sql_parse.cc:3508)
- 2013-05-20T16:23:29 ==27196== by 0x7E287E: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6161)
- 2013-05-20T16:23:29 ==27196== by 0x7D6101: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1336)
- 2013-05-20T16:23:29 ==27196== Address 0x1e07bd46 is 6 bytes inside a block of size 8 free'd
- 2013-05-20T16:23:29 ==27196== at 0x4C2A82E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
- 2013-05-20T16:23:29 ==27196== by 0xAAE672: my_free (my_malloc.c:137)
- 2013-05-20T16:23:29 ==27196== by 0xE130BB: LockTable::release_lock(LF_PINS*, Row_lock*) (ldb_locks.cc:308)
- 2013-05-20T16:23:29 ==27196== by 0xE0F551: Ldb_transaction::release_locks() (ha_leveldb.cc:266)
- 2013-05-20T16:23:29 ==27196== by 0xE0F8B3: Ldb_transaction::rollback() (ha_leveldb.cc:337)
- 2013-05-20T16:23:29 ==27196== by 0xE08D37: leveldb_rollback(handlerton*, THD*, bool) (ha_leveldb.cc:426)
- 2013-05-20T16:23:29 ==27196== by 0x63935E: ha_rollback_low(THD*, bool) (handler.cc:1529)
- 2013-05-20T16:23:29 ==27196== by 0x7137AC: TC_LOG_DUMMY::rollback(THD*, bool) (log.h:118)
- 2013-05-20T16:23:29 ==27196== by 0x639599: ha_rollback_trans(THD*, bool) (handler.cc:1595)
- 2013-05-20T16:23:29 ==27196== by 0x8AA3C0: trans_rollback(THD*) (transaction.cc:317)
- 2013-05-20T16:23:29 ==27196== by 0x7DDBEA: mysql_execute_command(THD*) (sql_parse.cc:4233)
- 2013-05-20T16:23:29 ==27196== by 0x7E287E: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6161)
- 2013-05-20T16:23:29 ==27196== by 0x7D6101: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1336)
- 2013-05-20T16:23:29 ==27196== by 0x7D5209: do_command(THD*) (sql_parse.cc:1038)
- 2013-05-20T16:23:29 ==27196== by 0x79B884: do_handle_one_connection(THD*) (sql_connect.cc:977)
- 2013-05-20T16:23:29 ==27196== by 0x79B288: handle_one_connection (sql_connect.cc:893)
- 2013-05-20T16:23:29 ==27196==
- 2013-05-20T16:23:29 ==27196== Thread 27:
- 2013-05-20T16:23:29 ==27196== Invalid read of size 1
- 2013-05-20T16:23:29 ==27196== at 0x4C2DCAB: bcmp (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
- 2013-05-20T16:23:29 ==27196== by 0xE0F33A: Primary_key_comparator::Compare(leveldb::Slice const&, leveldb::Slice const&) const (ha_leveldb.cc:162)
- 2013-05-20T16:23:29 ==27196== by 0xE08AD4: compare_mem_comparable_keys(unsigned char const*, unsigned long, unsigned char const*, unsigned long) (ha_leveldb.cc:191)
- 2013-05-20T16:23:29 ==27196== by 0xE854D3: lfind (lf_hash.c:116)
- 2013-05-20T16:23:29 ==27196== by 0xE858C7: lsearch (lf_hash.c:274)
- 2013-05-20T16:23:29 ==27196== by 0xE8601D: lf_hash_search (lf_hash.c:479)
- 2013-05-20T16:23:29 ==27196== by 0xE12A93: LockTable::get_lock(LF_PINS*, unsigned char const*, unsigned long, int) (ldb_locks.cc:132)
- 2013-05-20T16:23:29 ==27196== by 0xE0F49E: Ldb_transaction::get_lock(unsigned char const*, unsigned long, bool*) (ha_leveldb.cc:243)
- 2013-05-20T16:23:29 ==27196== by 0xE0D03D: ha_leveldb::rnd_next_with_direction(unsigned char*, bool) (ha_leveldb.cc:1908)
- 2013-05-20T16:23:29 ==27196== by 0xE0CE41: ha_leveldb::rnd_next(unsigned char*) (ha_leveldb.cc:1848)
- 2013-05-20T16:23:29 ==27196== by 0x63C125: handler::ha_rnd_next(unsigned char*) (handler.cc:2621)
- 2013-05-20T16:23:29 ==27196== by 0x98CB08: rr_sequential(READ_RECORD*) (records.cc:478)
- 2013-05-20T16:23:29 ==27196== by 0x9A29AA: mysql_delete(THD*, TABLE_LIST*, Item*, SQL_I_List<st_order>*, unsigned long long, unsigned long long) (sql_delete.cc:354)
- 2013-05-20T16:23:29 ==27196== by 0x7DB57A: mysql_execute_command(THD*) (sql_parse.cc:3508)
- 2013-05-20T16:23:29 ==27196== by 0x7E287E: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6161)
- 2013-05-20T16:23:29 ==27196== by 0x7D6101: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1336)
- 2013-05-20T16:23:29 ==27196== Address 0x1edbc650 is 0 bytes inside a block of size 8 free'd
- 2013-05-20T16:23:29 ==27196== at 0x4C2A82E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
- 2013-05-20T16:23:29 ==27196== by 0xAAE672: my_free (my_malloc.c:137)
- 2013-05-20T16:23:29 ==27196== by 0xE130BB: LockTable::release_lock(LF_PINS*, Row_lock*) (ldb_locks.cc:308)
- 2013-05-20T16:23:29 ==27196== by 0xE0F551: Ldb_transaction::release_locks() (ha_leveldb.cc:266)
- 2013-05-20T16:23:29 ==27196== by 0xE0F8B3: Ldb_transaction::rollback() (ha_leveldb.cc:337)
- 2013-05-20T16:23:29 ==27196== by 0xE0F59F: Ldb_transaction::commit() (ha_leveldb.cc:275)
- 2013-05-20T16:23:29 ==27196== by 0xE08CEA: leveldb_commit(handlerton*, THD*, bool) (ha_leveldb.cc:405)
- 2013-05-20T16:23:29 ==27196== by 0x639179: ha_commit_low(THD*, bool) (handler.cc:1475)
- 2013-05-20T16:23:29 ==27196== by 0x713772: TC_LOG_DUMMY::commit(THD*, bool) (log.h:115)
- 2013-05-20T16:23:29 ==27196== by 0x638FC8: ha_commit_trans(THD*, bool) (handler.cc:1421)
- 2013-05-20T16:23:29 ==27196== by 0x8A9ED1: trans_commit(THD*) (transaction.cc:219)
- 2013-05-20T16:23:29 ==27196== by 0x7DDA48: mysql_execute_command(THD*) (sql_parse.cc:4202)
- 2013-05-20T16:23:29 ==27196== by 0x7E287E: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6161)
- 2013-05-20T16:23:29 ==27196== by 0x7D6101: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1336)
- 2013-05-20T16:23:29 ==27196== by 0x7D5209: do_command(THD*) (sql_parse.cc:1038)
- 2013-05-20T16:23:29 ==27196== by 0x79B884: do_handle_one_connection(THD*) (sql_connect.cc:977)
- 2013-05-20T16:23:29 ==27196==
|
|
I tried to repeat the problem under valgrind's vgdb (with valgrind 3.7.0). I used --duration=1200 for RQG (because vgdb makes execution slower), and the test ran without error.
mariadb@perro:~$ grep Query: /home/mariadb/bzr/mysql-5.6-leveldb-mdev4377/mysql-test/var/master-data//../log/master.err | wc -l
16813
That is, it ran even more queries than without vgdb.
However, at the end of the test, I get:
(gdb) monitor v.info n_errs_found
n_errs_found 0 (vgdb-error 1)
no errors.
|
|
I think I have figured out what's going on. The valgrind warning is about hitting freed data with a stack like this:
- 2013-05-20T16:23:29 ==27196== by 0xE854D3: lfind (lf_hash.c:116)
- 2013-05-20T16:23:29 ==27196== by 0xE858C7: lsearch (lf_hash.c:274)
- 2013-05-20T16:23:29 ==27196== by 0xE8601D: lf_hash_search (lf_hash.c:479)
The bug is caused by the following sequence of events:
Thread1>
calls get_lock(), execution enters lf_hash_search, lsearch, lfind.
lfind() enters the "if (!DELETED(link))
{ ... }
" statement, and reaches key_comp() call inside it.
Thread2>
calls release_lock(), which calls lf_hash_delete(). lf_hash_delete removes an element from the
LF_HASH (this is the same element that lfind is looking at).
The element in LF_HASH has a pointer to a malloced rowkey. The rowkey is freed also. (***)
Thread1>
key_comp starts to do actual comparison of records. However, the rowkey was freed at step (***).
(Comparison still works, because free() doesn't remove the data).
|