LevelDB Storage Engine MS2 (MDEV-4201)

[MDEV-4377] LevelDB: Valgrind warnings 'Invalid read' on delete in transactional context Created: 2013-04-07  Updated: 2013-05-24  Resolved: 2013-05-24

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Technical task Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Won't Fix Votes: 0
Labels: leveldb

Attachments: File mdev4377-xpl-diff.diff     File runall.pl-valgrind-fix.diff    
Issue Links:
Relates

 Description   

==7397== Invalid read of size 1
==7397==    at 0x4C2DCAB: bcmp (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7397==    by 0xD4E512: Primary_key_comparator::Compare(leveldb::Slice const&, leveldb::Slice const&) const (ha_leveldb.cc:164)
==7397==    by 0xD480D4: compare_mem_comparable_keys(unsigned char const*, unsigned long, unsigned char const*, unsigned long) (ha_leveldb.cc:193)
==7397==    by 0xDB565D: lfind (lf_hash.c:116)
==7397==    by 0xDB5A51: lsearch (lf_hash.c:274)
==7397==    by 0xDB61A7: lf_hash_search (lf_hash.c:479)
==7397==    by 0xD51603: LockTable::get_lock(LF_PINS*, unsigned char const*, unsigned long, int) (ldb_locks.cc:132)
==7397==    by 0xD4C3C5: ha_leveldb::rnd_next_with_direction(unsigned char*, bool) (ha_leveldb.cc:1869)
==7397==    by 0xD4C1B5: ha_leveldb::rnd_next(unsigned char*) (ha_leveldb.cc:1810)
==7397==    by 0x61B141: handler::ha_rnd_next(unsigned char*) (handler.cc:2603)
==7397==    by 0x93D746: rr_sequential(READ_RECORD*) (records.cc:478)
==7397==    by 0x9518CE: mysql_delete(THD*, TABLE_LIST*, Item*, SQL_I_List<st_order>*, unsigned long long, unsigned long long) (sql_delete.cc:353)
==7397==    by 0x7A3A2E: mysql_execute_command(THD*) (sql_parse.cc:3506)
==7397==    by 0x7AAAFA: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6149)
==7397==    by 0x79E821: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1334)
==7397==    by 0x79D971: do_command(THD*) (sql_parse.cc:1036)
==7397==  Address 0x1e11f590 is 0 bytes inside a block of size 8 free'd
==7397==    at 0x4C2A82E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7397==    by 0xA46A2E: my_free (my_malloc.c:130)
==7397==    by 0xD51BA2: LockTable::release_lock(LF_PINS*, Row_lock*) (ldb_locks.cc:308)
==7397==    by 0xD4E63B: Ldb_transaction::release_last_lock() (ha_leveldb.cc:235)
==7397==    by 0xD4BA46: ha_leveldb::unlock_row() (ha_leveldb.cc:1640)
==7397==    by 0x9518B8: mysql_delete(THD*, TABLE_LIST*, Item*, SQL_I_List<st_order>*, unsigned long long, unsigned long long) (sql_delete.cc:405)
==7397==    by 0x7A3A2E: mysql_execute_command(THD*) (sql_parse.cc:3506)
==7397==    by 0x7AAAFA: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6149)
==7397==    by 0x79E821: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1334)
==7397==    by 0x79D971: do_command(THD*) (sql_parse.cc:1036)
==7397==    by 0x768AF4: do_handle_one_connection(THD*) (sql_connect.cc:969)
==7397==    by 0x768519: handle_one_connection (sql_connect.cc:885)
==7397==    by 0xD8A167: pfs_spawn_thread (pfs.cc:1853)
==7397==    by 0x58D0E99: start_thread (pthread_create.c:308)
==7397==    by 0x63EBCBC: clone (clone.S:112)

revision-id: psergey@askmonty.org-20130406172740-fid7u9tkz5ahlc4c
revno: 4823
branch-nick: mysql-5.6-leveldb

RQG grammar (test.yy):

query_init:
	SET GLOBAL TRANSACTION ISOLATION LEVEL REPEATABLE READ ; SET AUTOCOMMIT=OFF ;
 
query:
	transaction | delete | insert ;
 
transaction:
	START TRANSACTION | COMMIT | ROLLBACK ;
 
delete:
	DELETE FROM _table WHERE _field = _digit LIMIT _digit ;
 
insert:
	INSERT INTO _table ( `pk`, `col_int` ) VALUES ( NULL, _digit );

RQG data template (test.zz):

$tables = {
        names => [ 't1' ],
        rows => [ '1000' ],
        engines => [ 'LevelDB' ],
        pk => [ 'int auto_increment' ]
};

RQG command line:

perl ./runall.pl  --threads=4 --duration=300 --queries=100M  --valgrind --reporters=ValgrindErrors --grammar=test.yy --basedir=/data/bzr/mysql-5.6-leveldb --gendata=test.zz



 Comments   
Comment by Elena Stepanova [ 2013-04-07 ]

For now I only have a concurrent test, failed to convert it into a reasonably reliable MTR test case despite all effort. Please try it out, if it doesn't work for you as is, I'll try to think of something else.
INSERT in the test flow isn't really important, the problem happens even if there is only DELETE. INSERT is there just to maintain non-empty table a little longer.

Comment by Sergei Petrunia [ 2013-05-17 ]

I was unable to repeat with the latest tree.

(Here's RQG output:
https://gist.github.com/spetrunia/5599365. I interpret it as the bug didn't repeat. Is it correct?)

tip cset was:
------------------------------------------------------------
revno: 4872
revision-id: psergey@askmonty.org-20130516143440-b62g2jcint27n7ht
parent: psergey@askmonty.org-20130516112650-8fcygul5v8mxifu6
committer: Sergey Petrunya <psergey@askmonty.org>
branch nick: mysql-5.6-leveldb
timestamp: Thu 2013-05-16 18:34:40 +0400
message:
MDEV-4201: LevelDB Storage Engine MS2

  • Add @@leveldb_max_row_locks variable, limit # of locks a transaction may hold
Comment by Sergei Petrunia [ 2013-05-20 ]

I could reproduce on perro.

I don't know what causes the problem yet. I have one idea: it seems, valgrind warning only happens when we've got an error of this kind:

Query: DELETE FROM `t1` WHERE `col_datetime` = 6 LIMIT 4 failed: 1205 Lock wait timeout exceeded; try restarting transaction. Further errors of this kind will be suppressed.

Could it be that lock-wait-timeout-error and unlock_row() work incorrectly together?

Comment by Sergei Petrunia [ 2013-05-20 ]

It's not one error, note the "Further errors of this kind will be suppressed." part of the message.

The valgrind warning in this issue shows that the accessed data has been freed by ha_leveldb::unlock_row(). The problem seems to be unrelated to ha_leveldb::unlock_row(). If I comment out all of the code in that function, I get:

  1. 2013-05-20T16:23:29 ==27196== Thread 24:
  2. 2013-05-20T16:23:29 ==27196== Invalid read of size 1
  3. 2013-05-20T16:23:29 ==27196== at 0x4C2DCC0: bcmp (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
  4. 2013-05-20T16:23:29 ==27196== by 0xE0F33A: Primary_key_comparator::Compare(leveldb::Slice const&, leveldb::Slice const&) const (ha_leveldb.cc:162)
  5. 2013-05-20T16:23:29 ==27196== by 0xE08AD4: compare_mem_comparable_keys(unsigned char const*, unsigned long, unsigned char const*, unsigned long) (ha_leveldb.cc:191)
  6. 2013-05-20T16:23:29 ==27196== by 0xE854D3: lfind (lf_hash.c:116)
  7. 2013-05-20T16:23:29 ==27196== by 0xE858C7: lsearch (lf_hash.c:274)
  8. 2013-05-20T16:23:29 ==27196== by 0xE8601D: lf_hash_search (lf_hash.c:479)
  9. 2013-05-20T16:23:29 ==27196== by 0xE12A93: LockTable::get_lock(LF_PINS*, unsigned char const*, unsigned long, int) (ldb_locks.cc:132)
  10. 2013-05-20T16:23:29 ==27196== by 0xE0F49E: Ldb_transaction::get_lock(unsigned char const*, unsigned long, bool*) (ha_leveldb.cc:243)
  11. 2013-05-20T16:23:29 ==27196== by 0xE0D03D: ha_leveldb::rnd_next_with_direction(unsigned char*, bool) (ha_leveldb.cc:1908)
  12. 2013-05-20T16:23:29 ==27196== by 0xE0CE41: ha_leveldb::rnd_next(unsigned char*) (ha_leveldb.cc:1848)
  13. 2013-05-20T16:23:29 ==27196== by 0x63C125: handler::ha_rnd_next(unsigned char*) (handler.cc:2621)
  14. 2013-05-20T16:23:29 ==27196== by 0x98CB08: rr_sequential(READ_RECORD*) (records.cc:478)
  15. 2013-05-20T16:23:29 ==27196== by 0x9A29AA: mysql_delete(THD*, TABLE_LIST*, Item*, SQL_I_List<st_order>*, unsigned long long, unsigned long long) (sql_delete.cc:354)
  16. 2013-05-20T16:23:29 ==27196== by 0x7DB57A: mysql_execute_command(THD*) (sql_parse.cc:3508)
  17. 2013-05-20T16:23:29 ==27196== by 0x7E287E: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6161)
  18. 2013-05-20T16:23:29 ==27196== by 0x7D6101: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1336)
  19. 2013-05-20T16:23:29 ==27196== Address 0x1e07bd46 is 6 bytes inside a block of size 8 free'd
  20. 2013-05-20T16:23:29 ==27196== at 0x4C2A82E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
  21. 2013-05-20T16:23:29 ==27196== by 0xAAE672: my_free (my_malloc.c:137)
  22. 2013-05-20T16:23:29 ==27196== by 0xE130BB: LockTable::release_lock(LF_PINS*, Row_lock*) (ldb_locks.cc:308)
  23. 2013-05-20T16:23:29 ==27196== by 0xE0F551: Ldb_transaction::release_locks() (ha_leveldb.cc:266)
  24. 2013-05-20T16:23:29 ==27196== by 0xE0F8B3: Ldb_transaction::rollback() (ha_leveldb.cc:337)
  25. 2013-05-20T16:23:29 ==27196== by 0xE08D37: leveldb_rollback(handlerton*, THD*, bool) (ha_leveldb.cc:426)
  26. 2013-05-20T16:23:29 ==27196== by 0x63935E: ha_rollback_low(THD*, bool) (handler.cc:1529)
  27. 2013-05-20T16:23:29 ==27196== by 0x7137AC: TC_LOG_DUMMY::rollback(THD*, bool) (log.h:118)
  28. 2013-05-20T16:23:29 ==27196== by 0x639599: ha_rollback_trans(THD*, bool) (handler.cc:1595)
  29. 2013-05-20T16:23:29 ==27196== by 0x8AA3C0: trans_rollback(THD*) (transaction.cc:317)
  30. 2013-05-20T16:23:29 ==27196== by 0x7DDBEA: mysql_execute_command(THD*) (sql_parse.cc:4233)
  31. 2013-05-20T16:23:29 ==27196== by 0x7E287E: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6161)
  32. 2013-05-20T16:23:29 ==27196== by 0x7D6101: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1336)
  33. 2013-05-20T16:23:29 ==27196== by 0x7D5209: do_command(THD*) (sql_parse.cc:1038)
  34. 2013-05-20T16:23:29 ==27196== by 0x79B884: do_handle_one_connection(THD*) (sql_connect.cc:977)
  35. 2013-05-20T16:23:29 ==27196== by 0x79B288: handle_one_connection (sql_connect.cc:893)
  36. 2013-05-20T16:23:29 ==27196==
  37. 2013-05-20T16:23:29 ==27196== Thread 27:
  38. 2013-05-20T16:23:29 ==27196== Invalid read of size 1
  39. 2013-05-20T16:23:29 ==27196== at 0x4C2DCAB: bcmp (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
  40. 2013-05-20T16:23:29 ==27196== by 0xE0F33A: Primary_key_comparator::Compare(leveldb::Slice const&, leveldb::Slice const&) const (ha_leveldb.cc:162)
  41. 2013-05-20T16:23:29 ==27196== by 0xE08AD4: compare_mem_comparable_keys(unsigned char const*, unsigned long, unsigned char const*, unsigned long) (ha_leveldb.cc:191)
  42. 2013-05-20T16:23:29 ==27196== by 0xE854D3: lfind (lf_hash.c:116)
  43. 2013-05-20T16:23:29 ==27196== by 0xE858C7: lsearch (lf_hash.c:274)
  44. 2013-05-20T16:23:29 ==27196== by 0xE8601D: lf_hash_search (lf_hash.c:479)
  45. 2013-05-20T16:23:29 ==27196== by 0xE12A93: LockTable::get_lock(LF_PINS*, unsigned char const*, unsigned long, int) (ldb_locks.cc:132)
  46. 2013-05-20T16:23:29 ==27196== by 0xE0F49E: Ldb_transaction::get_lock(unsigned char const*, unsigned long, bool*) (ha_leveldb.cc:243)
  47. 2013-05-20T16:23:29 ==27196== by 0xE0D03D: ha_leveldb::rnd_next_with_direction(unsigned char*, bool) (ha_leveldb.cc:1908)
  48. 2013-05-20T16:23:29 ==27196== by 0xE0CE41: ha_leveldb::rnd_next(unsigned char*) (ha_leveldb.cc:1848)
  49. 2013-05-20T16:23:29 ==27196== by 0x63C125: handler::ha_rnd_next(unsigned char*) (handler.cc:2621)
  50. 2013-05-20T16:23:29 ==27196== by 0x98CB08: rr_sequential(READ_RECORD*) (records.cc:478)
  51. 2013-05-20T16:23:29 ==27196== by 0x9A29AA: mysql_delete(THD*, TABLE_LIST*, Item*, SQL_I_List<st_order>*, unsigned long long, unsigned long long) (sql_delete.cc:354)
  52. 2013-05-20T16:23:29 ==27196== by 0x7DB57A: mysql_execute_command(THD*) (sql_parse.cc:3508)
  53. 2013-05-20T16:23:29 ==27196== by 0x7E287E: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6161)
  54. 2013-05-20T16:23:29 ==27196== by 0x7D6101: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1336)
  55. 2013-05-20T16:23:29 ==27196== Address 0x1edbc650 is 0 bytes inside a block of size 8 free'd
  56. 2013-05-20T16:23:29 ==27196== at 0x4C2A82E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
  57. 2013-05-20T16:23:29 ==27196== by 0xAAE672: my_free (my_malloc.c:137)
  58. 2013-05-20T16:23:29 ==27196== by 0xE130BB: LockTable::release_lock(LF_PINS*, Row_lock*) (ldb_locks.cc:308)
  59. 2013-05-20T16:23:29 ==27196== by 0xE0F551: Ldb_transaction::release_locks() (ha_leveldb.cc:266)
  60. 2013-05-20T16:23:29 ==27196== by 0xE0F8B3: Ldb_transaction::rollback() (ha_leveldb.cc:337)
  61. 2013-05-20T16:23:29 ==27196== by 0xE0F59F: Ldb_transaction::commit() (ha_leveldb.cc:275)
  62. 2013-05-20T16:23:29 ==27196== by 0xE08CEA: leveldb_commit(handlerton*, THD*, bool) (ha_leveldb.cc:405)
  63. 2013-05-20T16:23:29 ==27196== by 0x639179: ha_commit_low(THD*, bool) (handler.cc:1475)
  64. 2013-05-20T16:23:29 ==27196== by 0x713772: TC_LOG_DUMMY::commit(THD*, bool) (log.h:115)
  65. 2013-05-20T16:23:29 ==27196== by 0x638FC8: ha_commit_trans(THD*, bool) (handler.cc:1421)
  66. 2013-05-20T16:23:29 ==27196== by 0x8A9ED1: trans_commit(THD*) (transaction.cc:219)
  67. 2013-05-20T16:23:29 ==27196== by 0x7DDA48: mysql_execute_command(THD*) (sql_parse.cc:4202)
  68. 2013-05-20T16:23:29 ==27196== by 0x7E287E: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6161)
  69. 2013-05-20T16:23:29 ==27196== by 0x7D6101: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1336)
  70. 2013-05-20T16:23:29 ==27196== by 0x7D5209: do_command(THD*) (sql_parse.cc:1038)
  71. 2013-05-20T16:23:29 ==27196== by 0x79B884: do_handle_one_connection(THD*) (sql_connect.cc:977)
  72. 2013-05-20T16:23:29 ==27196==
Comment by Sergei Petrunia [ 2013-05-21 ]

Patch from Elena for RQG to start server with valgrind opts

Comment by Sergei Petrunia [ 2013-05-21 ]

Attempted to debug this under valgrind (on perro.askmonty.org), without success.

If I start valgrind with vgdb support, then the test reaches its end without hitting an error:

GDB p
(gdb) monitor v.info n_errs_found
n_errs_found 0 (vgdb-error 1)

Comment by Sergei Petrunia [ 2013-05-21 ]

The problem is not easily repeatable under valgrind with "--duration=300".
The problem repeats in multiple attempts with "--duration=600" option.

It runs this many queries:
grep Query: /home/mariadb/bzr/mysql-5.6-leveldb-mdev4377/mysql-test/var/master-data//../log/master.err | wc -l
13782

Comment by Sergei Petrunia [ 2013-05-21 ]

I tried to repeat the problem under valgrind's vgdb (with valgrind 3.7.0). I used --duration=1200 for RQG (because vgdb makes execution slower), and the test ran without error.

mariadb@perro:~$ grep Query: /home/mariadb/bzr/mysql-5.6-leveldb-mdev4377/mysql-test/var/master-data//../log/master.err | wc -l
16813

That is, it ran even more queries than without vgdb.

However, at the end of the test, I get:

(gdb) monitor v.info n_errs_found
n_errs_found 0 (vgdb-error 1)

no errors.

Comment by Sergei Petrunia [ 2013-05-22 ]

I think I have figured out what's going on. The valgrind warning is about hitting freed data with a stack like this:

  1. 2013-05-20T16:23:29 ==27196== by 0xE854D3: lfind (lf_hash.c:116)
  2. 2013-05-20T16:23:29 ==27196== by 0xE858C7: lsearch (lf_hash.c:274)
  3. 2013-05-20T16:23:29 ==27196== by 0xE8601D: lf_hash_search (lf_hash.c:479)

The bug is caused by the following sequence of events:

Thread1>
calls get_lock(), execution enters lf_hash_search, lsearch, lfind.
lfind() enters the "if (!DELETED(link))

{ ... }

" statement, and reaches key_comp() call inside it.

Thread2>
calls release_lock(), which calls lf_hash_delete(). lf_hash_delete removes an element from the
LF_HASH (this is the same element that lfind is looking at).
The element in LF_HASH has a pointer to a malloced rowkey. The rowkey is freed also. (***)

Thread1>
key_comp starts to do actual comparison of records. However, the rowkey was freed at step (***).
(Comparison still works, because free() doesn't remove the data).

Comment by Sergei Petrunia [ 2013-05-22 ]

A patch to aid debugging. It does
1. [obsolete] remove ha_leveldb::unlock_row() implementation. This is orthogonal to the bug.
2. Damage rowkeys before free()'ing them. Put 0xFEAD magic number as first two bytes.

Comment by Sergei Petrunia [ 2013-05-22 ]

When valgrind stops on error inside lfind(), we get

(gdb) p link
$6 = 505940640
(gdb) p cursor->curr->link
$7 = 505940641

which means that DELETED(link) went from being FALSE to TRUE while we were inside the key_comp() call.

Comment by Sergei Petrunia [ 2013-05-22 ]

With the above patch, the stderr gets this message:

a parameter was freed (2)!

Note the "(2)". the message without (2) was not printed. This matches the guess about error scenario posted above.

Generated at Thu Feb 08 06:55:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.