Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4201 LevelDB Storage Engine MS2
  3. MDEV-4377

LevelDB: Valgrind warnings 'Invalid read' on delete in transactional context

Details

    • Technical task
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • None

    Description

      ==7397== Invalid read of size 1
      ==7397==    at 0x4C2DCAB: bcmp (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
      ==7397==    by 0xD4E512: Primary_key_comparator::Compare(leveldb::Slice const&, leveldb::Slice const&) const (ha_leveldb.cc:164)
      ==7397==    by 0xD480D4: compare_mem_comparable_keys(unsigned char const*, unsigned long, unsigned char const*, unsigned long) (ha_leveldb.cc:193)
      ==7397==    by 0xDB565D: lfind (lf_hash.c:116)
      ==7397==    by 0xDB5A51: lsearch (lf_hash.c:274)
      ==7397==    by 0xDB61A7: lf_hash_search (lf_hash.c:479)
      ==7397==    by 0xD51603: LockTable::get_lock(LF_PINS*, unsigned char const*, unsigned long, int) (ldb_locks.cc:132)
      ==7397==    by 0xD4C3C5: ha_leveldb::rnd_next_with_direction(unsigned char*, bool) (ha_leveldb.cc:1869)
      ==7397==    by 0xD4C1B5: ha_leveldb::rnd_next(unsigned char*) (ha_leveldb.cc:1810)
      ==7397==    by 0x61B141: handler::ha_rnd_next(unsigned char*) (handler.cc:2603)
      ==7397==    by 0x93D746: rr_sequential(READ_RECORD*) (records.cc:478)
      ==7397==    by 0x9518CE: mysql_delete(THD*, TABLE_LIST*, Item*, SQL_I_List<st_order>*, unsigned long long, unsigned long long) (sql_delete.cc:353)
      ==7397==    by 0x7A3A2E: mysql_execute_command(THD*) (sql_parse.cc:3506)
      ==7397==    by 0x7AAAFA: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6149)
      ==7397==    by 0x79E821: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1334)
      ==7397==    by 0x79D971: do_command(THD*) (sql_parse.cc:1036)
      ==7397==  Address 0x1e11f590 is 0 bytes inside a block of size 8 free'd
      ==7397==    at 0x4C2A82E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
      ==7397==    by 0xA46A2E: my_free (my_malloc.c:130)
      ==7397==    by 0xD51BA2: LockTable::release_lock(LF_PINS*, Row_lock*) (ldb_locks.cc:308)
      ==7397==    by 0xD4E63B: Ldb_transaction::release_last_lock() (ha_leveldb.cc:235)
      ==7397==    by 0xD4BA46: ha_leveldb::unlock_row() (ha_leveldb.cc:1640)
      ==7397==    by 0x9518B8: mysql_delete(THD*, TABLE_LIST*, Item*, SQL_I_List<st_order>*, unsigned long long, unsigned long long) (sql_delete.cc:405)
      ==7397==    by 0x7A3A2E: mysql_execute_command(THD*) (sql_parse.cc:3506)
      ==7397==    by 0x7AAAFA: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6149)
      ==7397==    by 0x79E821: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1334)
      ==7397==    by 0x79D971: do_command(THD*) (sql_parse.cc:1036)
      ==7397==    by 0x768AF4: do_handle_one_connection(THD*) (sql_connect.cc:969)
      ==7397==    by 0x768519: handle_one_connection (sql_connect.cc:885)
      ==7397==    by 0xD8A167: pfs_spawn_thread (pfs.cc:1853)
      ==7397==    by 0x58D0E99: start_thread (pthread_create.c:308)
      ==7397==    by 0x63EBCBC: clone (clone.S:112)

      revision-id: psergey@askmonty.org-20130406172740-fid7u9tkz5ahlc4c
      revno: 4823
      branch-nick: mysql-5.6-leveldb

      RQG grammar (test.yy):

      query_init:
      	SET GLOBAL TRANSACTION ISOLATION LEVEL REPEATABLE READ ; SET AUTOCOMMIT=OFF ;
       
      query:
      	transaction | delete | insert ;
       
      transaction:
      	START TRANSACTION | COMMIT | ROLLBACK ;
       
      delete:
      	DELETE FROM _table WHERE _field = _digit LIMIT _digit ;
       
      insert:
      	INSERT INTO _table ( `pk`, `col_int` ) VALUES ( NULL, _digit );

      RQG data template (test.zz):

      $tables = {
              names => [ 't1' ],
              rows => [ '1000' ],
              engines => [ 'LevelDB' ],
              pk => [ 'int auto_increment' ]
      };

      RQG command line:

      perl ./runall.pl  --threads=4 --duration=300 --queries=100M  --valgrind --reporters=ValgrindErrors --grammar=test.yy --basedir=/data/bzr/mysql-5.6-leveldb --gendata=test.zz

      Attachments

        Activity

          I tried to repeat the problem under valgrind's vgdb (with valgrind 3.7.0). I used --duration=1200 for RQG (because vgdb makes execution slower), and the test ran without error.

          mariadb@perro:~$ grep Query: /home/mariadb/bzr/mysql-5.6-leveldb-mdev4377/mysql-test/var/master-data//../log/master.err | wc -l
          16813

          That is, it ran even more queries than without vgdb.

          However, at the end of the test, I get:

          (gdb) monitor v.info n_errs_found
          n_errs_found 0 (vgdb-error 1)

          no errors.

          psergei Sergei Petrunia added a comment - I tried to repeat the problem under valgrind's vgdb (with valgrind 3.7.0). I used --duration=1200 for RQG (because vgdb makes execution slower), and the test ran without error. mariadb@perro:~$ grep Query: /home/mariadb/bzr/mysql-5.6-leveldb-mdev4377/mysql-test/var/master-data//../log/master.err | wc -l 16813 That is, it ran even more queries than without vgdb. However, at the end of the test, I get: (gdb) monitor v.info n_errs_found n_errs_found 0 (vgdb-error 1) no errors.

          I think I have figured out what's going on. The valgrind warning is about hitting freed data with a stack like this:

          1. 2013-05-20T16:23:29 ==27196== by 0xE854D3: lfind (lf_hash.c:116)
          2. 2013-05-20T16:23:29 ==27196== by 0xE858C7: lsearch (lf_hash.c:274)
          3. 2013-05-20T16:23:29 ==27196== by 0xE8601D: lf_hash_search (lf_hash.c:479)

          The bug is caused by the following sequence of events:

          Thread1>
          calls get_lock(), execution enters lf_hash_search, lsearch, lfind.
          lfind() enters the "if (!DELETED(link))

          { ... }

          " statement, and reaches key_comp() call inside it.

          Thread2>
          calls release_lock(), which calls lf_hash_delete(). lf_hash_delete removes an element from the
          LF_HASH (this is the same element that lfind is looking at).
          The element in LF_HASH has a pointer to a malloced rowkey. The rowkey is freed also. (***)

          Thread1>
          key_comp starts to do actual comparison of records. However, the rowkey was freed at step (***).
          (Comparison still works, because free() doesn't remove the data).

          psergei Sergei Petrunia added a comment - I think I have figured out what's going on. The valgrind warning is about hitting freed data with a stack like this: 2013-05-20T16:23:29 ==27196== by 0xE854D3: lfind (lf_hash.c:116) 2013-05-20T16:23:29 ==27196== by 0xE858C7: lsearch (lf_hash.c:274) 2013-05-20T16:23:29 ==27196== by 0xE8601D: lf_hash_search (lf_hash.c:479) The bug is caused by the following sequence of events: Thread1> calls get_lock(), execution enters lf_hash_search, lsearch, lfind. lfind() enters the "if (!DELETED(link)) { ... } " statement, and reaches key_comp() call inside it. Thread2> calls release_lock(), which calls lf_hash_delete(). lf_hash_delete removes an element from the LF_HASH (this is the same element that lfind is looking at). The element in LF_HASH has a pointer to a malloced rowkey. The rowkey is freed also. (***) Thread1> key_comp starts to do actual comparison of records. However, the rowkey was freed at step (***). (Comparison still works, because free() doesn't remove the data).

          A patch to aid debugging. It does
          1. [obsolete] remove ha_leveldb::unlock_row() implementation. This is orthogonal to the bug.
          2. Damage rowkeys before free()'ing them. Put 0xFEAD magic number as first two bytes.

          psergei Sergei Petrunia added a comment - A patch to aid debugging. It does 1. [obsolete] remove ha_leveldb::unlock_row() implementation. This is orthogonal to the bug. 2. Damage rowkeys before free()'ing them. Put 0xFEAD magic number as first two bytes.

          When valgrind stops on error inside lfind(), we get

          (gdb) p link
          $6 = 505940640
          (gdb) p cursor->curr->link
          $7 = 505940641

          which means that DELETED(link) went from being FALSE to TRUE while we were inside the key_comp() call.

          psergei Sergei Petrunia added a comment - When valgrind stops on error inside lfind(), we get (gdb) p link $6 = 505940640 (gdb) p cursor->curr->link $7 = 505940641 which means that DELETED(link) went from being FALSE to TRUE while we were inside the key_comp() call.

          With the above patch, the stderr gets this message:

          a parameter was freed (2)!

          Note the "(2)". the message without (2) was not printed. This matches the guess about error scenario posted above.

          psergei Sergei Petrunia added a comment - With the above patch, the stderr gets this message: a parameter was freed (2)! Note the "(2)". the message without (2) was not printed. This matches the guess about error scenario posted above.

          People

            psergei Sergei Petrunia
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.