Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16768

Replication aborts with ER_KEY_NOT_FOUND or replica crashes on table with key on BLOB and without PK

Details

    Description

      Run with --mysqld=--plugin-load-add=ha_rocksdb

      --source include/have_binlog_format_row.inc
      --source include/master-slave.inc
       
      CREATE TABLE t1 (b BLOB, i INT, KEY(b(8))) ENGINE=RocksDB;
      INSERT INTO t1 VALUES (REPEAT('a',9),1);
       
      UPDATE t1 SET i = 2;
       
      --sync_slave_with_master
       
      # Cleanup
      --connection master
      DROP TABLE t1;
      --source include/rpl_end.inc
      

      10.2 1cc1d0429da

      2018-07-16 19:11:54 140503152559872 [ERROR] mysqld: Can't find record in 't1'
      2018-07-16 19:11:54 140503152559872 [ERROR] Slave SQL: Could not execute Update_rows_v1 event on table test.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log master-bin.000001, end_log_pos 923, Gtid 0-1-3, Internal MariaDB error code: 1032
      2018-07-16 19:11:54 140503152559872 [Warning] Slave: Can't find record in 't1' Error_code: 1032
      2018-07-16 19:11:54 140503152559872 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 726
      

      Reproducible on 10.5-10.11 regardless of parallel replication or parallel replication mode.
      On 11.2 and higher versions the replica crashes or fails with ASAN errors (depending on the build type):

      11.2 12a91b57e27b979819924cf89614e6e51f24b37b

      ==846681==ERROR: AddressSanitizer: use-after-poison on address 0x6190001159d8 at pc 0x7fe202fe27ce bp 0x7fe1f3258e10 sp 0x7fe1f3258e08
      READ of size 8 at 0x6190001159d8 thread T36
          #0 0x7fe202fe27cd in myrocks::Rdb_field_packing::get_field_in_table(TABLE const*) const /data/bld/11.2-asan/storage/rocksdb/rdb_datadic.cc:3500
          #1 0x7fe202fd5929 in myrocks::Rdb_key_def::pack_record(TABLE const*, unsigned char*, unsigned char const*, unsigned char*, myrocks::Rdb_string_writer*, bool, long long, unsigned int, unsigned int*, char const*) const /data/bld/11.2-asan/storage/rocksdb/rdb_datadic.cc:1378
          #2 0x7fe202fd3787 in myrocks::Rdb_key_def::pack_index_tuple(TABLE*, unsigned char*, unsigned char*, unsigned char*, unsigned char const*, unsigned long const&) const /data/bld/11.2-asan/storage/rocksdb/rdb_datadic.cc:1024
          #3 0x7fe202ddac66 in myrocks::ha_rocksdb::index_read_map_impl(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function, st_key_range const*) /data/bld/11.2-asan/storage/rocksdb/ha_rocksdb.cc:8579
          #4 0x7fe202dd9f54 in myrocks::ha_rocksdb::index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function) /data/bld/11.2-asan/storage/rocksdb/ha_rocksdb.cc:8491
          #5 0x559a7c35b4f4 in handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function) /data/bld/11.2-asan/sql/handler.cc:3750
          #6 0x559a7c768878 in Rows_log_event::find_row(rpl_group_info*) /data/bld/11.2-asan/sql/log_event_server.cc:7677
          #7 0x559a7c76b27c in Update_rows_log_event::do_exec_row(rpl_group_info*) /data/bld/11.2-asan/sql/log_event_server.cc:8066
          #8 0x559a7c75396d in Rows_log_event::do_apply_event(rpl_group_info*) /data/bld/11.2-asan/sql/log_event_server.cc:5226
          #9 0x559a7c719f04 in Log_event::apply_event(rpl_group_info*) /data/bld/11.2-asan/sql/log_event.cc:3922
          #10 0x559a7b755473 in apply_event_and_update_pos_apply /data/bld/11.2-asan/sql/slave.cc:3949
          #11 0x559a7b75620a in apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*) /data/bld/11.2-asan/sql/slave.cc:4125
          #12 0x559a7b75836c in exec_relay_log_event /data/bld/11.2-asan/sql/slave.cc:4551
          #13 0x559a7b7608de in handle_slave_sql /data/bld/11.2-asan/sql/slave.cc:5788
          #14 0x559a7cb1cfed in pfs_spawn_thread /data/bld/11.2-asan/storage/perfschema/pfs.cc:2201
          #15 0x7fe20b4a8043 in start_thread nptl/pthread_create.c:442
          #16 0x7fe20b52861b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
       
      0x6190001159d8 is located 344 bytes inside of 1040-byte region [0x619000115880,0x619000115c90)
      allocated by thread T36 here:
          #0 0x7fe20beb89cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
          #1 0x559a7d88fe35 in my_malloc /data/bld/11.2-asan/mysys/my_malloc.c:93
          #2 0x559a7d8631c6 in root_alloc /data/bld/11.2-asan/mysys/my_alloc.c:66
          #3 0x559a7d864b1f in alloc_root /data/bld/11.2-asan/mysys/my_alloc.c:332
          #4 0x559a7d8652e0 in multi_alloc_root /data/bld/11.2-asan/mysys/my_alloc.c:405
          #5 0x559a7bdd011a in copy_keys_from_share(TABLE*, st_mem_root*) /data/bld/11.2-asan/sql/table.cc:4142
          #6 0x559a7bdd1ec3 in open_table_from_share(THD*, TABLE_SHARE*, st_mysql_const_lex_string const*, unsigned int, unsigned int, unsigned int, TABLE*, bool, List<String>*) /data/bld/11.2-asan/sql/table.cc:4382
          #7 0x559a7b8347af in open_table(THD*, TABLE_LIST*, Open_table_context*) /data/bld/11.2-asan/sql/sql_base.cc:2247
          #8 0x559a7b83f9b6 in open_and_process_table /data/bld/11.2-asan/sql/sql_base.cc:4180
          #9 0x559a7b84250d in open_tables(THD*, DDL_options_st const&, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) /data/bld/11.2-asan/sql/sql_base.cc:4666
          #10 0x559a7b8473cd in open_and_lock_tables(THD*, DDL_options_st const&, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) /data/bld/11.2-asan/sql/sql_base.cc:5640
          #11 0x559a7b86b9fc in open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int) /data/bld/11.2-asan/sql/sql_base.h:531
          #12 0x559a7c7517fb in Rows_log_event::do_apply_event(rpl_group_info*) /data/bld/11.2-asan/sql/log_event_server.cc:4972
          #13 0x559a7c719f04 in Log_event::apply_event(rpl_group_info*) /data/bld/11.2-asan/sql/log_event.cc:3922
          #14 0x559a7b755473 in apply_event_and_update_pos_apply /data/bld/11.2-asan/sql/slave.cc:3949
          #15 0x559a7b75620a in apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*) /data/bld/11.2-asan/sql/slave.cc:4125
          #16 0x559a7b75836c in exec_relay_log_event /data/bld/11.2-asan/sql/slave.cc:4551
          #17 0x559a7b7608de in handle_slave_sql /data/bld/11.2-asan/sql/slave.cc:5788
          #18 0x559a7cb1cfed in pfs_spawn_thread /data/bld/11.2-asan/storage/perfschema/pfs.cc:2201
          #19 0x7fe20b4a8043 in start_thread nptl/pthread_create.c:442
       
      Thread T36 created by T34 here:
          #0 0x7fe20be49726 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:207
          #1 0x559a7cb18d28 in my_thread_create /data/bld/11.2-asan/storage/perfschema/my_thread.h:52
          #2 0x559a7cb1d3dc in pfs_spawn_thread_v1 /data/bld/11.2-asan/storage/perfschema/pfs.cc:2252
          #3 0x559a7b73ab11 in inline_mysql_thread_create /data/bld/11.2-asan/include/mysql/psi/mysql_thread.h:1139
          #4 0x559a7b742fbd in start_slave_thread(unsigned int, void* (*)(void*), st_mysql_mutex*, st_mysql_mutex*, st_mysql_cond*, unsigned int volatile*, unsigned long volatile*, Master_info*) /data/bld/11.2-asan/sql/slave.cc:1152
          #5 0x559a7b743aa0 in start_slave_threads(THD*, bool, bool, Master_info*, char const*, char const*, int) /data/bld/11.2-asan/sql/slave.cc:1278
          #6 0x559a7bacb163 in start_slave(THD*, Master_info*, bool) /data/bld/11.2-asan/sql/sql_repl.cc:3277
          #7 0x559a7b9f2852 in mysql_execute_command(THD*, bool) /data/bld/11.2-asan/sql/sql_parse.cc:4286
          #8 0x559a7ba0b421 in mysql_parse(THD*, char*, unsigned int, Parser_state*) /data/bld/11.2-asan/sql/sql_parse.cc:7938
          #9 0x559a7b9e3023 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) /data/bld/11.2-asan/sql/sql_parse.cc:1894
          #10 0x559a7b9dfd60 in do_command(THD*, bool) /data/bld/11.2-asan/sql/sql_parse.cc:1407
          #11 0x559a7bebb061 in do_handle_one_connection(CONNECT*, bool) /data/bld/11.2-asan/sql/sql_connect.cc:1439
          #12 0x559a7bebaa1d in handle_one_connection /data/bld/11.2-asan/sql/sql_connect.cc:1341
          #13 0x559a7cb1cfed in pfs_spawn_thread /data/bld/11.2-asan/storage/perfschema/pfs.cc:2201
          #14 0x7fe20b4a8043 in start_thread nptl/pthread_create.c:442
       
      Thread T34 created by T0 here:
          #0 0x7fe20be49726 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:207
          #1 0x559a7cb18d28 in my_thread_create /data/bld/11.2-asan/storage/perfschema/my_thread.h:52
          #2 0x559a7cb1d3dc in pfs_spawn_thread_v1 /data/bld/11.2-asan/storage/perfschema/pfs.cc:2252
          #3 0x559a7b6128c5 in inline_mysql_thread_create /data/bld/11.2-asan/include/mysql/psi/mysql_thread.h:1139
          #4 0x559a7b62aee8 in create_thread_to_handle_connection(CONNECT*) /data/bld/11.2-asan/sql/mysqld.cc:6241
          #5 0x559a7b62b50d in create_new_thread(CONNECT*) /data/bld/11.2-asan/sql/mysqld.cc:6303
          #6 0x559a7b62b7f8 in handle_accepted_socket(st_mysql_socket, st_mysql_socket) /data/bld/11.2-asan/sql/mysqld.cc:6365
          #7 0x559a7b62c17c in handle_connections_sockets() /data/bld/11.2-asan/sql/mysqld.cc:6489
          #8 0x559a7b62a765 in mysqld_main(int, char**) /data/bld/11.2-asan/sql/mysqld.cc:6136
          #9 0x559a7b611948 in main /data/bld/11.2-asan/sql/main.cc:34
          #10 0x7fe20b4461c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
       
      SUMMARY: AddressSanitizer: use-after-poison /data/bld/11.2-asan/storage/rocksdb/rdb_datadic.cc:3500 in myrocks::Rdb_field_packing::get_field_in_table(TABLE const*) const
      Shadow bytes around the buggy address:
        0x0c328001aae0: 00 00 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
        0x0c328001aaf0: f7 f7 fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        0x0c328001ab00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        0x0c328001ab10: 00 00 00 00 00 00 f7 00 00 00 00 00 00 00 00 00
        0x0c328001ab20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f7 00
      =>0x0c328001ab30: 00 00 00 f7 f7 f7 f7 f7 f7 f7 f7[f7]f7 f7 f7 f7
        0x0c328001ab40: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
        0x0c328001ab50: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
        0x0c328001ab60: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
        0x0c328001ab70: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
        0x0c328001ab80: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
      Shadow byte legend (one shadow byte represents 8 application bytes):
        Addressable:           00
        Partially addressable: 01 02 03 04 05 06 07 
        Heap left redzone:       fa
        Freed heap region:       fd
        Stack left redzone:      f1
        Stack mid redzone:       f2
        Stack right redzone:     f3
        Stack after return:      f5
        Stack use after scope:   f8
        Global redzone:          f9
        Global init order:       f6
        Poisoned by user:        f7
        Container overflow:      fc
        Array cookie:            ac
        Intra object redzone:    bb
        ASan internal:           fe
        Left alloca redzone:     ca
        Right alloca redzone:    cb
      ==846681==ABORTING
      

      Attachments

        Activity

          Upstream is not affected, they store 8 as length:

          (gdb) print new_packed_size
            $7 = 23
          (gdb)  x/23c m_sk_packed_tuple
            0x7fffe00196c0:	0 '\000'	0 '\000'	1 '\001'	6 '\006'	1 '\001'	97 'a'	97 'a'	97 'a'
            0x7fffe00196c8:	97 'a'	97 'a'	97 'a'	97 'a'	97 'a'	0 '\000'	8 '\b'	0 '\000'
            0x7fffe00196d0:	0 '\000'	0 '\000'	0 '\000'	0 '\000'	0 '\000'	0 '\000'	1 '\001'
          

          Field_blob::make_sort_key() has this piece:

                uint key_length= MY_MIN(blob_length, length);
          

          It was added by this patch:

          https://github.com/facebook/mysql-5.6/commit/fecef54c24b5b64037af1ecf2f8e6ced5a90a3ac

          commit fecef54c24b5b64037af1ecf2f8e6ced5a90a3ac
          Author: Manuel Ung <mung@fb.com>
          Date:   Tue Jun 14 09:50:58 2016 -0700
           
              Fix length in Field_blob::make_sort_key
              
              Summary:
              For the binary collation in Field_blob::make_sort_key, we write out the length of the entire blob, even though the key itself may be shorter. We can miss out on duplicate key errors because of this:
          ...
          

          psergei Sergei Petrunia added a comment - Upstream is not affected, they store 8 as length: (gdb) print new_packed_size $7 = 23 (gdb) x/23c m_sk_packed_tuple 0x7fffe00196c0: 0 '\000' 0 '\000' 1 '\001' 6 '\006' 1 '\001' 97 'a' 97 'a' 97 'a' 0x7fffe00196c8: 97 'a' 97 'a' 97 'a' 97 'a' 97 'a' 0 '\000' 8 '\b' 0 '\000' 0x7fffe00196d0: 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 1 '\001' Field_blob::make_sort_key() has this piece: uint key_length= MY_MIN(blob_length, length); It was added by this patch: https://github.com/facebook/mysql-5.6/commit/fecef54c24b5b64037af1ecf2f8e6ced5a90a3ac commit fecef54c24b5b64037af1ecf2f8e6ced5a90a3ac Author: Manuel Ung <mung@fb.com> Date: Tue Jun 14 09:50:58 2016 -0700   Fix length in Field_blob::make_sort_key Summary: For the binary collation in Field_blob::make_sort_key, we write out the length of the entire blob, even though the key itself may be shorter. We can miss out on duplicate key errors because of this: ...

          Another question is, why does MyRocks not use variable-length-varchar encoding in this case?
          Consider two cases:

          create table t4 ( pk int not null primary key, b blob, key(b(100)))engine=rocksdb;
          insert into t4 values (1, 'abc');
          

          Here, secondary key length is 111 bytes.

          create table t5 ( pk int not null primary key, b varbinary(120), key(b(100)))engine=rocksdb;
          insert into t5 values (1, 'abc');
          

          Here, key length is 18 bytes. (Which IIRC is 18= (indexnr 4) + (null_flag 1) + (8 chunk_size) + (4 pk_value))

          psergei Sergei Petrunia added a comment - Another question is, why does MyRocks not use variable-length-varchar encoding in this case? Consider two cases: create table t4 ( pk int not null primary key , b blob, key (b(100)))engine=rocksdb; insert into t4 values (1, 'abc' ); Here, secondary key length is 111 bytes. create table t5 ( pk int not null primary key , b varbinary(120), key (b(100)))engine=rocksdb; insert into t5 values (1, 'abc' ); Here, key length is 18 bytes. (Which IIRC is 18= (indexnr 4) + (null_flag 1) + (8 chunk_size) + (4 pk_value))

          The above is not a critical limitation.

          psergei Sergei Petrunia added a comment - The above is not a critical limitation.

          There's a bunch of problems with the commit 87609324e0ee:

          • it breaks sys_vars.max_sort_length_func. The test result is correct, commit introduced a bug. Because Field_blob::make_sort_key works correctly, exactly as it should, although not how RocksDB wants it to. The correct fix would be to fix RocksDB not to expect Field_blob::make_sort_key to do something it should not.
          • It comes with a complex replication test, instead of a simple test, as in the comment above.
          • it changes on-disk format
          • it's in 10.3 only, so 10.2 is still not fixed

          I've fixed first two points in my commit 66750323e57 (it's only in bb-10.3-serg now, so commit hash might still change). Don't know if it's possible to avoid on-disk format changes.

          serg Sergei Golubchik added a comment - There's a bunch of problems with the commit 87609324e0ee: it breaks sys_vars.max_sort_length_func . The test result is correct, commit introduced a bug. Because Field_blob::make_sort_key works correctly, exactly as it should, although not how RocksDB wants it to. The correct fix would be to fix RocksDB not to expect Field_blob::make_sort_key to do something it should not. It comes with a complex replication test, instead of a simple test, as in the comment above. it changes on-disk format it's in 10.3 only, so 10.2 is still not fixed I've fixed first two points in my commit 66750323e57 (it's only in bb-10.3-serg now, so commit hash might still change). Don't know if it's possible to avoid on-disk format changes.

          It's not in review, as far as I can see. There is nothing to review, there was an old patch that was wrong and was reverted two years ago. Since then I don't see any other attempts to fix the bug.

          serg Sergei Golubchik added a comment - It's not in review, as far as I can see. There is nothing to review, there was an old patch that was wrong and was reverted two years ago. Since then I don't see any other attempts to fix the bug.

          People

            psergei Sergei Petrunia
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.