Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35691

Invalid access, use-after-free, on rli->description_event_for_exec

Details

    • Bug
    • Status: Open (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 10.5
    • 10.5
    • Replication

    Description

      This is a race in parallel replication. Slave worker threads are incorrectly accessing rli->relay_log.description_event_for_exec, which is not synchronised and not safe to access multi-threaded. This is similar to MDEV-29322, see explanation here:

      https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/message/V6PKEFOYFC67MLPIK6DUTJSA3JALR2BB/

      This particular problem was seen in an MSAN test failure giving this error:

      CURRENT_TEST: rpl.rpl_mdev6020
      mysqltest: At line 44: failed in 'select master_pos_wait('master-bin.000002', 343, 300, '')': 2013: Lost connection to server during query
      …
      Uninitialized bytes in MemcmpInterceptorCommon at offset 0 inside [0x7110000a2b50, 3)
      ==43126==WARNING: MemorySanitizer: use-of-uninitialized-value
          #0 0x55da6f74325d in __interceptor_memcmp (/home/buildbot/amd64-debian-11-msan-clang-16/build/sql/mariadbd+0x7f825d) (BuildId: 47389b241f533028aa4089e9d5370350f3b1107b)
          #1 0x55da6f839509 in Version::cmp(Version const&) const /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/log_event.h:2771:12
          #2 0x55da6f839509 in Version::operator>(Version const&) const /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/log_event.h:2791:55
          #3 0x55da6f839509 in rpl_master_has_bug(Relay_log_info const*, unsigned int, bool, bool (*)(void const*), void const*, bool) /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/slave.cc:8378:18
          #4 0x55da702c706e in Field_string::compatible_field_size(unsigned int, Relay_log_info const*, unsigned short, int*) const /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/field.cc:7655:7
          #5 0x55da6fea08a0 in Field::rpl_conv_type_from_same_data_type(unsigned short, Relay_log_info const*, Conv_param const&) const /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/rpl_utility_server.cc:421:8
          #6 0x55da6fea08a0 in Field_longstr::rpl_conv_type_from(Conv_source const&, Relay_log_info const*, Conv_param const&) const /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/rpl_utility_server.cc:558:12
          #7 0x55da6fea325e in can_convert_field_to(Field*, Conv_source const&, Relay_log_info const*, Conv_param const&) /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/rpl_utility_server.cc:857:3
          #8 0x55da6fea325e in table_def::compatible_with(THD*, rpl_group_info*, TABLE*, TABLE**) const /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/rpl_utility_server.cc:954:30
          #9 0x55da70749b71 in Rows_log_event::do_apply_event(rpl_group_info*) /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/log_event_server.cc:5687:30
          #10 0x55da6f83d6bb in Log_event::apply_event(rpl_group_info*) /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/log_event.h:1512:10
          #11 0x55da6f82928d in apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int) /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/slave.cc:3942:19
          #12 0x55da70048337 in rpt_handle_event(rpl_parallel_thread::queued_event*, rpl_parallel_thread*) /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/rpl_parallel.cc:66:8
          #13 0x55da70040323 in handle_rpl_parallel_thread /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/rpl_parallel.cc:1549:18
          #14 0x55da70c3d1b0 in pfs_spawn_thread /home/buildbot/amd64-debian-11-msan-clang-16/build/storage/perfschema/pfs.cc:2201:3
          #15 0x7fba43167ea6 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7ea6) (BuildId: 255e355c207aba91a59ae1f808e3b4da443abf0c)
          #16 0x7fba42c65ace in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfbace) (BuildId: f3654f4d10c3f54ac5dabcb8fc67a2b8d1409dc4)
        Uninitialized value was created by a heap deallocation
          #0 0x55da6f738d10 in free (/home/buildbot/amd64-debian-11-msan-clang-16/build/sql/mariadbd+0x7edd10) (BuildId: 47389b241f533028aa4089e9d5370350f3b1107b)
          #1 0x55da70769d79 in Log_event::operator delete(void*, unsigned long) /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/log_event.h:1389:5
          #2 0x55da70769d79 in Format_description_log_event::~Format_description_log_event() /home/buildbot/amd64-debian-11-msan-clang-16/build/sql/log_event.h:2844:3
      

      The function rpl_master_has_bug() ends up getting called from a worker thread and accesses the description_event_for_exec:

        const Version &master_ver=
          rli->relay_log.description_event_for_exec->server_version_split;
      

      This is wrong, as the description_event_for_exec changes whenever the SQL driver thread encounters a binlog rotation from the master. So the worker threads will potentially use the wrong format description event or, as here, access freed memory trying to use an old one.

      Accessing potentially freed memory is somewhat serious. But it looks like this is only doing a memcmp() of size 3 on potentially re-used memory, so may not cause any real problems in practice.

      As for a fix, this rpl_master_has_bug() functions seems like it should maybe be replaced/removed. It is iterating over a linear list looking for a specific bug number, and this seems to be done for every field (of string type?) of each row event applied by the slave, which is surely overly expensive - apart from being wrong in accessing the description_event_for_exec which is owned by a different thread.

      This particular call rpl_master_has_bug(37426 ...) is for a bug in MySQL 5.1.28 from 2008 , surely no longer needed.

      More generally, it must be wrong to memcmp() the version in format description event repeatedly when applying different events. Instead, when reading the format description event, it should check at that point whether the originating master version has each specific bug of interest and set a bit in a bitmap accordingly. And then when reading other events, any bug of interest for that event should be copied into a flag in that event.

      This way, the format description event will be used correctly to determine how to read each event, and the event will after that be self-describing and can be applied without needing to reference the format description event. And we avoid the overhead of iterating over a list of ancient bug work-arounds during the execution of normal events.

      Attachments

        Activity

          There are no comments yet on this issue.

          People

            Elkin Andrei Elkin
            knielsen Kristian Nielsen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.