[MDEV-26272] The macro MASTER_INFO_VAR invokes undefined behaviour Created: 2021-07-29  Updated: 2024-01-03  Resolved: 2023-10-30

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Brandon Nesterenko
Resolution: Fixed Votes: 0
Labels: UBSAN, affects-tests

Issue Links:
Blocks
blocks MDEV-33157 runtime error: call to function wsrep... Closed
blocks MDEV-33158 The macro MYSQL_THDVAR_ULONG leads to... Confirmed
blocks MDEV-33159 The macro my_offsetof() invokes undef... Confirmed
blocks MDEV-33160 show_status_array() calls various fun... Closed
blocks MDEV-33161 my_hash_walk_action leads to undefine... Confirmed
Relates
relates to MDEV-31802 GCC 13: sql/sys_vars.inl: runtime err... Open
relates to MDEV-25454 Make MariaDB server UBSAN safe Confirmed
relates to MDEV-26814 UBSAN: runtime error: applying non-ze... Confirmed
relates to MDEV-27355 Fix UBSAN's Findings in Replication Closed

 Description   

Since MariaDB 10.4.19 most tests succeed when the code is compiled with GCC 9 or 10 and cmake -DWITH_UBSAN (and UBSAN is not instructed to crash on errors by setting an environment variable). But, a clang build would crash on bootstrap, due to dereferencing an invalid constant pointer:

10.4

UndefinedBehaviorSanitizer:DEADLYSIGNAL
==619510==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000010 (pc 0x0000006de2b4 bp 0x000200000000 sp 0x7ffe3fc3bbc0 T619510)
==619510==The signal is caused by a READ memory access.
==619510==Hint: address points to the zero page.
    #0 0x6de2b4 in __cxx_global_var_init.1130 /mariadb/10.4/sql/sys_vars.cc:5256:8
    #1 0x6de2b4 in _GLOBAL__sub_I_sys_vars.cc /mariadb/10.4/sql/sys_vars.cc
    #2 0x23fc874 in __libc_csu_init (/dev/shm/10.4/sql/mysqld+0x23fc874)
    #3 0x7fdedcfb8c99 in __libc_start_main csu/../csu/libc-start.c:264:6
    #4 0x6ee309 in _start (/dev/shm/10.4/sql/mysqld+0x6ee309)

The crashing instruction is attempting to dereference an invalid constant pointer:

   0x00000000006de295 <+36517>:	movabs $0x9ddfea08eb382d69,%r14

If applying any offset to the null pointer is undefined behaviour, then surely also applying an offset to an invalid constant pointer should be undefined behaviour:

#define MASTER_INFO_VAR(X) my_offsetof(Master_info, X), sizeof(((Master_info *)0x10)->X)
#define my_offsetof(TYPE, MEMBER) PTR_BYTE_DIFF(&((TYPE *)0x10)->MEMBER, 0x10)

(Note: offsetof is defined for POD, but not for C++ objects. Any use of the macro my_offsetof on C++ objects ought to be undefined behaviour.

The following patch, which removes all use of MASTER_INFO_VAR but not my_offsetof, allows the bootstrap to succeed on clang:

diff --git a/sql/mysqld.cc b/sql/mysqld.cc
index ba1d477882f..90064ac94f5 100644
--- a/sql/mysqld.cc
+++ b/sql/mysqld.cc
@@ -8991,7 +8991,7 @@ static int get_options(int *argc_ptr, char ***argv_ptr)
   /* Remember if max_user_connections was 0 at startup */
   max_user_connections_checking= global_system_variables.max_user_connections != 0;
 
-#ifdef HAVE_REPLICATION
+#if 0 // ifdef HAVE_REPLICATION
   {
     sys_var *max_relay_log_size_var, *max_binlog_size_var;
     /* If max_relay_log_size is 0, then set it to max_binlog_size */
diff --git a/sql/rpl_rli.cc b/sql/rpl_rli.cc
index c0ade40b461..5d41552df1b 100644
--- a/sql/rpl_rli.cc
+++ b/sql/rpl_rli.cc
@@ -82,7 +82,9 @@ Relay_log_info::Relay_log_info(bool is_slave_recovery)
   group_relay_log_name[0]= event_relay_log_name[0]=
     group_master_log_name[0]= 0;
   until_log_name[0]= ign_master_log_name_end[0]= 0;
+#if 0
   max_relay_log_size= global_system_variables.max_relay_log_size;
+#endif
   bzero((char*) &info_file, sizeof(info_file));
   bzero((char*) &cache_buf, sizeof(cache_buf));
   mysql_mutex_init(key_relay_log_info_run_lock, &run_lock, MY_MUTEX_INIT_FAST);
diff --git a/sql/sql_class.h b/sql/sql_class.h
index 8e2b6eed338..7a5a886a814 100644
--- a/sql/sql_class.h
+++ b/sql/sql_class.h
@@ -595,12 +595,14 @@ typedef struct system_variables
   ulonglong default_regex_flags;
   ulonglong max_mem_used;
 
+#if 0
   /**
      Place holders to store Multi-source variables in sys_var.cc during
      update and show of variables.
   */
   ulonglong slave_skip_counter;
   ulonglong max_relay_log_size;
+#endif
 
   ha_rows select_limit;
   ha_rows max_join_size;
diff --git a/sql/sys_vars.cc b/sql/sys_vars.cc
index c52a8f742a8..05dd872796c 100644
--- a/sql/sys_vars.cc
+++ b/sql/sys_vars.cc
@@ -5215,6 +5215,7 @@ bool update_multi_source_variable(sys_var *self_var, THD *thd,
   return result;
 }
 
+#if 0
 static bool update_slave_skip_counter(sys_var *self, THD *thd, Master_info *mi)
 {
   if (mi->rli.slave_running)
@@ -5272,6 +5273,7 @@ static Sys_var_multi_source_ulonglong Sys_max_relay_log_size(
        MASTER_INFO_VAR(rli.max_relay_log_size),
        VALID_RANGE(0, 1024L*1024*1024), DEFAULT(0), BLOCK_SIZE(IO_SIZE),
        ON_UPDATE(update_max_relay_log_size));
+#endif
 
 static Sys_var_charptr Sys_slave_skip_errors(
        "slave_skip_errors", "Tells the slave thread to continue "

With this patch, clang-12 is reporting some errors that are not reported by GCC 10.2.1:

main.func_math                           [ fail ]  Found warnings/errors in server log file!
        Test ended at 2021-07-29 10:12:13
line
/mariadb/10.4/strings/ctype.c:1151:46: runtime error: applying zero offset to null pointer
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /mariadb/10.4/strings/ctype.c:1151:46 in 
/mariadb/10.4/sql/sql_select.cc:3766:22: runtime error: applying non-zero offset 4020089388120 to null pointer
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /mariadb/10.4/sql/sql_select.cc:3766:22 in 
/mariadb/10.4/sql/sql_select.cc:3225:32: runtime error: applying non-zero offset 936 to null pointer
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /mariadb/10.4/sql/sql_select.cc:3225:32 in 
/mariadb/10.4/sql/sql_show.cc:3828:7: runtime error: call to function rpl_semi_sync_master_show_clients(THD*, st_mysql_show_var*, char*) through pointer to incorrect function type 'int (*)(THD *, st_mysql_show_var *, void *, system_status_var *, enum_var_type)'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /mariadb/10.4/sql/sql_show.cc:3828:7 in 
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /mariadb/10.4/strings/decimal.c:1161:8 in 
/mariadb/10.4/strings/json_lib.c:1688:26: runtime error: index -1 out of bounds for type 'json_path_step_t [32]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /mariadb/10.4/strings/json_lib.c:1688:26 in 
^ Found warnings in /dev/shm/10.4/mysql-test/var/log/mysqld.1.err



 Comments   
Comment by Marko Mäkelä [ 2021-11-24 ]

Because my_offsetof() appears to exercise undefined pointer arithmetics, we’d better replace it with the standard macro offsetof() where possible:

  • MDL_key has standard layout
  • MY_LOCALE turns out to have standard layout as well
  • PFS_events_statements was trivially transformed to standard layout by moving the base class to a data member.

With a small test program you can easily experiment when offsetof is not allowed. https://godbolt.org offers convenient access to many compilers, including GCC 4.8.5, which is the oldest that we currently support.

#include <type_traits>
 
class B {
    int m1;
public:
    int m2= 1;
    B(): m1(2) {}
    bool eq() const { return m1==m2; }
    static int o2() { return __builtin_offsetof(B, m2); }
};
 
int f()
{
    return std::is_standard_layout<B>::value;
}
 
int g() { return B::o2(); }

Apart from Master_info, which is the antagonist in this bug, we have a number of data types on which my_offsetof() is being invoked and offsetof() cannot be invoked due to non-standard layout:

  • Table_triggers_list
  • TABLE_LIST
  • ACL_ROLE
  • ACL_USER_BASE

These classes either derive from a base class (similar to how PFS_events_statements did before my cleanup) or they encapsulate nonstandard-layout objects, such as List<String>.

Comment by Brandon Nesterenko [ 2023-05-22 ]

Raised the priority to critical as MTR cannot run at all with Clang/UBSAN

Comment by Brandon Nesterenko [ 2023-06-09 ]

Hi Andrei!

This is ready for review: PR 2665

Comment by Brandon Nesterenko [ 2023-10-30 ]

Fixed in 10.4 as e52777f1a

No merge conflicts observed when cherry-picking into 11.3.

Generated at Thu Feb 08 09:44:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.