[MDEV-25454] Make MariaDB server UBSAN safe Created: 2021-04-19  Updated: 2024-01-03

Status: Confirmed
Project: MariaDB Server
Component/s: Server, Tests
Fix Version/s: 10.4, 10.5, 10.6

Type: Task Priority: Major
Reporter: Michael Widenius Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None

Issue Links:
Relates
relates to MDEV-21341 Fix UBSAN failures Confirmed
relates to MDEV-22742 UBSAN: Many overflow issues in string... Closed
relates to MDEV-24193 UBSAN: sql/sql_acl.cc:9985:29: runtim... Open
relates to MDEV-24198 UBSAN: sql/sql_type_int.h:91:42: runt... Open
relates to MDEV-24510 Assertion `tmp != ((long long) 0x8000... Closed
relates to MDEV-26272 The macro MASTER_INFO_VAR invokes und... Closed
relates to MDEV-26814 UBSAN: runtime error: applying non-ze... Confirmed
relates to MDEV-26817 runtime error: index 24320 out of bou... Closed
relates to MDEV-26839 UBSAN: null pointer passed as argumen... Confirmed
relates to MDEV-26840 UBSAN: load of value 3200171710, whic... Open
relates to MDEV-33157 runtime error: call to function wsrep... Closed
relates to MDEV-33158 The macro MYSQL_THDVAR_ULONG leads to... Confirmed
relates to MDEV-33159 The macro my_offsetof() invokes undef... Confirmed
relates to MDEV-33160 show_status_array() calls various fun... Closed
relates to MDEV-28374 UBSAN: runtime error: signed integer... Confirmed
relates to MDEV-29473 UBSAN: Signed integer overflow: X * Y... Closed

 Description   

The task to fix that we can run all mtr tests with a MariaDB server compiled with
UBSAN (UndefinedBehaviorSanitizer) and TSAN (ThreadSanitizer) and fix all run time errors (if possible).

gcc --fsanitize=undefined works at least with gcc 7.5.0 and up.



 Comments   
Comment by Michael Widenius [ 2021-04-19 ]

UBSAN did find a lot of potential and real errors in the MariaDB server code. Great TOOL!
Here is an extract from the commit message that explains what was fixed:

The 'special' cases where we disable, suppress or circumvent UBSAN are:

  • ref10 source (as here we intentionally do some shifts that UBSAN
    complains about.
  • x86 version of optimized int#korr() methods. UBSAN do not like unaligned
    memory access of integers. Fixed by using byte_order_generic.h when
    compiling with UBSAN
  • We use smaller thread stack with ASAN and UBSAN, which forced me to
    disable a few tests that prints the thread stack size.
  • Verifying class types does not work for shared libraries. I added
    suppression in mysql-test-run.pl for this case.
  • Added '#ifdef WITH_UBSAN' when using integer arithmetic where it is
    safe to have overflows (two cases, in item_func.cc).

Things fixed:

  • Don't left shift signed values
    (byte_order_generic.h, mysqltest.c, item_sum.cc and many more)
  • Don't assign not non existing values to enum variables.
  • Ensure that bool and enum values are properly initialized in
    constructors. This was needed as UBSAN checks that these types has
    correct values when one copies an object.
    (gcalc_tools.h, ha_partition.cc, item_sum.cc, partition_element.h ...)
  • Ensure we do not called handler functions on unallocated objects or
    deleted objects.
    (events.cc, sql_acl.cc).
  • Fixed bugs in Item_sp::Item_sp() where we did not call constructor
    on Query_arena object.
  • Fixed several cast of objects to an incompatible class!
    (Item.cc, Item_buff.cc, item_timefunc.cc, opt_subselect.cc, sql_acl.cc,
    sql_select.cc ...)
  • Ensure we do not do integer arithmetic that causes over or underflows.
    This includes also ++ and – of integers.
    (Item_func.cc, Item_strfunc.cc, item_timefunc.cc, sql_base.cc ...)
  • Added JSON_VALUE_UNITIALIZED to json_value_types and ensure that
    value_type is initialized to this instead of to -1, which is not a valid
    enum value for json_value_types.
  • Ensure we do not call memcpy() when second argument could be null.
  • Fixed that Item_func_str::make_empty_result() creates an empty string
    instead of a null string (safer as it ensures we do not do arithmetic
    on null strings).

Other things:

  • Changed struct st_position to an OBJECT and added an initialization
    function to it to ensure that we do not copy or use uninitialized
    members. The change to a class was also motived that we used "struct
    st_position" and POSITION randomly trough the code which was
    confusing.
  • Notably big rewrite in sql_acl.cc to avoid using deleted objects.
  • Changed in sql_partition to use '^' instead of '-'. This is safe as
    the operator is either 0 or 0x8000000000000000ULL.
  • Added check for select_nr < INT_MAX in JOIN::build_explain() to
    avoid bug when get_select() could return NULL.
  • Reordered elements in POSITION for better alignment.
  • Changed sql_test.cc::print_plan() to use pointers instead of objects.
  • Fixed bug in find_set() where could could execute '1 << -1'.
  • Added variable have_sanitizer, used by mtr. (This variable was before
    only in 10.5 and up). It can now have one of two values:
    ASAN or UBSAN.
  • Moved ~Archive_share() from ha_archive.cc to ha_archive.h and marked
    it virtual. This was an effort to get UBSAN to work with loaded storage
    engines. I kept the change as the new place is better.
  • Added in CONNECT engine COLBLK::SetName(), to get around a wrong cast
    in tabutil.cpp.

Changes that should not be needed but had to be done to suppress warnings
from UBSAN:

  • Added static_cast<<uint16_t>> around shift to get rid of a LOT of
    compiler warnings when using UBSAN.
  • Had to change some '/' of 2 base integers to shift to get rid of
    some compile time warnings.
Comment by Marko Mäkelä [ 2021-04-20 ]

This should fix the bulk of the failures that are mentioned in MDEV-21341.

I tested the following:

cmake -DWITH_UBSAN=ON -DWITH_ASAN=ON …

and running with

ASAN_OPTIONS=abort_on_error=1 UBSAN_OPTIONS=halt_on_error=1 ./mtr --force --max-test-fail=0 …

I am seeing many more failures when setting those environment variables. The combination of ASAN and UBSAN seems to find more than just plain UBSAN.

Last, when clang is used, the server would still crash in static initialization, already during

mariadbd --verbose --help

which prevents any mtr tests from being run. I believe that the problem is related to my_offsetof. There are 5 source files that would fail to compile with GCC 10.2.1 if I make the following change:

diff --git a/include/my_global.h b/include/my_global.h
index e999e555bf7..186a4c51a4a 100644
--- a/include/my_global.h
+++ b/include/my_global.h
@@ -839,7 +839,7 @@ typedef long long	my_ptrdiff_t;
   and related routines are refactored.
 */
 
-#define my_offsetof(TYPE, MEMBER) PTR_BYTE_DIFF(&((TYPE *)0x10)->MEMBER, 0x10)
+#define my_offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
 
 #define NullS		(char *) 0
 

I believe that after removing the use of my_offsetof that GCC would flag invalid with that patch, we should be close to having the UBSAN build work also with clang.

Comment by Roel Van de Paar [ 2021-10-14 ]

A fix for MDEV-22742 would be greatly appreciated. UBSAN Runs are clogged up with similar issues in strings/decimal.c with different stacks, even when generalized.

Comment by Marko Mäkelä [ 2021-11-01 ]

The most prominent abuse of MY_OFFSETOF() is in the macro MASTER_INFO_VAR (MDEV-26272). In that ticket, I have posted a patch that allows a clang -fsanitize=undefined built server to bootstrap.

Comment by Roel Van de Paar [ 2021-11-29 ]

Logged TSAN safe as MDEV-27138: Make MariaDB server TSAN safe

Generated at Thu Feb 08 09:37:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.