Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33755

Some tests cause MSAN alarms due to uninitialized Item_func::not_null_tables_cache

Details

    • Bug
    • Status: In Review (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.5
    • 10.5
    • None
    • None

    Description

      This was found as part of MDEV-33478. Reproducible under MSAN of clang-16 or higher.

      $ LD_LIBRARY_PATH="$HOME"/msan-libs ./mtr main.subselect_sj
      <...>
      mysqltest: At line 80: query 'explain extended
      select * from t1 left join t2 on (t2.a= t1.a and t2.a in (select pk from t10))' failed: 2013: Lost connection to MySQL server during query
       
       
      Server [mysqld.1 - pid: 42990, winpid: 42990, exit: 256] failed during test run
      Server log from this test:
      ----------SERVER LOG START-----------
      $ /home/oleg/server/build/sql/mariadbd --defaults-group-suffix=.1 --defaults-file=/home/oleg/server/build/mysql-test/var/my.cnf --log-output=file --core-file --loose-debug-sync-timeout=300
      <...>
      Version: '10.5.25-MariaDB-debug-log'  socket: '/home/oleg/server/build/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  Source distribution
      ==42991==WARNING: MemorySanitizer: use-of-uninitialized-value
          #0 0x5611105e3d91 in Item_func::not_null_tables() const /home/oleg/server/sql/item_func.cc:632:3
          #1 0x56111057e436 in Item_cond::fix_fields(THD*, Item**) /home/oleg/server/sql/item_cmpfunc.cc:5001:38
          #2 0x56110fd0d314 in add_cond_and_fix(THD*, Item**, Item*) /home/oleg/server/sql/sql_select.cc:11318:12
          #3 0x56110fd0d314 in make_join_select(JOIN*, SQL_SELECT*, Item*) /home/oleg/server/sql/sql_select.cc:12314:13
          #4 0x56110fce0032 in JOIN::optimize_stage2() /home/oleg/server/sql/sql_select.cc:2663:7
          #5 0x56110fce9277 in JOIN::optimize_inner() /home/oleg/server/sql/sql_select.cc:2412:9
          #6 0x56110fcdd3c1 in JOIN::optimize() /home/oleg/server/sql/sql_select.cc:1740:10
          #7 0x56110fcc7577 in mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) /home/oleg/server/sql/sql_select.cc:4868:19
          #8 0x56110fdb7ec9 in mysql_explain_union(THD*, st_select_lex_unit*, select_result*) /home/oleg/server/sql/sql_select.cc:28121:10
          #9 0x56110fc3536c in execute_sqlcom_select(THD*, TABLE_LIST*) /home/oleg/server/sql/sql_parse.cc:6356:12
          #10 0x56110fc1ec65 in mysql_execute_command(THD*) /home/oleg/server/sql/sql_parse.cc:4022:12
          #11 0x56110fc0d1fb in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) /home/oleg/server/sql/sql_parse.cc:8196:18
          #12 0x56110fc048b3 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) /home/oleg/server/sql/sql_parse.cc:1891:7
          #13 0x56110fc0e50c in do_command(THD*) /home/oleg/server/sql/sql_parse.cc:1375:17
          #14 0x561110059471 in do_handle_one_connection(CONNECT*, bool) /home/oleg/server/sql/sql_connect.cc:1415:11
          #15 0x561110058fbb in handle_one_connection /home/oleg/server/sql/sql_connect.cc:1317:5
          #16 0x561110dcfab0 in pfs_spawn_thread /home/oleg/server/storage/perfschema/pfs.cc:2201:3
          #17 0x7f02f33ce133 in start_thread nptl/pthread_create.c:442:8
          #18 0x7f02f344e7db in clone3 misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
       
        Memory was marked as uninitialized
          #0 0x56110f88308d in __msan_allocated_memory (/home/oleg/server/build/sql/mariadbd+0x76008d) (BuildId: ef49369ab45d1225724bdc731e5cb048238867a4)
          #1 0x561111b79e06 in my_malloc /home/oleg/server/mysys/my_malloc.c:114:7
       
      SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/oleg/server/sql/item_func.cc:632:3 in Item_func::not_null_tables() const
      

      Attachments

        Issue Links

          Activity

            oleg.smirnov Oleg Smirnov added a comment - - edited

            This issue is similar to MDEV-33665 in terms of Item_func::not_null_tables_cache is left uninitialized after quick_fix_field() call. In contrast to MDEV-33665, here we deal with another descendant of Item_func: Item_func_trig_cond. So this solution works, as well as overriding quick_fix_field() in Item_func_trig_cond:

            diff --git a/sql/item_cmpfunc.h b/sql/item_cmpfunc.h
            index c35a6ec30fc..8a704ae2ac2 100644
            --- a/sql/item_cmpfunc.h
            +++ b/sql/item_cmpfunc.h
            @@ -674,16 +674,22 @@ class Item_func_trig_cond: public Item_bool_func
             public:
            +  void quick_fix_field() override
            +  {
            +    Item_bool_func::quick_fix_field();
            +    eval_not_null_tables(nullptr);
            +  }
            

            Local patching similar to the one implemented at MDEV-33655 helps also:

            diff --git a/sql/sql_select.cc b/sql/sql_select.cc
            index aac35d6ef17..0ad0a620f07 100644
            --- a/sql/sql_select.cc
            +++ b/sql/sql_select.cc
            @@ -12339,7 +12339,10 @@ make_join_select(JOIN *join,SQL_SELECT *select,COND *cond)
                         DBUG_PRINT("info", ("Item_func_trig_cond %p",
                                             tmp_cond));
                         if (tmp_cond)
            +            {
                           tmp_cond->quick_fix_field();
            +              tmp_cond->eval_not_null_tables(nullptr);
            +            }
                        /* Add the predicate to other pushed down predicates */
                         DBUG_PRINT("info", ("Item_cond_and"));
                         *sel_cond_ref= !(*sel_cond_ref) ? 
            

            oleg.smirnov Oleg Smirnov added a comment - - edited This issue is similar to MDEV-33665 in terms of Item_func::not_null_tables_cache is left uninitialized after quick_fix_field() call. In contrast to MDEV-33665 , here we deal with another descendant of Item_func : Item_func_trig_cond . So this solution works, as well as overriding quick_fix_field() in Item_func_trig_cond : diff --git a/sql/item_cmpfunc.h b/sql/item_cmpfunc.h index c35a6ec30fc..8a704ae2ac2 100644 --- a/sql/item_cmpfunc.h +++ b/sql/item_cmpfunc.h @@ -674,16 +674,22 @@ class Item_func_trig_cond: public Item_bool_func public: + void quick_fix_field() override + { + Item_bool_func::quick_fix_field(); + eval_not_null_tables(nullptr); + } Local patching similar to the one implemented at MDEV-33655 helps also: diff --git a/sql/sql_select.cc b/sql/sql_select.cc index aac35d6ef17..0ad0a620f07 100644 --- a/sql/sql_select.cc +++ b/sql/sql_select.cc @@ -12339,7 +12339,10 @@ make_join_select(JOIN *join,SQL_SELECT *select,COND *cond) DBUG_PRINT("info", ("Item_func_trig_cond %p", tmp_cond)); if (tmp_cond) + { tmp_cond->quick_fix_field(); + tmp_cond->eval_not_null_tables(nullptr); + } /* Add the predicate to other pushed down predicates */ DBUG_PRINT("info", ("Item_cond_and")); *sel_cond_ref= !(*sel_cond_ref) ?
            oleg.smirnov Oleg Smirnov added a comment - - edited

            The following patch fixes 28 tests from the main suite:

            diff --git a/sql/item_func.cc b/sql/item_func.cc
            index f6fa613cba6..1ea998310ae 100644
            --- a/sql/item_func.cc
            +++ b/sql/item_func.cc
            @@ -407,6 +407,7 @@ Item_func::quick_fix_field()
                     (*arg)->quick_fix_field();
                 }
               }
            +  eval_not_null_tables(nullptr);
               fixed= 1;
             }
            

            As an option, eval_not_null_tables(nullptr) can be added after each call to quick_fix_field() as it's done at MDEV-33665 where the item is a descendant of the classs Item_func.

            oleg.smirnov Oleg Smirnov added a comment - - edited The following patch fixes 28 tests from the main suite: diff --git a/sql/item_func.cc b/sql/item_func.cc index f6fa613cba6..1ea998310ae 100644 --- a/sql/item_func.cc +++ b/sql/item_func.cc @@ -407,6 +407,7 @@ Item_func::quick_fix_field() (*arg)->quick_fix_field(); } } + eval_not_null_tables(nullptr); fixed= 1; } As an option, eval_not_null_tables(nullptr) can be added after each call to quick_fix_field() as it's done at MDEV-33665 where the item is a descendant of the classs Item_func .
            oleg.smirnov Oleg Smirnov added a comment -

            One more idea from Monty: initialize not_null_tables_cache with 0 during construction but mark the memory as undefined: MEM_UNDEFINED(&value, sizeof(value)).
            To silence MSAN but enable valgrind to detect undefined memory usage the following macro can be employed:

            #ifdef HAVE_MEM_CHECK
            MEM_UNDEFINED(&value, sizeof(value))
            #endif
            

            So only valgrind which is smarter will see this memory as undefined but not MSAN (no there'll be no false alarms). A new macro MEM_UNDEFINED_VALGRIND(&value, sizeof(value)) can be introduced for that purpose.

            oleg.smirnov Oleg Smirnov added a comment - One more idea from Monty: initialize not_null_tables_cache with 0 during construction but mark the memory as undefined: MEM_UNDEFINED(&value, sizeof(value)) . To silence MSAN but enable valgrind to detect undefined memory usage the following macro can be employed: #ifdef HAVE_MEM_CHECK MEM_UNDEFINED(&value, sizeof(value)) #endif So only valgrind which is smarter will see this memory as undefined but not MSAN (no there'll be no false alarms). A new macro MEM_UNDEFINED_VALGRIND(&value, sizeof(value)) can be introduced for that purpose.
            oleg.smirnov Oleg Smirnov added a comment -

            monty, I implemented your proposed solution. Please review the pull request

            oleg.smirnov Oleg Smirnov added a comment - monty , I implemented your proposed solution. Please review the pull request
            danblack Daniel Black added a comment -

            psergei showed in https://jira.mariadb.org/browse/MDEV-33478?focusedCommentId=281579&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-281579 that it wasn't a false positive. There was really a path where it wasn't set to a value.

            The "recent implementations of MemorySanitizer have added a caller-site check" that Marko mentioned in MDEV-33478 which is why its new.

            If MSAN caught it then silencing its warning an allowing valgrind which assumedly missed it isn't a good solution.

            danblack Daniel Black added a comment - psergei showed in https://jira.mariadb.org/browse/MDEV-33478?focusedCommentId=281579&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-281579 that it wasn't a false positive. There was really a path where it wasn't set to a value. The "recent implementations of MemorySanitizer have added a caller-site check" that Marko mentioned in MDEV-33478 which is why its new. If MSAN caught it then silencing its warning an allowing valgrind which assumedly missed it isn't a good solution.
            oleg.smirnov Oleg Smirnov added a comment -

            danblack, let me try to explain the rationale.

            We did not initialize not_null_tables_cache, firstly, because we didn't want to do useless work during the construction and secondly, to be able to catch accessing uninitialized not_null_tables_cache with memory sanitizers. If we set it to 0 during the construction, we lose this ability.

            However, new MSAN issues errors not only on accessing uninitialized variables, but also on copying of uninitialized values. We don't see such copying as problem. If neither original Item_func object accesses not_null_tables_cache nor its copy does, why would we want to see those MSAN errors?

            At the same time, Valgrind is smarter than MSAN in this sense and it does not complain on copying of uninitialized values until they are accessed. That is why we decided to silence only MSAN leaving Valgrind to do its work.

            Does it sound reasonable?

            oleg.smirnov Oleg Smirnov added a comment - danblack , let me try to explain the rationale. We did not initialize not_null_tables_cache, firstly, because we didn't want to do useless work during the construction and secondly, to be able to catch accessing uninitialized not_null_tables_cache with memory sanitizers. If we set it to 0 during the construction, we lose this ability. However, new MSAN issues errors not only on accessing uninitialized variables, but also on copying of uninitialized values. We don't see such copying as problem. If neither original Item_func object accesses not_null_tables_cache nor its copy does, why would we want to see those MSAN errors? At the same time, Valgrind is smarter than MSAN in this sense and it does not complain on copying of uninitialized values until they are accessed. That is why we decided to silence only MSAN leaving Valgrind to do its work. Does it sound reasonable?
            danblack Daniel Black added a comment -

            I understand the will not to initialize on construction because there was supposed to be always a path to initialize it. Your's and psergei shows there was other locations to initialize it, but decided against it for one reason or another. A constructor initialization was almost accepted.

            The MSAN showed that the copy was accessing not_null_tables_cache uninitialized in the function Item_func::not_null_tables very clearly in:

            10.6 53c6c823dc7cafefffdc93c79661cfb146ff8641 MDEV-33478

            #0  0x000055c776594018 in __msan_warning_with_origin_noreturn ()
            #1  0x000055c77688bae2 in Item_func::not_null_tables (this=<optimized out>) at /mariadb/10.6/sql/item_func.cc:624
            #2  0x000055c7767b42d7 in Item_cond::eval_not_null_tables (this=<optimized out>, opt_arg=<optimized out>) at /mariadb/10.6/sql/item_cmpfunc.cc:5187
            #3  0x000055c7767b3708 in Item_cond::fix_fields (this=0x711000075970, thd=0x72b00004d018, ref=<optimized out>) at /mariadb/10.6/sql/item_cmpfunc.cc:5078
            #4  0x000055c777218e1c in make_cond_for_table_from_pred (thd=0x72b00004d018, root_cond=0x711000075470, cond=0x711000075470, tables=13835058055282163713, used_table=used_table@entry=1, 
                join_tab_idx_arg=join_tab_idx_arg@entry=0, exclude_expensive_cond=false, retain_ref_cond=<optimized out>, is_top_and_level=<optimized out>) at /mariadb/10.6/sql/sql_select.cc:23938
            

            Saying valgrind is smarter because it doesn't error is just false. If fact that valgrind didn't error is a deficiency in its capability because it should have caught the same error.

            So the proposed current patch adds a VALGRIND_MEM_UNDEFINED so it will error at the same location when valgrind gets the capability? Its just setting a trap for the future.

            If there's parts in the the optimizer that invalidate not_null_table_cache then call MEM_UNDEFINED at those locations like Item_func::quick_fix_field() (from PR description) and let all memory sanitizers find the errors to the best of their capability.

            danblack Daniel Black added a comment - I understand the will not to initialize on construction because there was supposed to be always a path to initialize it. Your's and psergei shows there was other locations to initialize it, but decided against it for one reason or another. A constructor initialization was almost accepted. The MSAN showed that the copy was accessing not_null_tables_cache uninitialized in the function Item_func::not_null_tables very clearly in: 10.6 53c6c823dc7cafefffdc93c79661cfb146ff8641 MDEV-33478 #0 0x000055c776594018 in __msan_warning_with_origin_noreturn () #1 0x000055c77688bae2 in Item_func::not_null_tables (this=<optimized out>) at /mariadb/10.6/sql/item_func.cc:624 #2 0x000055c7767b42d7 in Item_cond::eval_not_null_tables (this=<optimized out>, opt_arg=<optimized out>) at /mariadb/10.6/sql/item_cmpfunc.cc:5187 #3 0x000055c7767b3708 in Item_cond::fix_fields (this=0x711000075970, thd=0x72b00004d018, ref=<optimized out>) at /mariadb/10.6/sql/item_cmpfunc.cc:5078 #4 0x000055c777218e1c in make_cond_for_table_from_pred (thd=0x72b00004d018, root_cond=0x711000075470, cond=0x711000075470, tables=13835058055282163713, used_table=used_table@entry=1, join_tab_idx_arg=join_tab_idx_arg@entry=0, exclude_expensive_cond=false, retain_ref_cond=<optimized out>, is_top_and_level=<optimized out>) at /mariadb/10.6/sql/sql_select.cc:23938 Saying valgrind is smarter because it doesn't error is just false. If fact that valgrind didn't error is a deficiency in its capability because it should have caught the same error. So the proposed current patch adds a VALGRIND_MEM_UNDEFINED so it will error at the same location when valgrind gets the capability? Its just setting a trap for the future. If there's parts in the the optimizer that invalidate not_null_table_cache then call MEM_UNDEFINED at those locations like Item_func::quick_fix_field() (from PR description) and let all memory sanitizers find the errors to the best of their capability.

            People

              monty Michael Widenius
              oleg.smirnov Oleg Smirnov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.