Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12679

purge_sys_t::~purge_sys_t(): Assertion `latch.magic_n == 0' failed on --bootstrap (CMAKE_BUILD_TYPE=Debug)

Details

    Description

      10.2 at commit ce8ee7d90b4f7dd784e1d7587093e671b35303f0

      When compiled on Power with CMAKE_BUILD_TYPE=Debug, this fails on all test instigations.

       cmake ../mariadb-10.2 -DCMAKE_BUILD_TYPE=Debug  -DCMAKE_C_COMPILER=/opt/at10.0/bin/gcc -DCMAKE_CXX_COMPILER=/opt/at10.0/bin/g++ '-DCMAKE_C_FLAGS_RELWITHDEBINFO=-m64 -O3 -g -mtune=native' '-DCMAKE_CXX_FLAGS_RELWITHDEBINFO=-m64 -O3 -g -mtune=native' '-DCMAKE_C_FLAGS=-m64 -O3 -g -mtune=native' '-DCMAKE_CXX_FLAGS=-m64 -O3 -g -mtune=native' -DWITH_SSL=system -DCMAKE_INSTALL_PREFIX=/mysql/mariadb-upstream-Debug-10.0-at10.0-rc1  -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DWITHOUT_FEDERATED_STORAGE_ENGINE=1 -DWITHOUT_FEDERATEDX_STORAGE_ENGINE=1 -DWITHOUT_TOKUDB_STORAGE_ENGINE=1
      

      Same assertion fail with gcc-5 and gcc-6

      $ gcc-5 --version
      gcc-5 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
      $ gcc-6 --version
      gcc-6 (Ubuntu 6.3.0-12ubuntu1.1) 6.3.0 20170406
      

      (cd mysql-test/; ./mtr )
       
      2017-05-04 12:44:40 70367348770832 [Note] Plugin 'INNODB_SYS_VIRTUAL' is disabled.
      2017-05-04 12:44:40 70367348770832 [Note] Plugin 'INNODB_TABLESPACES_SCRUBBING' is disabled.
      mysqld: /home/danielgb/mariadb-10.2/storage/innobase/trx/trx0purge.cc:235: purge_sys_t::~purge_sys_t(): Assertion `latch.magic_n == 0' failed.
      170504 12:44:42 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.2.6-MariaDB-debug
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=0
      max_threads=153
      thread_count=0
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467287 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x49000
      /home/danielgb/build-mariadb-10.2-gcc/sql/mysqld(my_print_stacktrace+0x60)[0x457f4f60]
      /home/danielgb/build-mariadb-10.2-gcc/sql/mysqld(handle_fatal_signal+0x440)[0x44fc76e0]
      [0x3fffaccc04d8]
      /lib/powerpc64le-linux-gnu/libc.so.6(gsignal+0x40)[0x3fffac19edb0]
      /lib/powerpc64le-linux-gnu/libc.so.6(abort+0x2b4)[0x3fffac1a1314]
      /lib/powerpc64le-linux-gnu/libc.so.6(+0x34290)[0x3fffac194290]
      /lib/powerpc64le-linux-gnu/libc.so.6(__assert_fail+0x64)[0x3fffac194384]
      /home/danielgb/build-mariadb-10.2-gcc/sql/mysqld(+0xbb9d64)[0x453f9d64]
      /home/danielgb/build-mariadb-10.2-gcc/sql/mysqld(+0xb90318)[0x453d0318]
      /home/danielgb/build-mariadb-10.2-gcc/sql/mysqld(+0x9cb8ec)[0x4520b8ec]
      /home/danielgb/build-mariadb-10.2-gcc/sql/mysqld(_Z22ha_finalize_handlertonP13st_plugin_int+0x8c)[0x44fcb61c]
      /home/danielgb/build-mariadb-10.2-gcc/sql/mysqld(+0x54c4cc)[0x44d8c4cc]
      mysys/stacktrace.c:267(my_print_stacktrace)[0x44d94728]
      /home/danielgb/build-mariadb-10.2-gcc/sql/mysqld(_Z15plugin_shutdownv+0x5b4)[0x44d95de4]
      /home/danielgb/build-mariadb-10.2-gcc/sql/mysqld(+0x45fb60)[0x44c9fb60]
      trx/trx0purge.cc:235(purge_sys_t::~purge_sys_t())[0x44ca3890]
      /home/danielgb/build-mariadb-10.2-gcc/sql/mysqld(_Z11mysqld_mainiPPc+0x10b8)[0x44cadb58]
      srv/srv0start.cc:2834(innodb_shutdown())[0x44c827e8]
      /lib/powerpc64le-linux-gnu/libc.so.6(+0x2309c)[0x3fffac18309c]
      /lib/powerpc64le-linux-gnu/libc.so.6(__libc_start_main+0xb8)[0x3fffac183298]
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      Writing a core file
      

      mysqld: /home/danielgb/mariadb-10.2/storage/innobase/trx/trx0purge.cc:235: purge_sys_t::~purge_sys_t(): Assertion `latch.magic_n == 0' failed.
       
      Thread 1 "mysqld" received signal SIGABRT, Aborted.
      0x00003fffb745edb0 in __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:54
      54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
      (gdb) bt
      #0  0x00003fffb745edb0 in __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:54
      #1  0x00003fffb7461270 in __GI_abort () at abort.c:89
      #2  0x00003fffb7454290 in __assert_fail_base (fmt=<optimized out>, assertion=0x211bb570 "latch.magic_n == 0", 
          file=0x211bb500 "/home/danielgb/mariadb-10.2/storage/innobase/trx/trx0purge.cc", line=<optimized out>, function=<optimized out>) at assert.c:92
      #3  0x00003fffb7454384 in __GI___assert_fail (assertion=0x211bb570 "latch.magic_n == 0", file=0x211bb500 "/home/danielgb/mariadb-10.2/storage/innobase/trx/trx0purge.cc", 
          line=<optimized out>, function=0x211bab40 <purge_sys_t::~purge_sys_t()::__PRETTY_FUNCTION__> "purge_sys_t::~purge_sys_t()") at assert.c:101
      #4  0x0000000020bb9d64 in purge_sys_t::~purge_sys_t (this=<optimized out>, __in_chrg=<optimized out>) at /home/danielgb/mariadb-10.2/storage/innobase/trx/trx0purge.cc:235
      #5  0x0000000020b90318 in innodb_shutdown () at /home/danielgb/mariadb-10.2/storage/innobase/srv/srv0start.cc:2834
      #6  0x00000000209cb8ec in innobase_end (hton=<optimized out>, type=<optimized out>) at /home/danielgb/mariadb-10.2/storage/innobase/handler/ha_innodb.cc:4593
      #7  0x000000002078b61c in ha_finalize_handlerton (plugin=0x22103328) at /home/danielgb/mariadb-10.2/sql/handler.cc:451
      #8  0x000000002054c4cc in plugin_deinitialize (plugin=0x22103328, ref_check=true) at /home/danielgb/mariadb-10.2/sql/sql_plugin.cc:1209
      #9  0x0000000020554728 in reap_plugins () at /home/danielgb/mariadb-10.2/sql/sql_plugin.cc:1286
      #10 0x0000000020555de4 in plugin_shutdown () at /home/danielgb/mariadb-10.2/sql/sql_plugin.cc:1938
      #11 0x000000002045fb60 in clean_up (print_message=<optimized out>) at /home/danielgb/mariadb-10.2/sql/mysqld.cc:2201
      #12 0x0000000020463890 in unireg_abort (exit_code=<optimized out>) at /home/danielgb/mariadb-10.2/sql/mysqld.cc:2120
      #13 0x000000002046db58 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /home/danielgb/mariadb-10.2/sql/mysqld.cc:5947
      #14 0x00000000204427e8 in main (argc=<optimized out>, argv=<optimized out>) at /home/danielgb/mariadb-10.2/sql/main.cc:25
      

      Attachments

        Activity

          Thanks, danblack!
          Is it possible that your build is defining neither UNIV_DEBUG nor DBUG_OFF? In that case, the ut_d() would have no effect, but ut_ad() would map to assert(). In that case, this should fail every time. The problem would appear to be that rw_lock_free_func() is not calling the destructor (in the dump, magic_n is at RW_LOCK_MAGIC_N, which rw_lock_t::rw_lock_t() assigned it to) but the ut_ad() assertion in purge_sys_t::~purge_sys_t() expected that to happen.

          If the above guess is correct, then this should be a regression caused by MDEV-12269 (making UNIV_DEBUG depend on DBUG_OFF).

          marko Marko Mäkelä added a comment - Thanks, danblack ! Is it possible that your build is defining neither UNIV_DEBUG nor DBUG_OFF? In that case, the ut_d() would have no effect, but ut_ad() would map to assert(). In that case, this should fail every time. The problem would appear to be that rw_lock_free_func() is not calling the destructor (in the dump, magic_n is at RW_LOCK_MAGIC_N, which rw_lock_t::rw_lock_t() assigned it to) but the ut_ad() assertion in purge_sys_t::~purge_sys_t() expected that to happen. If the above guess is correct, then this should be a regression caused by MDEV-12269 (making UNIV_DEBUG depend on DBUG_OFF).

          I can reproduce this locally on a Debian GNU/Linux AMD64 system with the following CMake invocation:

          cmake .. -DCMAKE_BUILD_TYPE=Debug  -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ '-DCMAKE_C_FLAGS_RELWITHDEBINFO=-m64 -O3 -g -mtune=native' '-DCMAKE_CXX_FLAGS_RELWITHDEBINFO=-m64 -O3 -g -mtune=native' '-DCMAKE_C_FLAGS=-m64 -O3 -g -mtune=native' '-DCMAKE_CXX_FLAGS=-m64 -O3 -g -mtune=native' -DWITH_SSL=system -DWITHOUT_FEDERATED_STORAGE_ENGINE=1 -DWITHOUT_FEDERATEDX_STORAGE_ENGINE=1 -DWITHOUT_TOKUDB_STORAGE_ENGINE=1
          

          When using make VERBOSE=1, I see that the flags -DUNIV_DEBUG -DUNIV_SYNC_DEBUG are being defined and DBUG_OFF is not being defined.
          At the machine code level, I see that rw_lock_t::~rw_lock_t() is being invoked (it is inlined in rw_lock_free_func()) and that the assignment in the destructor is apparently being optimized away:

          	virtual ~rw_lock_t()
          	{
          		ut_ad(magic_n == RW_LOCK_MAGIC_N);
          		magic_n = 0;
          	}
          

          It looks like the correct action (for now) is to relax the debug assertion in purge_sys_t::~purge_sys_t():

          diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
          index b21ec75c3a6..38ffe0cfe7e 100644
          --- a/storage/innobase/trx/trx0purge.cc
          +++ b/storage/innobase/trx/trx0purge.cc
          @@ -231,8 +231,10 @@ purge_sys_t::~purge_sys_t()
           	view.close();
           	rw_lock_free(&latch);
           	/* rw_lock_free() already called latch.~rw_lock_t(); tame the
          -	debug assertions when the destructor will be called once more. */
          -	ut_ad(latch.magic_n == 0);
          +	debug assertions when the destructor will be called once more.
          +	Sometimes (maybe when compiled with -O3) rw_lock_t::~rw_lock_t()
          +	will not assign the magic_n=0; we tolerate either value here. */
          +	ut_ad(latch.magic_n == 0 || latch.magic_n == RW_LOCK_MAGIC_N);
           	ut_d(latch.magic_n = RW_LOCK_MAGIC_N);
           	mutex_free(&pq_mutex);
           	os_event_destroy(event);
          

          I do not know if this is a compiler bug or if C++ allows the assignment to be optimized away.

          marko Marko Mäkelä added a comment - I can reproduce this locally on a Debian GNU/Linux AMD64 system with the following CMake invocation: cmake .. -DCMAKE_BUILD_TYPE=Debug -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ '-DCMAKE_C_FLAGS_RELWITHDEBINFO=-m64 -O3 -g -mtune=native' '-DCMAKE_CXX_FLAGS_RELWITHDEBINFO=-m64 -O3 -g -mtune=native' '-DCMAKE_C_FLAGS=-m64 -O3 -g -mtune=native' '-DCMAKE_CXX_FLAGS=-m64 -O3 -g -mtune=native' -DWITH_SSL=system -DWITHOUT_FEDERATED_STORAGE_ENGINE=1 -DWITHOUT_FEDERATEDX_STORAGE_ENGINE=1 -DWITHOUT_TOKUDB_STORAGE_ENGINE=1 When using make VERBOSE=1, I see that the flags -DUNIV_DEBUG -DUNIV_SYNC_DEBUG are being defined and DBUG_OFF is not being defined. At the machine code level, I see that rw_lock_t::~rw_lock_t() is being invoked (it is inlined in rw_lock_free_func()) and that the assignment in the destructor is apparently being optimized away: virtual ~rw_lock_t() { ut_ad(magic_n == RW_LOCK_MAGIC_N); magic_n = 0; } It looks like the correct action (for now) is to relax the debug assertion in purge_sys_t::~purge_sys_t(): diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc index b21ec75c3a6..38ffe0cfe7e 100644 --- a/storage/innobase/trx/trx0purge.cc +++ b/storage/innobase/trx/trx0purge.cc @@ -231,8 +231,10 @@ purge_sys_t::~purge_sys_t() view.close(); rw_lock_free(&latch); /* rw_lock_free() already called latch.~rw_lock_t(); tame the - debug assertions when the destructor will be called once more. */ - ut_ad(latch.magic_n == 0); + debug assertions when the destructor will be called once more. + Sometimes (maybe when compiled with -O3) rw_lock_t::~rw_lock_t() + will not assign the magic_n=0; we tolerate either value here. */ + ut_ad(latch.magic_n == 0 || latch.magic_n == RW_LOCK_MAGIC_N); ut_d(latch.magic_n = RW_LOCK_MAGIC_N); mutex_free(&pq_mutex); os_event_destroy(event); I do not know if this is a compiler bug or if C++ allows the assignment to be optimized away.

          In my final patch, I decided to duplicate the assignment magic_n=0 in rw_lock_free_func() instead.

          marko Marko Mäkelä added a comment - In my final patch , I decided to duplicate the assignment magic_n=0 in rw_lock_free_func() instead.
          danblack Daniel Black added a comment -

          Well done marko and thanks. Accessing a objects members after an explicit destructor call does sound a bit suspicious.
          http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf 3.8 talks about an object lifetime ending on destructor and there after properties aren't valid.

          danblack Daniel Black added a comment - Well done marko and thanks. Accessing a objects members after an explicit destructor call does sound a bit suspicious. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf 3.8 talks about an object lifetime ending on destructor and there after properties aren't valid.

          danblack, thanks for reporting this!
          Ultimately, we should make rw_lock_t use proper constructors and destructors, instead of splitting some of the work to rw_lock_create_func() and rw_lock_free_func().

          marko Marko Mäkelä added a comment - danblack , thanks for reporting this! Ultimately, we should make rw_lock_t use proper constructors and destructors, instead of splitting some of the work to rw_lock_create_func() and rw_lock_free_func().

          People

            marko Marko Mäkelä
            danblack Daniel Black
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.