Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6340

Mariadb 10.0.12 fatal "Lost connection" error w/ GCC 4.9 'Release' build; workaround ~ CFLAGS="-fno-delete-null-pointer-checks"

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.12
    • 10.0.13
    • None
    • None

    Description

      After a clean/new install of MariaDB 10.0.11, undertaking a completely NEW drush-install from clean Drupal v7.28 source, I get the following fatal error + crash:

          SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

      Attachments

        Issue Links

          Activity

            grantk GrantK added a comment - - edited

            gcc 4.9.1 is neither released, nor shipping with any distribution; GCC 4.9.0 is.

            is the decision, then, to simply ignore builds of RELEASE MariaDB being broken with RELEASE GCC, and kick the ball down the road to GCC 4.9.1, whenever it's released?

            How, exactly, do we RE-OPEN this?

            grantk GrantK added a comment - - edited gcc 4.9.1 is neither released, nor shipping with any distribution; GCC 4.9.0 is. is the decision, then, to simply ignore builds of RELEASE MariaDB being broken with RELEASE GCC, and kick the ball down the road to GCC 4.9.1, whenever it's released? How, exactly, do we RE-OPEN this?

            I've reopened it.

            But 4.9.0 is pretty much the bleeding edge, most distributions don't ship it (and, as you can see, they have good reasons not to). On the other hand, 4.9.1 is already in Mageia Cauldron (which is in the development stage and won't be declared stable anytime soon).

            I will try to see if we can change something in MariaDB to avoid this gcc bug. But given that it is a gcc bug, apparently, and all that I wrote above, this won't be a hight priority bug, sorry.

            serg Sergei Golubchik added a comment - I've reopened it. But 4.9.0 is pretty much the bleeding edge, most distributions don't ship it (and, as you can see, they have good reasons not to). On the other hand, 4.9.1 is already in Mageia Cauldron (which is in the development stage and won't be declared stable anytime soon). I will try to see if we can change something in MariaDB to avoid this gcc bug. But given that it is a gcc bug, apparently, and all that I wrote above, this won't be a hight priority bug, sorry.
            grantk GrantK added a comment -

            Can you provide a reference to the specific GCC bug that you suggest is fixed?

            In apparent reference to

            "Operational Notification – Changes in gcc Code Optimization Can Cause a Crash in BIND"
            https://kb.isc.org/article/AA-01167

            as pointed out by showaz in bind's #irc, the bind dev team posted a GCC bug here,

            "GCC 4.9 generates incorrect object code"
            https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61236,

            for which a workaround is the similar

            -fno-delete-null-pointer-checks

            @ GCC, that bug has been resolved as INVALID by the GCC team, and, as a result, the bind team committed fixes to their repository branches to address the crash and work around the optimization issue.

            In that bug report, it's glibc that's called into question, not gcc.

            Noting as posted above here, in the mariadb backtrace,

            ...
            /lib64/libpthread.so.0(+0x80db)[0x7fcf4ff230db]
            /lib64/libc.so.6(clone+0x6d)[0x7fcf4ebd390d]

            So, is it in fact GCC, as you've ascribed, or glibc/other, that's invovled with the MariaDB crashes?

            grantk GrantK added a comment - Can you provide a reference to the specific GCC bug that you suggest is fixed? In apparent reference to "Operational Notification – Changes in gcc Code Optimization Can Cause a Crash in BIND" https://kb.isc.org/article/AA-01167 as pointed out by showaz in bind's #irc, the bind dev team posted a GCC bug here, "GCC 4.9 generates incorrect object code" https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61236 , for which a workaround is the similar -fno-delete-null-pointer-checks @ GCC, that bug has been resolved as INVALID by the GCC team, and, as a result, the bind team committed fixes to their repository branches to address the crash and work around the optimization issue. In that bug report, it's glibc that's called into question, not gcc. Noting as posted above here, in the mariadb backtrace, ... /lib64/libpthread.so.0(+0x80db) [0x7fcf4ff230db] /lib64/libc.so.6(clone+0x6d) [0x7fcf4ebd390d] So, is it in fact GCC, as you've ascribed, or glibc/other, that's invovled with the MariaDB crashes?

            jplindst, please take a look at the following patch:

            === modified file 'storage/innobase/include/lock0lock.h'
            --- storage/innobase/include/lock0lock.h        2014-05-07 15:32:23 +0000
            +++ storage/innobase/include/lock0lock.h        2014-07-30 19:36:42 +0000
            @@ -277,31 +277,31 @@
             UNIV_INTERN
             dberr_t
             lock_rec_insert_check_and_lock(
             /*===========================*/
                    ulint           flags,  /*!< in: if BTR_NO_LOCKING_FLAG bit is
                                            set, does nothing */
                    const rec_t*    rec,    /*!< in: record after which to insert */
                    buf_block_t*    block,  /*!< in/out: buffer block of rec */
                    dict_index_t*   index,  /*!< in: index */
                    que_thr_t*      thr,    /*!< in: query thread */
                    mtr_t*          mtr,    /*!< in/out: mini-transaction */
                    ibool*          inherit)/*!< out: set to TRUE if the new
                                            inserted record maybe should inherit
                                            LOCK_GAP type locks from the successor
                                            record */
            -       __attribute__((nonnull, warn_unused_result));
            +       __attribute__((nonnull(2,3,4,6,7), warn_unused_result));
             /*********************************************************************//**

            (the same for xtradb, of course).

            Here's why: old declaration promises that thr can never be NULL, and gcc-4.9.0 trusts that and optimizes accordingly. But in fact, the function starts from

            lock_rec_insert_check_and_lock(
            /*===========================*/
                    ...
            	ibool*		inherit)
            {
                    ...
            	if (flags & BTR_NO_LOCKING_FLAG) {
            		return(DB_SUCCESS);
            	}
             
            	trx = thr_get_trx(thr);

            so when BTR_NO_LOCKING_FLAG is set, thr can be NULL (and it is NULL in this stack trace: btr_insert_on_non_leaf_level_func → btr_cur_optimistic_insert → btr_cur_ins_lock_and_undo → lock_rec_insert_check_and_lock). The patch fixes this by removing nonnull attribute for thr. Another solution would be to move the check for BTR_NO_LOCKING_FLAG out of the function and keep the nonnull attribute.

            serg Sergei Golubchik added a comment - jplindst , please take a look at the following patch: === modified file 'storage/innobase/include/lock0lock.h' --- storage/innobase/include/lock0lock.h 2014-05-07 15:32:23 +0000 +++ storage/innobase/include/lock0lock.h 2014-07-30 19:36:42 +0000 @@ -277,31 +277,31 @@ UNIV_INTERN dberr_t lock_rec_insert_check_and_lock( /*===========================*/ ulint flags, /*!< in: if BTR_NO_LOCKING_FLAG bit is set, does nothing */ const rec_t* rec, /*!< in: record after which to insert */ buf_block_t* block, /*!< in/out: buffer block of rec */ dict_index_t* index, /*!< in: index */ que_thr_t* thr, /*!< in: query thread */ mtr_t* mtr, /*!< in/out: mini-transaction */ ibool* inherit)/*!< out: set to TRUE if the new inserted record maybe should inherit LOCK_GAP type locks from the successor record */ - __attribute__((nonnull, warn_unused_result)); + __attribute__((nonnull(2,3,4,6,7), warn_unused_result)); /*********************************************************************//** (the same for xtradb, of course). Here's why: old declaration promises that thr can never be NULL, and gcc-4.9.0 trusts that and optimizes accordingly. But in fact, the function starts from lock_rec_insert_check_and_lock( /*===========================*/ ... ibool* inherit) { ... if (flags & BTR_NO_LOCKING_FLAG) { return(DB_SUCCESS); }   trx = thr_get_trx(thr); so when BTR_NO_LOCKING_FLAG is set, thr can be NULL (and it is NULL in this stack trace: btr_insert_on_non_leaf_level_func → btr_cur_optimistic_insert → btr_cur_ins_lock_and_undo → lock_rec_insert_check_and_lock). The patch fixes this by removing nonnull attribute for thr. Another solution would be to move the check for BTR_NO_LOCKING_FLAG out of the function and keep the nonnull attribute.

            Patch is corret, I just do not follow why bother to call this function at all if BTR_NO_LOCKING_FLAG is set. Removing the call(s) could need deeper fix.

            jplindst Jan Lindström (Inactive) added a comment - Patch is corret, I just do not follow why bother to call this function at all if BTR_NO_LOCKING_FLAG is set. Removing the call(s) could need deeper fix.

            People

              serg Sergei Golubchik
              grantk GrantK
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.