Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 5.1.67, 5.2.14, 5.3.12, 5.5.36, 10.0.9
    • 5.5.37, 10.0.10
    • None
    • None
    • Windows Server 2012 (only?)

    Description

      As reported by Elena: mysqld.exe crashes on shutdown on Windows Server 2012. Maybe, only the debug build is affected.

      The stacktrace looks like this:

       	mysqld.exe!_db_enter_(const char * _func_, const char * _file_, unsigned int _line_, _db_stack_frame_ * _stack_frame_)  Line 1101 + 0x5 bytes	C
      >	mysqld.exe!my_free(void * ptr)  Line 209	C
       	mysqld.exe!delete_dynamic(st_dynamic_array * array)  Line 302	C
       	mysqld.exe!cleanup_instrument_config()  Line 238	C++
       	mysqld.exe!cleanup_performance_schema()  Line 165	C++
       	mysqld.exe!shutdown_performance_schema()  Line 209	C++
       	mysqld.exe!mysqld_exit(int exit_code)  Line 1968	C++
       	mysqld.exe!unireg_abort(int exit_code)  Line 1948	C++
       	mysqld.exe!win_main(int argc, char * * argv)  Line 5441	C++
       	mysqld.exe!mysql_service(void * p)  Line 5560	C++
       	mysqld.exe!mysqld_main(int argc, char * * argv)  Line 5753	C++
       	mysqld.exe!main(int argc, char * * argv)  Line 26	C++
       	mysqld.exe!__tmainCRTStartup()  Line 278 + 0x19 bytes	C
       	mysqld.exe!mainCRTStartup()  Line 189	C

      CORRECTION: forgot the first frames in the stack. They are like this:

       	vrfcore.dll!000007fdca3537ed() 	
       	[Frames below may be incorrect and/or missing, no symbols loaded for vrfcore.dll]	
       	vfbasics.dll!000007fdca2ea777() 	
      >	mysqld.exe!code_state()  Line 345	C
       	mysqld.exe!_db_enter_(const char * _func_, const char * _file_, unsigned int _line_, _db_stack_frame_ * _stack_frame_)  Line 1101 + 0x5 bytes	C
       	mysqld.exe!my_free(void * ptr)  Line 209	C

      Attachments

        Activity

          Debugging, I see that the crash happens inside this call:

          pthread_mutex_init(&THR_LOCK_dbug, NULL);

          pthread_mutex_init translates to InitializeCriticalSection on Windows. InitializeCriticalSection only requires that valid memory is passed to it (which is true).

          psergei Sergei Petrunia added a comment - Debugging, I see that the crash happens inside this call: pthread_mutex_init(&THR_LOCK_dbug, NULL); pthread_mutex_init translates to InitializeCriticalSection on Windows. InitializeCriticalSection only requires that valid memory is passed to it (which is true).

          My guess is that we're trying to initialize another critical section where the first critical section is already initialized. MSDN mentions that CRITICAL_SECTION objects cannot be moved in memory, so attempt to initialize one over another may be considered an invalid operation.

          psergei Sergei Petrunia added a comment - My guess is that we're trying to initialize another critical section where the first critical section is already initialized. MSDN mentions that CRITICAL_SECTION objects cannot be moved in memory, so attempt to initialize one over another may be considered an invalid operation.

          The following patch makes the crash go away:

          === modified file 'dbug/dbug.c'
          --- dbug/dbug.c 2013-11-20 11:05:39 +0000
          +++ dbug/dbug.c 2014-03-20 13:45:44 +0000
          @@ -342,6 +342,7 @@ static CODE_STATE *code_state(void)
               sstdout->file= stdout;
               sstderr->file= stderr;
               pthread_mutex_init(&THR_LOCK_dbug, NULL);
          +       fprintf(stderr, "psergey: initing THR_LOCK_dbug\n");
               bzero(&init_settings, sizeof(init_settings));
               init_settings.out_file= sstderr;
               init_settings.flags=OPEN_APPEND;
          @@ -1642,6 +1643,9 @@ void _db_end_()
           
             cs->stack= &init_settings;
             FreeState(cs, 0);
          +  //psergey:
          +  fprintf(stderr, "psergey: freeing THR_LOCK_dbug\n");
          +  pthread_mutex_destroy(&THR_LOCK_dbug);
             init_done= 0;
           }

          When running with the patch, I see:

          psergey: initing THR_LOCK_dbug

          (standard messages about MariaDB startup)

          psergey: freeing THR_LOCK_dbug
          psergey: initing THR_LOCK_dbug

          psergei Sergei Petrunia added a comment - The following patch makes the crash go away: === modified file 'dbug/dbug.c' --- dbug/dbug.c 2013-11-20 11:05:39 +0000 +++ dbug/dbug.c 2014-03-20 13:45:44 +0000 @@ -342,6 +342,7 @@ static CODE_STATE *code_state(void) sstdout->file= stdout; sstderr->file= stderr; pthread_mutex_init(&THR_LOCK_dbug, NULL); + fprintf(stderr, "psergey: initing THR_LOCK_dbug\n"); bzero(&init_settings, sizeof(init_settings)); init_settings.out_file= sstderr; init_settings.flags=OPEN_APPEND; @@ -1642,6 +1643,9 @@ void _db_end_() cs->stack= &init_settings; FreeState(cs, 0); + //psergey: + fprintf(stderr, "psergey: freeing THR_LOCK_dbug\n"); + pthread_mutex_destroy(&THR_LOCK_dbug); init_done= 0; } When running with the patch, I see: psergey: initing THR_LOCK_dbug (standard messages about MariaDB startup) psergey: freeing THR_LOCK_dbug psergey: initing THR_LOCK_dbug

          So, it could be that this particular Windows Server 2012 machine started being picky about programs placing one CRITICAL_SECTION object over another.

          psergei Sergei Petrunia added a comment - So, it could be that this particular Windows Server 2012 machine started being picky about programs placing one CRITICAL_SECTION object over another.

          As Sergei found out, the machine had Application Verifier set up for mysqld.exe. It made the machine being picky about this critical section specifics.

          The crash only happens under verifier on debug builds, both 5.5 and 10.0. I don't know whether the verifier points out at a real problem here – if it does, then it should probably be fixed. If it's just the verifier's whim which does not reveal a code flaw, I suppose it can be left as is.

          How it happened:
          Application verifier is used in buildbot tests on this machine.
          We turn off appverif for mysqld.exe at the first test step, as the precaution measure, then turn it on for one test run on a non-debug build, and immediately turn it off again when the test finishes, and turn it off once again at the next step before collecting the data. So, it should always be off except for a single test run.
          Apparently, the server got rebooted during this very test run, as it happens sometimes with Windows machines and their critical upgrades. It turns out that the appverif configuration is sticky – once turned on, it stays on until it is explicitly turned off. So, once it happened, the server started crashing on the build step, when bootstrap is run and the initial data is created; when buildbot fails on a build step, it doesn't go further, so it never reached the first test step when the verifier would be turned off. I will add switching it off at the very beginning of the factory as another precaution.

          elenst Elena Stepanova added a comment - As Sergei found out, the machine had Application Verifier set up for mysqld.exe. It made the machine being picky about this critical section specifics. The crash only happens under verifier on debug builds, both 5.5 and 10.0. I don't know whether the verifier points out at a real problem here – if it does, then it should probably be fixed. If it's just the verifier's whim which does not reveal a code flaw, I suppose it can be left as is. How it happened: Application verifier is used in buildbot tests on this machine. We turn off appverif for mysqld.exe at the first test step, as the precaution measure, then turn it on for one test run on a non-debug build, and immediately turn it off again when the test finishes, and turn it off once again at the next step before collecting the data. So, it should always be off except for a single test run. Apparently, the server got rebooted during this very test run, as it happens sometimes with Windows machines and their critical upgrades. It turns out that the appverif configuration is sticky – once turned on, it stays on until it is explicitly turned off. So, once it happened, the server started crashing on the build step, when bootstrap is run and the initial data is created; when buildbot fails on a build step, it doesn't go further, so it never reached the first test step when the verifier would be turned off. I will add switching it off at the very beginning of the factory as another precaution.

          serg, please review the fix (assuming fprintfs will be deleted).

          psergei Sergei Petrunia added a comment - serg , please review the fix (assuming fprintfs will be deleted).

          People

            serg Sergei Golubchik
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.