[MDEV-10291] main.bootstrap fails sporadically in buildbot on Windows Created: 2016-06-26  Updated: 2017-02-10  Resolved: 2017-02-10

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.2
Fix Version/s: 10.2.4

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Vladislav Vaintroub
Resolution: Fixed Votes: 0
Labels: 10.2-ga

Issue Links:
Blocks
blocks MDEV-7069 Fix buildbot failures in main server ... Stalled
Sprint: 10.2.2-1, 10.2.4-3

 Description   

http://buildbot.askmonty.org/buildbot/builders/winx64-packages/builds/392/steps/test/logs/stdio

main.bootstrap                           w4 [ fail ]
        Test ended at 2016-06-26 18:38:17
 
CURRENT_TEST: main.bootstrap
mysqltest: At line 45: command "$MYSQLD_BOOTSTRAP_CMD < $MYSQLTEST_VARDIR/tmp/long_query.sql >> $MYSQLTEST_VARDIR/tmp/bootstrap.log 2>&1" failed with wrong error: 255
 
The result from queries just before the failure was:
drop table if exists t1;
drop table t1;
drop table t1;
ERROR 42S02: Unknown table 'test.t1'
set @my_max_allowed_packet= @@max_allowed_packet;
set global max_allowed_packet=100*@@max_allowed_packet;
 
 - saving 'D:/winx64-packages/build/mysql-test/var/4/log/main.bootstrap/' to 'D:/winx64-packages/build/mysql-test/var/log/main.bootstrap/'
 - found 'mysqld.dmp' (0/5)
 
Retrying test main.bootstrap, attempt(2/3)...
 
worker[4] > Restart  - not started

It happened a few times recently.



 Comments   
Comment by Vladislav Vaintroub [ 2016-10-12 ]

Crash callstack (from a mysqld.dmp on buildbot)

 	ntdll.dll!RtlpWaitOnCriticalSection()	Unknown
 	ntdll.dll!RtlpEnterCriticalSectionContended()	Unknown
 	mysqld.exe!Statement_map::~Statement_map() Line 3963	C++
 	mysqld.exe!THD::~THD() Line 1790	C++
 	mysqld.exe!THD::`scalar deleting destructor'(unsigned int)	C++
>	mysqld.exe!do_handle_bootstrap(THD * thd=0x0000008c80cf6508) Line 1086	C++
 	mysqld.exe!handle_bootstrap(void * arg=0x0000008c80cf6508) Line 1066	C++
 	mysqld.exe!pthread_start(void * p=0x0000000000000000) Line 62	C
 	mysqld.exe!_callthreadstartex() Line 376	C
 	mysqld.exe!_threadstartex(void * ptd=0x0000008c80cfeff0) Line 354	C
 	kernel32.dll!BaseThreadInitThunk()	Unknown
 	ntdll.dll!RtlUserThreadStart()	Unknown

Mostly likely, there is a memory overwrite here that corrupts critical section. callstack shows RtlpEnterCriticalSectionContended, but this critical section cannot be contended, as there is only a single active thread.

Comment by Andrii Nikitin (Inactive) [ 2017-02-10 ]

I see at http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/2542/steps/test/logs/stdio that main thread is being aborted when crash happens:

mysqld!my_thread_global_end(void) [d:\winx64-debug\build\src\mysys\my_thr_init.c @ 203]
mysqld!my_end(int infoflag = 0n0) [d:\winx64-debug\build\src\mysys\my_init.c @ 210]
mysqld!mysqld_exit(int exit_code = 0n0) [d:\winx64-debug\build\src\sql\mysqld.cc @ 2149]
mysqld!unireg_abort(int exit_code = 0n0) [d:\winx64-debug\build\src\sql\mysqld.cc @ 2123]
mysqld!win_main(int argc = 0n54, char ** argv = 0x00000075`f9b15cf8) [d:\winx64-debug\build\src\sql\mysqld.cc @ 6018]
mysqld!mysql_service(void * p = 0x00000000`00000000) [d:\winx64-debug\build\src\sql\mysqld.cc @ 6156]
mysqld!mysqld_main(int argc = 0n54, char ** argv = 0x00000075`f9b03710) [d:\winx64-debug\build\src\sql\mysqld.cc @ 6349]

Edited: it looks that crash happens because mutex LOCK_prepared_stmt_count is referenced from ~Statement_map() of bootstrap thread is already destroyed in mysqld_exit()

Comment by Andrii Nikitin (Inactive) [ 2017-02-10 ]

It looks I can repeat the crash manually by injecting few Sleep() calls as below:

diff --git a/sql/mysqld.cc b/sql/mysqld.cc
index 23ac568d95..bc908740a4 100644
--- a/sql/mysqld.cc
+++ b/sql/mysqld.cc
@@ -2345,6 +2345,8 @@ static void clean_up_mutexes()
   mysql_rwlock_destroy(&LOCK_system_variables_hash);
   mysql_mutex_destroy(&LOCK_short_uuid_generator);
   mysql_mutex_destroy(&LOCK_prepared_stmt_count);
+  bzero(&LOCK_prepared_stmt_count, sizeof (LOCK_prepared_stmt_count));
+  Sleep(5000);
   mysql_mutex_destroy(&LOCK_error_messages);
   mysql_cond_destroy(&COND_thread_count);
   mysql_cond_destroy(&COND_thread_cache);
@@ -6384,6 +6386,7 @@ static void bootstrap(MYSQL_FILE *file)
     bootstrap_error=-1;
     DBUG_VOID_RETURN;
   }
+  Sleep(3000);
   /* Wait for thread to die */
   mysql_mutex_lock(&LOCK_thread_count);
   while (in_bootstrap)
diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc
index 5348245449..58f6cc7a4e 100644
--- a/sql/sql_parse.cc
+++ b/sql/sql_parse.cc
@@ -1085,6 +1085,7 @@ void do_handle_bootstrap(THD *thd)
 
 end:
   in_bootstrap= FALSE;
+  Sleep(4000);
   delete thd;
   mysql_mutex_lock(&LOCK_thread_count);
   mysql_cond_broadcast(&COND_thread_count);

In my understanding this may happen only on single (or very overloaded) CPU when do_handle_bootstrap() completes before main thread executes "while (in_bootstrap)"

Generated at Thu Feb 08 07:41:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.