[MDEV-9722] mysqld.exe hangs after upgrade Created: 2016-03-14  Updated: 2017-08-24  Resolved: 2017-07-12

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Aria
Affects Version/s: 5.5.48
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Eid Badr Assignee: Oleksandr Byelkin
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

Windows 7, 64bit


Attachments: File mysqld.001     File mysqld.002     File mysqld.003     File mysqld.004     File mysqld.005     File mysqld.006     File mysqld.007     File mysqld.008     File mysqld.009     File mysqld.010     File mysqld.011     File mysqld.012     File mysqld.013     File mysqld.014     File mysqld.015     File mysqld.crc     Zip Archive mysqld.zip     Text File windbg_output.txt     Text File windbg_output2_1b20_2016-05-20_19-33-35-246.txt    
Issue Links:
Duplicate
is duplicated by MDEV-13635 MariaDB Server 10.2.6 hangs up during... Open

 Description   

After upgrading from 5.5.47 to 5.5.48 MariaDB Server hangs.

When our application hangs connecting with MariaDB using MySQL Workbench is not possible anymore.

Our own application hangs waiting for a response from MariaDB but it doesn't get anything. After 1 hour of waiting it gets a timeout.

Attached is the minidump from MariaDB. (mysqld.001 - mysqld.015 --> mysqld.dmp)

And here is the stacktrace from our application:

14  Id: d4c.d54 Suspend: 0 Teb: 00007ff6`86286000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`02e4ec58 00007fff`48f9298f ntdll!ZwWaitForSingleObject+0xa
00000000`02e4ec60 00007fff`48f96a08 mswsock!SockWaitForSingleObject+0x152
00000000`02e4ece0 00007fff`4ad82707 mswsock!WSPRecv+0x622
*** WARNING: Unable to verify checksum for libmariadb.dll
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for libmariadb.dll - 
00000000`02e4ede0 00007fff`2d2c9996 ws2_32!recv+0x197
00000000`02e4ee80 00007fff`2d2c9a58 libmariadb!mysql_ssl_set+0x8059
00000000`02e4eec0 00007fff`2d2ca36e libmariadb!mysql_ssl_set+0x811b
00000000`02e4eef0 00007fff`2d2ca49d libmariadb!mysql_ssl_set+0x8a31
00000000`02e4ef30 00007fff`2d2d1212 libmariadb!mysql_ssl_set+0x8b60
00000000`02e4ef80 00007fff`2d2d2b9b libmariadb!mysql_ssl_set+0xf8d5
*** WARNING: Unable to verify checksum for sqlapi.dll
00000000`02e4efb0 00007fff`3006bf18 libmariadb!mysql_ssl_set+0x1125e
00000000`02e4eff0 00007fff`3001ce16 sqlapi!ImyCursor::Execute+0x7f8
00000000`02e4f160 00000000`60faa03e sqlapi!SACommand::Execute+0x56
00000000`02e4f1d0 00007ff6`86f2c381 dllsqlapiext!C_SACommand::Execute+0x7e [e:\ziscvs_b5810_x64\all\all_os\sqlapiext\c_sacommand.cpp @ 152]
00000000`02e4f240 00007ff6`869fec08 zis_data!db_select_zis_object_group_def_which_contains_zisoid+0x111 [e:\ziscvs_b5810_x64\all\all_os\datenbank\dbzisobjectgroupdef.cpp @ 2140]
00000000`02e4f350 00007ff6`868720c9 zis_data!C_DBQuery::GetZISObjectGroupDefWhichContainsZISOid+0x48 [e:\ziscvs_b5810_x64\all\all_os\data\c_dbquery.cpp @ 555]
00000000`02e4f3f0 00007ff6`86a99490 zis_data!C_DataView::DeleteZISOidFromAllZISObjectGroupDefs+0x79 [e:\ziscvs_b5810_x64\all\all_os\data\c_dataview.cpp @ 1346]
00000000`02e4f4e0 00007ff6`86839265 zis_data!C_SessionDefDataView::GetInsertDBCommandList+0x190 [e:\ziscvs_b5810_x64\all\all_os\data\c_sessiondefdataview.cpp @ 71]
00000000`02e4f600 00007ff6`86aa50bf zis_data!C_SessionManager::InsertSessionIntoDB+0x1c5 [e:\ziscvs_b5810_x64\all\all_os\data\c_sessionmanager.cpp @ 2743]
00000000`02e4f740 00007ff6`86aa550f zis_data!C_SessionManagerWorker::CommandInsertSessionIntoDB+0x5f [e:\ziscvs_b5810_x64\all\all_os\data\c_sessionmanagerworker.cpp @ 216]
00000000`02e4f850 00007ff6`86aa614e zis_data!C_SessionManagerWorker::ExecuteCommand+0x5f [e:\ziscvs_b5810_x64\all\all_os\data\c_sessionmanagerworker.cpp @ 187]
00000000`02e4f930 00007ff6`86b4601c zis_data!C_SessionManagerWorker::Run+0x2fe [e:\ziscvs_b5810_x64\all\all_os\data\c_sessionmanagerworker.cpp @ 127]
00000000`02e4fad0 00007fff`334fe5e1 zis_data!C_ACETask<ACE_Refcounted_Auto_Ptr_With_Single_Mutex<C_MigrateEUStatusTablesCommand,ACE_Thread_Mutex>,long>::svc+0x11c [e:\ziscvs_b5810_x64\all\all_os\tools\c_acetask.h @ 1214]
00000000`02e4fb60 00007fff`334feec1 ACE!ACE_Task_Base::svc_run+0x41 [e:\ziscvs_b5810_x64\all\vendor\ace_wrappers\ace\task.cpp @ 275]
00000000`02e4fb90 00007fff`334fee14 ACE!ACE_Thread_Adapter::invoke_i+0x71 [e:\ziscvs_b5810_x64\all\vendor\ace_wrappers\ace\thread_adapter.cpp @ 161]
00000000`02e4fbe0 00000000`65111d9f ACE!ACE_Thread_Adapter::invoke+0xd4 [e:\ziscvs_b5810_x64\all\vendor\ace_wrappers\ace\thread_adapter.cpp @ 96]
00000000`02e4fc20 00000000`65111e3b msvcr100!_callthreadstartex+0x17
00000000`02e4fc50 00007fff`4a9b13d2 msvcr100!_threadstartex+0x7f
00000000`02e4fc80 00007fff`4c645454 kernel32!BaseThreadInitThunk+0x22
00000000`02e4fcb0 00000000`00000000 ntdll!RtlUserThreadStart+0x34



 Comments   
Comment by Vladislav Vaintroub [ 2016-03-15 ]

attaching mysqld.dmp, zipped, complete

Comment by Vladislav Vaintroub [ 2016-03-15 ]

Attached output of Windbg. it looks like almost 400 threads are waiting, and there is some kind of deadlock.
most of the threads wait for LOCK_open, in open_table, close_tables and similar places, one thread however that is somewhere in "SHOW CREATE" is waiting on condition variable and is probably holding the lock , that others wait for

mysqld!pthread_cond_timedwait
mysqld!translog_flush_wait_for_end
mysqld!translog_flush
mysqld!pagecache_fwrite
mysqld!flush_cached_blocks
mysqld!flush_pagecache_blocks_int
mysqld!flush_pagecache_blocks_with_filter
mysqld!_ma_once_end_block_record
mysqld!maria_close
mysqld!closefrm
mysqld!intern_close_table
mysqld!free_cache_entry
mysqld!close_thread_table
mysqld!close_open_tables
mysqld!close_thread_tables
mysqld_show_create

Comment by Elena Stepanova [ 2016-03-16 ]

eidbadr, as I understand, the problem happens regularly for you? Could you maybe set up a monitor for the processlist (periodic execution of SHOW FULL PROCESSLIST, so we at least see where the problem starts?
It's better if the monitoring is done by a persistent connection, because new ones will be locked out, but if it's difficult, set it up any way that is suitable for you.

Comment by Elena Stepanova [ 2016-04-14 ]

wlad, do you think you would be able to investigate it further based on the information that you already have?

Comment by Elena Stepanova [ 2016-04-14 ]

wlad, do you think you'd be able to investigate it further based on the information that you already have?

Comment by Vladislav Vaintroub [ 2016-04-15 ]

We actually made some progress in investigation after a session with sanja couple of weeks back. He could remember what exactly happened here. IIRC the conclusion was that there should not have been a "wait" in this thread because current LSN is higher, but it was not obvious how waiting for past LSN was possible.

Comment by Eid Badr [ 2016-05-17 ]

The problem is still present in version 5.5.49.
Here is a fulldump
https://webapps.leutek.de/index.php/s/6JjLnwJLAv0Jb13
Password 123456, link expires on 24.05.2016

Due to deadlines and other work, we have not yet implemented the processlist monitor. Is this still necessary?

Comment by Vladislav Vaintroub [ 2016-05-19 ]

eidbadr could you describe the workload you run ?
I see selects, inserts, deletes, "show create table" in https://jira.mariadb.org/secure/attachment/41940/windbg_output.txt , but maybe you can give a better idea about what is running there usually.

Comment by Vladislav Vaintroub [ 2016-05-20 ]

windbg_output2_1b20_2016-05-20_19-33-35-246.txt Attaching another windbg dump of all threads. hang on the same place basically, waiting for lsn to be flushed.

2 threads are waiting for different LSN to be flushed (one thread for 30082577968, another one for 30082578109). All other threads seems to hang waiting for LOCK_open

 2      mysqld.exe!pthread_cond_timedwait(pthread_cond_t * cond=0x0000000000000000, _RTL_CRITICAL_SECTION * mutex=0x00007ff7d15b6da5, const timespec * abstime=0x00000055a7159000)
 3      mysqld.exe!translog_flush_wait_for_end(__int64 lsn=30082577968)
 4      mysqld.exe!translog_flush(__int64 lsn=367408970136)
 5      mysqld.exe!ma_commit(st_ma_transaction * trn=0x00000007010fb630)
 6      mysqld.exe!ha_maria::implicit_commit(THD * thd=0x00000055a6cfd700, bool new_trn=false)
 7      mysqld.exe!mysql_execute_command(THD * thd=0x00000055a6cfd790)
 8      mysqld.exe!mysql_parse(THD * thd=0x00000055a6cfae70, char * rawbuf=0x00000055a82bea40, unsigned 
 

and

 5      mysqld.exe!translog_flush_wait_for_end(__int64 lsn=30082578109)
 6      mysqld.exe!translog_flush(__int64 lsn=0)
 7      mysqld.exe!pagecache_fwrite(st_pagecache * pagecache=0x000000546ca7e310, st_pagecache_file * filedesc=0x00000055a7cb98a0, unsigned char * buffer=0x0000000000000001, unsigned __int64 pageno=36, pagecache_page_type type=PAGECACHE_PLAIN_PAGE, int flags=36)
 8      mysqld.exe!flush_cached_blocks(st_pagecache * pagecache=0x0000000000000002, st_pagecache_file * file=0x00000055adf367f8, st_pagecache_block_link * * cache=0x0000000000000000, st_pagecache_block_link * * end=0x00007ff7ffffffff, flush_type type=FLUSH_RELEASE, int * first_errno=0x0000000000000000)
 9      mysqld.exe!flush_pagecache_blocks_int(st_pagecache * pagecache=0x00007ff7d2712b60, st_pagecache_file * file=0x0000000000000001, flush_type type=-764335264, pagecache_flush_filter_result (pagecache_page_type, unsigned __int64, __int64, void *) * filter=0x0000000000000000, void * filter_arg=0x0000000000000000)
 10     mysqld.exe!flush_pagecache_blocks_with_filter(st_pagecache * pagecache=0x00000055adf35ed0, st_pagecache_file * file=0x0000000000000000, flush_type type=FLUSH_IGNORE_CHANGED, pagecache_flush_filter_result (pagecache_page_type, unsigned __int64, __int64, void *) * filter=0x0000000000000000, void * filter_arg=0x0000000000000000)
 11     mysqld.exe!_ma_once_end_block_record(st_maria_share * share=0x0000000000000002)
 12     mysqld.exe!maria_close(st_maria_handler * info=0x00000055a7cbda00)
 13     mysqld.exe!closefrm(TABLE * table=0x0000005599e05b10, bool free_share=128)
 14     mysqld.exe!intern_close_table(TABLE * table=0x000000554bcc37ee)
 15     mysqld.exe!free_cache_entry(TABLE * table=0x000000554bcc37ee)
 16     mysqld.exe!open_table(THD * thd=0x00007ff7d25fa850, TABLE_LIST * table_list=0x00007ff7d17bec86, st_mem_root * mem_root=0x0000000000000007, Open_table_context * ot_ctx=0x00007ff7d194a70c)
 17     mysqld.exe!open_and_process_table(THD * thd=0x00000055979ff700, LEX * lex=0x00000055a7cbe2d0, TABLE_LIST * tables=0x0000000000000000, unsigned int * counter=0x00000055a103c998, unsigned int flags=0, Prelocking_strategy * prelocking_strategy=0x00000055a7cbe2d0, bool has_prelocking_list=false, Open_table_context * ot_ctx=0x00000055a7cbe110, st_mem_root * new_frm_mem=0x00000055a7cbe140)
 18     mysqld.exe!open_tables(THD * thd=0x00007ff700000004, TABLE_LIST * * start=0x00007ff700000000, unsigned int * counter=0x00000055a7cbe230, unsigned int flags=0, Prelocking_strategy * prelocking_strategy=0x00000055a7cbe2d0)
 19     mysqld.exe!open_and_lock_tables(THD * thd=0x00007ff700000001, TABLE_LIST * tables=0x00000055a103c998, bool derived=false, unsigned int flags=2701379601, Prelocking_strategy * prelocking_strategy=0x00000055a7cbe2d0)
 20     mysqld.exe!mysql_insert(THD * thd=0x00000055979fea00, TABLE_LIST * table_list=0x00000055a7cbe580, List<Item> & fields={...}, List<List<Item> > & values_list=<struct at NULL>, List<Item> & update_fields={...}, List<Item> & update_values={...}, enum_duplicates duplic=DUP_ERROR, bool ignore=false)
 21     mysqld.exe!mysql_execute_command(THD * thd=0x00000055979fea70)

Here is the log_descriptor struct with relevant info

>Debug.Print log_descriptor

{pagecache=0x00007ff7d27134e0 {mysqld.exe!st_pagecache maria_log_pagecache_var} {mem_size=2097152 min_warm_blocks=...} ...}
    pagecache: 0x00007ff7d27134e0 {mysqld.exe!st_pagecache maria_log_pagecache_var} {mem_size=2097152 min_warm_blocks=...}
    flags: 0
    open_flags: 32770
    log_file_max_size: 1073741824
    server_version: 50549
    server_id: 1
    buffer_capacity_chunk_2: 1047552
    half_buffer_capacity_chunk_2: 523776
    page_overhead: 7
    page_capacity_chunk_2: 8184
    directory: 0x00007ff7d1ef9c58 "d:\\leutek\\mariadb\\data\\"
    open_files: {buffer=0x00000054545969b0 "ШYTT" elements=1 max_element=10 ...}
    max_file: 7
    min_file: 7
    open_files_lock: {m_rwlock={srwlock={Ptr=0x0000000000000000 } have_exclusive_srwlock=0 lock={DebugInfo=0x0000000000000000 <NULL> ...} ...} ...}
    directory_fd: -1
    buffers: 0x00007ff7d1ef9f58 {{buffer=0x00007ff7d1ef9f58 "w\x4" last_lsn=30082576829 prev_last_lsn=30082576007 ...}, ...}
    dirty_buffer_mask: 8 '\b'
    dirty_buffer_mask_lock: {m_mutex={DebugInfo=0xffffffffffffffff {Type=??? CreatorBackTraceIndex=??? CriticalSection=??? ...} ...} ...}
    horizon: 30082578112
    bc: {ptr=0x00007ff7d21fb9f0 }
    max_lsn: 30082578109
    flushed: 30082575408
    sent_to_disk: 30082578109
    log_start: 30073560955
    previous_flush_horizon: 30082574187
    in_buffers_only: 30082572288
    sent_to_disk_lock: {m_mutex={DebugInfo=0xffffffffffffffff {Type=??? CreatorBackTraceIndex=??? CriticalSection=??? ...} ...} ...}
    log_flush_lock: {m_mutex={DebugInfo=0x00000055c3a79830 {Type=0 CreatorBackTraceIndex=0 CriticalSection=mysqld.exe!0x00007ff7d26faa58 {...} ...} ...} ...}
    log_flush_cond: {m_cond={native_cond={Ptr=0x00000055a7cb9270 } waiting=2815136368 lock_waiting={DebugInfo=0x0000000000000000 <NULL> ...} ...} ...}
    new_goal_cond: {m_cond={native_cond={Ptr=0x0000000000000000 } waiting=0 lock_waiting={DebugInfo=0x0000000000000000 <NULL> ...} ...} ...}
    file_header_lock: {m_mutex={DebugInfo=0xffffffffffffffff {Type=??? CreatorBackTraceIndex=??? CriticalSection=??? ...} ...} ...}
    unfinished_files_lock: {m_mutex={DebugInfo=0xffffffffffffffff {Type=??? CreatorBackTraceIndex=??? CriticalSection=??? ...} ...} ...}
    unfinished_files: {buffer=0x0000005454596b90 "" elements=0 max_element=10 ...}
    min_need_file: 7
    min_file_number: 7
    purger_lock: {m_mutex={DebugInfo=0xffffffffffffffff {Type=??? CreatorBackTraceIndex=??? CriticalSection=??? ...} ...} ...}
    last_lsn_checked: 0
    is_everything_flushed: 1 '\x1'
    flush_in_progress: 0 '\0'
    flush_no: 36608
    next_pass_max_lsn: 0
    max_lsn_requester: 9520

Comment by Vladislav Vaintroub [ 2016-05-20 ]

Possibly related MDEV-689

Comment by Sergei Golubchik [ 2017-06-26 ]

Still an issue?

Comment by Vladislav Vaintroub [ 2017-08-24 ]

Yes, MDEV-13635

Generated at Thu Feb 08 07:36:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.