[MDEV-13025] Server hang at NO Errors in error logs Created: 2017-06-07 Updated: 2017-07-05 Resolved: 2017-07-05 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Platform Windows, Server |
| Affects Version/s: | 10.1.18 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Su, Jun-Ming | Assignee: | Vladislav Vaintroub |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Windows Server 2012 R2 x 2 with Microsoft Cluster Services (Active-Standby Mode) |
||
| Attachments: |
|
| Description |
|
I have a strange problem, that is my server sometimes (about 1 time per 2-3 weeks) hang, and they are: 1. Can not connect remotely. And then I reboot the active node in Microsoft Cluster Services, change active node to another Windows Server, and the service is back. I tried to find the error message at Windows Event service and error log, there was NO error logs for this problem. (only normally shutdown and startup mysqld) How can I solve this problem or find the root cause? |
| Comments |
| Comment by Vladislav Vaintroub [ 2017-06-07 ] | ||||||||||||||||||||||||||||||||||||||||||
|
if there is a hang, you can examine what the server is doing by I suggest to start with a) . I'm not sure which significance it has that you run in MS cluster environment. We did not do any specific testing for that here. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Su, Jun-Ming [ 2017-06-07 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Thanks, and how to do minidump at windows? I think b is not problem is that I can not login at DB server itself. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2017-06-07 ] | ||||||||||||||||||||||||||||||||||||||||||
|
you can take minidump with procdump https://technet.microsoft.com/en-us/sysinternals/dd996900.aspx | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Su, Jun-Ming [ 2017-06-08 ] | ||||||||||||||||||||||||||||||||||||||||||
|
OK, I will get minidump for mysqld at server hang next time, thanks. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Su, Jun-Ming [ 2017-06-28 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Hi, It happened again this morning, and I got the minidump for mysqld, please help me to find out the reason, thanks. (There are 2 files that I did 2 times for procdump at that moment.) | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
wlad, could you please take a look to see if you can work some magic on it? I am not getting much from the dumps, everything looks like this:
There are lots of threads in server_audit.dll, so it could be one of suspects. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2017-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
elenst, the secret is putting correct .pdb, .exe, .dll files, next to .dmp, otherwise at least VS debugger does not want to work. Let me check what's here | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Su, Jun-Ming [ 2017-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Hi, is there something wrong with my server_audit.dll ? In order to our company audit requirements, I modify original server_audit.dll, and to meet the requirements for audit policies. I upload the modified source code, dll file, and pdb file. I modified (in codes there are included from /* modify start / to / modify end */, and sent out more QUERY_XXX tags for operations.
I had used this modified plugin on 4 single servers (2 for replication slaves, 2 for test databases) and there is no problem like this before. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2017-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I know you modified the audit plugin, because we do not have matching symbols for that. Without matching symbols it is hard to debug. This would be nice if you could provide server_audit.dll and server_audit.pdb that you compiled But yes, it looks like it is the audit plugin that makes problems. It looks that one thread in the audit plugin is writing something ,and others are waiting from https://jira.mariadb.org/secure/attachment/43802/unique_stacks%2Ctxt.txt "writing" thread .161 Id: 770.1970 Suspend: 0 Teb: 00007ff7`c014e000 Unfrozen "waiting" threads .165 Id: 770.c98 Suspend: 0 Teb: 00007ff7`c0146000 Unfrozen | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Su, Jun-Ming [ 2017-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Thank you for helping me, and Yes, It is hard to debug for no dll and pdb. I am sorry, and I had uploaded them, please help me to figure out that, and I have no idea about using this modified plugin at another server is fine. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2017-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
ok , better resolved stacks are in https://jira.mariadb.org/secure/attachment/43803/unique_callstacks_2.txt now | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2017-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
added callstacks for another .dmp | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2017-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I'm not familiar with audit plugin, but what is suspicious here is that in both cases there is one thread that is stuck in WriteFile
And every other thread stuck in plugin is waiting for some critical section, which is probably held by the file writing thread, like this
Now, WriteFile() would not block for long time, when writing to regular file. But if file is a pipe, and the pipe is full, then blocking occurs. Can this be the case here? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Su, Jun-Ming [ 2017-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Hi, the audit files will be written on network share drive on different server, and I think although there are something wrong on server audit, It still can not block the database server itself. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2017-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||
|
audit plugin blocks for the entire duration of "auditing()" . It waits on a internal lock.I cannot comment whether this decision is wise, but as it stands , I would do is to try to avoid long running operations inside the auditing(), since one thread can run it at the same time. It looks like the writes to network drive take a long time in your environment | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Su, Jun-Ming [ 2017-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Ok, maybe it was designed like this. Not liking MSSQL TraceFile method, there will be another thread to do that. I have another question,that is the database server is operated 7x24, why it was not happened everyday or periodically, is it the randomly happened? And is there anyway to set WriteFile() function to nonblocking or async? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2017-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||
|
It was definitely not designed for the network access in mind. If I was doing it, I would implement it to write file locally, and maybe have a batch job scheduled periodically to transfer new records to some network place. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Su, Jun-Ming [ 2017-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Ok, thank you all for helping this issue, I will modify the design of my site. |