[MDEV-9387] mariadb server automatic restart at fusion-io Created: 2016-01-10  Updated: 2016-05-25  Resolved: 2016-05-25

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.0.22
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: sysdljr Assignee: Jan Lindström (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

cent os 6.6 , mariadb 10.0.22


Attachments: Text File error.log     File log.err.160117     Text File pstack0313.log     File server.cnf    

 Description   

recently ,we migrate db to Fusion-IO device,
besides MDEV-9230 parallel slave problems, we encounter new problems. after 2 days , First , slave occur automatic restart, after 3 days , master occur automatic restart too, Then, we switch master to HDD device, slave still use Fusion-IO, After 20 days, master normal, but slave again occur automatic restart.

we find some bug docs:
1. https://bugs.mysql.com/bug.php?id=73890
answer : set innodb_adaptive_hash_index=0 ?

2. https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1382744
answer: This is FS/Storage level problems ?

Could you help me ? thank you.

attach errorlog of automatic restart and server.cnf



 Comments   
Comment by Elena Stepanova [ 2016-01-11 ]

jplindst, could you please take a look?

Comment by sysdljr [ 2016-01-17 ]

Hi, Jan Lindström
At January 4th, slave of use fusion-io device automatic restart.
At 20:46 today , the slave again automatic restart.
Before the restart , we add "innodb_use_stacktrace = 1" in my.cnf file.

attach error log file: log.err.160117

Comment by sysdljr [ 2016-03-14 ]

after we set innodb_adaptive_hash_index = 0 ,innodb_change_buffering = none,
master and slave's mysql service not occur automatic restart.
But, at March 13th , Last day, master server's mysql service was stacked, not response .
after execute pstack command at below , mysql resume normal work.
pstack `pidof mysqld` > /tmp/pstack.log

attach pstack log file: pstack0313.log

It contains __tz_convert function in log file,
0x00000038fb29da67 in __tz_convert () from /lib64/libc.so.6

someone said , tz_function has performance problem , must modify time_zone ,
example , we live in east 8 zone , set global time_zone = '+8:00' replace default time_zone='SYSTEM', This is Correctly ? please help me , Thanks.

Comment by Jan Lindström (Inactive) [ 2016-03-14 ]

I do not think this is related to time zones. Both error logs contain long semaphore wait error messages, not sure why. Did you do some long running transactions ?

Comment by sysdljr [ 2016-03-14 ]

I sure not long running transaction . we main run two type statement:
(1) singe value statement :INSERT INTO table01 VALUES ON DUPLICATE KEY UPDATE ...
(2) single value call procedure: call update_log(v1,v2,v3);

besides, fusion-io device must format 4k block ?
This link said at below:* Low level formatting device with 4k page size will reduce memory usage of driver and improve performance*
http://www.voleg.info/fusion_io-mezzonine-card-UCS-redhat6.html

Comment by Jan Lindström (Inactive) [ 2016-03-14 ]

Hi, yes it could be that fusion-io device provides better performance and reduced memory usage formatted as 4K blocks, but then your problem is not about that. Do you use xtradb (default) or innodb_plugin ? I recommend using innodb_plugin when fusion-io is used, gives you better performance.

Comment by sysdljr [ 2016-03-14 ]

Thank you .we using default xtradb engine.
If swithch to innodb_plugin, add two lines in [mysqld] of my.cnf ?
ignore_builtin_innodb
plugin_load=innodb=ha_innodb.so

last month ,after we set innodb_adaptive_hash_index = 0 ,innodb_change_buffering = none,
master and slave's mysql service not occur automatic restart. And error logs not contain long semaphore wait too.

now , Please you help analyze attach file pstack0313.log ,thanks you again

Comment by Jan Lindström (Inactive) [ 2016-03-15 ]

Hi, yes correct. pstack0313.log seems to contain a lot of thread waiting either new connections or doing something on audit_plugin. I do not know much about audit_plugin so not sure what it is waiting there.

Comment by sysdljr [ 2016-05-25 ]

Thank you .
Recently ,We test MairaDB10.1,Percona xtradb cluster 5.6, found write stop question, google groups link:
https://groups.google.com/forum/#!topic/codership-team/Ne6WsTWixH8

Last ,We contact other company's DBA , found it is CentOS 6.6 bug.
receference link:
https://www.infoq.com/news/2015/05/redhat-futex
https://groups.google.com/forum/#!msg/mechanical-sympathy/QbmpZxp6C64/0M4_EbzSLj4J

after we replace to CentOS 6.5, mariadb 10.0, mariadb 10.1 (galera cluster) run normal

Both the bug and Mdev-8098 can closed , thanks you again

Generated at Thu Feb 08 07:34:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.