[MDEV-2516] LP:902654 - MariaDB consistently crashes in collect_tables on Aria checkpoint execution Created: 2011-12-11  Updated: 2015-02-02  Resolved: 2012-10-04

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Scott Feldstein (Inactive) Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: Launchpad

Attachments: XML File LPexportBug902654.xml     File LPexportBug902654_bug.tgz     File LPexportBug902654_bug902654.test     File LPexportBug902654_mariadb.crash.info.tgz     File LPexportBug902654_mysql.bug.tgz     File LPexportBug902654_mysql_general.log.bz2    

 Description   

Hi,
This issue occurs on MariaDB 5.2.10 and 5.3.2-beta

OS info:
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)

Memory (with MariaDB running):
$ free -m
total used free shared buffers cached
Mem: 16050 14153 1897 0 242 12572
-/+ buffers/cache: 1338 14712
Swap: 4095 89 4006

This occurs just about every 5 minutes on the dot, but sometimes it crashes out of band. mysqld_safe continuously restarts it.

Attached info includes:
cpuinfo (from /proc)
mysql table creation script
my.cnf
mysqld.log

To repro, simply create a db, run the data_tables.sql script to create all the tables and it will start to crash.



 Comments   
Comment by Scott Feldstein (Inactive) [ 2011-12-11 ]

Re: MariaDB consistently crashes

Comment by Scott Feldstein (Inactive) [ 2011-12-11 ]

mariadb.crash.info.tgz
LPexportBug902654_mariadb.crash.info.tgz

Comment by Elena Stepanova [ 2011-12-12 ]

Re: MariaDB consistently crashes
Hi,

Just creating the tables doesn't seem to cause the crash, at least it isn't crashing for me so far. I used the same settings as you.

Your log suggests that your server is not idle, you have some scheduled activity, in a cronjob or such – something that connects to the server and apparently executes something. Could you find out what is executed? The connection looks local, that is whatever is running, is probably on the same box.

If you could turn on the general log temporarily, its output could help to understand what flow causes the problem. To turn on logging, set general_log=1 and global general_log_file=<file location>. If you do it on a running server, please make sure you are changing global variables (set global); but it would be more useful if you could set them in the cnf file, so we see the whole activity from the server restart and till the crash.

One more thing, in the log file, when the problem started, the server complains about mysql.user and mysql.db tables being crashed. Could you please check if they are OK now (run check table on them), just so we know we are not dealing with an underlying condition?

Thank you

Comment by Scott Feldstein (Inactive) [ 2011-12-15 ]

Re: MariaDB consistently crashes
attaching mysql general log

Comment by Scott Feldstein (Inactive) [ 2011-12-15 ]

attaching mysql general log
mysql general log
LPexportBug902654_mysql_general.log.bz2

Comment by Elena Stepanova [ 2011-12-19 ]

Re: MariaDB consistently crashes
Hi,

Thank you for the provided log.

I have not succeeded so far at reproducing the problem, even by imitating the flow similar to yours, and using the identical data structures; ran tests both on RHEL 5.6 (although with less memory), and OpenSUSE with the same memory-related settings. I'll try to find configuration resembling yours more closely.

Is it correct that you only had the problem on RHEL 5.6, and only with aria-pagecache-buffer-size=12G?
Do you have installations on RHEL 5.6 that use a different memory configuration, and/or installations on different OSes with the same memory configuration? If you do, have you observed any problems with any of those?

Also, the general log that you provided has a bit different crashing pattern comparing to the previous error log. the first log shows that the server crashed about every 5 min for the first hour, then in 10 min, then in 50 min; and further with the same 2-hour cycle. If you happen to have a general log from any of such servers (where the crashes follow an uneven but stable pattern), it could also help.

Comment by Scott Feldstein (Inactive) [ 2011-12-20 ]

Re: MariaDB consistently crashes
Hi,
Attaching the log files that correspond to each other. I also generated a core dump using the mysql --core-file option. I hope you'll be able to diagnose the direct cause of the bug from that.

thanks.

Comment by Scott Feldstein (Inactive) [ 2011-12-20 ]

Hi,
Attaching the log files that correspond to each other. I also generated a core dump using the mysql --core-file option. I hope you'll be able to diagnose the direct cause of the bug from that.

thanks.
bug.tgz
LPexportBug902654_bug.tgz

Comment by Scott Feldstein (Inactive) [ 2012-02-04 ]

Re: MariaDB consistently crashes
Hi,
I am now able to reproduce this bug 100% of the time. I am now using 5.3.3 rc on linux.

I have attached a bundle which will allow you to do the same.

follow these instructions:

  1. create a db in mysql called "mydb"
  2. make sure that your ulimit -n is >= 8192 on your mysqld server due to the number of tables that are in this implementation
  3. unbundle mysql.bug.tgz
  4. use this script to populate the tables
    $ cat aria_metric_data_tables0.sql | mysql -u <user> -p<pass> -D mydb
    $ bin/mysqlreplay.sh <jdbcUrl> <user> <pass>
  5. e.g. bin/mysqlreplay.sh jdbc:mysql://localhost:3306/mydb myadmin mypasswd
  6. mariadb will crash

It turns out that the monitoring solution that we use calls

"show /*!50002 global */ status"
AND
SELECT * FROM information_schema.tables WHERE lower(table_schema) = 'mydb' AND engine is not null

by default this runs every 5 mins (same interval as the crash). When i change the interval on the monitoring solution to 20 mins, then mariadb crashes every 20 mins. When i turn it off the agent then mariadb is stable.

Please let me know if you are having issues with the script or are not able to reproduce. It works every time for me and the version of centos / rhel has nothing to do with it as i previously thought.

thanks.

Comment by Scott Feldstein (Inactive) [ 2012-02-04 ]

Hi,
I am now able to reproduce this bug 100% of the time. I am now using 5.3.3 rc on linux.

I have attached a bundle which will allow you to do the same.

follow these instructions:

  1. create a db in mysql called "mydb"
  2. make sure that your ulimit -n is >= 8192 on your mysqld server due to the number of tables that are in this implementation
  3. unbundle mysql.bug.tgz
  4. use this script to populate the tables
    $ cat aria_metric_data_tables0.sql | mysql -u <user> -p<pass> -D mydb
    $ bin/mysqlreplay.sh <jdbcUrl> <user> <pass>
  5. e.g. bin/mysqlreplay.sh jdbc:mysql://localhost:3306/mydb myadmin mypasswd
  6. mariadb will crash

It turns out that the monitoring solution that we use calls

"show /*!50002 global */ status"
AND
SELECT * FROM information_schema.tables WHERE lower(table_schema) = 'mydb' AND engine is not null

by default this runs every 5 mins (same interval as the crash). When i change the interval on the monitoring solution to 20 mins, then mariadb crashes every 20 mins. When i turn it off the agent then mariadb is stable.

Please let me know if you are having issues with the script or are not able to reproduce. It works every time for me and the version of centos / rhel has nothing to do with it as i previously thought.

thanks.
mysql.bug.tgz
LPexportBug902654_mysql.bug.tgz

Comment by Elena Stepanova [ 2012-02-07 ]

Re: MariaDB consistently crashes
Thank you, now I can reproduce the problem.

I am attaching an MTR testcase, it is based on the flow that Scott provided, only converted in mysqltest format and reduced. The test fails for me reliably on 5.2 and 5.3 debug server. If it doesn't cause the crash, try the algorithm suggested by Scott, it worked for me too.

Debug assertion:

mysqld: ma_checkpoint.c:949: collect_tables: Assertion `state_copy < state_copies_end' failed

#8 0xb7571014 in __assert_fail () from /lib/libc.so.6
#9 0x0860aba7 in collect_tables (str=0xb754a23c, checkpoint_start_log_horizon=4300787999)
at ma_checkpoint.c:949
#10 0x086097a8 in really_execute_checkpoint () at ma_checkpoint.c:198
#11 0x086095c1 in ma_checkpoint_execute (level=CHECKPOINT_MEDIUM, no_wait=1 '\001')
at ma_checkpoint.c:132
#12 0x0860a310 in ma_checkpoint_background (arg=0x1e) at ma_checkpoint.c:621
#13 0xb77ffb25 in start_thread () from /lib/libpthread.so.0

Release crash:

#3 0x0825ab96 in handle_segfault (sig=11) at mysqld.cc:2838
#4 <signal handler called>
#5 0x0847fb3a in collect_tables (level=CHECKPOINT_MEDIUM, no_wait=1 '\001')
at ma_checkpoint.c:948
#6 really_execute_checkpoint (level=CHECKPOINT_MEDIUM, no_wait=1 '\001')
at ma_checkpoint.c:198
#7 ma_checkpoint_execute (level=CHECKPOINT_MEDIUM, no_wait=1 '\001')
at ma_checkpoint.c:132
#8 0x08480716 in ma_checkpoint_background (arg=0x1e) at ma_checkpoint.c:621
#9 0xb77c9b25 in start_thread () from /lib/libpthread.so.0

Comment by Elena Stepanova [ 2012-02-07 ]

Thank you, now I can reproduce the problem.

I am attaching an MTR testcase, it is based on the flow that Scott provided, only converted in mysqltest format and reduced. The test fails for me reliably on 5.2 and 5.3 debug server. If it doesn't cause the crash, try the algorithm suggested by Scott, it worked for me too.

Debug assertion:

mysqld: ma_checkpoint.c:949: collect_tables: Assertion `state_copy < state_copies_end' failed

#8 0xb7571014 in __assert_fail () from /lib/libc.so.6
#9 0x0860aba7 in collect_tables (str=0xb754a23c, checkpoint_start_log_horizon=4300787999)
at ma_checkpoint.c:949
#10 0x086097a8 in really_execute_checkpoint () at ma_checkpoint.c:198
#11 0x086095c1 in ma_checkpoint_execute (level=CHECKPOINT_MEDIUM, no_wait=1 '\001')
at ma_checkpoint.c:132
#12 0x0860a310 in ma_checkpoint_background (arg=0x1e) at ma_checkpoint.c:621
#13 0xb77ffb25 in start_thread () from /lib/libpthread.so.0

Release crash:

#3 0x0825ab96 in handle_segfault (sig=11) at mysqld.cc:2838
#4 <signal handler called>
#5 0x0847fb3a in collect_tables (level=CHECKPOINT_MEDIUM, no_wait=1 '\001')
at ma_checkpoint.c:948
#6 really_execute_checkpoint (level=CHECKPOINT_MEDIUM, no_wait=1 '\001')
at ma_checkpoint.c:198
#7 ma_checkpoint_execute (level=CHECKPOINT_MEDIUM, no_wait=1 '\001')
at ma_checkpoint.c:132
#8 0x08480716 in ma_checkpoint_background (arg=0x1e) at ma_checkpoint.c:621
#9 0xb77c9b25 in start_thread () from /lib/libpthread.so.0

bug902654.test
LPexportBug902654_bug902654.test

Comment by Michael Widenius [ 2012-02-20 ]

Re: MariaDB consistently crashes in collect_tables on Aria checkpoint execution
The bug happens when you have more than 1024 open Aria tables during checkpoint.
This is now fixed in 5.2. Fix will be in 5.2.11.
Will push to 5.3 today.

Comment by Rasmus Johansson (Inactive) [ 2012-03-05 ]

Launchpad bug id: 902654

Comment by Scott Feldstein (Inactive) [ 2012-03-05 ]

Re: MariaDB consistently crashes in collect_tables on Aria checkpoint execution
Just fyi, been running maria 5.3.5 GA for several days now and it is stable.

I'm very impressed with the quick turnover. Thanks!

Generated at Thu Feb 08 06:42:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.