Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12972

Random and Frequent Segfault (SIG 11) During Runtime

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.1.23, 10.0(EOL), 10.1(EOL), 10.2(EOL)
    • 10.1.26, 10.0.32, 10.2.8
    • Optimizer
    • None
    • CentOS7, Ubuntu 16.04

    Description

      Over the past few weeks I've had three different servers with MariaDB 10.1.20 through 10.1.23 inclusive all crash seemingly at random with a Signal 11.

      No common time of day, no common operation other than operating on data stored in InnoDB tables.

      Attached is the GDB trace from the core dump I managed to obtain as well as configs.

      I can make the core dump available if needs be however it is over 200GB uncompressed, approx 50GB with gzip.

      The overall configuration is a single read/write master with slaves attached. One of the slaves is used for SELECT statements by the same application while the other is a dormant stand by.

      The crash happens on three separate servers of identical hardware. CentOS7 and Ubuntu 16.04 have both been the running OS while the crash has occurred. The crash has only occurred while the server has been operating as the master with slaves attached, as yet we've not seen an active read slave or dormant slave exhibit the same crash. Potential concurrency issue?

      We DO have a vast number of tables in the DB, close to 700k, approx 400k of those active during a normal business day. table_open_cache is currently set at 524288 because we've found if we don't set it/set it low, we get locked up waiting for mysql to look through the table cache for an evict-able table before it gives up and adds a new table_cache_entry anyway.

      Also attached is the my.cnf and the query/query plan for the query in the stack trace for the thread that segfaulted, though we've seen it happen on optimize table statements before now.

      On the current active write master I've set the optimizer_switch to default as I see MRR involved in the thread that died, something we turned on at some point and I note in MariaDB defaults, it's switched off.

      Attachments

        1. _usr_sbin_mysqld.112.crash
          55 kB
          Alex Boag-Munroe
        2. example-query.sql
          3 kB
          Alex Boag-Munroe
        3. mariadb-debug.out
          329 kB
          Alex Boag-Munroe
        4. my.cnf
          5 kB
          Alex Boag-Munroe
        5. show-create-index.txt
          17 kB
          Alex Boag-Munroe
        6. stack1
          111 kB
          Jan Lindström

        Issue Links

          Activity

            Added one of the latest full stack trace from core file.

            jplindst Jan Lindström (Inactive) added a comment - Added one of the latest full stack trace from core file.

            Worth noting, apparently the crash occurs in this code whether any of the mrr optimizer switches are enabled or not.

            Ninpo Alex Boag-Munroe added a comment - Worth noting, apparently the crash occurs in this code whether any of the mrr optimizer switches are enabled or not.

            Alex,
            The comment just before the class Mrr_simple_index_reader (multi_range_read.h) says:

            /*
              A "bypass" index reader that just does and index scan. The index scan is done 
              by calling default MRR implementation (i.e.  handler::multi_range_read_XXX())
              functions.
            */
            

            So the server performs just a regular index scan and crashes there.

            igor Igor Babaev (Inactive) added a comment - Alex, The comment just before the class Mrr_simple_index_reader (multi_range_read.h) says: /* A "bypass" index reader that just does and index scan. The index scan is done by calling default MRR implementation (i.e. handler::multi_range_read_XXX()) functions. */ So the server performs just a regular index scan and crashes there.

            Problem was a memory overflow in DsMrr_impl::setup_buffer_sharing() when used with a very small buffer.

            monty Michael Widenius added a comment - Problem was a memory overflow in DsMrr_impl::setup_buffer_sharing() when used with a very small buffer.
            varun Varun Gupta (Inactive) added a comment - - edited

            SET join_cache_level = 0;
            SET optimizer_switch ='mrr=on,mrr_sort_keys=on,optimize_join_buffer_size=on';
            ANALYZE 
            SELECT * FROM t1 AS t1_outer WHERE EXISTS ( SELECT * FROM t2 WHERE i2 IN ( SELECT i3 FROM t3 INNER JOIN t1 AS t1_inner ON (t1_inner.c1 = c3 ) WHERE t1_inner.i1 < t1_outer.i1 ) );
            id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra
            1	PRIMARY	t1_outer	ALL	NULL	NULL	NULL	NULL	5	5.00	100.00	80.00	Using where
            2	DEPENDENT SUBQUERY	t2	ALL	NULL	NULL	NULL	NULL	2	1.20	100.00	100.00	Using where
            2	DEPENDENT SUBQUERY	t3	ref	i3	i3	5	test.t2.i2	1	4.00	100.00	100.00	Using where
            2	DEPENDENT SUBQUERY	t1_inner	ref	c1	c1	4	test.t3.c3	1	0.38	100.00	44.44	Using where; FirstMatch(t2)
            
            

            varun Varun Gupta (Inactive) added a comment - - edited SET join_cache_level = 0; SET optimizer_switch ='mrr=on,mrr_sort_keys=on,optimize_join_buffer_size=on'; ANALYZE SELECT * FROM t1 AS t1_outer WHERE EXISTS ( SELECT * FROM t2 WHERE i2 IN ( SELECT i3 FROM t3 INNER JOIN t1 AS t1_inner ON (t1_inner.c1 = c3 ) WHERE t1_inner.i1 < t1_outer.i1 ) ); id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra 1 PRIMARY t1_outer ALL NULL NULL NULL NULL 5 5.00 100.00 80.00 Using where 2 DEPENDENT SUBQUERY t2 ALL NULL NULL NULL NULL 2 1.20 100.00 100.00 Using where 2 DEPENDENT SUBQUERY t3 ref i3 i3 5 test.t2.i2 1 4.00 100.00 100.00 Using where 2 DEPENDENT SUBQUERY t1_inner ref c1 c1 4 test.t3.c3 1 0.38 100.00 44.44 Using where; FirstMatch(t2)

            People

              monty Michael Widenius
              Ninpo Alex Boag-Munroe
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.