Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33533

Crash at execution of DELETE when trying to use rowid filter

Details

    Description

      WordPress plugin called all-in-one-seo-pack uses following query to delete some completed actionscheduler jobs:

      DELETE aa FROM wp_actionscheduler_actions as aa JOIN wp_actionscheduler_groups as ag on `ag`.`group_id` = `aa`.`group_id` WHERE `ag`.`slug` = 'aioseo' AND `aa`.`status` IN ('complete', 'failed', 'canceled');

      MariaDB server crashes immediately when processing this query. This happens randomly several times a day, but not every time. It depends on the data in affected tables.

      I have attached dump from these two tables with data that causes the crash every time.

      I tried with the following versions, with clean install from RPM repo and default server configuration (my.cnf). Server was a VPS with 16GB RAM and 8 EPYC cores.

      10.11 -> ok
      11.0.5 -> ok
      11.1.4 -> crash
      11.2.3 -> crash
      11.3.2 -> crash
      11.4.1 -> crash

      Log report after crash.

      Feb 23 16:09:02 crashtest mariadbd[71997]: 240223 16:09:02 [ERROR] mysqld got signal 11 ;
      Feb 23 16:09:02 crashtest mariadbd[71997]: Sorry, we probably made a mistake, and this is a bug.
      Feb 23 16:09:02 crashtest mariadbd[71997]: Your assistance in bug reporting will enable us to fix this for the next release.
      Feb 23 16:09:02 crashtest mariadbd[71997]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      Feb 23 16:09:02 crashtest mariadbd[71997]: We will try our best to scrape up some info that will hopefully help
      Feb 23 16:09:02 crashtest mariadbd[71997]: diagnose the problem, but since we have already crashed,
      Feb 23 16:09:02 crashtest mariadbd[71997]: something is definitely wrong and this may fail.
      Feb 23 16:09:02 crashtest mariadbd[71997]: Server version: 11.3.2-MariaDB-log source revision: 068a6819eb63bcb01fdfa037c9bf3bf63c33ee42
      Feb 23 16:09:02 crashtest mariadbd[71997]: key_buffer_size=134217728
      Feb 23 16:09:02 crashtest mariadbd[71997]: read_buffer_size=131072
      Feb 23 16:09:02 crashtest mariadbd[71997]: max_used_connections=26
      Feb 23 16:09:02 crashtest mariadbd[71997]: max_threads=2002
      Feb 23 16:09:02 crashtest mariadbd[71997]: thread_count=26
      Feb 23 16:09:02 crashtest mariadbd[71997]: It is possible that mysqld could use up to
      Feb 23 16:09:02 crashtest mariadbd[71997]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4540935 K  bytes of memory
      Feb 23 16:09:02 crashtest mariadbd[71997]: Hope that's ok; if not, decrease some variables in the equation.
      Feb 23 16:09:02 crashtest mariadbd[71997]: Thread pointer: 0x55620ccf6018
      Feb 23 16:09:02 crashtest mariadbd[71997]: Attempting backtrace. You can use the following information to find out
      Feb 23 16:09:02 crashtest mariadbd[71997]: where mysqld died. If you see no messages after this, something went
      Feb 23 16:09:02 crashtest mariadbd[71997]: terribly wrong...
      Feb 23 16:09:02 crashtest mariadbd[71997]: stack_bottom = 0x7f6610c4bbb8 thread_stack 0x49000
      Feb 23 16:09:02 crashtest mariadbd[71997]: /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x5561f4295a5e]
      Feb 23 16:09:02 crashtest mariadbd[71997]: /usr/sbin/mariadbd(handle_fatal_signal+0x468)[0x5561f3d76308]
      Feb 23 16:09:02 crashtest mariadbd[71997]: /lib64/libc.so.6(+0x54db0)[0x7f6824a54db0]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(handler_rowid_filter_check+0x54)[0x5561f3d84244]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(+0xe69b24)[0x5561f4120b24]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(+0xe6d357)[0x5561f4124357]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(+0xda6858)[0x5561f405d858]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_ZN7handler10ha_rnd_posEPhS0_+0x2ca)[0x5561f3d7dfba]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_Z16rr_from_pointersP11READ_RECORD+0x38)[0x5561f3a31648]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_ZN12multi_delete16do_table_deletesEP5TABLEP9SORT_INFOb+0x94)[0x5561f3ab9234]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_ZN12multi_delete10do_deletesEv+0x84)[0x5561f3ab94f4]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_ZN12multi_delete8send_eofEv+0x71)[0x5561f3ab95d1]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_ZN4JOIN10exec_innerEv+0xf65)[0x5561f3b82135]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_ZN4JOIN4execEv+0x3b)[0x5561f3b825db]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_ZN11Sql_cmd_dml13execute_innerEP3THD+0x76)[0x5561f3b82706]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_ZN14Sql_cmd_delete13execute_innerEP3THD+0x26)[0x5561f3ab8ec6]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_ZN11Sql_cmd_dml7executeEP3THD+0xdb)[0x5561f3b3d38b]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_Z21mysql_execute_commandP3THDb+0x4920)[0x5561f3b06450]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_Z11mysql_parseP3THDPcjP12Parser_state+0x211)[0x5561f3b074c1]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_Z16dispatch_command19enum_server_commandP3THDPcjb+0x1695)[0x5561f3b09ec5]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_Z10do_commandP3THDb+0x134)[0x5561f3b0bc54]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(_Z24do_handle_one_connectionP7CONNECTb+0x3bf)[0x5561f3c30e4f]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(handle_one_connection+0x5d)[0x5561f3c3119d]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /usr/sbin/mariadbd(+0xce5402)[0x5561f3f9c402]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /lib64/libc.so.6(+0x9f802)[0x7f6824a9f802]
      Feb 23 16:09:03 crashtest mariadbd[71997]: /lib64/libc.so.6(+0x3f450)[0x7f6824a3f450]
      Feb 23 16:09:03 crashtest mariadbd[71997]: Trying to get some variables.
      Feb 23 16:09:03 crashtest mariadbd[71997]: Some pointers may be invalid and cause the dump to abort.
      Feb 23 16:09:03 crashtest mariadbd[71997]: Query (0x55620a221030): DELETE aa FROM wp_actionscheduler_actions as aa
      Feb 23 16:09:03 crashtest mariadbd[71997]:                         JOIN wp_actionscheduler_groups as ag on `ag`.`group_id` = `aa`.`group_id`
      Feb 23 16:09:03 crashtest mariadbd[71997]:                         WHERE `ag`.`slug` = 'aioseo'
      Feb 23 16:09:03 crashtest mariadbd[71997]:                         AND `aa`.`status` IN ('complete', 'failed', 'canceled')
      Feb 23 16:09:03 crashtest mariadbd[71997]: Connection ID (thread ID): 8000
      Feb 23 16:09:03 crashtest mariadbd[71997]: Status: NOT_KILLED
      Feb 23 16:09:03 crashtest mariadbd[71997]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=on,cset_narrowing=off,sargable_casefold=on
      Feb 23 16:09:03 crashtest mariadbd[71997]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
      Feb 23 16:09:03 crashtest mariadbd[71997]: information that should help you find out what is causing the crash.
      Feb 23 16:09:03 crashtest mariadbd[71997]: Writing a core file...
      Feb 23 16:09:03 crashtest mariadbd[71997]: Working directory at /var/lib/mysql
      Feb 23 16:09:03 crashtest mariadbd[71997]: Resource Limits:
      Feb 23 16:09:03 crashtest mariadbd[71997]: Limit                     Soft Limit           Hard Limit           Units
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max cpu time              unlimited            unlimited            seconds
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max file size             unlimited            unlimited            bytes
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max data size             unlimited            unlimited            bytes
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max stack size            8388608              unlimited            bytes
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max core file size        0                    unlimited            bytes
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max resident set          unlimited            unlimited            bytes
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max processes             254837               254837               processes
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max open files            32768                32768                files
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max locked memory         8388608              8388608              bytes
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max address space         unlimited            unlimited            bytes
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max file locks            unlimited            unlimited            locks
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max pending signals       254837               254837               signals
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max msgqueue size         819200               819200               bytes
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max nice priority         0                    0
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max realtime priority     0                    0
      Feb 23 16:09:03 crashtest mariadbd[71997]: Max realtime timeout      unlimited            unlimited            us
      Feb 23 16:09:03 crashtest mariadbd[71997]: Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
      Feb 23 16:09:03 crashtest mariadbd[71997]: Kernel version: Linux version 5.14.0-362.18.1.el9_3.x86_64 (mockbuild@x64-builder02.almalinux.org) (gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), GNU ld version 2.35.2-42.el9) #1 SMP PREEMPT_DYNAMIC Mon Jan 29 07:05:48 EST 2024
      Feb 23 16:09:03 crashtest systemd-coredump[73702]: Process 71997 (mariadbd) of user 987 dumped core.
      Feb 23 16:09:03 crashtest systemd[1]: mariadb.service: Main process exited, code=dumped, status=11/SEGV
      Feb 23 16:09:03 crashtest systemd[1]: mariadb.service: Failed with result 'core-dump'.
      Feb 23 16:09:03 crashtest systemd[1]: mariadb.service: Consumed 2min 38.737s CPU time.
      Feb 23 16:09:08 crashtest systemd[1]: mariadb.service: Scheduled restart job, restart counter is at 3.
      Feb 23 16:09:08 crashtest systemd[1]: Stopped MariaDB 11.3.2 database server.

      Attachments

        Issue Links

          Activity

            sanja I wanted to check the issue and had a nice discussion in slack with psergei and serg. What is obvious to one person is not always obvious to another. It is about POV and we should go for the best.

            IMHO, high level, a filter pushed down to SE should be applied to any record that is returned from SE to optimizer using the handler. If we are adding any more assumption to that then it is likely originating from the optimizer design and it is probably better to abstract such things from SE. In current case I observed multiple arguments like ...

            1. The "Row ID" filter is for index scan and should not be applied for table scan (non-index scan). Now this is certainly not obvious from SE perspective as a filter could be applied to any scan.

            2. rnd_init(scan=false) followed by rnd_pos() is positioning the cursor and getting the record and "Row ID" filter is not relevant in such case. However, from SE perspective, a filter is applicable to any record that is returned back to optimizer. So, this rule does look like an exception and certainly not obvious to me from "SE perspective". Please don't get me wrong. I do understand and appreciate the same rule when thought from optimizer perspective.

            The current patch was based rule [1] above. However, after discussion we figured out that it is actually [2] that we would like to impose. In short the expectation is SE to ignore "Row ID Filter" when rnd_init(scan=false).

            I think the patch solves the issue and is functionally correct. So, I agree that we can go ahead with the patch with the above correction and suggest to have some comments stating the expectation we have from SE for rnd_init interface.

            Yet another perspective from the issue that we discussed was about the handler state set with "row ID filter" for a scan with rnd_init(scan=false) when it is not relevant. It seems to be a stale state from previous scan with the same handler that used row ID optimization. I think it is better to reset the row Id filtering state for every scan and set appropriately. This suggestion is independent and is not necessarily to be included in current patch.

            debarun Debarun Banerjee added a comment - sanja I wanted to check the issue and had a nice discussion in slack with psergei and serg . What is obvious to one person is not always obvious to another. It is about POV and we should go for the best. IMHO, high level, a filter pushed down to SE should be applied to any record that is returned from SE to optimizer using the handler. If we are adding any more assumption to that then it is likely originating from the optimizer design and it is probably better to abstract such things from SE. In current case I observed multiple arguments like ... 1. The "Row ID" filter is for index scan and should not be applied for table scan (non-index scan). Now this is certainly not obvious from SE perspective as a filter could be applied to any scan. 2. rnd_init(scan=false) followed by rnd_pos() is positioning the cursor and getting the record and "Row ID" filter is not relevant in such case. However, from SE perspective, a filter is applicable to any record that is returned back to optimizer. So, this rule does look like an exception and certainly not obvious to me from "SE perspective". Please don't get me wrong. I do understand and appreciate the same rule when thought from optimizer perspective. The current patch was based rule [1] above. However, after discussion we figured out that it is actually [2] that we would like to impose. In short the expectation is SE to ignore "Row ID Filter" when rnd_init(scan=false). I think the patch solves the issue and is functionally correct. So, I agree that we can go ahead with the patch with the above correction and suggest to have some comments stating the expectation we have from SE for rnd_init interface. Yet another perspective from the issue that we discussed was about the handler state set with "row ID filter" for a scan with rnd_init(scan=false) when it is not relevant. It seems to be a stale state from previous scan with the same handler that used row ID optimization. I think it is better to reset the row Id filtering state for every scan and set appropriately. This suggestion is independent and is not necessarily to be included in current patch.

            (CC: serg): patch against 11.1 addressing latest input from discussion with Debarun:

            commit a6c516ff587d7bd44b6c3fe97c4f35dabe849ef2 (origin/bb-11.1-MDEV-33533-v2, bb-11.1-MDEV-33533-v2)
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date:   Fri May 3 13:40:06 2024 +0300
             
                MDEV-33533: Crash at execution of DELETE when trying to use rowid filter
            

            I'm still in progress with applying the fix to 10.5. In any case, the patch for 10.5 will be different.

            psergei Sergei Petrunia added a comment - (CC: serg ): patch against 11.1 addressing latest input from discussion with Debarun: commit a6c516ff587d7bd44b6c3fe97c4f35dabe849ef2 (origin/bb-11.1-MDEV-33533-v2, bb-11.1-MDEV-33533-v2) Author: Sergei Petrunia <sergey@mariadb.com> Date: Fri May 3 13:40:06 2024 +0300   MDEV-33533: Crash at execution of DELETE when trying to use rowid filter I'm still in progress with applying the fix to 10.5. In any case, the patch for 10.5 will be different.

            pushed into 11.1+

            serg Sergei Golubchik added a comment - pushed into 11.1+

            Note: on the question of why versions before 11.1 are not affected: because before 11.1, make_join_select() has this logic:

                        if (join->thd->lex->sql_command == SQLCOM_SELECT &&
                            optimizer_flag(join->thd, OPTIMIZER_SWITCH_USE_ROWID_FILTER))
            

            so earlier versions do not hit the issue.

            psergei Sergei Petrunia added a comment - Note: on the question of why versions before 11.1 are not affected: because before 11.1, make_join_select() has this logic: if (join->thd->lex->sql_command == SQLCOM_SELECT && optimizer_flag(join->thd, OPTIMIZER_SWITCH_USE_ROWID_FILTER)) so earlier versions do not hit the issue.

            Closing as this was pushed:

            commit fe41171c96a0eee9c026f96d5971ed281e76d488 (tag: mariadb-11.1.5, origin/bb-11.1-release)
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date:   Fri May 3 13:40:06 2024 +0300
             
                MDEV-33533: Crash at execution of DELETE when trying to use rowid filter
                
                (Based on original patch by Oleksandr Byelkin)
                
                Multi-table DELETE can execute via "buffered" mode: at phase #1 it collects
                rowids of rows to be deleted, then at phase #2 in multi_delete::do_deletes()
                it calls handler->rnd_pos() to read rows to be deleted and deletes them.
                
                The problem occurred when phase #1 used Rowid Filter on the table that
                phase #2 would be deleting from.
                In InnoDB, h->rnd_init(scan=false) and h->rnd_pos() is an index scan over PK
                under the hood. So, at phase #2 ha_innobase::rnd_init() would try to use the
                Rowid Filter and hit an assertion inside ha_innobase::rnd_init().
                
                Note that multi-table UPDATE works similarly but was not affected, because
                patch for MDEV-7487 added code to disable rowid filter for phase #2 in
                multi_update::do_updates().
                
                This patch changes the approach:
                - It makes InnoDB not use Rowid Filter in rnd_pos() scans: it is disabled in
                  ha_innobase::rnd_init() and enabled back in ha_innobase::rnd_end().
                - multi_update::do_updates() no longer disables Rowid Filter for phase#2 as
                  it is no longer necessary.
            

            If we need to change the way things work, we'll need to open another MDEV.

            psergei Sergei Petrunia added a comment - Closing as this was pushed: commit fe41171c96a0eee9c026f96d5971ed281e76d488 (tag: mariadb-11.1.5, origin/bb-11.1-release) Author: Sergei Petrunia <sergey@mariadb.com> Date: Fri May 3 13:40:06 2024 +0300   MDEV-33533: Crash at execution of DELETE when trying to use rowid filter (Based on original patch by Oleksandr Byelkin) Multi-table DELETE can execute via "buffered" mode: at phase #1 it collects rowids of rows to be deleted, then at phase #2 in multi_delete::do_deletes() it calls handler->rnd_pos() to read rows to be deleted and deletes them. The problem occurred when phase #1 used Rowid Filter on the table that phase #2 would be deleting from. In InnoDB, h->rnd_init(scan=false) and h->rnd_pos() is an index scan over PK under the hood. So, at phase #2 ha_innobase::rnd_init() would try to use the Rowid Filter and hit an assertion inside ha_innobase::rnd_init(). Note that multi-table UPDATE works similarly but was not affected, because patch for MDEV-7487 added code to disable rowid filter for phase #2 in multi_update::do_updates(). This patch changes the approach: - It makes InnoDB not use Rowid Filter in rnd_pos() scans: it is disabled in ha_innobase::rnd_init() and enabled back in ha_innobase::rnd_end(). - multi_update::do_updates() no longer disables Rowid Filter for phase#2 as it is no longer necessary. If we need to change the way things work, we'll need to open another MDEV.

            People

              psergei Sergei Petrunia
              dlt rauli arjatsalo
              Votes:
              2 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.