Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25163

Rowid filter does not process storage engine error correctly.

Details

    Description

      We had this crash on production database, but we cant see reason for this.
      It was not possible to restart the server again and got error message (see picture below).

      We managed to run it again by setting innodb_force_recovery=1. The server is running fine, but not knowing what caused this makes me worry

      Mar 15 15:59:54 mysql02-server mariadbd: 2021-03-15 15:59:54 3535477 [Warning] Sort aborted, host: web06, user: dbuser, thread: 3535477, 
      query: SELECT * FROM `******` WHERE id = 123 ORDER BY id ASC, id ASC LIMIT 1 FOR UPDATE
      Mar 15 15:59:54 mysql02-server mariadbd: 2021-03-15 15:59:54 0x7fc49e404700 
       InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.8/storage/innobase/trx/trx0trx.cc line 1336
      Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0
      Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: We intentionally generate a memory trap.
      Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: If you get repeated assertion failures or crashes, even
      Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: immediately after the mysqld startup, there may be
      Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: corruption in the InnoDB tablespace. Please refer to
      Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
      Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: about forcing recovery.
      Mar 15 15:59:54 mysql02-server mariadbd: 210315 15:59:54 [ERROR] mysqld got signal 6 ;
      Mar 15 15:59:54 mysql02-server mariadbd: This could be because you hit a bug. It is also possible that this binary
      Mar 15 15:59:54 mysql02-server mariadbd: or one of the libraries it was linked against is corrupt, improperly built,
      Mar 15 15:59:54 mysql02-server mariadbd: or misconfigured. This error can also be caused by malfunctioning hardware.
      Mar 15 15:59:54 mysql02-server mariadbd: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      Mar 15 15:59:54 mysql02-server mariadbd: We will try our best to scrape up some info that will hopefully help
      Mar 15 15:59:54 mysql02-server mariadbd: diagnose the problem, but since we have already crashed,
      Mar 15 15:59:54 mysql02-server mariadbd: something is definitely wrong and this may fail.
      

      Attachments

        1. gdb_output.txt
          135 kB
        2. image-2021-03-16-13-47-55-914.png
          image-2021-03-16-13-47-55-914.png
          278 kB
        3. mariadb_log.txt
          20 kB
        4. screenshot-1.png
          screenshot-1.png
          6 kB

        Issue Links

          Activity

            Looking at the patch...

            This part:

            +        /* It's supposed that thd error state will be set(with
            +        file->print_error(res, MYF(0)) call) inside of quick->get_next() call,
            +        that's why there is no need to do this here. See
            +        Mrr_simple_index_reader::get_next() code for example. */
            

            This is true... But then the code in rr_quick is [sometimes] redundant ...
            Debugging a query:

            select * from t1 where a between 10 and 20 for update;
            

              Thread 32 "mysqld" hit Breakpoint 5, my_message_sql (error=1205, str=0x7fffcaffbf70 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0)
            (gdb) wher
              #0  my_message_sql (error=1205, str=0x7fffcaffbf70 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) at /home/psergey/dev-git2/10.4-l
              #1  0x00005555569af452 in my_error (nr=1205, MyFlags=0) at /home/psergey/dev-git2/10.4-look10/mysys/my_error.c:125
              #2  0x00005555560e9156 in handler::print_error (this=0x7fff70037a48, error=146, errflag=0) at /home/psergey/dev-git2/10.4-look10/sql/handler.cc:4093
              #3  0x0000555555f828e4 in Mrr_simple_index_reader::get_next (this=0x7fff70038030, range_info=0x7fffcaffd1a8) at /home/psergey/dev-git2/10.4-look10/sql/m
              #4  0x0000555555f85734 in DsMrr_impl::dsmrr_next (this=0x7fff70037ee0, range_info=0x7fffcaffd1a8) at /home/psergey/dev-git2/10.4-look10/sql/multi_range_
              #5  0x00005555564db79a in ha_innobase::multi_range_read_next (this=0x7fff70037a48, range_info=0x7fffcaffd1a8) at /home/psergey/dev-git2/10.4-look10/stor
              #6  0x000055555627e322 in QUICK_RANGE_SELECT::get_next (this=0x7fff70082f90) at /home/psergey/dev-git2/10.4-look10/sql/opt_range.cc:12345
              #7  0x00005555562920e0 in rr_quick (info=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.cc:369
              #8  0x0000555555ce5eb9 in READ_RECORD::read_record (this=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.h:70
              #9  0x0000555555e1b878 in sub_select (join=0x7fff70017288, join_tab=0x7fff70019188, end_of_records=false) at /home/psergey/dev-git2/10.4-look10/sql/sql_
              #10 0x0000555555e1aace in do_select (join=0x7fff70017288, procedure=0x0) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:20425
              #11 0x0000555555deee1f in JOIN::exec_inner (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4605
              #12 0x0000555555dedf36 in JOIN::exec (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4387
            

            Then, again:

              Thread 32 "mysqld" hit Breakpoint 5, my_message_sql (error=1205, str=0x7fffcaffc040 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0)
            (gdb) wher
              #0  my_message_sql (error=1205, str=0x7fffcaffc040 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) at /home/psergey/dev-git2/10.4-l
              #1  0x00005555569af452 in my_error (nr=1205, MyFlags=0) at /home/psergey/dev-git2/10.4-look10/mysys/my_error.c:125
              #2  0x00005555560e9156 in handler::print_error (this=0x7fff70037a48, error=146, errflag=0) at /home/psergey/dev-git2/10.4-look10/sql/handler.cc:4093
              #3  0x000055555629209d in rr_handle_error (info=0x7fff70019250, error=146) at /home/psergey/dev-git2/10.4-look10/sql/records.cc:356
              #4  0x00005555562920ff in rr_quick (info=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.cc:371
              #5  0x0000555555ce5eb9 in READ_RECORD::read_record (this=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.h:70
              #6  0x0000555555e1b878 in sub_select (join=0x7fff70017288, join_tab=0x7fff70019188, end_of_records=false) at /home/psergey/dev-git2/10.4-look10/sql/sql_
              #7  0x0000555555e1aace in do_select (join=0x7fff70017288, procedure=0x0) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:20425
              #8  0x0000555555deee1f in JOIN::exec_inner (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4605
              #9  0x0000555555dedf36 in JOIN::exec (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4387
            

            So, the error is reported twice... but this is outside of the scope of this patch...

            psergei Sergei Petrunia added a comment - Looking at the patch... This part: + /* It's supposed that thd error state will be set(with + file->print_error(res, MYF(0)) call) inside of quick->get_next() call, + that's why there is no need to do this here. See + Mrr_simple_index_reader::get_next() code for example. */ This is true... But then the code in rr_quick is [sometimes] redundant ... Debugging a query: select * from t1 where a between 10 and 20 for update; Thread 32 "mysqld" hit Breakpoint 5, my_message_sql (error=1205, str=0x7fffcaffbf70 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) (gdb) wher #0 my_message_sql (error=1205, str=0x7fffcaffbf70 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) at /home/psergey/dev-git2/10.4-l #1 0x00005555569af452 in my_error (nr=1205, MyFlags=0) at /home/psergey/dev-git2/10.4-look10/mysys/my_error.c:125 #2 0x00005555560e9156 in handler::print_error (this=0x7fff70037a48, error=146, errflag=0) at /home/psergey/dev-git2/10.4-look10/sql/handler.cc:4093 #3 0x0000555555f828e4 in Mrr_simple_index_reader::get_next (this=0x7fff70038030, range_info=0x7fffcaffd1a8) at /home/psergey/dev-git2/10.4-look10/sql/m #4 0x0000555555f85734 in DsMrr_impl::dsmrr_next (this=0x7fff70037ee0, range_info=0x7fffcaffd1a8) at /home/psergey/dev-git2/10.4-look10/sql/multi_range_ #5 0x00005555564db79a in ha_innobase::multi_range_read_next (this=0x7fff70037a48, range_info=0x7fffcaffd1a8) at /home/psergey/dev-git2/10.4-look10/stor #6 0x000055555627e322 in QUICK_RANGE_SELECT::get_next (this=0x7fff70082f90) at /home/psergey/dev-git2/10.4-look10/sql/opt_range.cc:12345 #7 0x00005555562920e0 in rr_quick (info=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.cc:369 #8 0x0000555555ce5eb9 in READ_RECORD::read_record (this=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.h:70 #9 0x0000555555e1b878 in sub_select (join=0x7fff70017288, join_tab=0x7fff70019188, end_of_records=false) at /home/psergey/dev-git2/10.4-look10/sql/sql_ #10 0x0000555555e1aace in do_select (join=0x7fff70017288, procedure=0x0) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:20425 #11 0x0000555555deee1f in JOIN::exec_inner (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4605 #12 0x0000555555dedf36 in JOIN::exec (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4387 Then, again: Thread 32 "mysqld" hit Breakpoint 5, my_message_sql (error=1205, str=0x7fffcaffc040 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) (gdb) wher #0 my_message_sql (error=1205, str=0x7fffcaffc040 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) at /home/psergey/dev-git2/10.4-l #1 0x00005555569af452 in my_error (nr=1205, MyFlags=0) at /home/psergey/dev-git2/10.4-look10/mysys/my_error.c:125 #2 0x00005555560e9156 in handler::print_error (this=0x7fff70037a48, error=146, errflag=0) at /home/psergey/dev-git2/10.4-look10/sql/handler.cc:4093 #3 0x000055555629209d in rr_handle_error (info=0x7fff70019250, error=146) at /home/psergey/dev-git2/10.4-look10/sql/records.cc:356 #4 0x00005555562920ff in rr_quick (info=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.cc:371 #5 0x0000555555ce5eb9 in READ_RECORD::read_record (this=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.h:70 #6 0x0000555555e1b878 in sub_select (join=0x7fff70017288, join_tab=0x7fff70019188, end_of_records=false) at /home/psergey/dev-git2/10.4-look10/sql/sql_ #7 0x0000555555e1aace in do_select (join=0x7fff70017288, procedure=0x0) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:20425 #8 0x0000555555deee1f in JOIN::exec_inner (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4605 #9 0x0000555555dedf36 in JOIN::exec (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4387 So, the error is reported twice... but this is outside of the scope of this patch...

            +        /* It's supposed that thd error state will be set(with
            +        file->print_error(res, MYF(0)) call) inside of quick->get_next() call,
            +        that's why there is no need to do this here. See
            +        Mrr_simple_index_reader::get_next() code for example. */
            

            It's not "will be set" it is "has been set" at this point. get_next() call is dozen lines above. Please change the wording.

            Please also add

              DBUG_ASSERT(thd->is_error());
            

            to make sure the error state has been set.

            Also this gives a clue how to add test coverage for the changed code in sql_join_cache.cc :

            https://gist.github.com/spetrunia/d2e78780184e09842f4299a87531d848

            psergei Sergei Petrunia added a comment - + /* It's supposed that thd error state will be set(with + file->print_error(res, MYF(0)) call) inside of quick->get_next() call, + that's why there is no need to do this here. See + Mrr_simple_index_reader::get_next() code for example. */ It's not "will be set" it is "has been set" at this point. get_next() call is dozen lines above. Please change the wording. Please also add DBUG_ASSERT(thd->is_error()); to make sure the error state has been set. Also this gives a clue how to add test coverage for the changed code in sql_join_cache.cc : https://gist.github.com/spetrunia/d2e78780184e09842f4299a87531d848
            psergei Sergei Petrunia added a comment - Also please apply this coding style fix: https://gist.github.com/spetrunia/7edad4c40927c1d804aa13ec96aecd60

            More review input provided on Slack. Ok to push when addressed.

            psergei Sergei Petrunia added a comment - More review input provided on Slack. Ok to push when addressed.

            Summary: a query that's doing a locking read and using Rowid Filter could cause a server crash if it has encountered a Lock Wait Timeout or Deadlock opr a similar error when building the Rowid Filter.

            psergei Sergei Petrunia added a comment - Summary: a query that's doing a locking read and using Rowid Filter could cause a server crash if it has encountered a Lock Wait Timeout or Deadlock opr a similar error when building the Rowid Filter.

            People

              vlad.lesin Vladislav Lesin
              ingig Ingi Gauti Ragnarsson
              Votes:
              3 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.