[MDEV-19555] sql/sql_error.h:1036: uint Diagnostics_area::sql_errno() const: Assertion `m_status == DA_ERROR' failed for CHECK TABLE Created: 2019-05-22  Updated: 2022-01-26  Resolved: 2022-01-26

Status: Closed
Project: MariaDB Server
Component/s: Admin statements, Locking
Affects Version/s: 10.4.6, 10.5.1, 10.4, 10.5, 10.6
Fix Version/s: 10.4.23, 10.5.14, 10.6.6, 10.7.2, 10.8.1

Type: Bug Priority: Critical
Reporter: Matthias Leich Assignee: Rucha Deodhar
Resolution: Fixed Votes: 0
Labels: affects-tests

Attachments: File 19555.log     File MDEV-19555.test     File MDEV-19555.yy    
Issue Links:
Relates
relates to MDEV-19840 Assertion `m_status == DA_ERROR' fail... Open

 Description   

Upd: See MTR test case in the comments


Problem found during RQG testing on
origin/10.4 592fe954ef82be1bc08b29a8e54f7729eb1e1343 2019-05-22T13:48:26+02:00
 
mysqld: sql/sql_error.h:1036: uint Diagnostics_area::sql_errno() const: Assertion `m_status == DA_ERROR' failed.
190522 15:58:16 [ERROR] mysqld got signal 6 ;
...
Query (0x7effa8011180): CHECK TABLE t3  /* E_R Thread5 QNO 594 CON_ID 21 */
Connection ID (thread ID): 33
Status: NOT_KILLED
...
Thread 1 (Thread 0x7f00603d2700 (LWP 27998)):
#0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
#1  0x00005615f4364aab in my_write_core (sig=6) at mysys/stacktrace.c:481
#2  0x00005615f3b98047 in handle_fatal_signal (sig=6) at sql/signal_handler.cc:343
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#5  0x00007f0078a2637a in __GI_abort () at abort.c:89
#6  0x00007f0078a1cb47 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x5615f44c180c "m_status == DA_ERROR", file=file@entry=0x5615f44c1770 "sql/sql_error.h", line=line@entry=1036, function=function@entry=0x5615f44c2360 <Diagnostics_area::sql_errno() const::__PRETTY_FUNCTION__> "uint Diagnostics_area::sql_errno() const") at assert.c:92
#7  0x00007f0078a1cbf2 in __GI___assert_fail (assertion=0x5615f44c180c "m_status == DA_ERROR", file=0x5615f44c1770 "sql/sql_error.h", line=1036, function=0x5615f44c2360 <Diagnostics_area::sql_errno() const::__PRETTY_FUNCTION__> "uint Diagnostics_area::sql_errno() const") at assert.c:101
#8  0x00005615f3749dac in Diagnostics_area::sql_errno (this=0x7effa8006450) at sql/sql_error.h:1036
#9  0x00005615f3b9e7f6 in ha_rollback_trans (thd=0x7effa8000a98, all=true) at sql/handler.cc:1893
#10 0x00005615f39ebf5b in trans_rollback_implicit (thd=0x7effa8000a98) at sql/transaction.cc:387
#11 0x00005615f39e9e3e in mysql_admin_table(THD *, TABLE_LIST *, HA_CHECK_OPT *, const char *, thr_lock_type, bool, bool, uint, int (*)(THD *, TABLE_LIST *, HA_CHECK_OPT *), struct {...}, int (*)(THD *, TABLE_LIST *, HA_CHECK_OPT *)) (thd=0x7effa8000a98, tables=0x7effa8011288, check_opt=0x7effa8005c58, operator_name=0x5615f4532f79 "check", lock_type=TL_READ_NO_INSERT, org_open_for_modify=false, repair_table_use_frm=false, extra_open_options=32, prepare_func=0x0, operator_func=(int (handler::*)(handler * const, THD *, HA_CHECK_OPT *)) 0x5615f3ba5cf2 <handler::ha_check(THD*, st_ha_check_opt*)>, view_operator_func=0x5615f397559e <view_check(THD*, TABLE_LIST*, st_ha_check_opt*)>) at sql/sql_admin.cc:1173
#12 0x00005615f39ea74c in Sql_cmd_check_table::execute (this=0x7effa8011938, thd=0x7effa8000a98) at sql/sql_admin.cc:1348
#13 0x00005615f386cc81 in mysql_execute_command (thd=0x7effa8000a98) at sql/sql_parse.cc:6347
#14 0x00005615f3871f31 in mysql_parse (thd=0x7effa8000a98, rawbuf=0x7effa8011180 "CHECK TABLE t3  /* E_R Thread5 QNO 594 CON_ID 21 */", length=51, parser_state=0x7f00603d11d0, is_com_multi=false, is_next_command=false) at sql/sql_parse.cc:8157
#15 0x00005615f385d60d in dispatch_command (command=COM_QUERY, thd=0x7effa8000a98, packet=0x7effa8007d49 "", packet_length=53, is_com_multi=false, is_next_command=false) at sql/sql_parse.cc:1829
#16 0x00005615f385bdcb in do_command (thd=0x7effa8000a98) at sql/sql_parse.cc:1362
#17 0x00005615f39d3d9b in do_handle_one_connection (connect=0x5615f69c79d8) at sql/sql_connect.cc:1403
#18 0x00005615f39d3aff in handle_one_connection (arg=0x5615f69c79d8) at sql/sql_connect.cc:1306
#19 0x00007f00798696da in start_thread (arg=0x7f00603d2700) at pthread_create.c:456
#20 0x00007f0078af7d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
 
Current state of information
1. The problem applies to MariaDB version >= 10.4 only.
2. This seems to be a concurrency problem.
3. I have not seen that problem when testing 10.4 a few weeks ago.
 
Some bisecting
10.4.5  commit d18ef804bb5d9d473055a2fdc04f74e175a8e9cd 2019-05-03   replay
10.4.5  commit ca098107a26872b818a96bd753ab2aeb2cd1ccdf 2019-05-03   no replay



 Comments   
Comment by Matthias Leich [ 2019-05-22 ]

19555.log – Protocol of my RQG run including backtrace
MDEV-19555.yy - Simplified RQG grammar
I am trying to develop some MTR based replay test.

Comment by Elena Stepanova [ 2019-08-30 ]

Note: The test case is non-deterministic, run with --repeat=N. It always fails for me within 5 attempts, but it can vary on different machines and builds.

--source include/have_innodb.inc
 
CREATE TABLE t1 (a INT) ENGINE=InnoDB;
 
--connect (con1,localhost,root,,test)
FLUSH TABLES WITH READ LOCK;
 
--connect (con2,localhost,root,,test)
--send
    UPDATE t1 SET a = 1;
 
--connection default
--send
    CREATE OR REPLACE TABLE t1 (b INT) ENGINE=InnoDB;
 
--connection con1
CHECK TABLE t1;
 
# Cleanup
UNLOCK TABLES;
--disconnect con1
--connection con2
--error 0,ER_BAD_FIELD_ERROR
--reap
--disconnect con2
--connection default
--reap
DROP TABLE t1;

10.4 77e44282

mysqld: /data/src/10.4/sql/sql_error.h:1036: uint Diagnostics_area::sql_errno() const: Assertion `m_status == DA_ERROR' failed.
190830 19:38:21 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f4b0fc4df12 in __GI___assert_fail (assertion=0x5621b78c3b94 "m_status == DA_ERROR", file=0x5621b78c3b00 "/data/src/10.4/sql/sql_error.h", line=1036, function=0x5621b78c46c0 <Diagnostics_area::sql_errno() const::__PRETTY_FUNCTION__> "uint Diagnostics_area::sql_errno() const") at assert.c:101
#8  0x00005621b6bd0c02 in Diagnostics_area::sql_errno (this=0x7f4ab40064c8) at /data/src/10.4/sql/sql_error.h:1036
#9  0x00005621b702af7e in ha_rollback_trans (thd=0x7f4ab4000b00, all=true) at /data/src/10.4/sql/handler.cc:1919
#10 0x00005621b6e759f5 in trans_rollback_implicit (thd=0x7f4ab4000b00) at /data/src/10.4/sql/transaction.cc:387
#11 0x00005621b6e738d8 in mysql_admin_table(THD *, TABLE_LIST *, HA_CHECK_OPT *, const char *, thr_lock_type, bool, bool, uint, int (*)(THD *, TABLE_LIST *, HA_CHECK_OPT *), struct {...}, int (*)(THD *, TABLE_LIST *, HA_CHECK_OPT *)) (thd=0x7f4ab4000b00, tables=0x7f4ab4011e20, check_opt=0x7f4ab4005cd0, operator_name=0x5621b7934851 "check", lock_type=TL_READ_NO_INSERT, org_open_for_modify=false, repair_table_use_frm=false, extra_open_options=32, prepare_func=0x0, operator_func=(int (handler::*)(handler * const, THD *, HA_CHECK_OPT *)) 0x5621b7032488 <handler::ha_check(THD*, st_ha_check_opt*)>, view_operator_func=0x5621b6dfed79 <view_check(THD*, TABLE_LIST*, st_ha_check_opt*)>) at /data/src/10.4/sql/sql_admin.cc:1177
#12 0x00005621b6e741e6 in Sql_cmd_check_table::execute (this=0x7f4ab40124d8, thd=0x7f4ab4000b00) at /data/src/10.4/sql/sql_admin.cc:1352
#13 0x00005621b6cf3ae9 in mysql_execute_command (thd=0x7f4ab4000b00) at /data/src/10.4/sql/sql_parse.cc:6098
#14 0x00005621b6cf8d59 in mysql_parse (thd=0x7f4ab4000b00, rawbuf=0x7f4ab4011d68 "CHECK TABLE t1", length=14, parser_state=0x7f4b08fd3170, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:7908
#15 0x00005621b6ce5002 in dispatch_command (command=COM_QUERY, thd=0x7f4ab4000b00, packet=0x7f4ab4008331 "", packet_length=14, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1843
#16 0x00005621b6ce3748 in do_command (thd=0x7f4ab4000b00) at /data/src/10.4/sql/sql_parse.cc:1360
#17 0x00005621b6e5d74e in do_handle_one_connection (connect=0x5621b9a24640) at /data/src/10.4/sql/sql_connect.cc:1404
#18 0x00005621b6e5d49d in handle_one_connection (arg=0x5621b9a24640) at /data/src/10.4/sql/sql_connect.cc:1306
#19 0x00005621b778b0ab in pfs_spawn_thread (arg=0x5621b99a3350) at /data/src/10.4/storage/perfschema/pfs.cc:1862
#20 0x00007f4b117c24a4 in start_thread (arg=0x7f4b08fd4700) at pthread_create.c:456
#21 0x00007f4b0fd0ad0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Not reproducible on 10.3.
No obvious effect on a non-debug build.
May or may not be related to MDEV-19840.

Comment by Matthias Leich [ 2020-02-07 ]

10.5.1 commit 42e825dd0a8d25c1d6fa93f5a07115ceed3ee0ff 2020-02-04 compiled with debug – replay
10.5.1 older commit compiled without debug – no replay
10.4.13 commit c1eaa385ffb44bdf6264d2cc4b4cc0e10284c88a 2020-02-03 compiled with debug – replay
10.3.23 commit b0fa30808622fe12d474a70af1838906e60b9897 2020-02-02 compiled with debug – no replay

Comment by Roel Van de Paar [ 2020-06-02 ]

I see this one all over the runs too. A fix would be appreciated. Thank you!

Comment by Andrei Elkin [ 2022-01-18 ]

rucha174, please consider this patch

diff --git a/sql/handler.cc b/sql/handler.cc
index 57992d98c04..c08c480e9b9 100644
--- a/sql/handler.cc
+++ b/sql/handler.cc
@@ -2203,7 +2203,8 @@ int ha_rollback_trans(THD *thd, bool all)
       Thanks to possibility of MDL deadlock rollback request can come even if
       transaction hasn't been started in any transactional storage engine.
     */
-    if (thd->transaction_rollback_request)
+    if (thd->transaction_rollback_request &&
+        thd->transaction->xid_state.is_explicit_XA())
       thd->transaction->xid_state.set_error(thd->get_stmt_da()->sql_errno());
 
     thd->has_waiter= false;

According to git-annotate f189f34ed4b8 refactored thd->transaction.xid_state.rm_error= into the current form, but the assignment pre-condition was not preserved. Originally that was to the explicit XA trx, that's what my patch makes.

Comment by Matthias Leich [ 2022-01-24 ]

Hi Andrei,
many thanks for your work on MDEV-19555.
 
I am just running automatic test simplification for some Fulltext issue on the development tree
    origin/bb-10.6-MDEV-20605-cur-pos-fix 714b3adc82eac16b46a47c4953cbeedaf14c77cb 2022-01-20T18:41:45+03:00.
The FTS issue I am looking for occurs on actual main trees too.
The corresponding RQG test does neither invoke explicite XA statements nor replication.
DB servers get startet with "--sync-binlog=1".
 
Before applying your patch to the development tree I got 
    50% of all tests fail because of MDEV-19555    
      1%  fail with the issue I am searching for.
After losing the patience and applying your patch it looks far way better.
It also does not look like the bad effect "Assertion `m_status == DA_ERROR' failed for CHECK TABLE"
is now somehow replaced by some other new bad effect.

Comment by Matthias Leich [ 2022-01-25 ]

The tree
origin/10.8, 10.8 5595ed9d9f20aeb463ac26e075c9cba41927e85e 2022-01-24T19:26:09+09:00
+ MDEV-19555 patch + some preliminary patch for MDEV-14481 performed well in RQG testing
with the test battery for broad range coverage of functionality.
No new bad effects.

Comment by Andrei Elkin [ 2022-01-26 ]

Rucha, howdy.

The patch merely restores logical flaw in the problematic sources code spot.
The setting
thd->transaction->xid_state.set_error()
is meaningful only for the user (aka explicit) XA.
To clear your mind about that I already suggested to time travel with git-annotate.

Thank you,

Andrei.

Comment by Andrei Elkin [ 2022-01-26 ]

Approved by Sanja.

For merging, there's a usual conflict dealing with THD::transaction that is the pointer in higher branches.

Generated at Thu Feb 08 08:52:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.