[MDEV-31084] Assertion `waiting' failed in TP_connection_generic::wait_end Created: 2023-04-19  Updated: 2023-04-23  Resolved: 2023-04-21

Status: Closed
Project: MariaDB Server
Component/s: Server, Storage Engine - InnoDB
Affects Version/s: 10.6
Fix Version/s: 11.1.1, 10.11.3, 11.0.2, 10.6.13, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Blocker
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: performance, regression-10.6, rr-profile-analyzed

Issue Links:
Problem/Incident
is caused by MDEV-26827 Make page flushing even faster Closed
Relates
relates to MDEV-31114 Assertion `!tls_worker_data->is_waiti... Closed

 Description   

10.6 1892f5d8fcb7

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00002078557a6859 in __GI_abort () at abort.c:79
#2  0x00002078557a6729 in __assert_fail_base (fmt=0x20785593c588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x561d33059b8c "waiting", file=0x561d330599b0 "/data/MDEV-31084/10.6b/sql/threadpool_generic.cc", line=1363, 
    function=<optimized out>) at assert.c:92
#3  0x00002078557b7fd6 in __GI___assert_fail (assertion=0x561d33059b8c "waiting", 
    file=0x561d330599b0 "/data/MDEV-31084/10.6b/sql/threadpool_generic.cc", line=1363, 
    function=0x561d33059b98 "virtual void TP_connection_generic::wait_end()") at assert.c:101
#4  0x0000561d32645dce in TP_connection_generic::wait_end (this=0x561d34258e00)
    at /data/MDEV-31084/10.6b/sql/threadpool_generic.cc:1363
#5  0x0000561d3235d3c7 in tp_wait_end (thd=0x3d220001158) at /data/MDEV-31084/10.6b/sql/threadpool_common.cc:581
#6  0x0000561d3203bd29 in thd_wait_end (thd=0x3d220001158) at /data/MDEV-31084/10.6b/sql/sql_class.cc:5635
#7  0x0000561d32a82469 in buf_flush_wait_flushed (sync_lsn=1110731) at /data/MDEV-31084/10.6b/storage/innobase/buf/buf0flu.cc:1900
#8  0x0000561d32858eb5 in log_checkpoint_margin () at /data/MDEV-31084/10.6b/storage/innobase/log/log0log.cc:1046
#9  0x0000561d32858ef4 in log_check_margins () at /data/MDEV-31084/10.6b/storage/innobase/log/log0log.cc:1062
#10 0x0000561d32a44b03 in log_free_check () at /data/MDEV-31084/10.6b/storage/innobase/include/log0log.inl:309
#11 0x0000561d32a45e66 in btr_blob_log_check_t::check (this=0x2cad3c6dd350)
    at /data/MDEV-31084/10.6b/storage/innobase/btr/btr0cur.cc:5885
#12 0x0000561d32a41577 in btr_store_big_rec_extern_fields (pcur=0x2cad3c6dd7e0, offsets=0x2cad3c6dde00, big_rec_vec=0x3d223049100, 
    btr_mtr=0x2cad3c6dd8c0, op=BTR_STORE_INSERT) at /data/MDEV-31084/10.6b/storage/innobase/btr/btr0cur.cc:6068
#13 0x0000561d32920bbf in row_ins_index_entry_big_rec (entry=0x14fd2bd4f9e0, big_rec=0x3d223049100, offsets=0x2cad3c6dde00, 
    heap=0x2cad3c6ddcc0, index=0x3d2200e3fd0, thd=0x3d220001158) at /data/MDEV-31084/10.6b/storage/innobase/row/row0ins.cc:2555
#14 0x0000561d3292227a in row_ins_clust_index_entry_low (flags=0, mode=BTR_MODIFY_LEAF, index=0x3d2200e3fd0, n_uniq=1, 
    entry=0x14fd2bd4f9e0, n_ext=0, thr=0x14fd10026610) at /data/MDEV-31084/10.6b/storage/innobase/row/row0ins.cc:2885
#15 0x0000561d329234b4 in row_ins_clust_index_entry (index=0x3d2200e3fd0, entry=0x14fd2bd4f9e0, thr=0x14fd10026610, n_ext=0)
    at /data/MDEV-31084/10.6b/storage/innobase/row/row0ins.cc:3239
#16 0x0000561d329239da in row_ins_index_entry (index=0x3d2200e3fd0, entry=0x14fd2bd4f9e0, thr=0x14fd10026610)
    at /data/MDEV-31084/10.6b/storage/innobase/row/row0ins.cc:3365
#17 0x0000561d3292428f in row_ins_index_entry_step (node=0x14fd10025c80, thr=0x14fd10026610)
    at /data/MDEV-31084/10.6b/storage/innobase/row/row0ins.cc:3533
#18 0x0000561d329246f3 in row_ins (node=0x14fd10025c80, thr=0x14fd10026610)
    at /data/MDEV-31084/10.6b/storage/innobase/row/row0ins.cc:3658
#19 0x0000561d32924e20 in row_ins_step (thr=0x14fd10026610) at /data/MDEV-31084/10.6b/storage/innobase/row/row0ins.cc:3787
#20 0x0000561d32943d7e in row_insert_for_mysql (mysql_rec=0x3d22006b300 "\030\200@\002\002\374\377\377H/\312\022\375\024", 
    prebuilt=0x14fd10024be0, ins_mode=ROW_INS_NORMAL) at /data/MDEV-31084/10.6b/storage/innobase/row/row0mysql.cc:1308
#21 0x0000561d3278b93b in ha_innobase::write_row (this=0x14fd10002800, 
    record=0x3d22006b300 "\030\200@\002\002\374\377\377H/\312\022\375\024")
    at /data/MDEV-31084/10.6b/storage/innobase/handler/ha_innodb.cc:7906
#22 0x0000561d32437eba in handler::ha_write_row (this=0x14fd10002800, 
    buf=0x3d22006b300 "\030\200@\002\002\374\377\377H/\312\022\375\024") at /data/MDEV-31084/10.6b/sql/handler.cc:7603
#23 0x0000561d3205fa0d in write_record (thd=0x3d220001158, table=0x3d22006b7f8, info=0x2cad3c6dea90, sink=0x0)
    at /data/MDEV-31084/10.6b/sql/sql_insert.cc:2157
#24 0x0000561d3205cc2e in mysql_insert (thd=0x3d220001158, table_list=0x3d2200158e8, fields=..., values_list=..., update_fields=..., 
    update_values=..., duplic=DUP_ERROR, ignore=true, result=0x0) at /data/MDEV-31084/10.6b/sql/sql_insert.cc:1129
#25 0x0000561d320ae21c in mysql_execute_command (thd=0x3d220001158, is_called_from_prepared_stmt=false)
    at /data/MDEV-31084/10.6b/sql/sql_parse.cc:4570
#26 0x0000561d320b9592 in mysql_parse (thd=0x3d220001158, 
    rawbuf=0x3d2203941b0 "INSERT /* QNO 41 CON_ID 4 */ /*! IGNORE */ INTO table150_innodb_dynamic_int VALUES  (LOAD_FILE('/data/MDEV-31084/rqg/data/blobs/earth5kb.jpg'), -1581383680, 1697513472, 't', 1789919232, null, '\\'canva"..., length=287932, 
    parser_state=0x2cad3c6df2b0) at /data/MDEV-31084/10.6b/sql/sql_parse.cc:8036
#27 0x0000561d320a6482 in dispatch_command (command=COM_QUERY, thd=0x3d220001158, 
    packet=0x3d220080649 "INSERT /* QNO 41 CON_ID 4 */ /*! IGNORE */ INTO table150_innodb_dynamic_int VALUES  (LOAD_FILE('/data/MDEV-31084/rqg/data/blobs/earth5kb.jpg'), -1581383680, 1697513472, 't', 1789919232, null, '\\'canva"..., packet_length=287933, 
    blocking=false) at /data/MDEV-31084/10.6b/sql/sql_parse.cc:1896
#28 0x0000561d320a507f in do_command (thd=0x3d220001158, blocking=false) at /data/MDEV-31084/10.6b/sql/sql_parse.cc:1409
#29 0x0000561d3235cdf6 in threadpool_process_request (thd=0x3d220001158) at /data/MDEV-31084/10.6b/sql/threadpool_common.cc:432
#30 0x0000561d3235c915 in tp_callback (c=0x561d34258e00) at /data/MDEV-31084/10.6b/sql/threadpool_common.cc:245
#31 0x0000561d32646596 in worker_main (param=0x696e0bee0b58) at /data/MDEV-31084/10.6b/sql/threadpool_generic.cc:1556
#32 0x00007ffe75daa609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#33 0x00002078558a3133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95



 Comments   
Comment by Marko Mäkelä [ 2023-04-21 ]

The stack trace is actually indicating the culrpit thread: we are missing tpool::tpool_wait_begin() before tpool::tpool_wait_end(). I suspect that this recent change of mine related to MDEV-26827 broke it.

Comment by Marko Mäkelä [ 2023-04-21 ]

I think that we must restore the wait bookkeeping in the normal case that this function invokes buf_flush_wait(), something similar to how buf_flush_sync() works:

diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
index f8374a404aa..07c60848ad0 100644
--- a/storage/innobase/buf/buf0flu.cc
+++ b/storage/innobase/buf/buf0flu.cc
@@ -1910,9 +1910,13 @@ ATTRIBUTE_COLD void buf_flush_wait_flushed(lsn_t sync_lsn)
     }
     else
 #endif
+    {
+      thd_wait_begin(nullptr, THD_WAIT_DISKIO);
+      tpool::tpool_wait_begin();
       buf_flush_wait(sync_lsn);
-
-    thd_wait_end(nullptr);
+      tpool::tpool_wait_end();
+      thd_wait_end(nullptr);
+    }
   }
 
   mysql_mutex_unlock(&buf_pool.flush_list_mutex);

Comment by Elena Stepanova [ 2023-04-21 ]

I ran the test on 10.6 27ff972be + the patch above 50 times, it didn't fail.
Without the patch, on otherwise the same build type and the same machine, it usually fails within ~10 attempts.

Generated at Thu Feb 08 10:21:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.