[MDEV-23955] main.multi_update_big random failure on arm64, server timeouts Created: 2020-10-13  Updated: 2020-12-26  Resolved: 2020-10-14

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.5.5
Fix Version/s: 10.2.35, 10.3.26, 10.4.16, 10.5.7

Type: Bug Priority: Minor
Reporter: Otto Kekäläinen Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None


 Description   

Came across this failure on Travis-CI:

main.multi_update_big                    w3 [ fail ]  timeout after 120 seconds
        Test ended at 2020-10-13 20:03:58
Test case timeout after 120 seconds
== /home/travis/build/ottok/mariadb/mysql-test/var/3/log/multi_update_big.log == 
DROP TABLE IF EXISTS t1,t2;
CREATE TABLE t1 ( a INT NOT NULL, b INT NOT NULL) ;
# The protocolling of many inserts into t1 is suppressed.
ALTER TABLE t1 ADD INDEX i1(a);
DELETE FROM t1 WHERE a > 2000000;
CREATE TABLE t2 LIKE t1;
INSERT INTO t2 SELECT * FROM t1;
SELECT 't2 rows before small delete', COUNT(*) FROM t1;
t2 rows before small delete	COUNT(*)
t2 rows before small delete	2000000
DELETE t1,t2 FROM t1,t2 WHERE t1.b=t2.a AND t1.a < 2;
SELECT 't2 rows after small delete', COUNT(*) FROM t2;
t2 rows after small delete	COUNT(*)
t2 rows after small delete	1999999
SELECT 't1 rows after small delete', COUNT(*) FROM t1;
t1 rows after small delete	COUNT(*)
t1 rows after small delete	1999999
DELETE t1,t2 FROM t1,t2 WHERE t1.b=t2.a AND t1.a < 100*1000;
 == /home/travis/build/ottok/mariadb/mysql-test/var/3/tmp/analyze-timeout-mysqld.1.err ==
mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local MySQL server through socket '/home/travis/build/ottok/mariadb/mysql-test/var/tmp/3/mysqld.1.s' (111)
 - found 'core' (0/5)
Trying 'dbx' to get a backtrace
Trying 'gdb' to get a backtrace from coredump /home/travis/build/ottok/mariadb/mysql-test/var/3/log/main.multi_update_big/mysqld.1/data/core
Core generated by '/home/travis/build/ottok/mariadb/sql/mariadbd'
Output from gdb follows. The first stack trace is from the failing thread.
The following stack traces are from all threads (so the failing one is
duplicated).
--------------------------
[New LWP 29524]
[New LWP 29544]
[New LWP 29532]
[New LWP 29535]
[New LWP 29536]
[New LWP 29537]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/travis/build/ottok/mariadb/sql/mariadbd --defaults-group-suffix=.1 --defa'.
Program terminated with signal SIGABRT, Aborted.
#0  0x0000ffff981c97f4 in pthread_kill () from /lib/aarch64-linux-gnu/libpthread.so.0
[Current thread is 1 (Thread 0xffff97c67010 (LWP 29524))]
#0  0x0000ffff981c97f4 in pthread_kill () from /lib/aarch64-linux-gnu/libpthread.so.0
#1  0x0000aaaab1d6a0ac in my_write_core (sig=<optimized out>) at /home/travis/build/ottok/mariadb/mysys/stacktrace.c:424
#2  0x0000aaaab1909fc4 in handle_fatal_signal (sig=6) at /home/travis/build/ottok/mariadb/sql/signal_handler.cc:330
#3  <signal handler called>
#4  0x0000ffff97e49aa0 in poll () from /lib/aarch64-linux-gnu/libc.so.6
#5  0x0000aaaab16c11fc in poll (__timeout=-1, __nfds=2, __fds=0xfffff13ec410) at /usr/include/aarch64-linux-gnu/bits/poll2.h:41
#6  handle_connections_sockets () at /home/travis/build/ottok/mariadb/sql/mysqld.cc:6219
#7  0x0000aaaab16c3c98 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /home/travis/build/ottok/mariadb/sql/mysqld.cc:5668
#8  0x0000ffff97da3090 in __libc_start_main () from /lib/aarch64-linux-gnu/libc.so.6
#9  0x0000aaaab16bad38 in _start ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 6 (Thread 0xffff95fb30f0 (LWP 29537)):
#0  0x0000ffff981c801c in pthread_cond_wait@@GLIBC_2.17 () from /lib/aarch64-linux-gnu/libpthread.so.0
#1  0x0000aaaab16d4d6c in inline_mysql_cond_wait (mutex=<optimized out>, that=<optimized out>) at /home/travis/build/ottok/mariadb/include/mysql/psi/mysql_thread.h:1231
#2  handle_slave_background (arg=<optimized out>) at /home/travis/build/ottok/mariadb/sql/slave.cc:525
#3  0x0000ffff981c14fc in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0
#4  0x0000ffff97e52f2c in ?? () from /lib/aarch64-linux-gnu/libc.so.6
Thread 5 (Thread 0xffff95ffe0f0 (LWP 29536)):
#0  0x0000ffff97db6e64 in sigtimedwait () from /lib/aarch64-linux-gnu/libc.so.6
#1  0x0000aaaab16bb4bc in my_sigwait (code=<synthetic pointer>, sig=0xffff95ffd7bc, set=0xffff95ffd7c0) at /home/travis/build/ottok/mariadb/include/my_pthread.h:196
#2  signal_hand (arg=<optimized out>) at /home/travis/build/ottok/mariadb/sql/mysqld.cc:2988
#3  0x0000ffff981c14fc in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0
#4  0x0000ffff97e52f2c in ?? () from /lib/aarch64-linux-gnu/libc.so.6
Thread 4 (Thread 0xffff967ff0f0 (LWP 29535)):
#0  0x0000ffff981c8384 in pthread_cond_timedwait@@GLIBC_2.17 () from /lib/aarch64-linux-gnu/libpthread.so.0
#1  0x0000aaaab1a8b8b0 in inline_mysql_cond_timedwait (abstime=0xffff967fe830, mutex=<optimized out>, that=<optimized out>) at /home/travis/build/ottok/mariadb/include/mysql/psi/mysql_thread.h:1268
#2  my_service_thread_sleep (control=control@entry=0xaaaab250ef80 <checkpoint_control>, sleep_time=30000000000) at /home/travis/build/ottok/mariadb/storage/maria/ma_servicethread.c:115
#3  0x0000aaaab1a83c34 in ma_checkpoint_background (arg=0x1e) at /home/travis/build/ottok/mariadb/storage/maria/ma_checkpoint.c:707
#4  0x0000ffff981c14fc in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0
#5  0x0000ffff97e52f2c in ?? () from /lib/aarch64-linux-gnu/libc.so.6
Thread 3 (Thread 0xffff97b680f0 (LWP 29532)):
#0  0x0000ffff981c8384 in pthread_cond_timedwait@@GLIBC_2.17 () from /lib/aarch64-linux-gnu/libpthread.so.0
#1  0x0000aaaab1d6d378 in inline_mysql_cond_timedwait (abstime=0xfffff13eb01f, mutex=0xaaaab2e485a8 <LOCK_timer>, that=0xaaaab2e48618 <COND_timer>) at /home/travis/build/ottok/mariadb/include/mysql/psi/mysql_thread.h:1268
#2  timer_handler (arg=<optimized out>) at /home/travis/build/ottok/mariadb/mysys/thr_timer.c:321
#3  0x0000ffff981c14fc in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0
#4  0x0000ffff97e52f2c in ?? () from /lib/aarch64-linux-gnu/libc.so.6
Thread 2 (Thread 0xffff95f680f0 (LWP 29544)):
#0  0x0000ffff981cc990 in pwrite64 () from /lib/aarch64-linux-gnu/libpthread.so.0
#1  0x0000aaaab1d6776c in my_pwrite (Filedes=61, Buffer=0xffff95f66558 "", Count=7, offset=639369, MyFlags=4) at /home/travis/build/ottok/mariadb/mysys/my_pread.c:159
#2  0x0000aaaab1d4267c in _mi_delete_static_record (info=0xffff8402d008) at /home/travis/build/ottok/mariadb/storage/myisam/mi_statrec.c:108
#3  0x0000aaaab1d31154 in mi_delete (info=0xffff8402d008, record=0xffff8411dbf8 "\377\202\025\001") at /home/travis/build/ottok/mariadb/storage/myisam/mi_delete.c:95
#4  0x0000aaaab19145f0 in handler::ha_delete_row (this=0xffff8411ab90, buf=0xffff8411dbf8 "\377\202\025\001") at /home/travis/build/ottok/mariadb/sql/handler.cc:7255
#5  0x0000aaaab1a2ebf4 in TABLE::delete_row (this=0xffff8411b848) at /home/travis/build/ottok/mariadb/sql/sql_delete.cc:277
#6  multi_delete::send_data (this=0xffff84012e38, values=...) at /home/travis/build/ottok/mariadb/sql/sql_delete.cc:1330
#7  0x0000aaaab179e770 in select_result_sink::send_data_with_check (sent=<optimized out>, u=<optimized out>, items=..., this=<optimized out>) at /home/travis/build/ottok/mariadb/sql/sql_class.h:5317
#8  end_send (join=0xffff84012ea8, join_tab=0xffff84015c20, end_of_records=<optimized out>) at /home/travis/build/ottok/mariadb/sql/sql_select.cc:21819
#9  0x0000aaaab178718c in evaluate_join_record (join=join@entry=0xffff84012ea8, join_tab=join_tab@entry=0xffff84015878, error=<optimized out>) at /home/travis/build/ottok/mariadb/sql/sql_select.cc:20848
#10 0x0000aaaab179180c in sub_select (join=0xffff84012ea8, join_tab=0xffff84015878, end_of_records=<optimized out>) at /home/travis/build/ottok/mariadb/sql/sql_select.cc:20625
#11 0x0000aaaab178718c in evaluate_join_record (join=join@entry=0xffff84012ea8, join_tab=join_tab@entry=0xffff840154d0, error=<optimized out>) at /home/travis/build/ottok/mariadb/sql/sql_select.cc:20848
#12 0x0000aaaab179187c in sub_select (join=0xffff84012ea8, join_tab=0xffff840154d0, end_of_records=<optimized out>) at /home/travis/build/ottok/mariadb/sql/sql_select.cc:20664
#13 0x0000aaaab17ada50 in do_select (procedure=<optimized out>, join=0xffff84012ea8) at /home/travis/build/ottok/mariadb/sql/sql_select.cc:20159
#14 JOIN::exec_inner (this=this@entry=0xffff84012ea8) at /home/travis/build/ottok/mariadb/sql/sql_select.cc:4455
#15 0x0000aaaab17ade00 in JOIN::exec (this=this@entry=0xffff84012ea8) at /home/travis/build/ottok/mariadb/sql/sql_select.cc:4236
#16 0x0000aaaab17ac7a8 in mysql_select (thd=thd@entry=0xffff84000c68, tables=<optimized out>, fields=..., conds=<optimized out>, og_num=og_num@entry=0, order=order@entry=0x0, group=group@entry=0x0, having=having@entry=0x0, proc_param=proc_param@entry=0x0, select_options=2202244746112, result=result@entry=0xffff84012e38, unit=0xffff84004a68, unit@entry=0xffff84000c68, select_lex=select_lex@entry=0xffff84005268) at /home/travis/build/ottok/mariadb/sql/sql_select.cc:4651
#17 0x0000aaaab1762670 in mysql_execute_command (thd=thd@entry=0xffff84000c68) at /home/travis/build/ottok/mariadb/sql/sql_parse.cc:4837
#18 0x0000aaaab1766864 in mysql_parse (thd=thd@entry=0xffff84000c68, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /home/travis/build/ottok/mariadb/sql/sql_parse.cc:7994
#19 0x0000aaaab175cf3c in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0xffff84000c68, packet=packet@entry=0xffff84076309 "DELETE t1,t2 FROM t1,t2 WHERE t1.b=t2.a AND t1.a < 100*1000", packet_length=packet_length@entry=59, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/travis/build/ottok/mariadb/sql/sql_parse.cc:1867
#20 0x0000aaaab175bbc4 in do_command (thd=0xffff84000c68) at /home/travis/build/ottok/mariadb/sql/sql_parse.cc:1348
#21 0x0000aaaab1825e3c in do_handle_one_connection (connect=<optimized out>, connect@entry=0xaaaadc0eaeb8, put_in_cache=put_in_cache@entry=true) at /home/travis/build/ottok/mariadb/sql/sql_connect.cc:1410
#22 0x0000aaaab1826148 in handle_one_connection (arg=0xaaaadc0eaeb8) at /home/travis/build/ottok/mariadb/sql/sql_connect.cc:1312
#23 0x0000ffff981c14fc in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0
#24 0x0000ffff97e52f2c in ?? () from /lib/aarch64-linux-gnu/libc.so.6
Thread 1 (Thread 0xffff97c67010 (LWP 29524)):
#0  0x0000ffff981c97f4 in pthread_kill () from /lib/aarch64-linux-gnu/libpthread.so.0
#1  0x0000aaaab1d6a0ac in my_write_core (sig=<optimized out>) at /home/travis/build/ottok/mariadb/mysys/stacktrace.c:424
#2  0x0000aaaab1909fc4 in handle_fatal_signal (sig=6) at /home/travis/build/ottok/mariadb/sql/signal_handler.cc:330
#3  <signal handler called>
#4  0x0000ffff97e49aa0 in poll () from /lib/aarch64-linux-gnu/libc.so.6
#5  0x0000aaaab16c11fc in poll (__timeout=-1, __nfds=2, __fds=0xfffff13ec410) at /usr/include/aarch64-linux-gnu/bits/poll2.h:41
#6  handle_connections_sockets () at /home/travis/build/ottok/mariadb/sql/mysqld.cc:6219
#7  0x0000aaaab16c3c98 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /home/travis/build/ottok/mariadb/sql/mysqld.cc:5668
#8  0x0000ffff97da3090 in __libc_start_main () from /lib/aarch64-linux-gnu/libc.so.6
#9  0x0000aaaab16bad38 in _start ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Full log: https://travis-ci.org/github/ottok/mariadb/jobs/735501697



 Comments   
Comment by Marko Mäkelä [ 2020-10-14 ]

It seems to me that the server is not hung, but busy executing

DELETE t1,t2 FROM t1,t2 WHERE t1.b=t2.a AND t1.a < 100*1000;

in a MyISAM table.

The preceding part of the test looks unnecessarily complicated:

CREATE TABLE t1 ( a INT NOT NULL, b INT NOT NULL) ;
--echo # The protocolling of many inserts into t1 is suppressed.
--disable_query_log
INSERT INTO t1 VALUES (1,1),(2,2),(3,3),(4,4);
let $1=19;
set @d=4;
begin;
while ($1)
{
  eval INSERT INTO t1 SELECT a+@d,b+@d FROM t1;
  eval SET @d=@d*2;
  dec $1;
}
commit;
--enable_query_log
ALTER TABLE t1 ADD INDEX i1(a);
DELETE FROM t1 WHERE a > 2000000;
CREATE TABLE t2 LIKE t1;
INSERT INTO t2 SELECT * FROM t1;
 
SELECT 't2 rows before small delete', COUNT(*) FROM t1;
DELETE t1,t2 FROM t1,t2 WHERE t1.b=t2.a AND t1.a < 2;

It could be significantly faster if we used have_sequence.inc and performed an INSERT…SELECT with a join of the generated tables seq_1_to_19 and seq_1_to_4.
For many other tests that used INSERT INTO t1 SELECT … FROM t1 I already did this. For the last part of main.sum_distinct-big it made a huge difference to the execution time, because the table is InnoDB, and because explicit record locks would be allocated for every row of the source table.

Comment by Marko Mäkelä [ 2020-10-14 ]

Given that INSERT…SELECT from MyISAM to MyISAM was demonstrated to be slow, I cleaned up the test main.sum_distinct-big further.

diff --git a/mysql-test/main/sum_distinct-big.test b/mysql-test/main/sum_distinct-big.test
index 42eeae743b1..41aa5a44c19 100644
--- a/mysql-test/main/sum_distinct-big.test
+++ b/mysql-test/main/sum_distinct-big.test
@@ -21,32 +21,15 @@ set default_storage_engine=MYISAM;
 CREATE TABLE t1 (id INTEGER);
 CREATE TABLE t2 (id INTEGER);
 
-INSERT INTO t1 (id) VALUES (1), (1), (1),(1);
-INSERT INTO t1 (id) SELECT id FROM t1; /* 8 */
-INSERT INTO t1 (id) SELECT id FROM t1; /* 12 */
-INSERT INTO t1 (id) SELECT id FROM t1; /* 16 */
-INSERT INTO t1 (id) SELECT id FROM t1; /* 20 */
-INSERT INTO t1 (id) SELECT id FROM t1; /* 24 */
-INSERT INTO t1 SELECT id+1 FROM t1;
-INSERT INTO t1 SELECT id+2 FROM t1;
-INSERT INTO t1 SELECT id+4 FROM t1;
-INSERT INTO t1 SELECT id+8 FROM t1;
-INSERT INTO t1 SELECT id+16 FROM t1;
-INSERT INTO t1 SELECT id+32 FROM t1;
-INSERT INTO t1 SELECT id+64 FROM t1;
-INSERT INTO t1 SELECT id+128 FROM t1;
-INSERT INTO t1 SELECT id+256 FROM t1;
-INSERT INTO t1 SELECT id+512 FROM t1;
+INSERT INTO t1 SELECT b.seq FROM seq_1_to_128,seq_1_to_1024 b;
 
 # Just test that AVG(DISTINCT) is there
 SELECT AVG(DISTINCT id) FROM t1 GROUP BY id % 13;
 SELECT SUM(DISTINCT id)/COUNT(DISTINCT id) FROM t1 GROUP BY id % 13;
 
-INSERT INTO t1 SELECT id+1024 FROM t1;
-INSERT INTO t1 SELECT id+2048 FROM t1;
-INSERT INTO t1 SELECT id+4096 FROM t1;
-INSERT INTO t1 SELECT id+8192 FROM t1;
-INSERT INTO t2 SELECT id FROM t1 ORDER BY id*rand();
+INSERT INTO t1 SELECT b.seq FROM seq_1_to_128,seq_1025_to_16384 b;
+INSERT INTO t2 SELECT b.seq FROM seq_1_to_128 a,seq_1_to_16384 b
+ORDER by (a.seq*0+b.seq)*rand();
 
 # SELECT '++++++++++++++++++++++++++++++++++++++++++++++++++';
 
@@ -76,9 +59,10 @@ SELECT SUM(DISTINCT id) sm FROM t1 GROUP BY id % 13;
 
 SET max_heap_table_size=default;
 
-INSERT INTO t1 SELECT id+16384 FROM t1;
-DELETE FROM t2;
-INSERT INTO t2 SELECT id FROM t1 ORDER BY id*rand();
+INSERT INTO t1 SELECT b.seq FROM seq_1_to_128,seq_16385_to_32768 b;
+TRUNCATE t2;
+INSERT INTO t2 SELECT b.seq FROM seq_1_to_128 a,seq_1_to_32768 b
+ORDER BY (a.seq*0+b.seq)*rand();
 
 SELECT SUM(DISTINCT id) sm FROM t2;
 

I will try to apply the same technique to main.multi_update_big.

Comment by Marko Mäkelä [ 2020-10-14 ]

The slow code turned out to be inserting 2Mi rows in ascending order:

diff --git a/mysql-test/main/multi_update_big.test b/mysql-test/main/multi_update_big.test
index a0f17fabec4..d6fec704822 100644
--- a/mysql-test/main/multi_update_big.test
+++ b/mysql-test/main/multi_update_big.test
@@ -1,4 +1,5 @@
 --source include/long_test.inc
+--source include/have_sequence.inc
 
 #
 # Test of update statement that uses many tables.
@@ -40,25 +41,9 @@ if (`SELECT '$BIG_TEST' = '' AND $need_big = 1`)
 # Bug#1820 Rows not deleted from second table on multi-table delete
 #
 
---disable_warnings
-DROP TABLE IF EXISTS t1,t2;
---enable_warnings
-
 CREATE TABLE t1 ( a INT NOT NULL, b INT NOT NULL) ;
---echo # The protocolling of many inserts into t1 is suppressed.
---disable_query_log
-INSERT INTO t1 VALUES (1,1),(2,2),(3,3),(4,4);
-let $1=19;
-set @d=4;
-begin;
-while ($1)
-{
-  eval INSERT INTO t1 SELECT a+@d,b+@d FROM t1;
-  eval SET @d=@d*2;
-  dec $1;
-}
-commit;
---enable_query_log
+INSERT INTO t1 SELECT seq,seq FROM seq_1_to_2097152;
+
 ALTER TABLE t1 ADD INDEX i1(a);
 DELETE FROM t1 WHERE a > 2000000;
 CREATE TABLE t2 LIKE t1;

Comment by Otto Kekäläinen [ 2020-10-14 ]

Note that I saw the Travis-CI job failure when working on PR https://github.com/MariaDB/server/pull/1507, not the current Travis config, just in case it matters (probably not).

Comment by Marko Mäkelä [ 2020-10-14 ]

At least https://travis-ci.org/github/MariaDB/server/jobs/735629106 for my test push looks OK. There were a few failures on other platforms of https://travis-ci.org/github/MariaDB/server/builds/735629089:

Comment by Otto Kekäläinen [ 2020-10-14 ]

I confirm this is now fixed. I applied the commit https://github.com/mariadb/server/commit/75d8f60017fb47293190f48d5ae172fb54c3fdb3 on top of my https://github.com/MariaDB/server/pull/1507 (as the 10.2 version did not apply cleanly) and the test passed:

main.multi_update_big                    w4 [ pass ]  23608

Full log: https://travis-ci.org/github/ottok/mariadb/jobs/735792473
Overview: https://travis-ci.org/github/ottok/mariadb/builds/735792456

Thanks Marko!

Generated at Thu Feb 08 09:26:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.