[MDEV-19200] innodb_fast_shutdown=0 triggers OOM killer on shutdown was: innodb.undo_truncate_recover fails in buildbot not being able to shut down the server Created: 2019-04-06  Updated: 2019-08-30  Resolved: 2019-08-30

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Tests
Affects Version/s: 10.4.4
Fix Version/s: 10.4.8

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Vladislav Vaintroub
Resolution: Fixed Votes: 0
Labels: regression

Attachments: Text File bt-thread-apply-all.txt    
Issue Links:
Relates
relates to MDEV-18694 Assertion `mysql_socket.fd != -1' fai... Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-asan/builds/807/steps/mtr_nm/logs/stdio

10.4 c84dde14

innodb.undo_truncate_recover '16k,2,innodb,strict_crc32' w3 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2019-04-05 18:18:29
line
2019-04-05 18:18:24 0 [Warning] /home/buildbot/buildbot/build/mariadb-10.4.4/sql/mysqld: Thread 5 (user : '') did not exit
2019-04-05 18:18:24 0 [Warning] /home/buildbot/buildbot/build/mariadb-10.4.4/sql/mysqld: Thread 4 (user : '') did not exit
2019-04-05 18:18:24 0 [Warning] /home/buildbot/buildbot/build/mariadb-10.4.4/sql/mysqld: Thread 3 (user : '') did not exit
2019-04-05 18:18:24 0 [Warning] /home/buildbot/buildbot/build/mariadb-10.4.4/sql/mysqld: Thread 2 (user : '') did not exit
2019-04-05 18:18:24 0 [Warning] /home/buildbot/buildbot/build/mariadb-10.4.4/sql/mysqld: Thread 1 (user : '') did not exit
^ Found warnings in /dev/shm/var/3/log/mysqld.1.err
ok

Not reproducible locally on my machine, but reproducible on buildbot's ASAN VMs.



 Comments   
Comment by Daniel Black [ 2019-07-08 ]

I managed to hit it:

ubuntu-18.04 with distro gcc/g++ (7.4.0) - no special CFLAGS.
kernel 5.1.0
POWER8 (8335-GCA / S822LC)

$ client/mysql -S /tmp/s.sock 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 8418
Server version: 10.4.6-MariaDB Source distribution
 
MariaDB [(none)]> set global innodb_flush_log_at_trx_commit=1;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [(none)]> show global variables like '%shutdown%';
+-------------------------------------+-------+
| Variable_name                       | Value |
+-------------------------------------+-------+
| innodb_buffer_pool_dump_at_shutdown | ON    |
| innodb_fast_shutdown                | 1     |
+-------------------------------------+-------+
2 rows in set (0.005 sec)
 
MariaDB [(none)]> set global innodb_fast_shutdown=0;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [(none)]> shutdown;
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [(none)]> Bye
{noformat}
 
{code:perl}
 sql/mysqld --no-defaults --datadir=$HOME/mysqldata --socket /tmp/s.sock --lc-messages-dir=${PWD}/sql/share --innodb-buffer-pool-size=200G --innodb-log-file-size=1G --innodb_log_files_in_group=4  --verbose
mysql.user table already exists!
Run mysql_upgrade, not mysql_install_db
2019-07-08 11:50:24 0 [Note] sql/mysqld (mysqld 10.4.6-MariaDB) starting as process 7807 ...
2019-07-08 11:50:25 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-07-08 11:50:25 0 [Note] InnoDB: Uses event mutexes
2019-07-08 11:50:25 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2019-07-08 11:50:25 0 [Note] InnoDB: Number of pools: 1
2019-07-08 11:50:25 0 [Note] InnoDB: Using POWER8 crc32 instructions
2019-07-08 11:50:25 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)
2019-07-08 11:50:25 0 [Note] InnoDB: Initializing buffer pool, total size = 200G, instances = 8, chunk size = 128M
2019-07-08 11:50:58 0 [Note] InnoDB: Completed initialization of buffer pool
2019-07-08 11:50:58 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2019-07-08 11:51:03 0 [Note] InnoDB: Resizing redo log from 2*1073741824 to 4*1073741824 bytes; LSN=140318
2019-07-08 11:51:03 0 [Note] InnoDB: Starting to delete and rewrite log files.
2019-07-08 11:51:03 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 1073741824 bytes
2019-07-08 11:51:03 0 [Note] InnoDB: Setting log file ./ib_logfile1 size to 1073741824 bytes
2019-07-08 11:51:03 0 [Note] InnoDB: Setting log file ./ib_logfile2 size to 1073741824 bytes
2019-07-08 11:51:03 0 [Note] InnoDB: Setting log file ./ib_logfile3 size to 1073741824 bytes
2019-07-08 11:51:03 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2019-07-08 11:51:03 0 [Note] InnoDB: New log files created, LSN=140318
2019-07-08 11:51:03 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2019-07-08 11:51:03 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2019-07-08 11:51:03 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2019-07-08 11:51:03 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2019-07-08 11:51:03 0 [Note] InnoDB: Waiting for purge to start
2019-07-08 11:51:03 0 [Note] InnoDB: 10.4.6 started; log sequence number 140318; transaction id 21
2019-07-08 11:51:03 0 [Note] InnoDB: Loading buffer pool(s) from /home/dan/mysqldata/ib_buffer_pool
2019-07-08 11:51:03 0 [Note] Plugin 'FEEDBACK' is disabled.
2019-07-08 11:51:03 0 [Note] InnoDB: Buffer pool(s) load completed at 190708 11:51:03
2019-07-08 11:51:03 0 [Note] Server socket created on IP: '::'.
2019-07-08 11:51:03 0 [Note] Reading of all Master_info entries succeeded
2019-07-08 11:51:03 0 [Note] Added new Master_info '' to hash table
2019-07-08 11:51:03 0 [Note] sql/mysqld: ready for connections.
Version: '10.4.6-MariaDB'  socket: '/tmp/s.sock'  port: 3306  Source distribution
2019-07-08 11:56:20 14 [Warning] IP address '192.168.10.8' could not be resolved: Name or service not known
2019-07-08 11:56:20 12 [Warning] IP address '192.168.10.8' could not be resolved: Name or service not known
...
...
...
2019-07-08 14:26:27 2102 [Warning] Aborted connection 2102 to db: 'sbtest' user: 'sbtest' host: '192.168.10.8' (Got an error reading communication packets)
2019-07-08 14:26:27 2110 [Warning] Aborted connection 2110 to db: 'sbtest' user: 'sbtest' host: '192.168.10.8' (Got an error reading communication packets)
q2019-07-08 14:30:03 2935 [Note] Detected table cache mutex contention at instance 2: 20% waits. Additional table cache instance activated. Number of instances after activation: 3.
2019-07-08 15:13:26 8212 [Note] Detected table cache mutex contention at instance 2: 22% waits. Additional table cache instance activated. Number of instances after activation: 4.
2019-07-08 15:55:04 0 [Note] sql/mysqld (initiated by: dan[dan] @ localhost []): Normal shutdown
2019-07-08 15:55:04 0 [Note] Event Scheduler: Purging the queue. 0 events
2019-07-08 15:55:04 0 [Note] InnoDB: FTS optimize thread exiting.
2019-07-08 15:55:25 0 [Warning] sql/mysqld: Thread 5 (user : '') did not exit
 
2019-07-08 15:55:25 0 [Warning] sql/mysqld: Thread 4 (user : '') did not exit
 
2019-07-08 15:55:25 0 [Warning] sql/mysqld: Thread 3 (user : '') did not exit
 
2019-07-08 15:55:25 0 [Warning] sql/mysqld: Thread 1 (user : '') did not exit
 
2019-07-08 15:55:25 0 [Warning] sql/mysqld: Thread 2 (user : '') did not exit
 
Killed
$ date
16:05:55

`date` was run pretty much I saw the killed come up.

While running the following globals where changed:

innodb_flush_log_at_trx_commit=0
 
max_prepared_stmt_count=65535
max_connections=800
innodb_io_capacity_max=500000
innodb_io_capacity=500000
innodb_lru_scan_depth=10240

The first 5 user threads are innodb as expected:

| Id  | User        | Host               | db     | Command | Time | State                    | Info                                                                             | Progress |
+-----+-------------+--------------------+--------+---------+------+--------------------------+----------------------------------------------------------------------------------+----------+
|   2 | system user |                    | NULL   | Daemon  | NULL | InnoDB purge worker      | NULL                                                                             |    0.000 |
|   1 | system user |                    | NULL   | Daemon  | NULL | InnoDB purge coordinator | NULL                                                                             |    0.000 |
|   3 | system user |                    | NULL   | Daemon  | NULL | InnoDB purge worker      | NULL                                                                             |    0.000 |
|   4 | system user |                    | NULL   | Daemon  | NULL | InnoDB purge worker      | NULL                                                                             |    0.000 |
|   5 | system user |                    | NULL   | Daemon  | NULL | InnoDB shutdown handler  | NULL                                                                             |    0.000 |

On restart:

2019-07-08 16:09:09 0 [Note] sql/mysqld (mysqld 10.4.6-MariaDB) starting as process 45002 ...
2019-07-08 16:09:09 0 [Note] mysqld: Aria engine: starting recovery
recovered pages: 0% 28% 38% 70% 80% 91% 100% (0.0 seconds); tables to flush: 2 1 0 (0.0 seconds); 
2019-07-08 16:09:09 0 [Note] mysqld: Aria engine: recovery done
2019-07-08 16:09:09 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-07-08 16:09:09 0 [Note] InnoDB: Uses event mutexes
2019-07-08 16:09:09 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2019-07-08 16:09:09 0 [Note] InnoDB: Number of pools: 1
2019-07-08 16:09:09 0 [Note] InnoDB: Using POWER8 crc32 instructions
2019-07-08 16:09:09 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)
2019-07-08 16:09:09 0 [Note] InnoDB: Initializing buffer pool, total size = 200G, instances = 8, chunk size = 128M
^C2019-07-08 16:09:42 0 [Note] InnoDB: Completed initialization of buffer pool
2019-07-08 16:09:42 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2019-07-08 16:09:42 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=443556641154
2019-07-08 16:09:42 0 [Note] InnoDB: Starting final batch to recover 2530 pages from redo log.
2019-07-08 16:09:51 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2019-07-08 16:09:51 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2019-07-08 16:09:51 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2019-07-08 16:09:51 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2019-07-08 16:09:51 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2019-07-08 16:09:51 0 [Note] InnoDB: Waiting for purge to start
2019-07-08 16:09:51 0 [Note] InnoDB: 10.4.6 started; log sequence number 443558379695; transaction id 174681
2019-07-08 16:09:51 0 [Note] InnoDB: Loading buffer pool(s) from /home/dan/mysqldata/ib_buffer_pool
2019-07-08 16:09:51 0 [Note] Plugin 'FEEDBACK' is disabled.
2019-07-08 16:09:51 0 [Note] InnoDB: Buffer pool(s) load completed at 190708 16:09:51
2019-07-08 16:09:51 0 [Note] Server socket created on IP: '::'.
2019-07-08 16:09:51 0 [Note] Reading of all Master_info entries succeeded
2019-07-08 16:09:51 0 [Note] Added new Master_info '' to hash table
2019-07-08 16:09:51 0 [Note] sql/mysqld: ready for connections.
Version: '10.4.6-MariaDB'  socket: '/tmp/s.sock'  port: 3306  Source distribution

Comment by Daniel Black [ 2019-07-08 ]

Reliably reproduced by using innodb_fast_shutdown=0 - gdb bt thread apply all attach caught before 10 minute kill occured.

 
MariaDB [(none)]>  set global innodb_fast_shutdown=0;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [(none)]> shutdown;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [(none)]> quit

$ scripts/mysql_install_db --no-defaults --srcdir=${OLDPWD} --builddir=${PWD} --datadir=$HOME/mysqldata --verbose; sql/mysqld --no-defaults --datadir=$HOME/mysqldata --socket /tmp/s.sock --lc-messages-dir=${PWD}/sql/share --innodb-buffer-pool-size=200G --innodb-log-file-size=1G --innodb_log_files_in_group=4  --max_prepared_stmt_count=65535  --max_connections=800 --innodb_io_capacity_max=500000 --innodb_io_capacity=500000 --innodb_lru_scan_depth=10240 --verbose
mysql.user table already exists!
Run mysql_upgrade, not mysql_install_db
2019-07-08 16:15:24 0 [Note] sql/mysqld (mysqld 10.4.6-MariaDB) starting as process 45096 ...
2019-07-08 16:15:24 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-07-08 16:15:24 0 [Note] InnoDB: Uses event mutexes
2019-07-08 16:15:24 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2019-07-08 16:15:24 0 [Note] InnoDB: Number of pools: 1
2019-07-08 16:15:24 0 [Note] InnoDB: Using POWER8 crc32 instructions
2019-07-08 16:15:24 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)
2019-07-08 16:15:24 0 [Note] InnoDB: Initializing buffer pool, total size = 200G, instances = 8, chunk size = 128M
2019-07-08 16:15:56 0 [Note] InnoDB: Completed initialization of buffer pool
2019-07-08 16:15:56 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2019-07-08 16:16:01 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2019-07-08 16:16:01 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2019-07-08 16:16:01 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2019-07-08 16:16:01 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2019-07-08 16:16:01 0 [Note] InnoDB: 10.4.6 started; log sequence number 444427247572; transaction id 174681
2019-07-08 16:16:01 0 [Note] InnoDB: Loading buffer pool(s) from /home/dan/mysqldata/ib_buffer_pool
2019-07-08 16:16:01 0 [Note] Plugin 'FEEDBACK' is disabled.
2019-07-08 16:16:01 0 [Note] Server socket created on IP: '::'.
2019-07-08 16:16:01 0 [Note] Reading of all Master_info entries succeeded
2019-07-08 16:16:01 0 [Note] Added new Master_info '' to hash table
2019-07-08 16:16:01 0 [Note] sql/mysqld: ready for connections.
Version: '10.4.6-MariaDB'  socket: '/tmp/s.sock'  port: 3306  Source distribution
2019-07-08 16:16:06 0 [Note] InnoDB: Buffer pool(s) load completed at 190708 16:16:06
2019-07-08 16:16:16 0 [Note] sql/mysqld (initiated by: dan[dan] @ localhost []): Normal shutdown
2019-07-08 16:16:16 0 [Note] Event Scheduler: Purging the queue. 0 events
2019-07-08 16:16:16 0 [Note] InnoDB: FTS optimize thread exiting.
2019-07-08 16:16:36 0 [Warning] sql/mysqld: Thread 5 (user : '') did not exit
 
2019-07-08 16:16:36 0 [Warning] sql/mysqld: Thread 4 (user : '') did not exit
 
2019-07-08 16:16:36 0 [Warning] sql/mysqld: Thread 3 (user : '') did not exit
 
2019-07-08 16:16:36 0 [Warning] sql/mysqld: Thread 2 (user : '') did not exit
 
2019-07-08 16:16:36 0 [Warning] sql/mysqld: Thread 1 (user : '') did not exit

code base does have my github pr #1221 rebased on top of the 10.4.6 release tag however this is occurring without --large-pages enabled.

Comment by Daniel Black [ 2019-07-08 ]

Turns out the killer is oom_reaper:

[1650311.804676] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,8,global_oom,task_memcg=/,task=mysqld,pid=7807,uid=1003
[1650311.804871] Out of memory: Killed process 7807 (mysqld) total-vm:249372608kB, anon-rss:216923456kB, file-rss:384kB, shmem-rss:0kB
[1650315.267321] oom_reaper: reaped process 7807 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

$ ps -u dan au
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       4250  0.0  0.0   7424  1536 tty1     Ss+  Jun19   0:00 /sbin/agetty -o -p -- \u --noclear tty1 linux
dan        6620  0.0  0.0  10944  5824 pts/1    Ss   11:13   0:01 -bash
dan        7905  0.0  0.0  10944  4544 pts/3    Ss+  11:56   0:01 -bash
root      16692  0.0  0.0  15040  6208 hvc0     Ss   Jun19   0:00 /bin/login -p --
dan       19015  0.0  0.0  10944  5312 pts/4    Ss+  15:37   0:00 -bash
dan       45096  172 12.6 223207104 33805632 pts/0 Sl+ 16:15  47:08 sql/mysqld --no-defaults --datadir=/home/dan/mysqldata --socket /tmp/s.sock --lc-messages-dir=/home/dan/buil

The RSS value was going up by about 90k every 2 seconds.

Comment by Marko Mäkelä [ 2019-07-19 ]

danblack, thanks for the analysis!
Could you please get some perf top or similar output, to show where InnoDB is spending the CPU time and memory?

Comment by Daniel Black [ 2019-07-23 ]

I had an attempt at reproduction and its appeared to have gone way. Was using a table full of INT PK with a [LONG?]BLOB column filled to the last character.

Probably times is the threads not in lock from backtrace:

  • Thread 28 - btr_defragment_thread (sleep)
  • Thread 25 - srv_worker_thread (ut_delay)
  • Thread 24 - srv_worker_thread (ut_delay)
  • Thread 23 - srv_worker_thread (ut_delay)
  • Thread 22 - srv_monitor_thread (sleep)

Could it just be an OOM? The VSZ is 40G away from the total-vm size but if there was some expansion of LONGBLOBs across a few threads maybe that would be enough. I'll keep an eye out for it.

Comment by Marko Mäkelä [ 2019-08-16 ]

I can reliably repeat the messages with the following invocation:

./mtr --mysqld=--innodb-sync-debug innodb.undo_truncate_recover

Based on perf top it appears that the purge threads were still busy at this time. I verified this by setting a breakpoint on warn_threads_active_after_phase_1() and checking the active threads.

The test is generating quite a bit of workload. The shutdown warnings do look bogus to me, because the InnoDB shutdown clearly is in progress:

10.4 e6b505fd3cfb03567f0c0f5664f30de1b1c0f9da

Thread 25 (Thread 0x7fffb2cdd700 (LWP 12015)):
#0  MutexDebug<TTASEventMutex<GenericPolicy> >::assert_clean_context (this=0xffffffffffffffff) at /mariadb/10.4/storage/innobase/include/sync0policy.h:80
#1  0x00005555562cd3e8 in MutexDebug<TTASEventMutex<GenericPolicy> >::destroy (this=0x7fffb2cdcb80) at /mariadb/10.4/storage/innobase/include/sync0policy.h:107
#2  0x00005555562cc46c in MutexDebug<TTASEventMutex<GenericPolicy> >::enter (this=0x55555715fa88 <srv_sys+200>, mutex=..., filename=0x555556b7db60 "/mariadb/10.4/storage/innobase/srv/srv0srv.cc", line=932) at /mariadb/10.4/storage/innobase/include/sync0policy.h:127
#3  0x00005555562ca9ad in PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0x55555715fa78 <srv_sys+184>, n_spins=30, n_delay=4, name=0x555556b7db60 "/mariadb/10.4/storage/innobase/srv/srv0srv.cc", line=932) at /mariadb/10.4/storage/innobase/include/ib0mutex.h:593
#4  0x000055555646a9cd in srv_release_threads (type=SRV_PURGE, n=1) at /mariadb/10.4/storage/innobase/srv/srv0srv.cc:932
#5  0x000055555646ee57 in srv_purge_wakeup () at /mariadb/10.4/storage/innobase/srv/srv0srv.cc:2816
#6  0x000055555646ef5c in srv_purge_shutdown () at /mariadb/10.4/storage/innobase/srv/srv0srv.cc:2833
#7  0x000055555629c797 in thd_destructor_proxy () at /mariadb/10.4/storage/innobase/handler/ha_innodb.cc:302
#8  0x00005555567c8ae2 in pfs_spawn_thread (arg=0x55555806aea8) at /mariadb/10.4/storage/perfschema/pfs.cc:1862
#9  0x00007ffff7615fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#10 0x00007ffff71cc4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

We can speed up the test by dropping the table before shutdown:

diff --git a/mysql-test/suite/innodb/t/undo_truncate_recover.test b/mysql-test/suite/innodb/t/undo_truncate_recover.test
index a143c25eaf0..64b1f6f23a1 100644
--- a/mysql-test/suite/innodb/t/undo_truncate_recover.test
+++ b/mysql-test/suite/innodb/t/undo_truncate_recover.test
@@ -35,6 +35,7 @@ update t1 set c = 'MariaDB';
 update t1 set c = 'InnoDB';
 eval set global debug_dbug = '+d,$SEARCH_PATTERN';
 commit;
+drop table t1;
 call mtr.add_suppression("InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces");
 SET GLOBAL innodb_fast_shutdown=0;
 --source include/shutdown_mysqld.inc
@@ -51,5 +52,3 @@ if ($checksum_algorithm == "strict_crc32")
   let $restart_parameters= $restart_parameters --innodb_checksum_algorithm=strict_full_crc32;
 }
 --source include/start_mysqld.inc
-
-drop table t1;

With the above tweak, the test passes a little easier for me despite the --innodb-debug-sync slowdown. (The purpose of the test is to exercise undo log truncation, and it does not matter if we speed up the undo log record processing by dropping the table upfront.)

innodb.undo_truncate_recover '16k,2,innodb,strict_crc32' [ pass ]  46517
innodb.undo_truncate_recover '2,4k,innodb,strict_crc32' [ fail ]  Found warnings/errors in server log file!

Alas, it still fails with innodb_page_size=4k. So, something is indeed wrong, and I will keep investigating. Side note: This is the only InnoDB test that fails suspiciously when using --innodb-debug-sync, and it does not fail like that on 10.3.

For the case

./mtr --manual-gdb --mysqld=--innodb-sync-debug innodb.undo_truncate_recover,4k

I am seeing some contention on srv_sys.mutex. At the time of warn_threads_active_after_phase_1() invocation, two srv_worker_thread are waiting for it inside srv_resume_thread(). Setting innodb_purge_threads=1 avoids that:

./mtr --parallel=auto --force --retry=0 --mysqld=--innodb-sync-debug innodb.undo_truncate_recover
innodb.undo_truncate_recover '16k,2,innodb,strict_full_crc32' w1 [ pass ]  36457
innodb.undo_truncate_recover '16k,2,innodb,strict_crc32' w2 [ pass ]  39821
innodb.undo_truncate_recover '2,8k,innodb,strict_crc32' w4 [ pass ]  43680
innodb.undo_truncate_recover '2,8k,innodb,strict_full_crc32' w3 [ pass ]  43010
innodb.undo_truncate_recover '2,4k,innodb,strict_crc32' w5 [ pass ]  52457
innodb.undo_truncate_recover '2,4k,innodb,strict_full_crc32' w6 [ pass ]  55820

I am not seeing any evidence of a memory leak, but it seems to be "on the egde" again if I run my tweaked test with both AddressSanitizer and innodb_debug_sync:

 
innodb.undo_truncate_recover '16k,2,innodb,strict_crc32' w3 [ pass ]  46366
innodb.undo_truncate_recover '16k,2,innodb,strict_full_crc32' w1 [ pass ]  48442
innodb.undo_truncate_recover '2,8k,innodb,strict_crc32' w4 [ pass ]  52617
innodb.undo_truncate_recover '2,4k,innodb,strict_crc32' w6 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2019-08-16 09:44:25
line
2019-08-16  9:44:21 0 [Warning] /dev/shm/10.4/sql/mysqld: Thread 2 (user : '') did not exit
2019-08-16  9:44:21 0 [Warning] /dev/shm/10.4/sql/mysqld: Thread 1 (user : '') did not exit
^ Found warnings in /dev/shm/10.4/mysql-test/var/6/log/mysqld.1.err
ok
 
 - saving '/dev/shm/10.4/mysql-test/var/6/log/innodb.undo_truncate_recover-2,4k,innodb,strict_crc32/' to '/dev/shm/10.4/mysql-test/var/log/innodb.undo_truncate_recover-2,4k,innodb,strict_crc32/'
innodb.undo_truncate_recover '2,8k,innodb,strict_full_crc32' w5 [ pass ]  51675
innodb.undo_truncate_recover '2,4k,innodb,strict_full_crc32' w2 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2019-08-16 09:44:38
line
2019-08-16  9:44:35 0 [Warning] /dev/shm/10.4/sql/mysqld: Thread 2 (user : '') did not exit
2019-08-16  9:44:35 0 [Warning] /dev/shm/10.4/sql/mysqld: Thread 1 (user : '') did not exit
^ Found warnings in /dev/shm/10.4/mysql-test/var/2/log/mysqld.1.err
ok
 
 - saving '/dev/shm/10.4/mysql-test/var/2/log/innodb.undo_truncate_recover-2,4k,innodb,strict_full_crc32/' to '/dev/shm/10.4/mysql-test/var/log/innodb.undo_truncate_recover-2,4k,innodb,strict_full_crc32/'
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 199.100 of 97 seconds executing testcases
 
Completed: Failed 2/6 tests, 66.67% were successful.

Without innodb_debug_sync, it is again fine, and no leaks are reported by

ASAN_OPTIONS=abort_on_error=1,disable_coredump=0 ./mtr --parallel=auto --force --retry=0 innodb.undo_truncate_recover

I will push my tweaks to the test, but will keep this ticket open for a while.
danblack, perhaps you were seeing the impact of memory fragmentation?

Comment by Marko Mäkelä [ 2019-08-16 ]

My 10.2 test tweak is now merged up to 10.4.

Comment by Marko Mäkelä [ 2019-08-16 ]

The warning messages were introduced by wlad in MDEV-18694 in MariaDB Server 10.4.4.

I believe that such warnings on shutdown are bogus and should not be issued except in debug-instrumented builds.

A shutdown with innodb_fast_shutdown=0 can last minutes or even hours. At the very least, the messages (and any abrupt killing of the server on shutdown) should be avoided when this setting is enabled.

Generated at Thu Feb 08 08:49:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.