[MDEV-29508] perfschema.short_option_1 fails with MSAN - Error in accept: Invalid argument Created: 2022-09-09  Updated: 2023-12-05

Status: Stalled
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.5
Fix Version/s: 10.5

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-29505 LeakSanitizer errors in TABLE_LIST::f... Open

 Description   

LD_LIBRARY_PATH=<path to MSAN libs> perl mysql-test-run.pl perfschema.short_option_1

10.5 1ac0bce36e

***Warnings generated in error logs during shutdown after running tests: perfschema.short_option_1

Version: '10.5.17-MariaDB-debug-log'  socket: '/home/jenkins/10.5/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  Source distribution
2022-09-11  1:02:33 0 [ERROR] Error in accept: Invalid argument
2022-09-11  1:02:33 0 [Note] /home/jenkins/10.5/sql/mariadbd (initiated by: unknown): Normal shutdown
2022-09-11  1:02:33 0 [Note] Event Scheduler: Purging the queue. 0 events
2022-09-11  1:02:33 0 [Note] Debug sync points hit:                   1941
2022-09-11  1:02:33 0 [Note] Debug sync points executed:              0
2022-09-11  1:02:33 0 [Note] Debug sync points max active per thread: 0
2022-09-11  1:02:33 0 [Note] /home/jenkins/10.5/sql/mariadbd: Shutdown complete
 
 
User time 0.26, System time 0.18
Maximum resident set size 351552, Integral resident set size 0
Non-physical pagefaults 112789, Physical pagefaults 0, Swaps 0
Blocks in 0 out 912, Messages in 0 out 0, Signals 0
Voluntary context switches 64, Involuntary context switches 2

My build was built with

cmake -DPLUGIN_ROCKSDB=NO -DPLUGIN_MROONGA=NO -DPLUGIN_SPHINX=NO -DPLUGIN_OQGRAPH=NO -DPLUGIN_FEDERATED=NO -DPLUGIN_FEDERATEDX=NO -DPLUGIN_SPIDER=NO -DWITH_MARIBACKUP=OFF -DPLUGIN_CONNECT=NO -DPLUGIN_EXAMPLE=NO -DPLUGIN_TOKUDB=NO -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=OFF -DWITH_EMBEDDED_SERVER=OFF -DWITH_SAFEMALLOC=OFF -DWITH_ZLIB=bundled -DWITH_SSL=bundled -DWITH_PCRE=bundled -DCMAKE_DISABLE_FIND_PACKAGE_URING=1 -DCMAKE_DISABLE_FIND_PACKAGE_LIBAIO=1 -DHAVE_LIBAIO_H=0 '-DCMAKE_C_FLAGS=-O2 -Wno-unused-command-line-argument -fdebug-macro' '-DCMAKE_CXX_FLAGS=-stdlib=libc++ -O2 -Wno-unused-command-line-argument -fdebug-macro' -DCMAKE_C_COMPILER=clang-12 -DCMAKE_CXX_COMPILER=clang++-12 -DPLUGIN_SPIDER=NO -DWITH_MSAN=ON -DMAX_INDEXES=128 -G Ninja .

but it's also happening in the Foundation buildbot.
I only observed it on 10.5, not sure about the buildbot (since it's a post-test failure, it's not stored in the cross-reference).

The effect is deterministic at least on 10.5. It started happening after this patch

commit 122742897b47a19c85b1a5e9932ab3a8c2a4134e
Author: Sergei Golubchik
Date:   Thu Jun 23 12:59:31 2022 +0200
 
    my_safe_process: try to kill the process softly first
    
    first SIGTERM and if the process didn't die in 10 seconds, SIGKILL it.
    
    This allows various tools like `rr`, `gcov`, `gprof`, etc to flush
    their data to disk properly

I can't tell whether the change introduced a bogus failure or revealed a previously hidden problem.



 Comments   
Comment by Marko Mäkelä [ 2022-10-06 ]

This test also hangs (causes a 2-hour timeout) on Valgrind starting with 10.5. I manually sent SIGQUIT to the /usr/bin/valgrind.bin. Here is the 10.5 run:

10.5 8d6421aa48de5f1581e0706cc65c3515954bcead

innodb_fts.fulltext2 'innodb'            w33 [ pass ]  192229
***Warnings generated in error logs during shutdown after running tests: perfschema.short_option_1
2022-10-06  8:20:19 0 [ERROR] Error in accept: Invalid argument
main.mdl_sync 'innodb'                   w27 [ pass ]  10581
--------------------------------------------------------------------------
The servers were restarted 1737 times
Spent 38840.958 of 8620 seconds executing testcases

More than 1 hour had elapsed (with no output at all in the log) before faust gave me remote access and I killed the Valgrind process.

I will disable the test also on Valgrind.

Comment by Marko Mäkelä [ 2022-10-06 ]

It is interesting to note that the test does not fail on the 10.6 MSAN, but only on 10.5 MSAN.

Comment by Marko Mäkelä [ 2022-10-12 ]

It turns out that adding

--source include/not_valgrind.inc

will not cause this test to be skipped on Valgrind. It would still hang. This can be worked around by specifying --skip-test="encryption\.*|^perfschema\.short_option_1$" in the Valgrind mtr invocation.

Comment by Michael Widenius [ 2022-10-18 ]

I don't think that short_option_1 has anything to do with the problem.
It is a very short simple test and should always work, with or without valgrind.
The only special thing with this test is that it is VERY short and will require a restart of the server before and after the test, just like short_option_2.
I assume that the bug is that after innodb_fts.fulltext2, the server was not able to shut down without a timeout.

I could not repeat the above build as I don't have either clang or Ninja installed.
However with gcc 7.5.0 I cannot repeat this issue on my machine

Marko did provide the following additional information:
test/var/tmp/21/mysqld.1.sock' port: 16380 Source distribution
2022-10-12 10:29:43 0 [ERROR] Error in accept: Invalid argument
2022-10-12 10:29:43 0 [Note] /buildbot/amd64-ubuntu-1804-valgrind/build/sql/mariadbd (initiated by: unknown): Normal shutdown

I don't understand how the 'accept' call could give the above error.
Maybe this is the reason for the problem?

Comment by Marko Mäkelä [ 2022-10-18 ]

Here is an excerpt from a server error log where Valgrind’s memcheck tool was sent SIGQUIT to resolve a hang during the test:

10.6 8872d2ee71cc65826909f6a64413ee46d1c5218d

2022-10-12  8:54:29 0 [Note] /buildbot/amd64-ubuntu-1804-valgrind/build/sql/mariadbd: Shutdown complete
 
CURRENT_TEST: perfschema.short_option_1
$ valgrind --tool=memcheck --show-reachable=yes --leak-check=yes --num-callers=16 --quiet --suppressions=/buildbot/amd64-ubuntu-1804-valgrind/build/mysql-test/valgrind.supp --leak-check=summary --gen-suppressions=yes --num-callers=10 /buildbot/amd64-ubuntu-1804-valgrind/build/sql/mariadbd --defaults-group-suffix=.1 --defaults-file=/buildbot/amd64-ubuntu-1804-valgrind/build/mysql-test/var/21/my.cnf --log-output=file -a -Cutf8 --collation-server=utf8_bin -T12 -W2 --core-file --loose-debug-sync-timeout=300 --loose-debug-gdb --loose-skip-stack-trace --loose-wait-for-pos-timeout=1500
2022-10-12  8:54:35 0 [Note] /buildbot/amd64-ubuntu-1804-valgrind/build/sql/mariadbd (mysqld 10.5.18-MariaDB-valgrind-max-debug-log) starting as process 99305 ...
2022-10-12  8:54:35 0 [Warning] Could not increase number of max_open_files to more than 1012 (request: 32246)
2022-10-12  8:54:35 0 [Warning] Changed limits: max_open_files: 1012  max_connections: 151 (was 151)  table_cache: 415 (was 2000)
2022-10-12  8:54:36 0 [Note] Plugin 'BLACKHOLE' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'partition' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'ARCHIVE' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'FEDERATED' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'S3' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'InnoDB' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'SPHINX' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'CONNECT' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'SEQUENCE' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_SYS_DATAFILES' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_SYS_TABLESTATS' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_BUFFER_PAGE' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_TRX' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_CMP_PER_INDEX' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_METRICS' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_SYS_FOREIGN_COLS' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_CMP' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'THREAD_POOL_WAITS' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_CMP_RESET' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'THREAD_POOL_QUEUES' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_SYS_FIELDS' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_BUFFER_PAGE_LRU' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'QUERY_CACHE_INFO' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_LOCKS' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_FT_INDEX_TABLE' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_CMPMEM' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'THREAD_POOL_GROUPS' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_CMP_PER_INDEX_RESET' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'QUERY_RESPONSE_TIME' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_FT_INDEX_CACHE' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_BUFFER_POOL_STATS' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_FT_BEING_DELETED' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_SYS_FOREIGN' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_CMPMEM_RESET' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_FT_DEFAULT_STOPWORD' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_SYS_TABLES' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_SYS_COLUMNS' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_FT_CONFIG' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_SYS_TABLESPACES' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_SYS_VIRTUAL' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_SYS_INDEXES' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'METADATA_LOCK_INFO' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'user_variables' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_SYS_SEMAPHORE_WAITS' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_FT_DELETED' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_MUTEXES' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'THREAD_POOL_STATS' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'LOCALES' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_TABLESPACES_ENCRYPTION' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'INNODB_LOCK_WAITS' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'unix_socket' is disabled.
2022-10-12  8:54:36 0 [Note] Plugin 'QUERY_RESPONSE_TIME_AUDIT' is disabled.
2022-10-12  8:54:36 0 [Warning] /buildbot/amd64-ubuntu-1804-valgrind/build/sql/mariadbd: unknown option '--loose-pam-debug'
2022-10-12  8:54:36 0 [Warning] /buildbot/amd64-ubuntu-1804-valgrind/build/sql/mariadbd: unknown option '--loose-aria'
2022-10-12  8:54:36 0 [Warning] /buildbot/amd64-ubuntu-1804-valgrind/build/sql/mariadbd: unknown variable 'loose-wait-for-pos-timeout=1500'
2022-10-12  8:54:37 0 [Note] Server socket created on IP: '127.0.0.1'.
2022-10-12  8:54:37 0 [Note] Reading of all Master_info entries succeeded
2022-10-12  8:54:38 0 [Note] Added new Master_info '' to hash table
2022-10-12  8:54:38 0 [Note] /buildbot/amd64-ubuntu-1804-valgrind/build/sql/mariadbd: ready for connections.
Version: '10.5.18-MariaDB-valgrind-max-debug-log'  socket: '/buildbot/amd64-ubuntu-1804-valgrind/build/mysql-test/var/tmp/21/mysqld.1.sock'  port: 16380  Source distribution
2022-10-12 10:29:43 0 [ERROR] Error in accept: Invalid argument
2022-10-12 10:29:43 0 [Note] /buildbot/amd64-ubuntu-1804-valgrind/build/sql/mariadbd (initiated by: unknown): Normal shutdown
2022-10-12 10:29:43 0 [Note] Event Scheduler: Purging the queue. 0 events
2022-10-12 10:29:43 0 [Note] Debug sync points hit:                   1178
2022-10-12 10:29:43 0 [Note] Debug sync points executed:              0
2022-10-12 10:29:43 0 [Note] Debug sync points max active per thread: 0
2022-10-12 10:29:43 0 [Note] /buildbot/amd64-ubuntu-1804-valgrind/build/sql/mariadbd: Shutdown complete
 
 
User time 12.09, System time 0.18
Maximum resident set size 331264, Integral resident set size 0
Non-physical pagefaults 40628, Physical pagefaults 0, Swaps 0
Blocks in 0 out 520, Messages in 0 out 0, Signals 0
Voluntary context switches 295, Involuntary context switches 250
CURRENT_TEST: versioning.online

Test driver log for the same:

10.6 8872d2ee71cc65826909f6a64413ee46d1c5218d

innodb_fts.fulltext 'innodb'             w34 [ pass ]  124276
innodb_fts.fulltext2 'innodb'            w12 [ pass ]  175714
***Warnings generated in error logs during shutdown after running tests: perfschema.short_option_1
2022-10-12 10:29:43 0 [ERROR] Error in accept: Invalid argument
versioning.online 'debug,innodb,timestamp' w21 [ pass ]  24807
--------------------------------------------------------------------------
The servers were restarted 1725 times
Spent 46687.321 of 6493 seconds executing testcases
Check of testcase failed for: spider/regression/e1121.direct_join_by_pkey_pkey spider.direct_update_part spider/regression/e1121.load_data_local_ddi1 spider/regression/e112122.load_data_part_replace_ddi0 spider/bugfix.spider_table_sts
Completed: All 4860 tests were successful.
Errors/warnings were found in logfiles during server shutdown after running the
following sequence(s) of tests:
    perfschema.short_option_1
1829 tests were skipped, 1515 by the test itself.
mysql-test-run: *** ERROR: There where errors/warnings in server logs after running test cases.
program finished with exit code 1
elapsedTime=6493.100494

If the memcheck process would not have been killed, a 7200-second timeout would have been reported at the end (without any test name), like this:

command timed out: 7200 seconds without output

Comment by Michael Widenius [ 2022-10-18 ]

Quick update:

  • I was able to repeat it on my home system with --repeat=10 and WITHOUT valgrind!
    It will still hang !
  • In other words, this test has nothing to do with valgrind and we can remove the disabling of valgrind here.
    Still need to fix the bug that was caused by Serg's patch.
    We should probably disable the error in 'accept' if we are under shutdown.
Comment by Michael Widenius [ 2022-10-18 ]

This issue was caused by the short_option_1-master.opt file that had the option -T12, which means (among other things) to use blocking for sockets. This was supported up to MariaDB 10.4, but not in 10.5 where we have removed the code that changes blocking sockets to non blocking in case of errors.
Fix for 10.5 will be pushed shortly

Comment by Julien Fritsch [ 2023-12-05 ]

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Comment by JiraAutomate [ 2023-12-05 ]

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Generated at Thu Feb 08 10:09:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.