[MDEV-29352] SIGSEGV's in strlen and unknown location on optimized builds at SHUTDOWN Created: 2022-08-23  Updated: 2022-10-25  Resolved: 2022-09-21

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Spider
Affects Version/s: 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.3.37, 10.4.27, 10.5.18, 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2, 10.11.0

Type: Bug Priority: Critical
Reporter: Roel Van de Paar Assignee: Nayuta Yanagisawa (Inactive)
Resolution: Fixed Votes: 0
Labels: affects-tests, memory_corruption, stack-smashing

Issue Links:
Relates
relates to MDEV-28625 Spider: SIGSEGV in _dl_lookup_symbol_... Open

 Description   

This issue may potentially be at the core of, or close to, the many 'Cannot access memory at address' stacks I have observed in the last few months.

SHOW CREATE TABLE t;
FLUSH TABLES WITH READ LOCK;
CREATE FUNCTION spider_bg_direct_sql RETURNS INT SONAME 'ha_spider.so';
SELECT * FROM t;
SHUTDOWN;

Leads to:

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Optimized)

Core was generated by `/test/MD190822-mariadb-10.11.0-linux-x86_64-opt/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000014f6a8f94e4a in ?? ()
[Current thread is 1 (LWP 710596)]
(gdb) bt
#0  0x000014f6a8f94e4a in ?? ()
Backtrace stopped: Cannot access memory at address 0x7ffcf60b68b8

10.10.2 87e8463e0454a04c2bbaa38d44227c491fb07dc1 (Optimized)

Core was generated by `/test/MD200822-mariadb-10.10.2-linux-x86_64-opt/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  strlen () at ../sysdeps/x86_64/multiarch/../strlen.S:120
[Current thread is 1 (Thread 0x14f409a0c800 (LWP 710789))]
(gdb) bt
#0  strlen () at ../sysdeps/x86_64/multiarch/../strlen.S:120
#1  0x000014f40a22cce5 in __GI__dl_exception_create (exception=0x7ffe61c688a0, objname=0x4c8 <error: Cannot access memory at address 0x4c8>, errstring=errstring@entry=0x14f40a239b69 "shared object not open") at dl-exception.c:74
#2  0x000014f409d23878 in __GI__dl_signal_error (errcode=errcode@entry=0, objname=<optimized out>, occation=occation@entry=0x0, errstring=errstring@entry=0x14f40a239b69 "shared object not open") at dl-error-skeleton.c:117
#3  0x000014f40a22bf61 in _dl_close (_map=0x14f39c023590) at dl-close.c:856
#4  0x000014f409d23928 in __GI__dl_catch_exception (exception=exception@entry=0x7ffe61c688a0, operate=operate@entry=0x14f40a0e9420 <dlclose_doit>, args=args@entry=0x14f39c023590) at dl-error-skeleton.c:208
#5  0x000014f409d239f3 in __GI__dl_catch_error (objname=objname@entry=0x55753a399c50, errstring=errstring@entry=0x55753a399c58, mallocedp=mallocedp@entry=0x55753a399c48, operate=operate@entry=0x14f40a0e9420 <dlclose_doit>, args=args@entry=0x14f39c023590) at dl-error-skeleton.c:227
#6  0x000014f40a0e9b59 in _dlerror_run (operate=operate@entry=0x14f40a0e9420 <dlclose_doit>, args=0x14f39c023590) at dlerror.c:170
#7  0x000014f40a0e9468 in __dlclose (handle=<optimized out>) at dlclose.c:46
#8  0x00005575388f11e7 in udf_free () at /test/10.10_opt/sql/sql_udf.cc:302
#9  0x000055753872913a in clean_up (print_message=print_message@entry=true) at /test/10.10_opt/sql/mysqld.cc:1989
#10 0x00005575387313da in clean_up (print_message=true) at /test/10.10_opt/sql/mysqld.cc:5940
#11 mysqld_main (argc=<optimized out>, argv=<optimized out>) at /test/10.10_opt/sql/mysqld.cc:5940
#12 0x000014f409be7083 in __libc_start_main (main=0x5575386e55c0 <main(int, char**)>, argc=10, argv=0x7ffe61c68fc8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe61c68fb8) at ../csu/libc-start.c:308
#13 0x00005575387274be in _start () at /test/10.10_opt/sql/mysqld.cc:4558

10.6.10 75c416d3627650a5b43c70a8150292990206e3e0 (Optimized)

Core was generated by `/test/MD200822-mariadb-10.6.10-linux-x86_64-opt/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  strlen () at ../sysdeps/x86_64/multiarch/../strlen.S:120
[Current thread is 1 (Thread 0x147534425800 (LWP 711485))]
(gdb) bt
#0  strlen () at ../sysdeps/x86_64/multiarch/../strlen.S:120
#1  0x0000147534c45ce5 in __GI__dl_exception_create (exception=0x7ffcd97bf6d0, objname=0x4c8 <error: Cannot access memory at address 0x4c8>, errstring=errstring@entry=0x147534c52b69 "shared object not open") at dl-exception.c:74
#2  0x000014753473c878 in __GI__dl_signal_error (errcode=errcode@entry=0, objname=<optimized out>, occation=occation@entry=0x0, errstring=errstring@entry=0x147534c52b69 "shared object not open") at dl-error-skeleton.c:117
#3  0x0000147534c44f61 in _dl_close (_map=0x1474580232e0) at dl-close.c:856
#4  0x000014753473c928 in __GI__dl_catch_exception (exception=exception@entry=0x7ffcd97bf6d0, operate=operate@entry=0x147534b02420 <dlclose_doit>, args=args@entry=0x1474580232e0) at dl-error-skeleton.c:208
#5  0x000014753473c9f3 in __GI__dl_catch_error (objname=objname@entry=0x562f1c459c50, errstring=errstring@entry=0x562f1c459c58, mallocedp=mallocedp@entry=0x562f1c459c48, operate=operate@entry=0x147534b02420 <dlclose_doit>, args=args@entry=0x1474580232e0) at dl-error-skeleton.c:227
#6  0x0000147534b02b59 in _dlerror_run (operate=operate@entry=0x147534b02420 <dlclose_doit>, args=0x1474580232e0) at dlerror.c:170
#7  0x0000147534b02468 in __dlclose (handle=<optimized out>) at dlclose.c:46
#8  0x0000562f19eac487 in udf_free () at /test/10.6_opt/sql/sql_udf.cc:302
#9  0x0000562f19d2418a in clean_up (print_message=print_message@entry=true) at /test/10.6_opt/sql/mysqld.cc:1958
#10 0x0000562f19d2c42a in clean_up (print_message=true) at /test/10.6_opt/sql/mysqld.cc:5895
#11 mysqld_main (argc=<optimized out>, argv=<optimized out>) at /test/10.6_opt/sql/mysqld.cc:5895
#12 0x0000147534600083 in __libc_start_main (main=0x562f19ce1a80 <main(int, char**)>, argc=10, argv=0x7ffcd97bfdf8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffcd97bfde8) at ../csu/libc-start.c:308
#13 0x0000562f19d2263e in _start () at /test/10.6_opt/sql/mysqld.cc:4521

Bug confirmed present in:
MariaDB: 10.6.10 (opt), 10.7.6 (opt), 10.8.5 (opt), 10.9.2 (opt), 10.10.2 (opt), 10.11.0 (opt)

Bug (or feature/syntax) confirmed not present in:
MariaDB: 10.3.37 (dbg), 10.3.37 (opt), 10.4.27 (dbg), 10.4.27 (opt), 10.5.18 (dbg), 10.5.18 (opt), 10.6.10 (dbg), 10.7.6 (dbg), 10.8.5 (dbg), 10.9.2 (dbg), 10.10.2 (dbg), 10.11.0 (dbg)
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.51 (dbg), 5.6.51 (opt), 5.7.38 (dbg), 5.7.38 (opt), 8.0.29 (dbg), 8.0.29 (opt)

UniqueID's seen in connection with this testcase:

SIGSEGV|Backtrace stopped: Cannot access memory at address
SIGSEGV|strlen|Backtrace stopped: Cannot access memory at address
SIGSEGV|strlen|__GI__dl_exception_create|__GI__dl_signal_error|_dl_close



 Comments   
Comment by Nayuta Yanagisawa (Inactive) [ 2022-08-23 ]

After a few attempts, I've not yet succeeded in reproducing the bug. Checking...

Comment by Roel Van de Paar [ 2022-08-23 ]

nayuta-yanagisawa Ack. Both myself and Ramesh could reproduce the bug readily at the CLI command prompt. Maybe elenst has some bright MTR ideas.

Comment by Elena Stepanova [ 2022-08-23 ]

nayuta-yanagisawa the scenario in the description is apparently missing the actual CREATE TABLE t statement. I would try to create the table in order to reproduce.

Comment by Roel Van de Paar [ 2022-08-24 ]

The issue does reproduce at the CLI when a table is created, though it is not required.

CREATE TABLE t (c INT) ENGINE=InnoDB;  # Optional
SHOW CREATE TABLE t;
FLUSH TABLES WITH READ LOCK;
CREATE FUNCTION spider_bg_direct_sql RETURNS INT SONAME 'ha_spider.so';
SELECT * FROM t;
SHUTDOWN;

Comment by Nayuta Yanagisawa (Inactive) [ 2022-08-24 ]

Thank you both for your comments. I now can reproduce the bug with the following MTR test case:

CREATE TABLE t (c INT);
SHOW CREATE TABLE t;
FLUSH TABLES WITH READ LOCK;
 
--error ER_CANT_UPDATE_WITH_READLOCK
CREATE FUNCTION spider_bg_direct_sql RETURNS INT SONAME 'ha_spider.so';
SELECT * FROM t;
 
--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.1.expect
--shutdown_server
--source include/wait_until_disconnected.inc
--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.1.expect
--enable_reconnect
--source include/wait_until_connected_again.inc
 
DROP TABLE t;

Comment by Roel Van de Paar [ 2022-08-24 ]

Impressive testcase! Thank you

Comment by Nayuta Yanagisawa (Inactive) [ 2022-08-24 ]

I ran git-bisect with the above test case and got the following. I am still not yet sure that this is the true cause of the problem. Still checking ...

ee620a7416ac1719de322f16593fe403f5c43833 is the first bad commit
commit ee620a7416ac1719de322f16593fe403f5c43833
Merge: 558f1eff64e 3b071bad198
Author: Oleksandr Byelkin <sanja@mariadb.com>
Date:   Thu Aug 4 16:58:42 2022 +0200
 
    Merge branch '10.5' into 10.6
 
 client/mysqldump.c                                 |  65 +++++-------
 client/mysqlimport.c                               |   8 +-
 cmake/cpack_rpm.cmake                              |   2 +-
 man/mysqldump.1                                    |  30 ++++++
 mysql-test/include/default_my.cnf                  |   1 +
 mysql-test/main/func_group.result                  |  25 +++++
 mysql-test/main/func_group.test                    |  19 ++++
 mysql-test/main/insert_select.result               |  89 ++++++++++++++++
 mysql-test/main/insert_select.test                 |  82 +++++++++++++++
 mysql-test/main/lock_view.result                   |  49 ++++-----
 mysql-test/main/mysqldump-nl.result                |  11 +-
 mysql-test/main/mysqldump-timing.result            |  71 +++++++++++++
 mysql-test/main/mysqldump-timing.test              |  26 +++++
 mysql-test/main/mysqldump.result                   | 117 ++++++++-------------
 mysql-test/main/range_notembedded.result           |   9 ++
 mysql-test/main/range_notembedded.test             |   9 ++
 .../innodb_fts/r/ft_result_cache_limit.result      |  39 +++++++
 .../suite/innodb_fts/t/ft_result_cache_limit.test  |  57 ++++++++++
 mysql-test/suite/roles/definer.result              |  47 ++++-----
 sql/item_cmpfunc.cc                                |  13 ++-
 sql/item_func.cc                                   |   2 +
 sql/opt_range.cc                                   |   5 +-
 sql/sql_delete.cc                                  |   4 +-
 sql/sql_lex.cc                                     |   7 +-
 sql/sql_select.cc                                  |  23 ++++
 sql/sql_select.h                                   |   1 +
 storage/columnstore/columnstore                    |   2 +-
 storage/innobase/fts/fts0que.cc                    |  15 +++
 28 files changed, 645 insertions(+), 183 deletions(-)
 create mode 100644 mysql-test/main/mysqldump-timing.result
 create mode 100644 mysql-test/main/mysqldump-timing.test
 create mode 100644 mysql-test/suite/innodb_fts/r/ft_result_cache_limit.result
 create mode 100644 mysql-test/suite/innodb_fts/t/ft_result_cache_limit.test

Comment by Nayuta Yanagisawa (Inactive) [ 2022-08-30 ]

The test doesn't fail on both parents of the above "first bad commit." In other words, it was only when the merge was performed that the bug became apparent.

This is only speculation, but I suspect that this is not a very recently introduced problem but rather just an accidental exposure of an old problem. I cannot say for sure. I will continue to investigate further.

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-13 ]

holyfoot please review: https://github.com/MariaDB/server/commit/6b6bee9b7b8a315c700ca89f5071c22855d65abd

Comment by Alexey Botchkov [ 2022-09-15 ]

ok to push.

Generated at Thu Feb 08 10:07:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.