[MDEV-21229] SIGABRT on most simple commands when "wsrep_on=1" AND eating up *all* available memory - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.4.10
Fix Version/s: 10.4.11
Component/s: Galera, Replication, Server
Labels:
- crash
- replication
Environment:
Fedora 31; latest updates & kernel applied
X86_64

Description

Hello,
I'm trying to pack MariaDB 10.4.10 to Fedora, but I'm stuck on the following issue(s) when I test the built packages.

—

TL;DR:
When I run even most simple SQL commands that writes (e.g. "CREATE DATABASE A;", but not "SHOW DATABASES;"), while the replication is ON ( "wsrep_on=1" ),
the server will recieve SIGABRT,
AND
then it will consume all available memory AND get killed by oom (Linux out of memory killer).

Reproducible everytime on Fedora 31 with MariaDB 10.4.10 and Galera 26.4.3 Fedora packages.

Disabling Firewall nor SELinux helps.

—

Installed packages:

# dnf list installed | grep -i -e maria -e mysql -e galer

galera.x86_64                                26.4.3-1.fc31                    @@commandline

galera-debuginfo.x86_64                      26.4.3-1.fc31                    @@commandline

galera-debugsource.x86_64                    26.4.3-1.fc31                    @@commandline

mariadb.x86_64                               3:10.4.10-2.fc31                 @@commandline

mariadb-common.x86_64                        3:10.4.10-2.fc31                 @@commandline

mariadb-connector-c-config.noarch            3.1.5-1.fc31                     @updates

mariadb-debuginfo.x86_64                     3:10.4.10-2.fc31                 @@commandline

mariadb-debugsource.x86_64                   3:10.4.10-2.fc31                 @@commandline

mariadb-errmsg.x86_64                        3:10.4.10-2.fc31                 @@commandline

mariadb-libs.x86_64                          3:10.4.10-2.fc31                 @@commandline

mariadb-server.x86_64                        3:10.4.10-2.fc31                 @@commandline

mariadb-server-debuginfo.x86_64              3:10.4.10-2.fc31                 @@commandline

mariadb-server-galera.x86_64                 3:10.4.10-2.fc31                 @@commandline

mysql-selinux.noarch                         1.0.0-8.fc30                     @fedora

So basically the server, client, server-galera and galera.

—

Configuration:

# /usr/libexec/mysqld --print-defaults

/usr/libexec/mysqld would have been started with the following arguments:

--binlog_format=ROW

--default-storage-engine=innodb

--innodb_autoinc_lock_mode=2

--bind-address=0.0.0.0

--wsrep_on=1

--wsrep_provider=/usr/lib64/galera/libgalera_smm.so

--wsrep_cluster_name=my_wsrep_cluster

--wsrep_cluster_address=gcomm://

--wsrep_slave_threads=1

--wsrep_certify_nonPK=1

--wsrep_max_ws_rows=0

--wsrep_max_ws_size=2147483647

--wsrep_debug=0

--wsrep_convert_LOCK_to_trx=0

--wsrep_retry_autocommit=1

--wsrep_auto_increment_control=1

--wsrep_drupal_282555_workaround=0

--wsrep_causal_reads=0

--wsrep_notify_cmd=

--wsrep_sst_method=rsync

--wsrep_sst_auth=root:

--datadir=/var/lib/mysql

--socket=/var/lib/mysql/mysql.sock

--log-error=/var/log/mariadb/mariadb.log

--pid-file=/run/mariadb/mariadb.pid

We don't need any more machines in the cluster. The issue is reproducible on the single machine started by "galera_new_cluster".
But when in cluster, all of the nodes will fail & die.

The issue is not reproducible, when the MariaDB packages are built in debug mode without optimization. (-O0)

–

The issue is reproducible on every run, no matter how many times the server was restarted before or if it previously ran with different configuration.

I start the server, however, with:

rm -rf /var/lib/mysql/* /var/log/mariadb/mariadb.log \

 && galera_new_cluster

So every time I run with the clean setup. There are no other data, than those created by the server during the first run.

—

After the server started, I can attach to it by e.g. gdb.
In the meantime, I start mysql client and run "CREATE DATABASE A;" sql command.

The last breakpoint I was able to find is "sql_parse.cc:5061".
I haven't much succes investigating past this line.

Uknown number of instructions later, the server will recieve SIGABRT.

As a part of SIGABRT handling, the server will try to get a stacktrace.
During it, it will consume all available memory and get killed by oom (Linux Out Of Memory killer)

The server has 2GB of RAM; <100M used when the DB is not running; ~500MB used when the DB is running, having ~1,4 GB free.
That 1,4 GB get consumed in a blink of an eye.

Last safe breakpoint I managed to find before that is "stacktrace.c:273".
After that I wasn't successful to find the exact place where the memore get consumed.

—

Let me know which additional information would you consider helpful and I'll try to get them to you.
Since it is a x86_64 arch and it is always reproducible, there shouldn't be problem for me getting you anything you'd like to know.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

coredumpctl
23 kB
2019-12-05 08:53
error.log
13 kB
2019-12-05 08:53
gdb_output
31 kB
2019-12-05 08:53

Activity

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Michal Schorm

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2019-12-05 08:54

Updated:: 2019-12-09 13:39

Resolved:: 2019-12-06 10:42

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

SIGABRT on most simple commands when "wsrep_on=1" AND eating up all available memory

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration