Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21229

SIGABRT on most simple commands when "wsrep_on=1" AND eating up *all* available memory

    XMLWordPrintable

    Details

      Description

      Hello,
      I'm trying to pack MariaDB 10.4.10 to Fedora, but I'm stuck on the following issue(s) when I test the built packages.

      TL;DR:
      When I run even most simple SQL commands that writes (e.g. "CREATE DATABASE A;", but not "SHOW DATABASES;"), while the replication is ON ( "wsrep_on=1" ),
      the server will recieve SIGABRT,
      AND
      then it will consume all available memory AND get killed by oom (Linux out of memory killer).

      Reproducible everytime on Fedora 31 with MariaDB 10.4.10 and Galera 26.4.3 Fedora packages.

      Disabling Firewall nor SELinux helps.

      Installed packages:

      # dnf list installed | grep -i -e maria -e mysql -e galer
      galera.x86_64                                26.4.3-1.fc31                    @@commandline             
      galera-debuginfo.x86_64                      26.4.3-1.fc31                    @@commandline             
      galera-debugsource.x86_64                    26.4.3-1.fc31                    @@commandline             
      mariadb.x86_64                               3:10.4.10-2.fc31                 @@commandline             
      mariadb-common.x86_64                        3:10.4.10-2.fc31                 @@commandline             
      mariadb-connector-c-config.noarch            3.1.5-1.fc31                     @updates                  
      mariadb-debuginfo.x86_64                     3:10.4.10-2.fc31                 @@commandline             
      mariadb-debugsource.x86_64                   3:10.4.10-2.fc31                 @@commandline             
      mariadb-errmsg.x86_64                        3:10.4.10-2.fc31                 @@commandline             
      mariadb-libs.x86_64                          3:10.4.10-2.fc31                 @@commandline             
      mariadb-server.x86_64                        3:10.4.10-2.fc31                 @@commandline             
      mariadb-server-debuginfo.x86_64              3:10.4.10-2.fc31                 @@commandline             
      mariadb-server-galera.x86_64                 3:10.4.10-2.fc31                 @@commandline             
      mysql-selinux.noarch                         1.0.0-8.fc30                     @fedora            
      

      So basically the server, client, server-galera and galera.

      Configuration:

      # /usr/libexec/mysqld --print-defaults
      /usr/libexec/mysqld would have been started with the following arguments:
      --binlog_format=ROW
      --default-storage-engine=innodb
      --innodb_autoinc_lock_mode=2
      --bind-address=0.0.0.0
      --wsrep_on=1
      --wsrep_provider=/usr/lib64/galera/libgalera_smm.so
      --wsrep_cluster_name=my_wsrep_cluster
      --wsrep_cluster_address=gcomm://
      --wsrep_slave_threads=1
      --wsrep_certify_nonPK=1
      --wsrep_max_ws_rows=0
      --wsrep_max_ws_size=2147483647
      --wsrep_debug=0
      --wsrep_convert_LOCK_to_trx=0
      --wsrep_retry_autocommit=1
      --wsrep_auto_increment_control=1
      --wsrep_drupal_282555_workaround=0
      --wsrep_causal_reads=0
      --wsrep_notify_cmd=
      --wsrep_sst_method=rsync
      --wsrep_sst_auth=root:
      --datadir=/var/lib/mysql
      --socket=/var/lib/mysql/mysql.sock
      --log-error=/var/log/mariadb/mariadb.log
      --pid-file=/run/mariadb/mariadb.pid
      

      We don't need any more machines in the cluster. The issue is reproducible on the single machine started by "galera_new_cluster".
      But when in cluster, all of the nodes will fail & die.

      The issue is not reproducible, when the MariaDB packages are built in debug mode without optimization. (-O0)

      The issue is reproducible on every run, no matter how many times the server was restarted before or if it previously ran with different configuration.

      I start the server, however, with:

      rm -rf /var/lib/mysql/* /var/log/mariadb/mariadb.log \
       && galera_new_cluster
      

      So every time I run with the clean setup. There are no other data, than those created by the server during the first run.

      After the server started, I can attach to it by e.g. gdb.
      In the meantime, I start mysql client and run "CREATE DATABASE A;" sql command.

      The last breakpoint I was able to find is "sql_parse.cc:5061".
      I haven't much succes investigating past this line.

      Uknown number of instructions later, the server will recieve SIGABRT.

      As a part of SIGABRT handling, the server will try to get a stacktrace.
      During it, it will consume all available memory and get killed by oom (Linux Out Of Memory killer)

      The server has 2GB of RAM; <100M used when the DB is not running; ~500MB used when the DB is running, having ~1,4 GB free.
      That 1,4 GB get consumed in a blink of an eye.

      Last safe breakpoint I managed to find before that is "stacktrace.c:273".
      After that I wasn't successful to find the exact place where the memore get consumed.

      Let me know which additional information would you consider helpful and I'll try to get them to you.
      Since it is a x86_64 arch and it is always reproducible, there shouldn't be problem for me getting you anything you'd like to know.

        Attachments

        1. coredumpctl
          23 kB
        2. error.log
          13 kB
        3. gdb_output
          31 kB

          Activity

            People

            Assignee:
            jplindst Jan Lindström
            Reporter:
            mschorm Michal Schorm
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: