Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21229

SIGABRT on most simple commands when "wsrep_on=1" AND eating up *all* available memory

Details

    Description

      Hello,
      I'm trying to pack MariaDB 10.4.10 to Fedora, but I'm stuck on the following issue(s) when I test the built packages.

      TL;DR:
      When I run even most simple SQL commands that writes (e.g. "CREATE DATABASE A;", but not "SHOW DATABASES;"), while the replication is ON ( "wsrep_on=1" ),
      the server will recieve SIGABRT,
      AND
      then it will consume all available memory AND get killed by oom (Linux out of memory killer).

      Reproducible everytime on Fedora 31 with MariaDB 10.4.10 and Galera 26.4.3 Fedora packages.

      Disabling Firewall nor SELinux helps.

      Installed packages:

      # dnf list installed | grep -i -e maria -e mysql -e galer
      galera.x86_64                                26.4.3-1.fc31                    @@commandline             
      galera-debuginfo.x86_64                      26.4.3-1.fc31                    @@commandline             
      galera-debugsource.x86_64                    26.4.3-1.fc31                    @@commandline             
      mariadb.x86_64                               3:10.4.10-2.fc31                 @@commandline             
      mariadb-common.x86_64                        3:10.4.10-2.fc31                 @@commandline             
      mariadb-connector-c-config.noarch            3.1.5-1.fc31                     @updates                  
      mariadb-debuginfo.x86_64                     3:10.4.10-2.fc31                 @@commandline             
      mariadb-debugsource.x86_64                   3:10.4.10-2.fc31                 @@commandline             
      mariadb-errmsg.x86_64                        3:10.4.10-2.fc31                 @@commandline             
      mariadb-libs.x86_64                          3:10.4.10-2.fc31                 @@commandline             
      mariadb-server.x86_64                        3:10.4.10-2.fc31                 @@commandline             
      mariadb-server-debuginfo.x86_64              3:10.4.10-2.fc31                 @@commandline             
      mariadb-server-galera.x86_64                 3:10.4.10-2.fc31                 @@commandline             
      mysql-selinux.noarch                         1.0.0-8.fc30                     @fedora            
      

      So basically the server, client, server-galera and galera.

      Configuration:

      # /usr/libexec/mysqld --print-defaults
      /usr/libexec/mysqld would have been started with the following arguments:
      --binlog_format=ROW
      --default-storage-engine=innodb
      --innodb_autoinc_lock_mode=2
      --bind-address=0.0.0.0
      --wsrep_on=1
      --wsrep_provider=/usr/lib64/galera/libgalera_smm.so
      --wsrep_cluster_name=my_wsrep_cluster
      --wsrep_cluster_address=gcomm://
      --wsrep_slave_threads=1
      --wsrep_certify_nonPK=1
      --wsrep_max_ws_rows=0
      --wsrep_max_ws_size=2147483647
      --wsrep_debug=0
      --wsrep_convert_LOCK_to_trx=0
      --wsrep_retry_autocommit=1
      --wsrep_auto_increment_control=1
      --wsrep_drupal_282555_workaround=0
      --wsrep_causal_reads=0
      --wsrep_notify_cmd=
      --wsrep_sst_method=rsync
      --wsrep_sst_auth=root:
      --datadir=/var/lib/mysql
      --socket=/var/lib/mysql/mysql.sock
      --log-error=/var/log/mariadb/mariadb.log
      --pid-file=/run/mariadb/mariadb.pid
      

      We don't need any more machines in the cluster. The issue is reproducible on the single machine started by "galera_new_cluster".
      But when in cluster, all of the nodes will fail & die.

      The issue is not reproducible, when the MariaDB packages are built in debug mode without optimization. (-O0)

      The issue is reproducible on every run, no matter how many times the server was restarted before or if it previously ran with different configuration.

      I start the server, however, with:

      rm -rf /var/lib/mysql/* /var/log/mariadb/mariadb.log \
       && galera_new_cluster
      

      So every time I run with the clean setup. There are no other data, than those created by the server during the first run.

      After the server started, I can attach to it by e.g. gdb.
      In the meantime, I start mysql client and run "CREATE DATABASE A;" sql command.

      The last breakpoint I was able to find is "sql_parse.cc:5061".
      I haven't much succes investigating past this line.

      Uknown number of instructions later, the server will recieve SIGABRT.

      As a part of SIGABRT handling, the server will try to get a stacktrace.
      During it, it will consume all available memory and get killed by oom (Linux Out Of Memory killer)

      The server has 2GB of RAM; <100M used when the DB is not running; ~500MB used when the DB is running, having ~1,4 GB free.
      That 1,4 GB get consumed in a blink of an eye.

      Last safe breakpoint I managed to find before that is "stacktrace.c:273".
      After that I wasn't successful to find the exact place where the memore get consumed.

      Let me know which additional information would you consider helpful and I'll try to get them to you.
      Since it is a x86_64 arch and it is always reproducible, there shouldn't be problem for me getting you anything you'd like to know.

      Attachments

        1. coredumpctl
          23 kB
        2. error.log
          13 kB
        3. gdb_output
          31 kB

        Activity

          mschorm Michal Schorm added a comment -

          The issue doesn't seem to affect packages you released.
          However It seems you haven't released half of the packages for Fedora 31:
          http://mirror.lstn.net/mariadb/mariadb-10.4.10/yum/fedora31-amd64/rpms/
          e.g. server, client, ...

          mschorm Michal Schorm added a comment - The issue doesn't seem to affect packages you released. However It seems you haven't released half of the packages for Fedora 31: http://mirror.lstn.net/mariadb/mariadb-10.4.10/yum/fedora31-amd64/rpms/ e.g. server, client, ...
          mschorm Michal Schorm added a comment - - edited

          It seems like the issue is not present, when I use the latest git source (branch 10.4: aab6cefe8) for building of the packages.
          I'll test it more thoroughly and confirm.

          mschorm Michal Schorm added a comment - - edited It seems like the issue is not present, when I use the latest git source (branch 10.4: aab6cefe8) for building of the packages. I'll test it more thoroughly and confirm.

          There was a build issue we discovered after the release where some of the Fedora 31 packages were not getting built. We discovered it before announcing the release and so we never announced support for Fedora 31 in the release notes for 10.4.10 or 10.3.20. However, the partial set of packages for Fedora 31 were mistakenly uploaded to the mirrors. I've now removed them from the primary mirror and the rest of the mirrors will update when they next pull from the primary mirror.

          My understanding is that the build issue has now been resolved, at least from looking at the most recent builds in buildbot. For example, this log from the most recent 10.4 build shows our basic install test succeeding on our fedora-31 builder: http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-fedora31-amd64/builds/158/steps/install/logs/stdio So for the next releases of 10.4 and 10.3 we will have a working set of Fedora 31 packages.

          dbart Daniel Bartholomew added a comment - There was a build issue we discovered after the release where some of the Fedora 31 packages were not getting built. We discovered it before announcing the release and so we never announced support for Fedora 31 in the release notes for 10.4.10 or 10.3.20. However, the partial set of packages for Fedora 31 were mistakenly uploaded to the mirrors. I've now removed them from the primary mirror and the rest of the mirrors will update when they next pull from the primary mirror. My understanding is that the build issue has now been resolved, at least from looking at the most recent builds in buildbot. For example, this log from the most recent 10.4 build shows our basic install test succeeding on our fedora-31 builder: http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-fedora31-amd64/builds/158/steps/install/logs/stdio So for the next releases of 10.4 and 10.3 we will have a working set of Fedora 31 packages.
          mschorm Michal Schorm added a comment -

          I gave it more testing and it looks like it really is resolved.

          IMHO you can mark it as solved in 10.4.11

          mschorm Michal Schorm added a comment - I gave it more testing and it looks like it really is resolved. IMHO you can mark it as solved in 10.4.11
          teemu.ollakka Teemu Ollakka added a comment -

          We found actual issue with wsrep-lib `std::vector` usage which causes assertion in std library when _GLIBCXX_ASSERTIONS is defined. The fix has been merged to wsrep-lib master and I opened a PR against MariaDB 10.4 to update wsrep-lib: https://github.com/MariaDB/server/pull/1423.

          teemu.ollakka Teemu Ollakka added a comment - We found actual issue with wsrep-lib `std::vector` usage which causes assertion in std library when _GLIBCXX_ASSERTIONS is defined. The fix has been merged to wsrep-lib master and I opened a PR against MariaDB 10.4 to update wsrep-lib: https://github.com/MariaDB/server/pull/1423 .
          mschorm Michal Schorm added a comment -

          Just FYI:

          I found that the commit I was testing did NOT solve the issue. (branch 10.4: aab6cefe8)
          (Unfortunatelly I made a packaging error, which by the way hid the issue. I found it only by the very last test before push to Fedora I wanted to made)

          However also I can confirm, the commit mentioned by Teemu Ollakka DO solve the issue (9a621200899)

          Thanks for fixing.

          mschorm Michal Schorm added a comment - Just FYI: I found that the commit I was testing did NOT solve the issue. (branch 10.4: aab6cefe8) (Unfortunatelly I made a packaging error, which by the way hid the issue. I found it only by the very last test before push to Fedora I wanted to made) However also I can confirm, the commit mentioned by Teemu Ollakka DO solve the issue (9a621200899) Thanks for fixing.

          People

            jplindst Jan Lindström (Inactive)
            mschorm Michal Schorm
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.