Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 10.4.13
    • N/A
    • Galera
    • None
    • Linux 4.19.124-gentoo x86_64 AMD EPYC 7451, Intel I350 Gigabit Ethernet

    Description

      Note. Recent Galera version affected is 1.0.4/26.4.4, but same is relevant for previous versions, at least 26.4.3.

      We have cluster of 3 nodes working with large (above 1 TB) data volume. All the nodes have same hardware and software. Sometimes nodes run SST and IST to transfer data.
      It was notices that frequently SST fails and has to be restarted due to error:

      [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error)
      

      We tried to set bigger galera gcache size, but on some cases error happened again, on some it didn't. Moreover, sometimes simple restart of mysqld on receiving node (and thus restarting SST when donor node returned to synced state back) lead to successful SST and joiner managed to join the cluster, but sometimes it failed.

      • It was noticed that gcache size and amount of transactions happening on cluster nodes has no effect on the issue.
      • disabling or enabling compression of state transfer data and also attempts to flush logs has had no effect also
      • It was also noticed that in case IST failed, it was always possible to find same error message logged at 20(+/- 1) minutes after starting mysqld on joining node (thus, 20 minutes after state transfer request). This error was:

      2020-06-01 21:50:42 0 [Note] WSREP: IST sender 232217729 -> 232234231
      ...
      WSREP_SST: [INFO] Evaluating /usr/bin/mariabackup --innobackupex --defaults-file=/etc/mysql/my.cnf     $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2> /var/lib/mysql//mariabackup.backup.log | /usr/bin/zstd --fast=3 | socat -u stdio TCP:***.*.***.*:4444; RC=( ${PIPESTATUS[@]} ) (20200601 21:50:53.977)
      2020-06-01 22:10:59 0 [ERROR] WSREP: async IST sender failed to serve tcp://***.*.***.*:4568: ist send failed: asio.system:110', asio error 'write: Connection timed out': 110 (Connection timed out)
           at galera/src/ist.cpp:send():887
      2020-06-01 22:10:59 0 [Note] WSREP: async IST sender served
      

      Appearance of these last two lines (error+note) in mysqld log file always ended with state transfer failed with following errors logged:

      2020-06-02  1:49:58 0 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:232217728, protocol version: 5
      2020-06-02  1:49:58 0 [ERROR] WSREP: got asio system error while reading IST stream: asio.system:104
      2020-06-02  1:49:58 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 232234231 last received: 232221423
      2020-06-02  1:49:58 2 [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error)
           at galera/src/replicator_smm.hpp:pop_front():314. Null event.
      

      So, questionable things are:

      1. How to avoid such situations - nodes require manual restart on failed transfers!
      2. Why this asio error is always logged 20 minutes after state transfer start?
      3. Reported failure is 'Connection timed out' while connection is stable and no service or monitoring tool reports connection issues
      4. Issue is floating: on some restarts it appears and on others it doesn't, this was actual for previous version on galera library, too. No configuration change seems to cause or solve this.
      5. Also it was noted that referred asio library that is used by galera is 1.10.8 and this version can't be changed - however version 1.18 is out already.

      Attachments

        Issue Links

          Activity

            In Galera library version 26.4.15 there is asio to 1.14.1 maybe that can be tested with more recent version of MariaDB server. Does the issue still reproduce?

            janlindstrom Jan Lindström added a comment - In Galera library version 26.4.15 there is asio to 1.14.1 maybe that can be tested with more recent version of MariaDB server. Does the issue still reproduce?
            euglorg Eugene added a comment -

            Just for the case, since we upgraded to mariadb-10.6 (and thus, asio-1.26), we haven't seen this very issue. For at least half a year.

            euglorg Eugene added a comment - Just for the case, since we upgraded to mariadb-10.6 (and thus, asio-1.26), we haven't seen this very issue. For at least half a year.
            jumpojoy Vasyl Saienko added a comment -

            Hello Eugene,

            We tried to use more fresh mariadb version and still see the same issue, during abnormal shutdown with kill -9.

            mysql@mariadb-server-0:/$ dpkg -l |grep galera
            ii  galera-4                   26.4.14-ubu2004                   amd64        Replication framework for transactional applications
            mysql@mariadb-server-0:/$ dpkg -l |grep maria
            ii  libdbd-mariadb-perl        1.11-3ubuntu2                     amd64        Perl5 database interface to the MariaDB/MySQL databases
            ii  libmariadb3:amd64          1:10.6.14+maria~ubu2004           amd64        MariaDB database client library
            ii  mariadb-backup             1:10.6.14+maria~ubu2004           amd64        Backup tool for MariaDB server
            ii  mariadb-client-10.6        1:10.6.14+maria~ubu2004           amd64        MariaDB database client binaries
            ii  mariadb-client-core-10.6   1:10.6.14+maria~ubu2004           amd64        MariaDB database core client binaries
            ii  mariadb-common             1:10.6.14+maria~ubu2004           all          MariaDB common configuration files
            ii  mariadb-server             1:10.6.14+maria~ubu2004           all          MariaDB database server (metapackage depending on the latest version)
            ii  mariadb-server-10.6        1:10.6.14+maria~ubu2004           amd64        MariaDB database server binaries
            ii  mariadb-server-core-10.6   1:10.6.14+maria~ubu2004           amd64        MariaDB database core server files
            ii  mysql-common               1:10.6.14+maria~ubu2004           all          MariaDB database common files (e.g. /etc/mysql/my.cnf)
            

            jumpojoy Vasyl Saienko added a comment - Hello Eugene, We tried to use more fresh mariadb version and still see the same issue, during abnormal shutdown with kill -9. mysql @mariadb -server- 0 :/$ dpkg -l |grep galera ii galera- 4 26.4 . 14 -ubu2004 amd64 Replication framework for transactional applications mysql @mariadb -server- 0 :/$ dpkg -l |grep maria ii libdbd-mariadb-perl 1.11 -3ubuntu2 amd64 Perl5 database interface to the MariaDB/MySQL databases ii libmariadb3:amd64 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database client library ii mariadb-backup 1 : 10.6 . 14 +maria~ubu2004 amd64 Backup tool for MariaDB server ii mariadb-client- 10.6 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database client binaries ii mariadb-client-core- 10.6 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database core client binaries ii mariadb-common 1 : 10.6 . 14 +maria~ubu2004 all MariaDB common configuration files ii mariadb-server 1 : 10.6 . 14 +maria~ubu2004 all MariaDB database server (metapackage depending on the latest version) ii mariadb-server- 10.6 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database server binaries ii mariadb-server-core- 10.6 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database core server files ii mysql-common 1 : 10.6 . 14 +maria~ubu2004 all MariaDB database common files (e.g. /etc/mysql/my.cnf)
            jumpojoy Vasyl Saienko added a comment -

            oh, there is 26.4.15, let me try and get back with feedback

            jumpojoy Vasyl Saienko added a comment - oh, there is 26.4.15, let me try and get back with feedback
            jumpojoy Vasyl Saienko added a comment -

            I confirm with galera 26.4.15 issue is fixed, do you know when mariadb with this galera version will be released officially?

            jumpojoy Vasyl Saienko added a comment - I confirm with galera 26.4.15 issue is fixed, do you know when mariadb with this galera version will be released officially?

            People

              teemu.ollakka Teemu Ollakka
              euglorg Eugene
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.