Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10314

wsrep_sync_wait does not seem to be working

Details

    Description

      We have been trying to test the critical reads that are enabled by 'wsrep_sync_wait' parameter but seems like the results are not deterministic.

      Here is the test script:
      https://github.com/sjangra-git/galera-tests/blob/master/scripts/test.sh

      This test script based out of the following test-case:
      https://github.com/MariaDB/server/blob/2783fc7d14bc8ad16acfeb509d3b19615023f47a/mysql-test/suite/galera/t/mysql-wsrep%23201.test#L5

      Here is the version info:

      MariaDB [(none)]> show variables like 'version%';
      +-------------------------+---------------------------------+
      | Variable_name           | Value                           |
      +-------------------------+---------------------------------+
      | version                 | 10.1.12-MariaDB                 |
      | version_comment         | MariaDB Server                  |
      | version_compile_machine | x86_64                          |
      | version_compile_os      | Linux                           |
      | version_malloc_library  | system jemalloc                 |
      | version_ssl_library     | OpenSSL 1.0.1e-fips 11 Feb 2013 |
      +-------------------------+---------------------------------+
      

      The errors are more prominent when we run the following java test where the connection is persisted, so the queries are being attempted faster:
      https://github.com/sjangra-git/galera-tests

      Auto-commit is OFF on the servers in the cluster:

      MariaDB [(none)]> show global variables like 'autocommit';
      +---------------+-------+
      | Variable_name | Value |
      +---------------+-------+
      | autocommit    | OFF   |
      +---------------+-------+
      

      Few runs failed, few passed for the same test:

      -bash-4.1$ ./test.sh 10.226.76.38 10.226.76.32
      val1=5735 val2=5732
      syn_wait FAILED
      -bash-4.1$ ./test.sh 10.226.76.38 10.226.76.32
      -bash-4.1$ ./test.sh 10.226.76.38 10.226.76.32
      -bash-4.1$ ./test.sh 10.226.76.38 10.226.76.32
      -bash-4.1$ ./test.sh 10.226.76.38 10.226.76.32
      -bash-4.1$ ./test.sh 10.226.76.38 10.226.76.32
      

      Attachments

        Issue Links

          Activity

            Hi sandeep Please share your node configurations? What is the server version?

            nirbhay_c Nirbhay Choubey (Inactive) added a comment - Hi sandeep Please share your node configurations? What is the server version?
            sandeep Sandeep Jangra added a comment - - edited

            Here is my node config for my local VMs:

            MariaDB [(none)]> select @@version;
            -----------------

            @@version

            -----------------

            10.1.15-MariaDB

            -----------------

            Attaching the my.cnf file with this ticket. my.cnf

            I am also running 10.1.12 with the same configuration in a different cluster.

            sandeep Sandeep Jangra added a comment - - edited Here is my node config for my local VMs: MariaDB [(none)] > select @@version; ----------------- @@version ----------------- 10.1.15-MariaDB ----------------- Attaching the my.cnf file with this ticket. my.cnf I am also running 10.1.12 with the same configuration in a different cluster.

            sandeep I did try with 50K iterations with no failures. Can you try it again with QC turned off just
            to be sure you are not hitting https://github.com/codership/mysql-wsrep/issues/201?

            nirbhay_c Nirbhay Choubey (Inactive) added a comment - sandeep I did try with 50K iterations with no failures. Can you try it again with QC turned off just to be sure you are not hitting https://github.com/codership/mysql-wsrep/issues/201?

            Nirbhay,

            I tried with cache turned OFF on both nodes in my 2 node cluster.

            MariaDB [(none)]> select @@global.query_cache_type;
            ---------------------------

            @@global.query_cache_type

            ---------------------------

            OFF

            ---------------------------
            1 row in set (0.00 sec)

            Updated the test.sh to disable the cache on each run. https://github.com/sjangra-git/galera-tests/blob/master/scripts/test.sh#L21

            I still see errors are random values of the counter.
            [app@cb-node1 ~]$ ./test.sh 192.168.42.101 192.168.42.102
            val1=8 val2=6
            syn_wait FAILED
            [app@cb-node1 ~]$ ./test.sh 192.168.42.101 192.168.42.102
            val1=3234 val2=3232
            syn_wait FAILED
            [app@cb-node1 ~]$ ./test.sh 192.168.42.101 192.168.42.102
            val1=688 val2=686
            syn_wait FAILED
            [app@cb-node1 ~]$ ./test.sh 192.168.42.101 192.168.42.102
            val1=9852 val2=9850

            I will see if I can create a VM image of the environment and send it with this jira so we can look at the same environment.

            Btw I did see the issue move to 'confirmed', just curious if it failed for you too.

            sandeep Sandeep Jangra added a comment - Nirbhay, I tried with cache turned OFF on both nodes in my 2 node cluster. MariaDB [(none)] > select @@global.query_cache_type; --------------------------- @@global.query_cache_type --------------------------- OFF --------------------------- 1 row in set (0.00 sec) Updated the test.sh to disable the cache on each run. https://github.com/sjangra-git/galera-tests/blob/master/scripts/test.sh#L21 I still see errors are random values of the counter. [app@cb-node1 ~] $ ./test.sh 192.168.42.101 192.168.42.102 val1=8 val2=6 syn_wait FAILED [app@cb-node1 ~] $ ./test.sh 192.168.42.101 192.168.42.102 val1=3234 val2=3232 syn_wait FAILED [app@cb-node1 ~] $ ./test.sh 192.168.42.101 192.168.42.102 val1=688 val2=686 syn_wait FAILED [app@cb-node1 ~] $ ./test.sh 192.168.42.101 192.168.42.102 val1=9852 val2=9850 I will see if I can create a VM image of the environment and send it with this jira so we can look at the same environment. Btw I did see the issue move to 'confirmed', just curious if it failed for you too.

            Its has been confirmed. Its due to the thread pool (thread_handling=pool-of-threads).

            nirbhay_c Nirbhay Choubey (Inactive) added a comment - Its has been confirmed. Its due to the thread pool (thread_handling=pool-of-threads).

            People

              nirbhay_c Nirbhay Choubey (Inactive)
              sandeep Sandeep Jangra
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.