Details

    Description

      Running 10 threads slightly modified OLTP RW on Power8 hangs:

      sysbench 1.0:  multi-threaded system evaluation benchmark
       
      Running the test with following options:
      Number of threads: 10
      Report intermediate results every 10 second(s)
      Initializing random number generator from seed (42).
       
       
      Initializing worker threads...
       
      Threads started!
       
      [  10s] threads: 10, tps: 1514.48, reads: 7665.91, writes: 54882.19, response time: 12.08ms (95%), errors: 17.70, reconnects:  0.00
      [  20s] threads: 10, tps: 1595.89, reads: 8061.07, writes: 57769.51, response time: 11.35ms (95%), errors: 16.40, reconnects:  0.00
      [  30s] threads: 10, tps: 1623.60, reads: 8194.80, writes: 58757.63, response time: 11.28ms (95%), errors: 15.30, reconnects:  0.00
      [  40s] threads: 10, tps: 1612.90, reads: 8150.98, writes: 58388.66, response time: 11.23ms (95%), errors: 17.30, reconnects:  0.00
      [  50s] threads: 10, tps: 1630.11, reads: 8225.34, writes: 58968.35, response time: 11.13ms (95%), errors: 15.10, reconnects:  0.00
      [  60s] threads: 10, tps: 1618.00, reads: 8182.58, writes: 58589.23, response time: 11.28ms (95%), errors: 18.50, reconnects:  0.00
      [  80s] threads: 10, tps: 1433.49, reads: 7240.15, writes: 51899.38, response time: 11.13ms (95%), errors: 14.40, reconnects:  0.00
      [  90s] threads: 10, tps: 1620.61, reads: 8171.75, writes: 58636.19, response time: 11.13ms (95%), errors: 13.70, reconnects:  0.00
      [ 100s] threads: 10, tps: 1309.59, reads: 6610.07, writes: 47377.28, response time: 11.13ms (95%), errors: 12.60, reconnects:  0.00
      [ 110s] threads: 10, tps: 1370.81, reads: 6926.83, writes: 49623.40, response time: 11.11ms (95%), errors: 14.60, reconnects:  0.00
      [ 120s] threads: 10, tps: 1634.79, reads: 8248.27, writes: 59145.77, response time: 10.98ms (95%), errors: 14.70, reconnects:  0.00
      [ 130s] threads: 10, tps: 1628.70, reads: 8224.02, writes: 58949.45, response time: 11.13ms (95%), errors: 16.20, reconnects:  0.00
      [ 140s] threads: 10, tps: 1636.70, reads: 8266.78, writes: 59239.05, response time: 10.98ms (95%), errors: 16.50, reconnects:  0.00
      [ 150s] threads: 10, tps: 1637.31, reads: 8270.93, writes: 59281.33, response time: 11.03ms (95%), errors: 16.90, reconnects:  0.00
      [ 160s] threads: 10, tps: 1631.79, reads: 8239.67, writes: 59051.96, response time: 11.13ms (95%), errors: 16.30, reconnects:  0.00
      [ 170s] threads: 10, tps: 1629.71, reads: 8231.64, writes: 58985.49, response time: 11.09ms (95%), errors: 16.50, reconnects:  0.00
      [ 180s] threads: 10, tps: 1641.50, reads: 8285.48, writes: 59382.38, response time: 11.09ms (95%), errors: 15.60, reconnects:  0.00
      [ 190s] threads: 10, tps: 1629.10, reads: 8234.81, writes: 58995.58, response time: 11.15ms (95%), errors: 17.90, reconnects:  0.00
      [ 200s] threads: 10, tps: 1628.10, reads: 8213.18, writes: 58892.87, response time: 11.17ms (95%), errors: 14.60, reconnects:  0.00
      [ 210s] threads: 10, tps: 1637.01, reads: 8263.33, writes: 59227.40, response time: 11.03ms (95%), errors: 15.60, reconnects:  0.00
      [ 220s] threads: 10, tps: 1610.09, reads: 8132.07, writes: 58272.12, response time: 11.05ms (95%), errors: 16.50, reconnects:  0.00
      [ 230s] threads: 10, tps: 1492.80, reads: 7534.02, writes: 54017.07, response time: 11.07ms (95%), errors: 14.10, reconnects:  0.00
      [ 240s] threads: 10, tps: 1628.69, reads: 8220.87, writes: 58926.31, response time: 11.12ms (95%), errors: 15.30, reconnects:  0.00
      [ 250s] threads: 10, tps: 1632.60, reads: 8253.01, writes: 59127.11, response time: 11.14ms (95%), errors: 17.90, reconnects:  0.00
      [ 260s] threads: 10, tps: 1639.50, reads: 8268.29, writes: 59305.79, response time: 10.98ms (95%), errors: 14.10, reconnects:  0.00
      [ 270s] threads: 10, tps: 1617.61, reads: 8171.73, writes: 58562.89, response time: 11.19ms (95%), errors: 16.70, reconnects:  0.00
      [ 280s] threads: 10, tps: 1628.29, reads: 8238.47, writes: 58998.69, response time: 11.12ms (95%), errors: 19.60, reconnects:  0.00
      [ 290s] threads: 10, tps: 306.50, reads: 1543.90, writes: 11068.33, response time: 11.03ms (95%), errors: 2.50, reconnects:  0.00
      [ 300s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 310s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 320s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 330s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 340s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 350s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 360s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 370s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 380s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 390s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 400s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 410s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 420s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 430s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 440s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 450s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 460s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 470s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 480s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 490s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 500s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 510s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 520s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 530s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 540s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 550s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      [ 560s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
      

      Attached please see error log with monitor dump, gdb traces and script that was used to run sysbench and mysqld.

      Attachments

        1. mysqld10.err
          53 kB
          Sergey Vojtovich
        2. mysqld10.txt
          88 kB
          Sergey Vojtovich
        3. run.mysqld.p8
          4 kB
          Sergey Vojtovich

        Activity

          Assigned to Jan for now, but I'll keep analysing this issue meanwhile.

          svoj Sergey Vojtovich added a comment - Assigned to Jan for now, but I'll keep analysing this issue meanwhile.

          From mysqld10.err:

          --Thread 70334649302368 has waited at trx0undo.ic line 191 for 241.00 seconds the semaphore:
          S-lock on RW-latch at 0x3ffde1a37148 created in file buf0buf.cc line 1504
          number of readers 0, waiters flag 1, lock_word: 20000000
          Last time read locked in file trx0undo.ic line 191
          Last time write locked in file /home/mariadb/svoj/mariadb/storage/innobase/include/trx0undo.ic line 171
          

          Thread is waiting for S-lock on 0x3ffde1a37148, which is unlocked (lock_word= 20000000). There're no concurrent X-locks waiting, which likely means there is again this mess around waiters flag. Pretty much like we had with mutexes.

          Versions before 10.2 are not that badly affected, because we had this periodic wakeup.

          svoj Sergey Vojtovich added a comment - From mysqld10.err: --Thread 70334649302368 has waited at trx0undo.ic line 191 for 241.00 seconds the semaphore: S-lock on RW-latch at 0x3ffde1a37148 created in file buf0buf.cc line 1504 number of readers 0, waiters flag 1, lock_word: 20000000 Last time read locked in file trx0undo.ic line 191 Last time write locked in file /home/mariadb/svoj/mariadb/storage/innobase/include/trx0undo.ic line 171 Thread is waiting for S-lock on 0x3ffde1a37148, which is unlocked (lock_word= 20000000). There're no concurrent X-locks waiting, which likely means there is again this mess around waiters flag. Pretty much like we had with mutexes. Versions before 10.2 are not that badly affected, because we had this periodic wakeup.

          Another interesting observation:

          --Thread 70334619417952 has waited at btr0sea.ic line 128 for 235.00 seconds the semaphore:
          S-lock on RW-latch at 0x1009bdc5de0 created in file btr0sea.cc line 195
          a writer (thread id 70334619417952) has reserved it in mode  exclusive
          number of readers 0, waiters flag 1, lock_word: 0
          Last time read locked in file btr0sea.ic line 128
          Last time write locked in file /home/mariadb/source/server/storage/innobase/include/btr0sea.ic line 90
          

          Thread is attempting to acquire S-lock, while it is holding X-lock. This doesn't look like acceptable scenario.

          svoj Sergey Vojtovich added a comment - Another interesting observation: --Thread 70334619417952 has waited at btr0sea.ic line 128 for 235.00 seconds the semaphore: S-lock on RW-latch at 0x1009bdc5de0 created in file btr0sea.cc line 195 a writer (thread id 70334619417952) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file btr0sea.ic line 128 Last time write locked in file /home/mariadb/source/server/storage/innobase/include/btr0sea.ic line 90 Thread is attempting to acquire S-lock, while it is holding X-lock. This doesn't look like acceptable scenario.

          Fixes are in bb-10.2-svoj. Will assign to Jan for review after confirmation from IBM side that the fixes work.

          svoj Sergey Vojtovich added a comment - Fixes are in bb-10.2-svoj. Will assign to Jan for review after confirmation from IBM side that the fixes work.

          jplindst, please review 5 patches for this bug at https://github.com/MariaDB/server/commits/bb-10.2-svoj

          First 4 patches are just clean-ups. Last patch actually fixes this problem.

          svoj Sergey Vojtovich added a comment - jplindst , please review 5 patches for this bug at https://github.com/MariaDB/server/commits/bb-10.2-svoj First 4 patches are just clean-ups. Last patch actually fixes this problem.

          ok to push.

          jplindst Jan Lindström (Inactive) added a comment - ok to push.

          People

            svoj Sergey Vojtovich
            svoj Sergey Vojtovich
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.