Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14220

(draft) set global rocksdb_pause_background_work=1 freezes

Details

    Description

      setting rocksdb_pause_background_work=1 freezes server

      During load test of MDEV-14047 with script below:

      set -e
       
      ulimit -n 1024 || { echo "error: could not set open files limit ($?)" ; exit 1; }
       
      env1=${ENVIRON:-m8-system2}
      EXTRA_OPT=${EXTRA_OPT}
       
      set -u
       
      [ ! -d  mariadb-environs ] || cd mariadb-environs
       
       
      if [ ! -e common.sh ] ; then
        git clone http://github.com/AndriiNikitin/mariadb-environs
        cd mariadb-environs
      fi
       
      ./replant.sh $env1
      ./build_or_download.sh $env1
       
      trap "exit" INT TERM
      trap "read -n1 -r -p 'Test finished. Press any key to clean up processes...' key ; kill 0" EXIT
       
      if ls $env1/configure_rocksdb_plugin.sh 2>/dev/null ; then
        EXTRA_OPT="configure_rocksdb_plugin=1 $EXTRA_OPT"
      else
        EXTRA_OPT="plugin_load_add=ha_rocksdb $EXTRA_OPT"
      fi
       
      [[ $EXTRA_OPT =~ max_connections ]] || EXTRA_OPT="max_connections=250 $EXTRA_OPT"
       
      $env1/gen_cnf.sh rocksdb_flush_log_at_trx_commit=2 $EXTRA_OPT
      $env1/install_db.sh
      $env1/startup.sh
       
      $env1/sql.sh "create table tx (a int, b int, c varbinary(40), primary key(a,b,c) ) engine=RocksDB"
      $env1/sql.sh "insert into tx select 1,1,uuid()"
      $env1/sql.sh "insert into tx select 1,1,uuid() from tx a, tx b, tx c"
      $env1/sql.sh "insert into tx select 1,1,uuid() from tx a, tx b, tx c"
      $env1/sql.sh "insert into tx select 1,1,uuid() from tx a, tx b, tx c"
      # should be ~1M rows without LIMIT
      $env1/sql.sh "insert into tx select 1,1,uuid() from tx a, tx b LIMIT 100000"
       
       
      # just try to load plenty of data from 10 connections in big chunks
      for i in {0..100} ; do
        $env1/sql.sh "create table t$i (a int, b int, c varbinary(40), primary key(a,b,c) ) engine=RocksDB"
      done
       
      # insert loops
      for i in {0..100} ; do
        $env1/sql_loop.sh "begin; set @N=0; insert into t$i select @N,@N:=@N+1,uuid() from tx limit 10000; commit" &
      done
       
      while :; do
      # $env1/sql.sh 'show processlist; show global status like "Com_insert%"; show global status like "Handler_commit"'
      $env1/status.sh
       
      echo .rocksdb size:
      du -sh $env1/dt/.rocksdb
      echo log file count : $(ls -la $env1/dt/.rocksdb/*.log | wc -l)
      echo sst file count : $(ls -la $env1/dt/.rocksdb/*.sst | wc -l)
       
      sleep 10
      done
      

      I tried to execute `set global rocksdb_pause_background_work=1` in parallel connection, which was just hanging for minutes. I tried Ctrl+C, `set global rocksdb_pause_background_work=0` in another connection - it was hanging for some time more, then server resumed.

      Stats below from the script output indicate period of hanging, which started after "Uptime: 2413" :

      Uptime: 2351  Threads: 109  Questions: 36316  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.447
      .rocksdb size:
      4.4G	m8-system2/dt/.rocksdb
      log file count : 30
      sst file count : 79
      Uptime: 2361  Threads: 109  Questions: 36423  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.426
      .rocksdb size:
      4.5G	m8-system2/dt/.rocksdb
      log file count : 30
      sst file count : 80
      Uptime: 2372  Threads: 109  Questions: 36531  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.400
      .rocksdb size:
      4.5G	m8-system2/dt/.rocksdb
      log file count : 30
      sst file count : 80
      Uptime: 2382  Threads: 109  Questions: 36633  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.379
      .rocksdb size:
      4.5G	m8-system2/dt/.rocksdb
      log file count : 30
      sst file count : 80
      Uptime: 2392  Threads: 109  Questions: 36800  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.384
      .rocksdb size:
      4.6G	m8-system2/dt/.rocksdb
      log file count : 31
      sst file count : 82
      Uptime: 2402  Threads: 109  Questions: 36952  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.383
      .rocksdb size:
      4.6G	m8-system2/dt/.rocksdb
      log file count : 31
      sst file count : 82
      Uptime: 2413  Threads: 109  Questions: 37132  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.388
      .rocksdb size:
      4.6G	m8-system2/dt/.rocksdb
      log file count : 31
      sst file count : 82
      Uptime: 2553  Threads: 109  Questions: 37521  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 14.696
      .rocksdb size:
      4.6G	m8-system2/dt/.rocksdb
      log file count : 33
      sst file count : 78
      Uptime: 2563  Threads: 112  Questions: 37888  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 14.782
      .rocksdb size:
      4.7G	m8-system2/dt/.rocksdb
      log file count : 33
      sst file count : 78
      Uptime: 2574  Threads: 112  Questions: 37987  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 14.757
      .rocksdb size:
      4.7G	m8-system2/dt/.rocksdb
      log file count : 33
      sst file count : 78
      Uptime: 2584  Threads: 112  Questions: 38075  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 14.734
      .rocksdb size:
      4.7G	m8-system2/dt/.rocksdb
      log file count : 33
      sst file count : 78
      Uptime: 2594  Threads: 112  Questions: 38200  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 14.726
      

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova added a comment - - edited

            Tried to reproduce, couldn't so far.
            I guess when/if it does happen, the variable setting clashes with some specific background job which it is supposed to pause. psergey, do you have any ideas?
            If not, please feel free to close as 'Incomplete'.

            elenst Elena Stepanova added a comment - - edited Tried to reproduce, couldn't so far. I guess when/if it does happen, the variable setting clashes with some specific background job which it is supposed to pause. psergey , do you have any ideas? If not, please feel free to close as 'Incomplete'.

            Perhaps rocksdb_pause_background_work=1 also pauses compaction... if it does, RocksDB will eventually slow down and then freeze the write operations, because it will not let compaction fall too far behind. This doesn't explain why you were not able to reproduce, though...

            As far as I know, rocksdb_pause_background_work is not practically useful for production scenarios. Its use case is debugging/testing where one does not want race conditions with background jobs.

            psergei Sergei Petrunia added a comment - Perhaps rocksdb_pause_background_work=1 also pauses compaction... if it does, RocksDB will eventually slow down and then freeze the write operations, because it will not let compaction fall too far behind. This doesn't explain why you were not able to reproduce, though... As far as I know, rocksdb_pause_background_work is not practically useful for production scenarios. Its use case is debugging/testing where one does not want race conditions with background jobs.

            People

              psergei Sergei Petrunia
              anikitin Andrii Nikitin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.