[MDEV-14220] (draft) set global rocksdb_pause_background_work=1 freezes Created: 2017-10-30  Updated: 2018-04-12  Resolved: 2018-04-12

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - RocksDB
Affects Version/s: 10.2.9
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Andrii Nikitin (Inactive) Assignee: Sergei Petrunia
Resolution: Incomplete Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-14047 "Too many open files" Confirmed

 Description   

setting rocksdb_pause_background_work=1 freezes server

During load test of MDEV-14047 with script below:

set -e
 
ulimit -n 1024 || { echo "error: could not set open files limit ($?)" ; exit 1; }
 
env1=${ENVIRON:-m8-system2}
EXTRA_OPT=${EXTRA_OPT}
 
set -u
 
[ ! -d  mariadb-environs ] || cd mariadb-environs
 
 
if [ ! -e common.sh ] ; then
  git clone http://github.com/AndriiNikitin/mariadb-environs
  cd mariadb-environs
fi
 
./replant.sh $env1
./build_or_download.sh $env1
 
trap "exit" INT TERM
trap "read -n1 -r -p 'Test finished. Press any key to clean up processes...' key ; kill 0" EXIT
 
if ls $env1/configure_rocksdb_plugin.sh 2>/dev/null ; then
  EXTRA_OPT="configure_rocksdb_plugin=1 $EXTRA_OPT"
else
  EXTRA_OPT="plugin_load_add=ha_rocksdb $EXTRA_OPT"
fi
 
[[ $EXTRA_OPT =~ max_connections ]] || EXTRA_OPT="max_connections=250 $EXTRA_OPT"
 
$env1/gen_cnf.sh rocksdb_flush_log_at_trx_commit=2 $EXTRA_OPT
$env1/install_db.sh
$env1/startup.sh
 
$env1/sql.sh "create table tx (a int, b int, c varbinary(40), primary key(a,b,c) ) engine=RocksDB"
$env1/sql.sh "insert into tx select 1,1,uuid()"
$env1/sql.sh "insert into tx select 1,1,uuid() from tx a, tx b, tx c"
$env1/sql.sh "insert into tx select 1,1,uuid() from tx a, tx b, tx c"
$env1/sql.sh "insert into tx select 1,1,uuid() from tx a, tx b, tx c"
# should be ~1M rows without LIMIT
$env1/sql.sh "insert into tx select 1,1,uuid() from tx a, tx b LIMIT 100000"
 
 
# just try to load plenty of data from 10 connections in big chunks
for i in {0..100} ; do
  $env1/sql.sh "create table t$i (a int, b int, c varbinary(40), primary key(a,b,c) ) engine=RocksDB"
done
 
# insert loops
for i in {0..100} ; do
  $env1/sql_loop.sh "begin; set @N=0; insert into t$i select @N,@N:=@N+1,uuid() from tx limit 10000; commit" &
done
 
while :; do
# $env1/sql.sh 'show processlist; show global status like "Com_insert%"; show global status like "Handler_commit"'
$env1/status.sh
 
echo .rocksdb size:
du -sh $env1/dt/.rocksdb
echo log file count : $(ls -la $env1/dt/.rocksdb/*.log | wc -l)
echo sst file count : $(ls -la $env1/dt/.rocksdb/*.sst | wc -l)
 
sleep 10
done

I tried to execute `set global rocksdb_pause_background_work=1` in parallel connection, which was just hanging for minutes. I tried Ctrl+C, `set global rocksdb_pause_background_work=0` in another connection - it was hanging for some time more, then server resumed.

Stats below from the script output indicate period of hanging, which started after "Uptime: 2413" :

Uptime: 2351  Threads: 109  Questions: 36316  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.447
.rocksdb size:
4.4G	m8-system2/dt/.rocksdb
log file count : 30
sst file count : 79
Uptime: 2361  Threads: 109  Questions: 36423  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.426
.rocksdb size:
4.5G	m8-system2/dt/.rocksdb
log file count : 30
sst file count : 80
Uptime: 2372  Threads: 109  Questions: 36531  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.400
.rocksdb size:
4.5G	m8-system2/dt/.rocksdb
log file count : 30
sst file count : 80
Uptime: 2382  Threads: 109  Questions: 36633  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.379
.rocksdb size:
4.5G	m8-system2/dt/.rocksdb
log file count : 30
sst file count : 80
Uptime: 2392  Threads: 109  Questions: 36800  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.384
.rocksdb size:
4.6G	m8-system2/dt/.rocksdb
log file count : 31
sst file count : 82
Uptime: 2402  Threads: 109  Questions: 36952  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.383
.rocksdb size:
4.6G	m8-system2/dt/.rocksdb
log file count : 31
sst file count : 82
Uptime: 2413  Threads: 109  Questions: 37132  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 15.388
.rocksdb size:
4.6G	m8-system2/dt/.rocksdb
log file count : 31
sst file count : 82
Uptime: 2553  Threads: 109  Questions: 37521  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 14.696
.rocksdb size:
4.6G	m8-system2/dt/.rocksdb
log file count : 33
sst file count : 78
Uptime: 2563  Threads: 112  Questions: 37888  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 14.782
.rocksdb size:
4.7G	m8-system2/dt/.rocksdb
log file count : 33
sst file count : 78
Uptime: 2574  Threads: 112  Questions: 37987  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 14.757
.rocksdb size:
4.7G	m8-system2/dt/.rocksdb
log file count : 33
sst file count : 78
Uptime: 2584  Threads: 112  Questions: 38075  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 14.734
.rocksdb size:
4.7G	m8-system2/dt/.rocksdb
log file count : 33
sst file count : 78
Uptime: 2594  Threads: 112  Questions: 38200  Slow queries: 0  Opens: 392  Flush tables: 1  Open tables: 386  Queries per second avg: 14.726



 Comments   
Comment by Elena Stepanova [ 2018-01-27 ]

Tried to reproduce, couldn't so far.
I guess when/if it does happen, the variable setting clashes with some specific background job which it is supposed to pause. psergey, do you have any ideas?
If not, please feel free to close as 'Incomplete'.

Comment by Sergei Petrunia [ 2018-04-12 ]

Perhaps rocksdb_pause_background_work=1 also pauses compaction... if it does, RocksDB will eventually slow down and then freeze the write operations, because it will not let compaction fall too far behind. This doesn't explain why you were not able to reproduce, though...

As far as I know, rocksdb_pause_background_work is not practically useful for production scenarios. Its use case is debugging/testing where one does not want race conditions with background jobs.

Generated at Thu Feb 08 08:11:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.