[MDEV-15173] MyRocks: basic performance checks Created: 2018-02-01  Updated: 2018-03-15  Resolved: 2018-03-15

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - RocksDB
Fix Version/s: 10.2.13, 10.3.5

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Attachments: File run1-results.ods     File run2-results.ods     PNG File screenshot-1.png     PNG File screenshot-2.png     PNG File screenshot-3.png     PNG File screenshot-4.png     PNG File screenshot-5.png     File sysbench-oltp-rw-simple-run1.tgz    
Issue Links:
Relates
relates to MDEV-15372 Parallel slave speedup very limited w... Closed

 Description   

Run some pre-release performance checks to make sure MyRocks build is adequate.



 Comments   
Comment by Sergei Petrunia [ 2018-02-01 ]

Run1: default configuration, binlog=ON, 1M row table, oltp_simple_rw, different # of client threads:

Performance numbers look more-or-less similar (that is, there is no indication that something is broken)

Comment by Sergei Petrunia [ 2018-02-05 ]

Results from a replication test
(the test setup looks weird and it was, but that was the first attempt)

  • MariaDB 10.2 -ubuntu-xenial package
  • single threaded slave
  • master is run for two minutes with threads =1,10,20,40
  • then, waiting for the slave to catch up.

Columns

threads, master_pos_wait time (sec)

InnoDB, gtid_slave_pos is innodb
1,  0.074385
10, 359.942474
20, 613.465163
40, 757.761746

InnoDB, gtid_slave_pos is MyRocks
1, 305.385057

MyRocks, gtid_slave_pos is InnoDB
1, 320.517346

MyRocks, gtid_slave_pos is MyRocks
1, 0.011072
10, 531.862626
20, 938.513108
40, 1197.352052

Comment by Sergei Petrunia [ 2018-02-05 ]


Take-aways:

  • single threaded slave performance depends a lot on whether mysql.slave_gtid_info is using the same storage engine
  • Taking that into account, slave performance is the same order-of-magnitude
    **( NOTE: the difference is greater than on the chart because the master ran for 2 minutes, and according to the previous test, InnoDB is faster so it generated more events)
Comment by Sergei Petrunia [ 2018-02-11 ]

Parallel slave test.

  • On a master (t2.2xlarge) run oltp_write_only workload with --threads=32
  • Then, measure the time it will take the slave (t2.xlarge) to catch up.

Slave settings:

  • slave_parallel_workers=N
  • slave_parallel_mode=conservative

MariaDB 10.2, RocksDB

Sysbench run as workload generator ( oltp_write_only.lua --time=30 --table-size=1000000 --threads=32)

    transactions:                        220 570 (7350.21 per sec.)
    queries:                             1 330 661 (44342.59 per sec.)

slave_parallel_workers, catch_up_time 
1 ,543.821963
10,492.044147
20,480.922534
40,479.111308

MariaDB 10.2, InnoDB

Sysbench run as workload generator ( oltp_write_only.lua --time=30 --table-size=1000000 --threads=32)

    transactions:                        152 287 (5073.76 per sec.)
    queries:                             913 907 (30448.74 per sec.)

slave_parallel_workers, catch_up_time 
1,  276.695094
10, 100.695637
20, 68.44454
40, 57.729703

Comment by Sergei Petrunia [ 2018-02-12 ]

So, with InnoDB, the slave can get as fast as being 2x slow as the master. This is actually good because the master has more CPU power (t2.2xlarge, /proc/cpuinfo reports 8 CPUs) than slave (t2.large, cpuinfo shows 2 CPUs).

With MyRocks, the catchup time is much bigger. The difference is 2x for single-threaded slave (note: but this is partially explained by the fact that MyRocks master did more transactions, 220K/152K=1.44).

The issue is that for MyRocks, speedup of parallel slave wrt single threaded slave is much lower than for InnoDB.

Comment by Sergei Petrunia [ 2018-02-12 ]

Didn't note binlog group commit counters on the previous run. Re-ran the load step and I get this:

InnoDB

sysbench: transactions: 151869 (5059.10 per sec.)

Binlog_commits 151869
Binlog_group_commits 98083  (1.54 commits per group)

MyRocks

sysbench: transactions: 214206 (7138.58 per sec.)

Binlog_commits 216,944
Binlog_group_commits 44,895  (4.8 commits per group)

So the master part seems to be ok.

Comment by Sergei Petrunia [ 2018-02-12 ]

MariaDB 102, slave_parallel_mode=aggressive

RocksDB

slave_parallel_workers, catch_up_time 
1 3816.330531
10 6605.029636
20 6602.966884
40 1227.60888

InnoDB

slave_parallel_workers, catch_up_time 
1  284.314864
10 106.254725
20   79.08769
40   59.92061

Comment by Sergei Petrunia [ 2018-02-12 ]

Percona 5.7 bintar

slave_parallel_type=logical_clock

InnoDB

sysbench ... --time=30 /usr/share/sysbench/oltp_write_only.lua  --table-size=1000000 --mysql_storage_engine=RocksDB --threads=32 run
    transactions:                        114771 (3822.82 per sec.)

slave_parallel_workers time_to_catchup
1  205.707733
10 48.771367
20 35.452498
40 33.482409

MyRocks

sysbench ... --time=30 /usr/share/sysbench/oltp_write_only.lua --table-size=1000000 --mysql_storage_engine=RocksDB --threads=32 run
    transactions:                        144864 (4826.62 per sec.)

slave_parallel_workers time_to_catchup
1 328.305218
10 70.682672
20 40.449479
40 34.089818

Comment by Sergei Petrunia [ 2018-02-12 ]

Attempt to debug the slave while it is catching up with
slave_parallel_workers=20,slave_parallel_mode=conservative:
https://gist.github.com/spetrunia/3f77615a9e33df51ffe54a1d07fc1fef
This is current MariaDB 10.2, debug build.

MariaDB binary from the original 10.2 package:
https://gist.github.com/spetrunia/acbc36c50e8c198fbc0b87e6ecf82fa1

Comment by Sergei Petrunia [ 2018-02-12 ]

Both results have this as the second most common trace (the first being InnoDB's AIO sleep) :

207 at,safe_cond_wait,inline_mysql_cond_wait,wait_for_commit::wait_for_prior_commit2,
        wait_for_commit::wait_for_prior_commit,THD::wait_for_prior_commit,ha_commit_one_phase,
        ha_commit_trans,trans_commit,Xid_log_event::do_apply_event,Log_event::apply_event,
        apply_event_and_update_pos_apply,apply_event_and_update_pos_for_parallel,
        rpt_handle_event,handle_rpl_parallel_thread,start_thread,clone

Comment by Sergei Petrunia [ 2018-02-12 ]

Have been playing with gdb attached to the slave and caught this:

[Switching to thread 45 (Thread 0x7f7088178700 (LWP 16939))]
#0  0x00007f709700681d in fdatasync () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) where
#0  0x00007f709700681d in fdatasync () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f708915c1c2 in rocksdb::PosixWritableFile::Sync (this=0x7f706c15be90) at /home/ubuntu/mariadb-10.2/storage/rocksdb/rocksdb/env/io_posix.cc:844
#2  0x00007f70890beb55 in rocksdb::WritableFileWriter::SyncInternal (this=0x7f706c15c070, use_fsync=false)
    at /home/ubuntu/mariadb-10.2/storage/rocksdb/rocksdb/util/file_reader_writer.cc:350
#3  0x00007f70890be924 in rocksdb::WritableFileWriter::SyncWithoutFlush (this=0x7f706c15c070, use_fsync=false)
    at /home/ubuntu/mariadb-10.2/storage/rocksdb/rocksdb/util/file_reader_writer.cc:338
#4  0x00007f7088e6afde in rocksdb::DBImpl::SyncWAL (this=0x557f60ae59f0) at /home/ubuntu/mariadb-10.2/storage/rocksdb/rocksdb/db/db_impl.cc:683
#5  0x00007f7088e6aa8d in rocksdb::DBImpl::FlushWAL (this=0x557f60ae59f0, sync=true) at /home/ubuntu/mariadb-10.2/storage/rocksdb/rocksdb/db/db_impl.cc:638
#6  0x00007f7088ef7a08 in rocksdb::DBImpl::WriteImpl (this=0x557f60ae59f0, write_options=..., my_batch=0x7f703c01a070, callback=0x0, log_used=0x0, log_ref=0, 
    disable_memtable=false, seq_used=0x0) at /home/ubuntu/mariadb-10.2/storage/rocksdb/rocksdb/db/db_impl_write.cc:304
#7  0x00007f7088ef6718 in rocksdb::DBImpl::Write (this=0x557f60ae59f0, write_options=..., my_batch=0x7f703c01a070)
    at /home/ubuntu/mariadb-10.2/storage/rocksdb/rocksdb/db/db_impl_write.cc:49
#8  0x00007f7089213ccc in rocksdb::WriteCommittedTxn::CommitWithoutPrepareInternal (this=0x7f703c019e50)
    at /home/ubuntu/mariadb-10.2/storage/rocksdb/rocksdb/utilities/transactions/pessimistic_transaction.cc:304
#9  0x00007f708921382a in rocksdb::PessimisticTransaction::Commit (this=0x7f703c019e50)
    at /home/ubuntu/mariadb-10.2/storage/rocksdb/rocksdb/utilities/transactions/pessimistic_transaction.cc:264
#10 0x00007f7088dc08fc in myrocks::Rdb_transaction_impl::commit_no_binlog (this=0x7f703c019cb0) at /home/ubuntu/mariadb-10.2/storage/rocksdb/ha_rocksdb.cc:2283
#11 0x00007f7088dbfc24 in myrocks::Rdb_transaction::commit (this=0x7f703c019cb0) at /home/ubuntu/mariadb-10.2/storage/rocksdb/ha_rocksdb.cc:2014
#12 0x00007f7088d93d03 in myrocks::rocksdb_commit (hton=0x557f608ac920, thd=0x7f703c008eb0, commit_tx=true)
    at /home/ubuntu/mariadb-10.2/storage/rocksdb/ha_rocksdb.cc:3157
#13 0x0000557f5d4e207e in commit_one_phase_2 (thd=0x7f703c008eb0, all=true, trans=0x7f703c00c160, is_real_trans=true)
    at /home/ubuntu/mariadb-10.2/sql/handler.cc:1567
#14 0x0000557f5d4e1f81 in ha_commit_one_phase (thd=0x7f703c008eb0, all=true) at /home/ubuntu/mariadb-10.2/sql/handler.cc:1548
#15 0x0000557f5d4e184b in ha_commit_trans (thd=0x7f703c008eb0, all=true) at /home/ubuntu/mariadb-10.2/sql/handler.cc:1415
#16 0x0000557f5d3cbf5f in trans_commit (thd=0x7f703c008eb0) at /home/ubuntu/mariadb-10.2/sql/transaction.cc:307
#17 0x0000557f5d60b059 in Xid_log_event::do_apply_event (this=0x7f7092035860, rgi=0x7f7092081750) at /home/ubuntu/mariadb-10.2/sql/log_event.cc:8447
#18 0x0000557f5d1cbc23 in Log_event::apply_event (this=0x7f7092035860, rgi=0x7f7092081750) at /home/ubuntu/mariadb-10.2/sql/log_event.h:1452
#19 0x0000557f5d1c0bad in apply_event_and_update_pos_apply (ev=0x7f7092035860, thd=0x7f703c008eb0, rgi=0x7f7092081750, reason=0)
    at /home/ubuntu/mariadb-10.2/sql/slave.cc:3570
#20 0x0000557f5d1c11b0 in apply_event_and_update_pos_for_parallel (ev=0x7f7092035860, thd=0x7f703c008eb0, rgi=0x7f7092081750)
    at /home/ubuntu/mariadb-10.2/sql/slave.cc:3745
#21 0x0000557f5d425a24 in rpt_handle_event (qev=0x7f70905a57f0, rpt=0x7f709000b110) at /home/ubuntu/mariadb-10.2/sql/rpl_parallel.cc:50
#22 0x0000557f5d42886d in handle_rpl_parallel_thread (arg=0x7f709000b110) at /home/ubuntu/mariadb-10.2/sql/rpl_parallel.cc:1265
#23 0x00007f7097b7b6ba in start_thread (arg=0x7f7088178700) at pthread_create.c:333
#24 0x00007f709701041d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

A lot of other threads are in wait_for_commit::wait_for_prior_commit call.

As for this thread:

#9  0x00007f708921382a in rocksdb::PessimisticTransaction::Commit (this=0x7f703c019e50)
    at /home/ubuntu/mariadb-10.2/storage/rocksdb/rocksdb/utilities/transactions/pessimistic_transaction.cc:264
264           s = CommitWithoutPrepareInternal();
(gdb) p commit_without_prepare
$6 = true
(gdb) p commit_prepared
$7 = false
 
(gdb) up
#10 0x00007f7088dc08fc in myrocks::Rdb_transaction_impl::commit_no_binlog (this=0x7f703c019cb0) at /home/ubuntu/mariadb-10.2/storage/rocksdb/ha_rocksdb.cc:2283
2283        const rocksdb::Status s = m_rocksdb_tx->Commit();
(gdb) p m_is_two_phase
$8 = true

  • Is this intentional that we have not done a prepare call?
  • if we are doing a commit for a replication slave transaction, why are we doing a disk sync?
Comment by Sergei Petrunia [ 2018-02-12 ]

Attempted to check how the upstream resolves this but figured that they are based on mysql-5.6 and so do not support slave_parallel_type=logical_clock yet.

Stack traces from Percona Server:
https://gist.github.com/spetrunia/ac453401977fa620a423568e7ab5f619

There are multiple threads with a stack trace like this:

Thread 62 (Thread 0x7f35207c6700 (LWP 3540)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f35667f691c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007f353130d72b in std::condition_variable::wait<rocksdb::WriteThread::BlockingAwaitState(rocksdb::WriteThread::Writer*, uint8_t)::<lambda()> > (__p=..., __lock=..., this=0x7f35207c49c0) at /usr/include/c++/4.9/condition_variable:98
#3  rocksdb::WriteThread::BlockingAwaitState (this=<optimized out>, w=0x7f35207c4930, goal_mask=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/rocksdb/db/write_thread.cc:44
#4  0x00007f353130d7ff in rocksdb::WriteThread::AwaitState (this=this@entry=0x34990b8, w=w@entry=0x7f35207c4930, goal_mask=goal_mask@entry=30 '\036', ctx=ctx@entry=0x7f35317f37f0 <rocksdb::jbg_ctx>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/rocksdb/db/write_thread.cc:176
#5  0x00007f353130dc65 in rocksdb::WriteThread::JoinBatchGroup (this=this@entry=0x34990b8, w=w@entry=0x7f35207c4930) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/rocksdb/db/write_thread.cc:318
#6  0x00007f353125f0d2 in rocksdb::DBImpl::WriteImpl (this=0x34989d0, write_options=..., my_batch=<optimized out>, callback=callback@entry=0x0, log_used=log_used@entry=0x0, log_ref=0, disable_memtable=false, seq_used=0x0) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/rocksdb/db/db_impl_write.cc:104
#7  0x00007f35312609cd in rocksdb::DBImpl::Write (this=<optimized out>, write_options=..., my_batch=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/rocksdb/db/db_impl_write.cc:49
#8  0x00007f353149061a in rocksdb::WriteCommittedTxn::CommitWithoutPrepareInternal (this=0x7f34e1cefc30) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/rocksdb/utilities/transactions/pessimistic_transaction.cc:304
#9  0x00007f3531493c13 in rocksdb::PessimisticTransaction::Commit (this=0x7f34e1cefc30) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/rocksdb/utilities/transactions/pessimistic_transaction.cc:264
#10 0x00007f35311b71cb in myrocks::Rdb_transaction_impl::commit_no_binlog (this=0x7f34e270eb30) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/ha_rocksdb.cc:2147
#11 0x00007f3531199f76 in myrocks::Rdb_transaction::commit (this=0x7f34e270eb30) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/ha_rocksdb.cc:1880
#12 myrocks::rocksdb_commit (hton=<optimized out>, thd=0x7f34e0a54c80, commit_tx=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/ha_rocksdb.cc:2879
#13 0x00000000007f35f7 in ha_commit_low (thd=thd@entry=0x7f34e0a54c80, all=all@entry=true, run_after_commit=run_after_commit@entry=true) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/handler.cc:1933
#14 0x0000000000e26f99 in MYSQL_BIN_LOG::commit (this=0x1d42780 <mysql_bin_log>, thd=0x7f34e0a54c80, all=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/binlog.cc:8903
#15 0x00000000007f4032 in ha_commit_trans (thd=thd@entry=0x7f34e0a54c80, all=all@entry=true, ignore_global_read_lock=ignore_global_read_lock@entry=false) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/handler.cc:1829
#16 0x0000000000d0ce19 in trans_commit (thd=thd@entry=0x7f34e0a54c80) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/transaction.cc:239
#17 0x0000000000df28eb in Xid_log_event::do_commit (this=0x7f35288e97e0, thd_arg=0x7f34e0a54c80) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/log_event.cc:7164
#18 0x0000000000df29f2 in Xid_apply_log_event::do_apply_event_worker (this=0x7f35288e9828, w=0x7f3529d1c400) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/log_event.cc:7259
#19 0x0000000000e5edb5 in slave_worker_exec_job_group (worker=worker@entry=0x7f3529d1c400, rli=rli@entry=0x35ae780) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/rpl_rli_pdb.cc:2671
#20 0x0000000000e40cc3 in handle_slave_worker (arg=arg@entry=0x7f3529d1c400) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/rpl_slave.cc:6231
#21 0x0000000000ead034 in pfs_spawn_thread (arg=0x7f352825c7e0) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/perfschema/pfs.cc:2190
#22 0x00007f3567de46ba in start_thread (arg=0x7f35207c6700) at pthread_create.c:333
#23 0x00007f3565f6141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

  • they are relying on MyRocks' internal group commit.
    • It is not then clear if/how they make sure the commit order on the slave is the same as on the master? (because of gtids? A: no because in my tests gtid_mode=OFF)
  • Note the frame #8: rocksdb::WriteCommittedTxn::CommitWithoutPrepareInternal
Comment by Sergei Petrunia [ 2018-02-12 ]

If one runs Percona Server with log-slave-updates, the stack traces look different:

https://gist.github.com/spetrunia/634e67b743c3783aa4d102b1ebb870ae

There are a lot of threads waiting to commit:

Thread 45 (Thread 0x7fcd3c336700 (LWP 3743)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000e1610a in native_cond_wait (mutex=0x1d438f8 <mysql_bin_log+4472>, cond=0x1d438c0 <mysql_bin_log+4416>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/include/thr_cond.h:140
#2  my_cond_wait (mp=0x1d438f8 <mysql_bin_log+4472>, cond=0x1d438c0 <mysql_bin_log+4416>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/include/thr_cond.h:195
#3  inline_mysql_cond_wait (src_file=0x14a4140 "/mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/binlog.cc", src_line=2191, mutex=<optimized out>, that=0x1d438c0 <mysql_bin_log+4416>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/include/mysql/psi/mysql_thread.h:1184
#4  Stage_manager::enroll_for (this=this@entry=0x1d43740 <mysql_bin_log+4032>, stage=<optimized out>, thd=0x7fcd4c00a310, stage_mutex=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/binlog.cc:2191
#5  0x0000000000e1b6a5 in MYSQL_BIN_LOG::change_stage (this=0x1d42780 <mysql_bin_log>, thd=<optimized out>, stage=<optimized out>, queue=<optimized out>, leave_mutex=<optimized out>, enter_mutex=0x1d42788 <mysql_bin_log+8>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/binlog.cc:9216
#6  0x0000000000e25a22 in MYSQL_BIN_LOG::ordered_commit (this=this@entry=0x1d42780 <mysql_bin_log>, thd=thd@entry=0x7fcd4c00a310, all=all@entry=true, skip_commit=skip_commit@entry=false) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/binlog.cc:9630
#7  0x0000000000e27460 in MYSQL_BIN_LOG::commit (this=0x1d42780 <mysql_bin_log>, thd=0x7fcd4c00a310, all=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/binlog.cc:8883
#8  0x00000000007f4032 in ha_commit_trans (thd=thd@entry=0x7fcd4c00a310, all=all@entry=true, ignore_global_read_lock=ignore_global_read_lock@entry=false) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/handler.cc:1829
#9  0x0000000000d0ce19 in trans_commit (thd=thd@entry=0x7fcd4c00a310) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/transaction.cc:239
#10 0x0000000000df28eb in Xid_log_event::do_commit (this=0x7fcd2403d910, thd_arg=0x7fcd4c00a310) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/log_event.cc:7164
#11 0x0000000000df29f2 in Xid_apply_log_event::do_apply_event_worker (this=0x7fcd2403d958, w=0x7fcd240a6c80) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/log_event.cc:7259
#12 0x0000000000e5edb5 in slave_worker_exec_job_group (worker=worker@entry=0x7fcd240a6c80, rli=rli@entry=0x32c79c0) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/rpl_rli_pdb.cc:2671
#13 0x0000000000e40cc3 in handle_slave_worker (arg=arg@entry=0x7fcd240a6c80) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/rpl_slave.cc:6231
#14 0x0000000000ead034 in pfs_spawn_thread (arg=0x7fcd2403c9e0) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/perfschema/pfs.cc:2190
#15 0x00007fcd76cf96ba in start_thread (arg=0x7fcd3c336700) at pthread_create.c:333
#16 0x00007fcd74e7641d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

And there is one thread doing the syncing with myrocks::rocksdb_flush_wal

Thread 41 (Thread 0x7fcd3c43a700 (LWP 3739)):
#0  0x00007fcd74e6c81d in fdatasync () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007fcd3fb36d3d in rocksdb::PosixWritableFile::Sync (this=0x7fcd34027420) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/rocksdb/env/io_posix.cc:844
#2  0x00007fcd3fbd6289 in rocksdb::WritableFileWriter::SyncInternal (this=this@entry=0x7fcd34027720, use_fsync=use_fsync@entry=false) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/rocksdb/util/file_reader_writer.cc:350
#3  0x00007fcd3fbd63b8 in rocksdb::WritableFileWriter::SyncWithoutFlush (this=0x7fcd34027720, use_fsync=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/rocksdb/util/file_reader_writer.cc:338
#4  0x00007fcd3fa51b1e in rocksdb::DBImpl::SyncWAL (this=0x316a990) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/rocksdb/db/db_impl.cc:701
#5  0x00007fcd3fa46a88 in rocksdb::DBImpl::FlushWAL (this=0x316a990, sync=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/rocksdb/db/db_impl.cc:656
#6  0x00007fcd3f99d4d8 in rocksdb::StackableDB::FlushWAL (sync=<optimized out>, this=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/rocksdb/include/rocksdb/utilities/stackable_db.h:273
#7  myrocks::rocksdb_flush_wal (hton=<optimized out>, binlog_group_flush=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/rocksdb/ha_rocksdb.cc:2699
#8  0x00000000007ee0af in flush_handlerton (thd=<optimized out>, plugin=<optimized out>, arg=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/handler.cc:2640
#9  0x0000000000c8578e in plugin_foreach_with_mask (thd=0x0, funcs=funcs@entry=0x7fcd3c438b30, type=type@entry=1, state_mask=4294967287, state_mask@entry=8, arg=arg@entry=0x7fcd3c438b5c) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/sql_plugin.cc:2377
#10 0x0000000000c8591d in plugin_foreach_with_mask (thd=<optimized out>, func=func@entry=0x7ee090 <flush_handlerton(THD*, plugin_ref, void*)>, type=type@entry=1, state_mask=state_mask@entry=8, arg=arg@entry=0x7fcd3c438b5c) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/sql_plugin.cc:2392
#11 0x00000000007f4d4a in ha_flush_logs (db_type=db_type@entry=0x0, binlog_group_flush=binlog_group_flush@entry=true) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/handler.cc:2650
#12 0x0000000000e258ab in MYSQL_BIN_LOG::process_flush_stage_queue (this=this@entry=0x1d42780 <mysql_bin_log>, total_bytes_var=total_bytes_var@entry=0x7fcd3c438bf8, rotate_var=rotate_var@entry=0x7fcd3c438bf7, out_queue_var=out_queue_var@entry=0x7fcd3c438c00) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/binlog.cc:8991
#13 0x0000000000e25a69 in MYSQL_BIN_LOG::ordered_commit (this=this@entry=0x1d42780 <mysql_bin_log>, thd=thd@entry=0x7fcd380008c0, all=all@entry=true, skip_commit=skip_commit@entry=false) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/binlog.cc:9655
#14 0x0000000000e27460 in MYSQL_BIN_LOG::commit (this=0x1d42780 <mysql_bin_log>, thd=0x7fcd380008c0, all=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/binlog.cc:8883
#15 0x00000000007f4032 in ha_commit_trans (thd=thd@entry=0x7fcd380008c0, all=all@entry=true, ignore_global_read_lock=ignore_global_read_lock@entry=false) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/handler.cc:1829
#16 0x0000000000d0ce19 in trans_commit (thd=thd@entry=0x7fcd380008c0) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/transaction.cc:239
#17 0x0000000000df28eb in Xid_log_event::do_commit (this=0x7fcd240bb4d0, thd_arg=0x7fcd380008c0) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/log_event.cc:7164
#18 0x0000000000df29f2 in Xid_apply_log_event::do_apply_event_worker (this=0x7fcd240bb518, w=0x7fcd24021f40) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/log_event.cc:7259
#19 0x0000000000e5edb5 in slave_worker_exec_job_group (worker=worker@entry=0x7fcd24021f40, rli=rli@entry=0x32c79c0) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/rpl_rli_pdb.cc:2671
#20 0x0000000000e40cc3 in handle_slave_worker (arg=arg@entry=0x7fcd24021f40) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/sql/rpl_slave.cc:6231
#21 0x0000000000ead034 in pfs_spawn_thread (arg=0x7fcd240283c0) at /mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/debian-jessie-64bit/percona-server-5.7.20-19/storage/perfschema/pfs.cc:2190
#22 0x00007fcd76cf96ba in start_thread (arg=0x7fcd3c43a700) at pthread_create.c:333
#23 0x00007fcd74e7641d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Comment by Sergei Petrunia [ 2018-02-12 ]

The problem seems to be similar to the one we've had with group commit:

  • SQL layer is making commits in order (the call to commit trx$N will only be made after trx${N-1} has committed)
  • MyRocks is instructed to do flushing on each commit().
  • hence
    • One thread is committing at a time
    • each commit is doing a flush.

Need to do:

  • Don't flush when committing from the slave SQL thread (do we ever need to flush on the slave then?)
  • there's no way to do commit() calls in parallel, is there?
Comment by Sergei Petrunia [ 2018-02-13 ]

Run with log_slave_updates=ON, slave_parallel_mode=aggressive

Still MariaDB 10.2, official package

MyRocks

Sysbench workload generator (30 sec):

transactions:                        207851 (6926.66 per sec.)
Binlog_commits += 210,508
Binlog_group_commits += 42,878 (4.9 commits/group)

threads, catchup_time
1  529.671741
10 114.643005
20 51.314241
40 19.150348

InnoDB

Sysbench workload generator (30 sec):

transactions:                        148075 (4932.10 per sec.)
Binlog_commits += 148,075
Binlog_group_commits += 94,289 (1.57 commits/group)

threads, catchup_time
1 315.305727
10 90.130341
20 69.594089 
40 59.796552

Comment by Sergei Petrunia [ 2018-02-13 ]

Run with log_slave_updates=ON, slave_parallel_mode=conservative

MyRocks

Sysbench workload generator (30 sec):

Binlog_commits += 215594
Binlog_group_commits += 43450
    transactions:                        212861 (7093.72 per sec.)

1  542.23636 
10 111.546783
20 75.555415 
40 61.089441 

InnoDB

Sysbench workload generator (30 sec):

Binlog_commits += 150773
Binlog_group_commits += 96759 (1.55 commits/group)
    transactions:                        150773 (5024.59 per sec.)

threads, catchup_time
1 322.434903
10 88.847347
20 66.544515
40 58.500228

Comment by Sergei Petrunia [ 2018-02-13 ]

Summarizing two previous tests:

Comment by Sergei Petrunia [ 2018-02-14 ]

Running MariaDB parallel slave with log_slave_updates=ON, slave_parallel_mode=conservative. Stack traces https://gist.github.com/spetrunia/0fb73ad83308bd51ab8ba98c8c676a44

Typical stack looks like this:

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f02fe1be91c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from target:/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007f02f5e93e2b in std::condition_variable::wait<rocksdb::WriteThread::BlockingAwaitState(rocksdb::WriteThread::Writer*, uint8_t)::<lambda()> > (__p=..., __lock=..., this=0x7f02f40fddd0) at /usr/include/c++/5/condition_variable:98
#3  rocksdb::WriteThread::BlockingAwaitState (this=this@entry=0x55bfd4fbc338, w=0x7f02f40fdd40, goal_mask=goal_mask@entry=30 '\036') at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/rocksdb/rocksdb/db/write_thread.cc:44
#4  0x00007f02f5e93f99 in rocksdb::WriteThread::AwaitState (this=this@entry=0x55bfd4fbc338, w=w@entry=0x7f02f40fdd40, goal_mask=goal_mask@entry=30 '\036', ctx=ctx@entry=0x7f02f62d03c0 <rocksdb::jbg_ctx>) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/rocksdb/rocksdb/db/write_thread.cc:176
#5  0x00007f02f5e94777 in rocksdb::WriteThread::JoinBatchGroup (this=this@entry=0x55bfd4fbc338, w=w@entry=0x7f02f40fdd40) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/rocksdb/rocksdb/db/write_thread.cc:318
#6  0x00007f02f5e15ea9 in rocksdb::DBImpl::WriteImplWALOnly (this=this@entry=0x55bfd4fbbb40, write_options=..., my_batch=my_batch@entry=0x7f02e079d800, callback=callback@entry=0x0, log_used=log_used@entry=0x7f02e0a6f678, log_ref=0, seq_used=0x0) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/rocksdb/rocksdb/db/db_impl_write.cc:472
#7  0x00007f02f5e16c76 in rocksdb::DBImpl::WriteImpl (this=0x55bfd4fbbb40, write_options=..., my_batch=0x7f02e079d800, callback=callback@entry=0x0, log_used=log_used@entry=0x7f02e0a6f678, log_ref=0, disable_memtable=true, seq_used=0x0) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/rocksdb/rocksdb/db/db_impl_write.cc:82
#8  0x00007f02f5fca477 in rocksdb::WriteCommittedTxn::PrepareInternal (this=0x7f02e0a6f670) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/rocksdb/rocksdb/utilities/transactions/pessimistic_transaction.cc:223
#9  0x00007f02f5fcabc1 in rocksdb::PessimisticTransaction::Prepare (this=0x7f02e0a6f670) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/rocksdb/rocksdb/utilities/transactions/pessimistic_transaction.cc:195
#10 0x00007f02f5d86342 in myrocks::Rdb_transaction_impl::prepare (this=0x7f02e1342550, name=...) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/rocksdb/ha_rocksdb.cc:2272
#11 0x00007f02f5d67edf in myrocks::rocksdb_prepare (hton=<optimized out>, thd=<optimized out>, prepare_tx=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/rocksdb/ha_rocksdb.cc:2894
#12 0x000055bfd19ab64a in prepare_or_error (all=true, thd=0x7f02e1d6f1a8, ht=0x55bfd4d8f348) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/handler.cc:1158
#13 ha_commit_trans (thd=thd@entry=0x7f02e1d6f1a8, all=all@entry=true) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/handler.cc:1437
#14 0x000055bfd18ecf9b in trans_commit (thd=0x7f02e1d6f1a8) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/transaction.cc:307
#15 0x000055bfd1a80caf in Xid_log_event::do_apply_event (this=0x7f02cfa93078, rgi=0x7f02cfc8b960) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/log_event.cc:8441
#16 0x000055bfd179271b in Log_event::apply_event (rgi=0x7f02cfc8b960, this=0x7f02cfa93078) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/log_event.h:1452
#17 apply_event_and_update_pos_apply (ev=ev@entry=0x7f02cfa93078, thd=thd@entry=0x7f02e1d6f1a8, rgi=rgi@entry=0x7f02cfc8b960, reason=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/slave.cc:3570
#18 0x000055bfd1798a03 in apply_event_and_update_pos_for_parallel (ev=ev@entry=0x7f02cfa93078, thd=thd@entry=0x7f02e1d6f1a8, rgi=rgi@entry=0x7f02cfc8b960) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/slave.cc:3745
#19 0x000055bfd19253a5 in rpt_handle_event (qev=qev@entry=0x7f02ce1a7858, rpt=0x7f02cd44c900) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/rpl_parallel.cc:50
#20 0x000055bfd192a97c in handle_rpl_parallel_thread (arg=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/rpl_parallel.cc:1265
#21 0x00007f02fe4946ba in start_thread (arg=0x7f02f40ff700) at pthread_create.c:333
#22 0x00007f02fdb3f41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

That is, SQL layer is using two-phase commit. Prepare() are done in parallel (and they do sync, right?)

Comment by Sergei Petrunia [ 2018-02-14 ]

Disabled log_slave_updates back and ran InnoDB. Stacks: https://gist.github.com/spetrunia/39f22a276ad7e4df95884ef1a2b996f2

The commits are done in order:

Thread 46 (Thread 0x7f6311b78700 (LWP 11626)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000055c5c598c23b in inline_mysql_cond_wait (src_file=0x55c5c6141978 "/home/buildbot/buildbot/build/mariadb-10.2.12/sql/sql_class.cc", src_line=7212, mutex=0x7f631c4fa3a8, that=0x7f631c4fa3d8) at /home/buildbot/buildbot/build/mariadb-10.2.12/include/mysql/psi/mysql_thread.h:1149
#2  wait_for_commit::wait_for_prior_commit2 (this=this@entry=0x7f631c4fa3a8, thd=thd@entry=0x7f632c000ad8) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/sql_class.cc:7212
#3  0x000055c5c5b57f97 in wait_for_commit::wait_for_prior_commit (thd=0x7f632c000ad8, this=0x7f631c4fa3a8) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/sql_class.h:1965
#4  THD::wait_for_prior_commit (this=0x7f632c000ad8) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/sql_class.h:4122
#5  ha_commit_one_phase (thd=thd@entry=0x7f632c000ad8, all=all@entry=true) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/handler.cc:1545
#6  0x000055c5c5b5880e in ha_commit_trans (thd=thd@entry=0x7f632c000ad8, all=all@entry=true) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/handler.cc:1415
#7  0x000055c5c5a99f9b in trans_commit (thd=0x7f632c000ad8) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/transaction.cc:307
#8  0x000055c5c5c2dcaf in Xid_log_event::do_apply_event (this=0x7f631c128778, rgi=0x7f631c4fa370) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/log_event.cc:8441
#9  0x000055c5c593f71b in Log_event::apply_event (rgi=0x7f631c4fa370, this=0x7f631c128778) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/log_event.h:1452
#10 apply_event_and_update_pos_apply (ev=ev@entry=0x7f631c128778, thd=thd@entry=0x7f632c000ad8, rgi=rgi@entry=0x7f631c4fa370, reason=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/slave.cc:3570
#11 0x000055c5c5945a03 in apply_event_and_update_pos_for_parallel (ev=ev@entry=0x7f631c128778, thd=thd@entry=0x7f632c000ad8, rgi=rgi@entry=0x7f631c4fa370) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/slave.cc:3745
#12 0x000055c5c5ad23a5 in rpt_handle_event (qev=qev@entry=0x7f631cbda6b8, rpt=0x7f631c008318) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/rpl_parallel.cc:50
#13 0x000055c5c5ad797c in handle_rpl_parallel_thread (arg=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/rpl_parallel.cc:1265
#14 0x00007f6336b7d6ba in start_thread (arg=0x7f6311b78700) at pthread_create.c:333
#15 0x00007f633622841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

There is one committing thread:

Thread 43 (Thread 0x7f6311b2e700 (LWP 11623)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000055c5c5e1b826 in os_event::wait (this=0x55c5c93f6660) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/os/os0event.cc:166
#2  os_event::wait_low (reset_sig_count=8536, this=0x55c5c93f6660) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/os/os0event.cc:336
#3  os_event_wait_low (event=0x55c5c93f6660, reset_sig_count=reset_sig_count@entry=0) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/os/os0event.cc:535
#4  0x000055c5c5dee708 in log_write_up_to (lsn=<optimized out>, flush_to_disk=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/log/log0log.cc:1175
#5  0x000055c5c5df400a in log_write_up_to (lsn=<optimized out>, flush_to_disk=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/log/log0log.cc:1299
#6  0x000055c5c5eebc57 in trx_flush_log_if_needed_low (lsn=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/trx/trx0trx.cc:1585
#7  trx_flush_log_if_needed (trx=0x7f63213856f0, lsn=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/trx/trx0trx.cc:1607
#8  trx_commit_complete_for_mysql (trx=trx@entry=0x7f63213856f0) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/trx/trx0trx.cc:2259
#9  0x000055c5c5db2305 in innobase_commit (hton=<optimized out>, thd=0x7f62d4006f58, commit_trx=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/handler/ha_innodb.cc:4792
#10 0x000055c5c5b5669e in commit_one_phase_2 (thd=thd@entry=0x7f62d4006f58, all=<optimized out>, trans=0x7f62d400a030, is_real_trans=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/handler.cc:1567
#11 0x000055c5c5b57f67 in ha_commit_one_phase (thd=thd@entry=0x7f62d4006f58, all=all@entry=true) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/handler.cc:1548
#12 0x000055c5c5b5880e in ha_commit_trans (thd=thd@entry=0x7f62d4006f58, all=all@entry=true) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/handler.cc:1415
#13 0x000055c5c5a99f9b in trans_commit (thd=0x7f62d4006f58) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/transaction.cc:307
#14 0x000055c5c5c2dcaf in Xid_log_event::do_apply_event (this=0x7f631c7fd0b8, rgi=0x7f631c7d0520) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/log_event.cc:8441
#15 0x000055c5c593f71b in Log_event::apply_event (rgi=0x7f631c7d0520, this=0x7f631c7fd0b8) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/log_event.h:1452
#16 apply_event_and_update_pos_apply (ev=ev@entry=0x7f631c7fd0b8, thd=thd@entry=0x7f62d4006f58, rgi=rgi@entry=0x7f631c7d0520, reason=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/slave.cc:3570
#17 0x000055c5c5945a03 in apply_event_and_update_pos_for_parallel (ev=ev@entry=0x7f631c7fd0b8, thd=thd@entry=0x7f62d4006f58, rgi=rgi@entry=0x7f631c7d0520) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/slave.cc:3745
#18 0x000055c5c5ad23a5 in rpt_handle_event (qev=qev@entry=0x7f631c7d4a68, rpt=0x7f631c007e80) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/rpl_parallel.cc:50
#19 0x000055c5c5ad797c in handle_rpl_parallel_thread (arg=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.2.12/sql/rpl_parallel.cc:1265
#20 0x00007f6336b7d6ba in start_thread (arg=0x7f6311b2e700) at pthread_create.c:333
#21 0x00007f633622841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

.. and it is doing sync. One would expect that this would cause the parallel slave in this case to slow down to the speed of single-threaded one, but this is apparently not happening.

Comment by Sergei Petrunia [ 2018-02-19 ]

Debugging InnoDB on 10.2

Master

sysbench: 151 676 transactions
Binlog_commits += 151 676
Binlog_commit_groups += 95 717

Slave (debug build):

threads=20
log_write_up_to calls = 375 744
catchup time: 624.609522 sec.

Slave without the tracing code (for comparison)

sysbench on master: 145,089 transactions
catchup time: 592.940083

There are many more log_write_up_to calls than transactions

Comment by Sergei Petrunia [ 2018-02-19 ]

Now slave catch-up takes ~595 sec,
number of commits matches the expected one and we see that commits do happen
in parallel:

counter_value=145,158
n_active histogram= {
  0, 0
  1, 36464
  2, 29865
  3, 21621
  4, 15590
  5, 11351
  6, 8336
  7, 6121
  8, 4456
  9, 3220
  10, 2371
  11, 1685
  12, 1231
  13, 852
  14, 624
  15, 436
  16, 308
  17, 231
  18, 177
  19, 143
  20, 76
  21, 0
  22, 0
  23, 0
  24, 0
  25, 0
  26, 0
  27, 0
  28, 0
  29, 0
  30, 0
  31, 0
}

Comment by Sergei Petrunia [ 2018-02-19 ]

Using the same test setup as before (145K transactions, slave with slave_parallel_workers=20, _mode=conservative, NO log-slave-updates) to look at which code in innobase_commit executes in parallel:

This piece of code is executed by one thread at a time :

    if (commit_trx
            || (!thd_test_options(thd, OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN))) {
 
                DBUG_EXECUTE_IF("crash_innodb_before_commit",
                                DBUG_SUICIDE(););
 
 
                /* Run the fast part of commit if we did not already. */
                if (!trx_is_active_commit_ordered(trx)) {
                        innobase_commit_ordered_2(trx, thd);
 
                }
 
                /* We were instructed to commit the whole transaction, or
                this is an SQL statement end and autocommit is on */
 
                /* At this point commit order is fixed and transaction is
                visible to others. So we can wakeup other commits waiting for
                this one, to allow then to group commit with us. */

histogram:

 n_active histogram= {
   0, 0
   1, 145158
   2, 0
   3, 0

...
while this part of code runs in parallel, most likely due to the first call it is making:

                thd_wakeup_subsequent_commits(thd, 0);
 
 
                /* Now do a write + flush of logs. */
                trx_commit_complete_for_mysql(trx);
 
                trx_deregister_from_2pc(trx);
        } else {

histogram:

n_active histogram= {
  0, 0
  1, 39488
  2, 29460
  3, 20922
  4, 14941
  5, 10823
  6, 7922
  7, 5884
  8, 4360
  9, 3136
  10, 2266
  11, 1659
  12, 1234
  13, 880
  14, 632
  15, 479
  16, 360
  17, 275
  18, 212
  19, 150
  20, 75
  21, 0
  22, 0
  23, 0
...
}

Comment by Sergei Petrunia [ 2018-02-21 ]

Filed MDEV-15372: Parallel slave speedup very limited when log_slave_updates=OFF, now testing a fix for it.

Comment by Sergei Petrunia [ 2018-03-15 ]

RC release was made so closing this task

Generated at Thu Feb 08 08:19:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.