[MDEV-7200] OLTP RO 2% performance degradation in 10.0.15 Created: 2014-11-25  Updated: 2015-02-11  Resolved: 2015-02-11

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - XtraDB
Affects Version/s: 10.0.15
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Sergey Vojtovich Assignee: Sergey Vojtovich
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-6478 MariaDB on Power8 Closed

 Description   

I can see 2% performance drop in 10.0.15 compared to 10.0.14. This seem to be caused mainly by this revision:

    revno: 4500.1.26 [merge]
    committer: Sergei Golubchik <sergii@pisem.net>
    branch nick: 10.0
    timestamp: Thu 2014-11-20 17:05:13 +0100
    message:
      XtraDB 5.6.21-70.0

At least reverting it helps.

Tracking it further down I can see that this regression comes from upstream revision:

revno: 6119
revision-id: bin.x.su@oracle.com-20140819071006-qq5nsq3nkqytml4w
parent: sujatha.sivakumar@oracle.com-20140819042708-rx1r6q8ng1pgf9dq
committer: bin.x.su@oracle.com
branch nick: mysql-5.6
timestamp: Tue 2014-08-19 15:10:06 +0800
message:
  Bug#18477009 - INACCURATE HANDLING OF SRV_ACTIVITY_COUNT
 
  We call srv_active_wake_master_thread() directly and one of the places is
  innobase_commit (and prepare as well). This call not only wakes up the
  master thread but also increments the srv_activity_count which tells
  the page_cleaner that server is not idle. That's no what we expect.
 
  We should call srv_active_wake_master_thread() only after the commitment
  of a write trx, but not read-only trx, or after a rollback. This patch also
  changes some call of srv_active_wake_master_thread() to
  ib_wake_master_thread().
 
  Original patch is provided by Inaam.
 
  rb#5909, approved by Jimmy.

Specifically think hunk:

=== modified file 'storage/innobase/handler/ha_innodb.cc'
--- storage/innobase/handler/ha_innodb.cc       2014-08-02 07:51:08 +0000
+++ storage/innobase/handler/ha_innodb.cc       2014-08-19 07:10:06 +0000
@@ -3584,10 +3584,6 @@ innobase_commit(
 
        innobase_srv_conc_force_exit_innodb(trx);
 
-       /* Tell the InnoDB server that there might be work for utility
-       threads: */
-       srv_active_wake_master_thread();
-
        DBUG_RETURN(0);
 }



 Comments   
Comment by Sergey Vojtovich [ 2014-11-25 ]

Jan, please check what we can do about it.

Comment by Jan Lindström (Inactive) [ 2014-11-25 ]

Interesting, is that hunk the reason for slowdown really, removing something makes it slower ? That we can fix.

Comment by Sergey Vojtovich [ 2014-11-25 ]

With this hunk:

[  90s] threads: 160, tps: 32305.67, reads/s: 452285.98, writes/s: 0.00, response time: 5.21ms (95%)
[ 100s] threads: 160, tps: 32584.20, reads/s: 456166.93, writes/s: 0.00, response time: 5.20ms (95%)
[ 110s] threads: 160, tps: 31983.60, reads/s: 447774.85, writes/s: 0.00, response time: 5.22ms (95%)

If I revert just this hunk:

[ 100s] threads: 160, tps: 33627.49, reads/s: 470789.30, writes/s: 0.00, response time: 5.09ms (95%)
[ 110s] threads: 160, tps: 33585.80, reads/s: 470202.75, writes/s: 0.00, response time: 5.10ms (95%)
[ 120s] threads: 160, tps: 33635.39, reads/s: 470883.80, writes/s: 0.00, response time: 5.09ms (95%)

Comment by Jan Lindström (Inactive) [ 2014-11-25 ]

If there is not a big regression for read-write performance, you have my permission to add those lines back.

Comment by Sergey Vojtovich [ 2014-11-26 ]

Reverting just this hunk will most probably make DML statements to wake waiters twice. Not sure if that'll affect performance, but it sounds excessive at least.

Questions to answer are:

  • what's wrong about calling srv_active_wake_master_thread() for read-only queries?
  • why not calling it introduced performance regression?
Comment by Jan Lindström (Inactive) [ 2014-11-26 ]

Read only workload is not that important compared to read-write workloads. You may change that but please test read-write performance also.

Comment by Sergey Vojtovich [ 2015-02-11 ]

Couldn't reproduce it again. The only notable change I can think of is that Advance Toolchain was upgraded on the build host. Couldn't find this problem by code analysis either.

Last run numbers:

10.0.15-power (+7200): 34475
10.0.15-power (-7200): 34487
10.0.16-power (+7200): 34570
10.0.16-power (-7200): 34517
10.1 (+7200): 33359
10.1 (-7200): 33314

Generated at Thu Feb 08 07:17:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.