[MDEV-25598] INNODB_ROWS_UPDATED / n_rows_updated can become imprecise under heavy load Created: 2021-05-05  Updated: 2021-05-21

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.9
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Lau Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 1
Labels: innodb
Environment:

Any



 Description   

(We are currently in the process of adding Pomelo.EntityFramework.MySql and related benchmarks to the TechEmpower Framework Benchmarks. As part of that, I have also prepared a PR that will add MariaDB as a database to the benchmark. Preliminary locally run tests show about a 4 % increase in speed over MySQL. Feel free to join the conversation.)

Affects all versions of MariaDB and MySQL.
I posted a related bug report for MySQL.

Issue:

The INNODB_ROWS_UPDATED status variable is used by the TechEmpower Framework Benchmarks for MariaDB (and MySQL) verifying the expected UPDATE count after a benchmark run.

The framework tests different scenarios from little to high loads (high concurrency).

When running 10240 updates, divided into 512 concurrent calls, so 20 updates per connection, MariaDB (and MySQL) regularly only reports between 10237 and 10239 updates, even though all 10240 updates where successfully executed.

It seems, that the srv_stats.n_rows_updated.inc(); calls increment the stats in an concurrency unsafe way:

  /** If you can't use a good index id. Increment by 1. */
  void inc() UNIV_NOTHROW { add(1); }
 
  /** If you can't use a good index id.
  @param n is the amount to increment */
  void add(Type n) UNIV_NOTHROW {
    size_t i = m_policy.offset(m_policy.get_rnd_index());
 
    ut_ad(i < UT_ARR_SIZE(m_counter));
 
    m_counter[i] += n;
  }

The current workaround is to heuristically ignore the difference, if it is within an expected margin, which defeats the purpose of verifying the accuracy/health of the benchmark before running it.

How to repeat:

Run 10240 updates, divided in 512 (truly) concurrent connections, against any MariaDB (or MySQL) InnoDB table and check the expected INNODB_ROWS_UPDATED (and INNODB_ROWS_READ) count against the actual one. They should differ in about 4 out of 5 runs.

Suggested fix:

I would propose to use interlocked increment operations for inc() and interlocked add operations for add(), for a maximum in performance and counting accuracy at the same time.



 Comments   
Comment by Lau [ 2021-05-08 ]

The following post was added to the bug report on the MySQL issue tracker in response to the bug that I reported there:

Atomics are expensive. n_rows_updates is not meant be to 100% accurate.

SELECT ROW_COUNT(); is the way to get total the number of rows updated. See https://dev.mysql.com/doc/refman/8.0/en/information-functions.html#function_row-count

I posted the following response to it:

Atomic operations are implemented to be as performant as possible and are the most inexpensive way to atomically change the value of a variable.

The x86 standard implements CPU instructions for these operations (see https://en.wikipedia.org/wiki/Compare-and-swap#Implementations for a quick overview).

Since UPDATE operations on InnoDB are *very* expensive (even in comparison to other database systems like PostgreSQL, which are very expensive themselves), it does not matter in reality, whether the operation takes 3 nanoseconds for an interlocked increment, instead of 1 nanosecond for a non-interlocked increment. (In addition to that, the current implementation in MySQL isn't even optimized, and already more expensive than it needs to be.)

Therefore, this bug should still be fixed, because it has no practical performance downsides and only operational upsides.

Using ROW_COUNT() is not applicable in the TechEmpower Framework Benchmark scenarios, because issuing additional statements would falsify the benchmark itself. Instead, the status/instrumentation variables are being read before and after the benchmark, to ensure that the benchmark performed correctly.

Therefore, an accurate status variable count is necessary here and since the current counts are inaccurate, this bug needs to be fixed.

So to summarize, the risk is low, the gain is high, the performance impact negligible and the implementation very simple.

Comment by Sergei Golubchik [ 2021-05-13 ]

marko, what do you think?

Comment by Lau [ 2021-05-21 ]

Here is a locally run benchmark (using a proven benchmark library) that measures the time of 100 simple increments against 100 interlocked increments:

|               Method |     Mean |   Error |  StdDev |
|--------------------- |---------:|--------:|--------:|
|      SimpleIncrement | 116.7 ns | 0.12 ns | 0.11 ns |
| InterlockedIncrement | 525.9 ns | 0.68 ns | 0.57 ns |

So a simple increment instruction takes about 1.167 ns on my local machine (single CPU, 8 cores), while an interlocked increment takes about 5 times that amount with 5.259 ns.

Assuming around 1,000 UPDATE statements within one second (which already requires beefy hardware for MySQL, according to the TechEmpower Framework Benchmark), the 5.259 ns would take up about 0.0005259 % of the time spend for a request.

In total, it would add 5 additional microseconds for 1,000 UPDATE statements per second (1.000005 seconds instead of 1 second).

Generated at Thu Feb 08 09:38:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.