[MDEV-34515] Contention between secondary index UPDATE and purge due to large innodb_purge_batch_size - Jira

Vladislav Vaintroub created issue - 2024-07-02 22:12

Vladislav Vaintroub made changes - 2024-07-02 22:13

Field	Original Value	New Value
Description	When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5 The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users. So, the setup: server: {code:bash} mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise {code} sysbench {code:bash} sysbench --table-size=1 --tables=1 --threads=1 --time=120 oltp_update_index --rand-type=uniform --report-interval=1 --histogram --mysql-user=root --mysql-db=mysql --mysql-host=. --skip-trx run {code}	When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5 The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users. So, the setup: server: {code:bash} mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise {code} sysbench {code:bash} sysbench --table-size=1 --tables=1 --threads=1 --time=120 oltp_update_index --rand-type=uniform --report-interval=1 --histogram --mysql-user=root --mysql-db=mysql --mysql-host=. --skip-trx run {code} Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d) {noformat} SQL statistics: queries performed: read: 0 write: 71065 other: 0 total: 71065 transactions: 71065 (592.19 per sec.) queries: 71065 (592.19 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 120.0030s total number of events: 71065 {noformat}

Vladislav Vaintroub made changes - 2024-07-02 22:13

Assignee

Marko Mäkelä [ marko ]

Vladislav Vaintroub made changes - 2024-07-02 22:20

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench
{code:bash}
sysbench --table-size=1 --tables=1 --threads=1 --time=120 oltp_update_index --rand-type=uniform --report-interval=1 --histogram --mysql-user=root --mysql-db=mysql --mysql-host=. --skip-trx run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)
{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)
{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

Vladislav Vaintroub made changes - 2024-07-02 22:20

Attachment

10_6_flame.svg [ 73727 ]

Vladislav Vaintroub made changes - 2024-07-02 22:21

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)
{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)
{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

As one can see from the attached [^10_6_flame.svg]

Vladislav Vaintroub made changes - 2024-07-02 22:26

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)
{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

As one can see from the attached [^10_6_flame.svg]

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)
{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy. Still, it does not achieve anything, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes.

Results 10.5 (current,

Vladislav Vaintroub made changes - 2024-07-02 22:30

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)
{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy. Still, it does not achieve anything, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes.

Results 10.5 (current,

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)
{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy. Still, it does not achieve anything, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes.

Results 10.5 (current,

Vladislav Vaintroub made changes - 2024-07-02 22:31

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)
{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy. Still, it does not achieve anything, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes.

Results 10.5 (current,

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)

{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy. Still, it does not achieve anything, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes.

Results 10.5 (current,

Vladislav Vaintroub made changes - 2024-07-02 22:32

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)

{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy. Still, it does not achieve anything, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes.

Results 10.5 (current,

||version||tps||latency ms 95%||max history length||
|10.5||
When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)

{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy. Still, it does not achieve anything, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes.

Results 10.5 (current,

Vladislav Vaintroub made changes - 2024-07-02 22:33

Description

||version||tps||latency ms 95%||max history length||
|10.5||
When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)

{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy. Still, it does not achieve anything, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes.

Results 10.5 (current,

||version||tps||latency ms 95%||max history length||
|10.5||16501.81||0.07||
When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)

{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy. Still, it does not achieve anything, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes.

Results 10.5 (current,

Vladislav Vaintroub made changes - 2024-07-02 22:44

Description

||version||tps||latency ms 95%||max history length||
|10.5||16501.81||0.07||
When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.6 (current, f6fcfc1a6a058fd7cac6bf53216ea73f3a04b22d)

{noformat}
SQL statistics:
    queries performed:
        read: 0
        write: 71065
        other: 0
        total: 71065
    transactions: 71065 (592.19 per sec.)
    queries: 71065 (592.19 per sec.)
    ignored errors: 0 (0.00 per sec.)
    reconnects: 0 (0.00 per sec.)

General statistics:
    total time: 120.0030s
    total number of events: 71065
{noformat}

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy. Still, it does not achieve anything, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes.

Results 10.5 (current,

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.5|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think to record is actually purged.

In contrast,

Vladislav Vaintroub made changes - 2024-07-02 22:44

Attachment

10_5_flame.svg [ 73728 ]

Vladislav Vaintroub made changes - 2024-07-02 22:49

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.5|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think to record is actually purged.

In contrast,

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.5|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think to record is actually purged.

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in `do_command`, and purge works, keeping history length tiny during the benchmark, which results in 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

Vladislav Vaintroub made changes - 2024-07-02 22:50

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.5|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think to record is actually purged.

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in `do_command`, and purge works, keeping history length tiny during the benchmark, which results in 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think to record is actually purged.

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in `do_command`, and purge works, keeping history length tiny during the benchmark, which results in 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

Vladislav Vaintroub made changes - 2024-07-02 22:56

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think to record is actually purged.

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in `do_command`, and purge works, keeping history length tiny during the benchmark, which results in 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, `ssux_lock_impl<true>::wr_wait` from `buf_page_get_low` being the most expensive, and `ssux_lock_impl<true>::wr_wait` second expensive calls

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in `do_command`, and purge works, keeping history length tiny during the benchmark, which results in 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

Vladislav Vaintroub made changes - 2024-07-02 22:57

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, `ssux_lock_impl<true>::wr_wait` from `buf_page_get_low` being the most expensive, and `ssux_lock_impl<true>::wr_wait` second expensive calls

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in `do_command`, and purge works, keeping history length tiny during the benchmark, which results in 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, ``ssux_lock_impl<true>::wr_wait`` from `buf_page_get_low` being the most expensive, and `ssux_lock_impl<true>::wr_wait` second expensive calls

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in `do_command`, and purge works, keeping history length tiny during the benchmark, which results in 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

Vladislav Vaintroub made changes - 2024-07-02 22:59

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, ``ssux_lock_impl<true>::wr_wait`` from `buf_page_get_low` being the most expensive, and `ssux_lock_impl<true>::wr_wait` second expensive calls

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in `do_command`, and purge works, keeping history length tiny during the benchmark, which results in 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in `do_command`, and purge works, keeping history length tiny during the benchmark, which results in 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

Vladislav Vaintroub made changes - 2024-07-02 22:59

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in `do_command`, and purge works, keeping history length tiny during the benchmark, which results in 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, which results in 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

Vladislav Vaintroub made changes - 2024-07-02 22:59

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, which results in 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

Vladislav Vaintroub made changes - 2024-07-02 23:00

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

Vladislav Vaintroub made changes - 2024-07-02 23:36

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] . One might want to run the benchmark twice for this commit , on the first run unexplainably it started slow, but showed a better performance at about the middle of benchmark, on the second run it was all slow.

So, the results for that commit vs previous 88733282fb15c80f0bd722df0041d06ad90c26b0 are these

||commit|| tps || latency ms 95% || history length ||
aa719b5010c (bad) || 592.96 || 5.37 ||

Vladislav Vaintroub made changes - 2024-07-02 23:37

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] . One might want to run the benchmark twice for this commit , on the first run unexplainably it started slow, but showed a better performance at about the middle of benchmark, on the second run it was all slow.

So, the results for that commit vs previous 88733282fb15c80f0bd722df0041d06ad90c26b0 are these

||commit|| tps || latency ms 95% || history length ||
aa719b5010c (bad) || 592.96 || 5.37 ||

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] . One might want to run the benchmark twice for this commit , on the first run unexplainably it started slow, but showed a better performance at about the middle of benchmark, on the second run it was all slow.

So, the results for that commit vs previous 88733282fb15c80f0bd722df0041d06ad90c26b0 are these

||commit|| tps || latency ms 95% || history length ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|

Vladislav Vaintroub made changes - 2024-07-02 23:41

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] . One might want to run the benchmark twice for this commit , on the first run unexplainably it started slow, but showed a better performance at about the middle of benchmark, on the second run it was all slow.

So, the results for that commit vs previous 88733282fb15c80f0bd722df0041d06ad90c26b0 are these

||commit|| tps || latency ms 95% || history length ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] . One might want to run the benchmark twice for this commit , on the first run unexplainably it started slow, but showed a better performance at about the middle of benchmark, on the second run it was all slow.

So, the results for that commit vs previous [https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0 | 88733282fb15c80f0bd722df0041d06ad90c26b0] are below

||commit|| tps || latency ms 95% || history length max ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5

Vladislav Vaintroub made changes - 2024-07-02 23:41

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] . One might want to run the benchmark twice for this commit , on the first run unexplainably it started slow, but showed a better performance at about the middle of benchmark, on the second run it was all slow.

So, the results for that commit vs previous [https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0 | 88733282fb15c80f0bd722df0041d06ad90c26b0] are below

||commit|| tps || latency ms 95% || history length max ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] . One might want to run the benchmark twice for this commit , on the first run unexplainably it started slow, but showed a better performance at about the middle of benchmark, on the second run it was all slow.

So, the results for that commit vs previous [https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0 | 88733282fb15c80f0bd722df0041d06ad90c26b0] are below

||commit|| tps || latency ms 95% || history length max ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5

Vladislav Vaintroub made changes - 2024-07-02 23:46

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] . One might want to run the benchmark twice for this commit , on the first run unexplainably it started slow, but showed a better performance at about the middle of benchmark, on the second run it was all slow.

So, the results for that commit vs previous [https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0 | 88733282fb15c80f0bd722df0041d06ad90c26b0] are below

||commit|| tps || latency ms 95% || history length max ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous [https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0 | 88733282fb15c80f0bd722df0041d06ad90c26b0] are below

||commit|| tps || latency ms 95% || history length max ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

Vladislav Vaintroub made changes - 2024-07-03 00:04

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous [https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0 | 88733282fb15c80f0bd722df0041d06ad90c26b0] are below

||commit|| tps || latency ms 95% || history length max ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous [https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0 | 88733282fb15c80f0bd722df0041d06ad90c26b0] are below

||commit|| tps || latency ms 95% || history length max ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

Vladislav Vaintroub made changes - 2024-07-03 00:05

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous [https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0 | 88733282fb15c80f0bd722df0041d06ad90c26b0] are below

||commit|| tps || latency ms 95% || history length max ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous [https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0 | 88733282fb15c80f0bd722df0041d06ad90c26b0] are below

||commit|| tps || latency ms 95% || history length max ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

Vladislav Vaintroub made changes - 2024-07-03 00:07

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph||
|10.5|16501.81|0.07|95|
|10.6|592.19| 5.28|70332|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous [https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0 | 88733282fb15c80f0bd722df0041d06ad90c26b0] are below

||commit|| tps || latency ms 95% || history length max ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph||
|10.5|16501.81|0.07|95|[^10_5_flame.svg]|
|10.6|592.19| 5.28|70332|[^10_6_flame.svg]|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous [https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0 |88733282fb15c80f0bd722df0041d06ad90c26b0] are below

||commit|| tps || latency ms 95% || history length max ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

Vladislav Vaintroub made changes - 2024-07-03 00:07

Attachment

aa719b5010c9_flame2.svg [ 73729 ]

Vladislav Vaintroub made changes - 2024-07-03 00:07

Attachment

88733282fb15_flame.svg [ 73730 ]

Vladislav Vaintroub made changes - 2024-07-03 00:09

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph||
|10.5|16501.81|0.07|95|[^10_5_flame.svg]|
|10.6|592.19| 5.28|70332|[^10_6_flame.svg]|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous [https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0 |88733282fb15c80f0bd722df0041d06ad90c26b0] are below

||commit|| tps || latency ms 95% || history length max ||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph||
|10.5|16501.81|0.07|95|[^10_5_flame.svg]|
|10.6|592.19| 5.28|70332|[^10_6_flame.svg]|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous commit [88733282fb15c80f0bd722df0041d06ad90c26b0|https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0|] are below

||commit|| tps || latency ms 95% || history length max || flamegraph||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

Vladislav Vaintroub made changes - 2024-07-03 00:10

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph||
|10.5|16501.81|0.07|95|[^10_5_flame.svg]|
|10.6|592.19| 5.28|70332|[^10_6_flame.svg]|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous commit [88733282fb15c80f0bd722df0041d06ad90c26b0|https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0|] are below

||commit|| tps || latency ms 95% || history length max || flamegraph||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 |

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph||
|10.5|16501.81|0.07|95|[^10_5_flame.svg]|
|10.6|592.19| 5.28|70332|[^10_6_flame.svg]|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous commit [88733282fb15c80f0bd722df0041d06ad90c26b0|https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0|] are below

||commit|| tps || latency ms 95% || history length max || flamegraph||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 | [^88733282fb15_flame.svg]

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

Vladislav Vaintroub made changes - 2024-07-03 00:12

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph||
|10.5|16501.81|0.07|95|[^10_5_flame.svg]|
|10.6|592.19| 5.28|70332|[^10_6_flame.svg]|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous commit [88733282fb15c80f0bd722df0041d06ad90c26b0|https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0|] are below

||commit|| tps || latency ms 95% || history length max || flamegraph||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 | [^88733282fb15_flame.svg]

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph||
|10.5|16501.81|0.07|95|[^10_5_flame.svg]|
|10.6|592.19| 5.28|70332|[^10_6_flame.svg]|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous commit [88733282fb15c80f0bd722df0041d06ad90c26b0|https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0|] are below

||commit|| tps || latency ms 95% || history length max || flamegraph||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 | [^88733282fb15_flame.svg] |

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

Vladislav Vaintroub made changes - 2024-07-03 00:17

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph||
|10.5|16501.81|0.07|95|[^10_5_flame.svg]|
|10.6|592.19| 5.28|70332|[^10_6_flame.svg]|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark twice for this commit , on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous commit [88733282fb15c80f0bd722df0041d06ad90c26b0|https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0|] are below

||commit|| tps || latency ms 95% || history length max || flamegraph||
|aa719b5010c (bad) | 592.96 | 5.37 | 69069 |
|88733282fb15(good) | 19003.11|0.07| 217 | [^88733282fb15_flame.svg] |

In fact, the "good" commit before regression was 20% better than 10.5
The "bad" commit also shows that

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph||
|10.5|16501.81|0.07|95|[^10_5_flame.svg]|
|10.6|592.19| 5.28|70332|[^10_6_flame.svg]|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark 2 or 3 times for this commit , from col start, on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous commit [88733282fb15c80f0bd722df0041d06ad90c26b0|https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0|] are below

||commit|| tps || latency ms 95% || history length max || flamegraph||
|aa719b5010c (*bad*) | 592.96 | 5.37 | 69069 | [^aa719b5010c9_flame2.svg]|
|88733282fb15(*good*) | 19003.11|0.07| 217 | [^88733282fb15_flame.svg] |

In fact, the "good" commit before regression was 20% better than 10.5, and purging uses only 7% CPU while managing to keep history length low.

The "bad" flamegraph is pretty much the same as current 10.6, purge using up most of the CPU.

Vladislav Vaintroub made changes - 2024-07-03 00:17

Priority

Major [ 3 ]

Critical [ 2 ]

Vladislav Vaintroub made changes - 2024-07-03 00:17

Assignee

Marko Mäkelä [ marko ]

Vladislav Vaintroub made changes - 2024-07-03 00:18

Fix Version/s

10.6 [ 24028 ]

Vladislav Vaintroub made changes - 2024-07-03 00:18

Labels

performance

Vladislav Vaintroub made changes - 2024-07-03 00:25

Description

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

So, the setup:
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph||
|10.5|16501.81|0.07|95|[^10_5_flame.svg]|
|10.6|592.19| 5.28|70332|[^10_6_flame.svg]|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What probably causes it

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark 2 or 3 times for this commit , from col start, on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous commit [88733282fb15c80f0bd722df0041d06ad90c26b0|https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0|] are below

||commit|| tps || latency ms 95% || history length max || flamegraph||
|aa719b5010c (*bad*) | 592.96 | 5.37 | 69069 | [^aa719b5010c9_flame2.svg]|
|88733282fb15(*good*) | 19003.11|0.07| 217 | [^88733282fb15_flame.svg] |

In fact, the "good" commit before regression was 20% better than 10.5, and purging uses only 7% CPU while managing to keep history length low.

The "bad" flamegraph is pretty much the same as current 10.6, purge using up most of the CPU.

When I run sysbench oltp_update_index on current 10.6-11.6 using very small amount of data (1 row in 1 table), performance regresses compared to 10.5

The easiest way for me to reproduce it, is to run server with --innodb-flush-log-at-trx-commit=2 or 0 and a single benchmark user. I also see the same effect with --innodb-flush-log-at-trx-commit=1 and multiple users.

h2. Benchmark setup
server:

{code:bash}
mysqld --innodb-flush-log-at-trx-commit=2 #all defaults otherwise
{code}

sysbench prepare

{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram prepare
{code}
{code:bash}
sysbench oltp_update_index --table-size=1 --tables=1 --threads=1 --mysql-user=root --mysql-db=mysql --report-interval=1 --time=120 --mysql-socket=/tmp/mysql.sock --histogram run
{code}

Results 10.5 vs 10.6
||version||tps||latency ms 95%||max history length|| flamegraph||
|10.5|16501.81|0.07|95|[^10_5_flame.svg]|
|10.6|592.19| 5.28|70332|[^10_6_flame.svg]|

As one can see from the attached [^10_6_flame.svg] , in 10.6 the purge is abnormally busy, with 85% of all CPU samples in purge . However, no matter how busy, purge does not achieve its goal, as Innodb_history_list_length grows continuously up to about 70K within 2 minutes. I think no record is actually purged. Additionally, foreground threads run into some sort of buffer pool contention, {{ssux_lock_impl<true>::wr_wait}} from {{buf_page_get_low}} being the most , and {{ssux_lock_impl<true>::wr_wait}} second most prominent places

In contrast, as one can see from [^10_5_flame.svg] . in 10.5 the CPU time is spent roughly evenly between purge and foreground processing in {{do_command}}, and purge works, keeping history length tiny during the benchmark, with overall 25x better performance (although purge seems to have relatively high overhead to wake/wait for workers doing their tiny tasks, but this is for another day)

h2. What causes it, according to "git bisect"

I ran manual bisect, and it points to [aa719b5010c ~~MDEV-32050~~: Do not copy undo records in purge |https://github.com/mariadb/server/commit/aa719b5010c] from 10.6.16 .
I had want to run the benchmark 2 or 3 times for this commit , from col start, on the first run it started slow, but unexplainably showed a better performance at about the middle of benchmark, on the second run it was slow all the way.

So, the results for that commit vs previous commit [88733282fb15c80f0bd722df0041d06ad90c26b0|https://github.com/mariadb/server/commit/88733282fb15c80f0bd722df0041d06ad90c26b0|] are below

||commit|| tps || latency ms 95% || history length max || flamegraph||
|aa719b5010c (*bad*) | 592.96 | 5.37 | 69069 | [^aa719b5010c9_flame2.svg]|
|88733282fb15(*good*) | 19003.11|0.07| 217 | [^88733282fb15_flame.svg] |

In fact, the "good" commit before regression was 20% better than 10.5, and purging uses only 7% CPU while managing to keep history length low.

The "bad" flamegraph is pretty much the same as current 10.6, purge using up most of the CPU.

h2. Can it is reproduced differently ?

I could also reproduce it by leaving innodb_flush_log_at_trx_commit=0 .
I can also reproduce it by increasing concurrency (--threads=10 in sysbench), and leaving innodb_flush_log_at_trx_commit default (1).

The bad effect seems to vanish once more rows are updated, but maybe I did not find a way to have a repro for that.

Vladislav Vaintroub made changes - 2024-07-03 00:26

Link

This issue is caused by ~~MDEV-32050~~ [ ~~MDEV-32050~~ ]

Marko Mäkelä made changes - 2024-08-02 12:02

Link

This issue relates to MDEV-17598 [ MDEV-17598 ]

Marko Mäkelä made changes - 2024-08-02 14:36

Link

This issue blocks MDEV-33966 [ MDEV-33966 ]

Marko Mäkelä made changes - 2024-08-06 10:58

Status

Open [ 1 ]

In Progress [ 3 ]

Marko Mäkelä made changes - 2024-08-06 12:57

Link

This issue relates to ~~MDEV-34520~~ [ ~~MDEV-34520~~ ]

Marko Mäkelä made changes - 2024-08-09 06:16

Attachment

10.5-libstdc++.svg [ 73925 ]

Marko Mäkelä made changes - 2024-08-09 12:51

Attachment

10.6-good-88733282fb15c80f0bd722df0041d06ad90c26b0.svg [ 73926 ]

Marko Mäkelä made changes - 2024-08-09 12:51

Attachment

10.6-bad-aa719b5010c929132b4460b78113fbd07497d9c8.svg [ 73927 ]

Marko Mäkelä made changes - 2024-08-09 13:02

Attachment

10.6-good-88733282fb15c80f0bd722df0041d06ad90c26b0-large.svg [ 73928 ]

Marko Mäkelä made changes - 2024-08-09 13:02

Attachment

10.6-bad-aa719b5010c929132b4460b78113fbd07497d9c8-large.svg [ 73929 ]

Marko Mäkelä made changes - 2024-08-09 13:15

Attachment		10.6-innodb-purge-batch-size-10.svg [ 73930 ]
Attachment		10.6-innodb-purge-batch-size-30.svg [ 73931 ]

Marko Mäkelä made changes - 2024-08-12 13:49

Summary

Performance regression oltp_update_index on small table, purge

Contention between secondary index UPDATE and purge due to large innodb_purge_batch_size

Marko Mäkelä made changes - 2024-08-12 14:04

Assignee	Marko Mäkelä [ marko ]	Debarun Banerjee [ JIRAUSER54513 ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Jira Automation (IT) made changes - 2024-08-19 09:11

Zendesk Related Tickets		204001
Zendesk active tickets		204001

Debarun Banerjee made changes - 2024-08-22 13:29

Assignee	Debarun Banerjee [ JIRAUSER54513 ]	Marko Mäkelä [ marko ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Axel Schwenke made changes - 2024-08-23 09:02

Link

This issue relates to TODO-4832 [ TODO-4832 ]

Marko Mäkelä made changes - 2024-08-23 14:39

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Marko Mäkelä made changes - 2024-08-23 14:39

Assignee	Marko Mäkelä [ marko ]	Debarun Banerjee [ JIRAUSER54513 ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Debarun Banerjee made changes - 2024-08-26 07:21

Assignee	Debarun Banerjee [ JIRAUSER54513 ]	Marko Mäkelä [ marko ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Marko Mäkelä made changes - 2024-08-26 10:53

issue.field.resolutiondate

2024-08-26 10:53:11.0

2024-08-26 10:53:11.052

Marko Mäkelä made changes - 2024-08-26 10:53

Fix Version/s		10.6.20 [ 29903 ]
Fix Version/s		10.11.10 [ 29904 ]
Fix Version/s		11.2.6 [ 29906 ]
Fix Version/s		11.4.4 [ 29907 ]
Fix Version/s		11.6.2 [ 29908 ]
Fix Version/s	10.6 [ 24028 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Marko Mäkelä made changes - 2024-08-26 13:25

Link

This issue blocks MDEV-34431 [ MDEV-34431 ]

Marko Mäkelä made changes - 2024-08-27 07:30

Link

This issue relates to MDEV-34821 [ MDEV-34821 ]

Jira Automation (IT) made changes - 2024-08-29 18:03

Zendesk active tickets

204001

Marko Mäkelä made changes - 2024-10-01 11:18

Link

This issue relates to ~~MDEV-35053~~ [ ~~MDEV-35053~~ ]

Marko Mäkelä made changes - 2024-10-16 11:32

Link

This issue causes ~~MDEV-35174~~ [ ~~MDEV-35174~~ ]

Marko Mäkelä made changes - 2024-10-22 04:19

Link

This issue blocks MENT-2148 [ MENT-2148 ]

Ralf Gebhardt made changes - 2024-10-31 10:34

Link

This issue is part of TODO-4980 [ TODO-4980 ]

Marko Mäkelä made changes - 2024-11-26 15:09

Link

This issue causes ~~MDEV-35508~~ [ ~~MDEV-35508~~ ]

Marko Mäkelä made changes - 2024-12-12 10:51

Link

This issue causes ~~MDEV-35619~~ [ ~~MDEV-35619~~ ]

MariaDB Server

Contention between secondary index UPDATE and purge due to large innodb_purge_batch_size

Details

Description

Benchmark setup

What causes it, according to "git bisect"

Can it is reproduced differently ?

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration