[MDEV-34462] SemiSync replication underperforming and stalling throughput - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 10.6.17
Fix Version/s: N/A
Component/s: Replication
Labels:
None
Environment:
Debian Bullseye

Description

We upgraded some 10.1.48 MariaDB servers to 10.6.17 and noticed that by having semi-sync replication enabled, throughput was stalled and context switching has been heavily impacted leading to poor performance across the board.

By looking at `information_schema.processlist` we were able to detect that most of the queries were stuck for several seconds in

Waiting for semi-sync ACK from slave

STATE.

After disabling semisync replication on the primary server, everything came back to life.

This is a high-throughput high QPS environment (80K QPS avg, 140K on pak, >2500 sessions at any given time)

We've found a MySQL bug reported several years ago that looks related but although the relevant patch can't be found in the equivalent MariaDB code position, I'm not confident that it hasn't been ported in some other way over the years.

The only relevant change I've found in upgrade docs between 10.1 and 10.6 is this one but it doesn't look guilty by itself.

Given that this caused serious performance degradation in our case after upgrading to 10.6, let us know if there is something more we can help with in order to spot the root cause of the issue.

Attachments

Issue Links

is duplicated by

MDEV-33551 Semi-sync Wait Point AFTER_COMMIT Slow on Workloads with Heavy Concurrency

Closed

Activity

People

Assignee:: Kristian Nielsen

Reporter:: Kostis Fardelas

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2024-06-26 12:17

Updated:: 2024-08-13 07:20

Resolved:: 2024-08-13 07:20

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.