[MDEV-24756] 100% cpu load when idle on arm v7 - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: 10.5.8
Fix Version/s: N/A
Component/s: Server
Labels:
- regression
Environment:
Fresh alpine linux 3.13 container running on an arm v7 debian 10.7 host

Description

Issue has also been filed in alpine packages

The following issue appeared with mariadb 10.5.8 (there is no issue with alpine 3.12 and mariadb 10.4.x):

On arm v7 (32 bits) mysqld consumes about 100% cpu when idle, but still works.
On arm v8 (64 bits) or amd64, it consumes near 0% cpu when idle as expected.

I see only one difference in start logs (but that may be irrelevant):

arm v7 : "Using generic crc32 instructions"
arm v8 : "Using ARMv8 crc32 instructions"

Any idea why this happens and if there is an option in mysqld to circumvent the issue?

Steps to reproduce:

~ # apk --update --upgrade add mariadb sudo

~ # mkdir /run/mysqld

~ # chown mysql:mysql /run/mysqld

~ # sudo -su mysql

~ $ mysql_install_db

~ $ mysqld --datadir=./data

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mariadb_sql_commands.txt
76 kB
2021-02-03 10:33
perf.data
634 kB
2021-03-02 18:11
perf.data
46 kB
2021-02-03 13:03
perf.report
7 kB
2021-03-02 18:11

Activity

Ascending order - Click to sort in descending order

View 4 older comments

Daniel Black added a comment - 2021-02-03 21:29

perf.data is rather raw, it needs the ELF binary to resolve. Can you try `perf report -g --no-children -i ~/perf.data` and/or `objdump -d /usr/bin/mysqld` should give address that we can map to the perf data.

Thanks for trying strace, its looking very much like an endless loop. The gdb of the running process will still be useful if you have it compiled with debug symbols (cflag=-g).

Daniel Black added a comment - 2021-02-03 21:29 perf.data is rather raw, it needs the ELF binary to resolve. Can you try `perf report -g --no-children -i ~/perf.data` and/or `objdump -d /usr/bin/mysqld` should give address that we can map to the perf data. Thanks for trying strace, its looking very much like an endless loop. The gdb of the running process will still be useful if you have it compiled with debug symbols (cflag=-g).

Josh Richards added a comment - 2021-03-02 18:11

Not the OP but since it looks like there is some further data being waited on I thought I'd tried to provide it as I ran across this issue last night as well.

perf.data + perf.report output attached from my setup perf.report perf.data

Josh Richards added a comment - 2021-03-02 18:11 Not the OP but since it looks like there is some further data being waited on I thought I'd tried to provide it as I ran across this issue last night as well. perf.data + perf.report output attached from my setup perf.report perf.data

Andre R added a comment - 2021-03-02 19:27

(cross posted in alpine issue)
Just discovered that this issue seems fixed when using the edge repository in alpine 3.13.
A few packages are upgraded in edge, among which:

mariadb (and co) from 10.5.8 to 10.5.9
musl from 1.2.2-r0 to 1.2.2-r2

I'd rather say (but I can't be sure) that musl was the culprit of the bug, as mariadb 10.5.8 used to run fine in a debian/buster container instead of alpine.

Andre R added a comment - 2021-03-02 19:27 (cross posted in alpine issue ) Just discovered that this issue seems fixed when using the edge repository in alpine 3.13. A few packages are upgraded in edge, among which: mariadb (and co) from 10.5.8 to 10.5.9 musl from 1.2.2-r0 to 1.2.2-r2 I'd rather say (but I can't be sure) that musl was the culprit of the bug, as mariadb 10.5.8 used to run fine in a debian/buster container instead of alpine.

Daniel Black added a comment - 2021-06-26 05:38 - edited

Thanks for getting back to us andre_r. It appears from your upstream bug that both musl and mariadb have been updated on alpine 3.13/armv7.

At the moments its a bit hard to determine the cause.

We'd need musl MDBF-244, an armv7/armhf/Arm Cortex support in CI, and maybe even a alpine based testing (MDEV-18462) to find this kind of issue prerelease.

Suggestions welcome how to obtain armv7 hardware/cloud infrastructure that can be integrated to a CI infrstructure as a runner (own container based preferable, and not a new dedicated CI framework like drone.io appears to be),
Edit - found https://www.worksonarm.com/

Daniel Black added a comment - 2021-06-26 05:38 - edited Thanks for getting back to us andre_r . It appears from your upstream bug that both musl and mariadb have been updated on alpine 3.13/armv7. At the moments its a bit hard to determine the cause. We'd need musl MDBF-244 , an armv7/armhf/Arm Cortex support in CI, and maybe even a alpine based testing ( MDEV-18462 ) to find this kind of issue prerelease. Suggestions welcome how to obtain armv7 hardware/cloud infrastructure that can be integrated to a CI infrstructure as a runner (own container based preferable, and not a new dedicated CI framework like drone.io appears to be), Edit - found https://www.worksonarm.com/

Marko Mäkelä added a comment - 2021-09-09 07:06

andre_r, in MariaDB 10.5.9 there was also a fix of ~~MDEV-24270~~ where we replaced io_getevents() with a thinner wrapper of the system call to avoid 2 unnecessary wakeups per second. This basically works around a regression that was introduced by an ‘optimization’ in https://pagure.io/libaio/c/7cede5af5adf01ad26155061cc476aad0804d3fc several years ago. That ‘optimization’ would cause a race condition on shutdown in user space, leading to SIGSEGV or similar. But, I would find it hard to believe that the unnecessary wakeups would keep one CPU 100% busy.

It would be interesting to try MariaDB 10.5.8 with the newer musl libc and 10.5.9 with the older libc, to narrow down the culprit.

Marko Mäkelä added a comment - 2021-09-09 07:06 andre_r , in MariaDB 10.5.9 there was also a fix of MDEV-24270 where we replaced io_getevents() with a thinner wrapper of the system call to avoid 2 unnecessary wakeups per second. This basically works around a regression that was introduced by an ‘optimization’ in https://pagure.io/libaio/c/7cede5af5adf01ad26155061cc476aad0804d3fc several years ago. That ‘optimization’ would cause a race condition on shutdown in user space, leading to SIGSEGV or similar. But, I would find it hard to believe that the unnecessary wakeups would keep one CPU 100% busy. It would be interesting to try MariaDB 10.5.8 with the newer musl libc and 10.5.9 with the older libc, to narrow down the culprit.

MariaDB Server

100% cpu load when idle on arm v7

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration