[MDBF-358] buildbot: raise memlock limits so uring is actually tested in containers Created: 2022-03-09  Updated: 2022-09-26  Resolved: 2022-09-26

Status: Closed
Project: MariaDB Foundation Development
Component/s: Buildbot
Affects Version/s: N/A
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Daniel Black Assignee: Vlad Bogolin
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: 0d
Time Spent: 3h
Original Estimate: 0d

Issue Links:
Relates
relates to MDEV-29610 uring tmpfs InnoDB: IO Error: 125 dur... Stalled

 Description   

from worker p9-db-bbw1-docker-debian-sid(https://ci.mariadb.org/23085/logs/ppc64le-debian-sid/mysqld.1.err.3)

2022-03-09  4:12:07 0 [Warning] mariadbd: io_uring_queue_init() failed with errno 1
2022-03-09  4:12:07 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF

So errno 1 is a EPERM. Other workers have ENOSYS indicating an unsupported kernel.

Without uring enable on a variety of architectures, there isn't sufficient testing of innodb's default mode of writing.



 Comments   
Comment by Faustin Lammler [ 2022-03-22 ]

What would be needed to enable it (just install liburing?)

Comment by Daniel Black [ 2022-03-22 ]

Nope, seccomp filter to allow the syscalls (https://github.com/moby/moby/issues/39396)

And maybe raising the MEMLOCK Limit. This will be indicated by an innodb error message in the logs if there's a newer kernel liburing version (2+, not 0.7 anyway). This is only needed on kernels < 5.12 (https://github.com/axboe/liburing/issues/246). I think 1M limit if the default is lower is sufficient.

Comment by Faustin Lammler [ 2022-03-23 ]

Ok, I see. I am wondering if we are not hitting the limits of using containers in BB and if this is not a job for libvirt runners (vladbogo can I have you opinion on this?).

Anyway, from what I understand of https://github.com/moby/moby/issues/39396, we could enable uring directly in the docker daemon by modifying https://gitlab.com/mariadb/sysadmin/-/blob/master/ansible/roles/bb_worker_docker/templates/docker_override.conf.j2

danblack do you feel to propose something directly there? I am really not sure about how the default seccomp filter should be modified. Then I am happy to deploy/test on some runners. Also I can make sure of deploying this only on <5.12 kernels (should be doable with jinja2 and ansible facter).

Comment by Vlad Bogolin [ 2022-03-23 ]

Hi. Does this happen on other hosts? If yes, then it might be indeed a limitation. Otherwise, I would vote to apply the solution from the mentioned issue only for this host but maybe allowing all syscalls (as proposed in the issue) is too extreme

Comment by Daniel Black [ 2022-03-24 ]

From https://github.com/moby/moby/commit/f4d41f1dfa52caa8f12b070315e230e7eded5f4a it looks like docker has these enabled by default. Are we actually running a version that is covered by the tags at the top of that commit?

Comment by Faustin Lammler [ 2022-03-24 ]

I have made an upgrade round on our runners:

faust@serv ~/MariaDB/ansible main
.venv ❯ ansible bb_workers_docker -a "docker --version"
bg-bbw1-x64 | CHANGED | rc=0 >>
Docker version 20.10.12, build e91ed57
bg-bbw3-x64 | CHANGED | rc=0 >>
Docker version 20.10.14, build a224086
bg-bbw4-x64 | CHANGED | rc=0 >>
Docker version 20.10.14, build a224086
bg-bbw2-x64 | CHANGED | rc=0 >>
Docker version 20.10.14, build a224086
ci-bbw1-arm | CHANGED | rc=0 >>
Docker version 20.10.14, build a224086
db-p9-bbw1 | CHANGED | rc=0 >>
Docker version v20.10.13, build a224086
ci-bbw4-arm | CHANGED | rc=0 >>
Docker version 20.10.12, build e91ed57
ci-bbw3-arm | CHANGED | rc=0 >>
Docker version 20.10.12, build e91ed57
ci-bbw2-arm | CHANGED | rc=0 >>
Docker version 20.10.14, build a224086
us-intel-bbw1-x64 | CHANGED | rc=0 >>
Docker version 20.10.14, build a224086
fi-bbw-p9rhel7 | CHANGED | rc=0 >>
Docker version 18.03.1-ce, build ccde200
fi-bbw-p9rhel8 | CHANGED | rc=0 >>
Docker version v20.10.12, build e91ed57
hz-bbw2 | CHANGED | rc=0 >>
Docker version 20.10.14, build a224086
hz-bbw1 | CHANGED | rc=0 >>
Docker version 20.10.14, build a224086
ibm-s390x-sles15 | CHANGED | rc=0 >>
Docker version 20.10.9-ce, build 79ea9d308018
ibm-s390x-ubuntu20.04 | CHANGED | rc=0 >>
Docker version 20.10.12, build e91ed57
ibm-s390x-rhel8 | CHANGED | rc=0 >>
Docker version 20.10.12, build e91ed57

If you danblack can confirm that for those runners uring is enabled, it would be good because maybe something else is needed

Comment by Daniel Black [ 2022-03-24 ]

bg-bbw2-docker-fedora-35 https://ci.mariadb.org/23511/logs/aarch64-fedora-35/mysqld.1.err.1

2022-03-23 11:17:57 0 [Warning] mariadbd: io_uring_queue_init() failed with ENOMEM: try larger memory locked limit, ulimit -l, or https://mariadb.com/kb/en/systemd/#configuring-limitmemlock under systemd
2022-03-23 11:17:57 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF

https://docs.docker.com/engine/reference/commandline/dockerd/#default-ulimit-settings (json file example on this page)

Comment by Faustin Lammler [ 2022-04-11 ]

danblack, vladbogo do we have a way to query the BB DB or API (or something else) in order to quickly and programmatically spot this kind of problems?

Comment by Daniel Black [ 2022-04-11 ]

Not really. It was quite painful to go through and find this one warning (I'm not sure I found any successes)

Best I can think of at the moment is to take a test from https://github.com/axboe/liburing/tree/master/test, suggestion read-write.c, compile against the worker liburing (so don't pull the latest feature tests) into the bb workers, and run it on some pre-step in worker jobs. Like MDBF-386 it can be a worker information finding stage.

I don't think properties are a cross reference searchable item but it should be possible to extend it that way.

Comment by Faustin Lammler [ 2022-05-23 ]

FYI (and as already discussed) liburing-devel is not available in rhel9, see also:
https://github.com/MariaDB/mariadb.org-tools/pull/160

Comment by Daniel Black [ 2022-09-23 ]

Please update the memlock limits. Spent a long time working out why MDEV-29610 isn't affecting bb.org builders.

And we're delivering significant functionality (10.6+ liburing), that is complicated, and untested, to delivering this packaged to users.

Just 1M of locked memory as the default memlock limit per container.

This is needed in all hosts running a kernel < 5.12.

Comment by Faustin Lammler [ 2022-09-23 ]

Assigned to vladbogo since it's better to be managed at BB level. Re-assign to me if we want to manage this at docker daemon level.

Comment by Daniel Black [ 2022-09-23 ]

Thanks vladbogo, already showing useful results in MDEV-29610.

Generated at Thu Feb 08 03:37:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.