Uploaded image for project: 'MariaDB Foundation Development'
  1. MariaDB Foundation Development
  2. MDBF-358

buildbot: raise memlock limits so uring is actually tested in containers

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • N/A
    • N/A
    • Buildbot
    • None

    Description

      from worker p9-db-bbw1-docker-debian-sid(https://ci.mariadb.org/23085/logs/ppc64le-debian-sid/mysqld.1.err.3)

      2022-03-09  4:12:07 0 [Warning] mariadbd: io_uring_queue_init() failed with errno 1
      2022-03-09  4:12:07 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF
      

      So errno 1 is a EPERM. Other workers have ENOSYS indicating an unsupported kernel.

      Without uring enable on a variety of architectures, there isn't sufficient testing of innodb's default mode of writing.

      Attachments

        Issue Links

          Activity

            danblack Daniel Black created issue -

            What would be needed to enable it (just install liburing?)

            faust Faustin Lammler added a comment - What would be needed to enable it (just install liburing?)
            danblack Daniel Black added a comment -

            Nope, seccomp filter to allow the syscalls (https://github.com/moby/moby/issues/39396)

            And maybe raising the MEMLOCK Limit. This will be indicated by an innodb error message in the logs if there's a newer kernel liburing version (2+, not 0.7 anyway). This is only needed on kernels < 5.12 (https://github.com/axboe/liburing/issues/246). I think 1M limit if the default is lower is sufficient.

            danblack Daniel Black added a comment - Nope, seccomp filter to allow the syscalls ( https://github.com/moby/moby/issues/39396 ) And maybe raising the MEMLOCK Limit. This will be indicated by an innodb error message in the logs if there's a newer kernel liburing version (2+, not 0.7 anyway). This is only needed on kernels < 5.12 ( https://github.com/axboe/liburing/issues/246 ). I think 1M limit if the default is lower is sufficient.
            danblack Daniel Black logged work - 2022-03-22 22:54
            • Time Spent:
              0.5h
               
              research for guidance
            danblack Daniel Black made changes -
            Field Original Value New Value
            Worklog Id 96204 [ 96204 ]
            Remaining Estimate 0d [ 0 ]
            Time Spent 0.5h [ 1800 ]
            faust Faustin Lammler added a comment - - edited

            Ok, I see. I am wondering if we are not hitting the limits of using containers in BB and if this is not a job for libvirt runners (vladbogo can I have you opinion on this?).

            Anyway, from what I understand of https://github.com/moby/moby/issues/39396, we could enable uring directly in the docker daemon by modifying https://gitlab.com/mariadb/sysadmin/-/blob/master/ansible/roles/bb_worker_docker/templates/docker_override.conf.j2

            danblack do you feel to propose something directly there? I am really not sure about how the default seccomp filter should be modified. Then I am happy to deploy/test on some runners. Also I can make sure of deploying this only on <5.12 kernels (should be doable with jinja2 and ansible facter).

            faust Faustin Lammler added a comment - - edited Ok, I see. I am wondering if we are not hitting the limits of using containers in BB and if this is not a job for libvirt runners ( vladbogo can I have you opinion on this?). Anyway, from what I understand of https://github.com/moby/moby/issues/39396 , we could enable uring directly in the docker daemon by modifying https://gitlab.com/mariadb/sysadmin/-/blob/master/ansible/roles/bb_worker_docker/templates/docker_override.conf.j2 danblack do you feel to propose something directly there? I am really not sure about how the default seccomp filter should be modified. Then I am happy to deploy/test on some runners. Also I can make sure of deploying this only on <5.12 kernels (should be doable with jinja2 and ansible facter).
            vladbogo Vlad Bogolin added a comment -

            Hi. Does this happen on other hosts? If yes, then it might be indeed a limitation. Otherwise, I would vote to apply the solution from the mentioned issue only for this host but maybe allowing all syscalls (as proposed in the issue) is too extreme

            vladbogo Vlad Bogolin added a comment - Hi. Does this happen on other hosts? If yes, then it might be indeed a limitation. Otherwise, I would vote to apply the solution from the mentioned issue only for this host but maybe allowing all syscalls (as proposed in the issue) is too extreme
            danblack Daniel Black added a comment -

            From https://github.com/moby/moby/commit/f4d41f1dfa52caa8f12b070315e230e7eded5f4a it looks like docker has these enabled by default. Are we actually running a version that is covered by the tags at the top of that commit?

            danblack Daniel Black added a comment - From https://github.com/moby/moby/commit/f4d41f1dfa52caa8f12b070315e230e7eded5f4a it looks like docker has these enabled by default. Are we actually running a version that is covered by the tags at the top of that commit?
            danblack Daniel Black logged work - 2022-03-24 00:35
            • Time Spent:
              0.5h
               
              <No comment>
            danblack Daniel Black made changes -
            Worklog Id 96235 [ 96235 ]
            Time Spent 0.5h [ 1800 ] 1h [ 3600 ]

            I have made an upgrade round on our runners:

            faust@serv ~/MariaDB/ansible main
            .venv ❯ ansible bb_workers_docker -a "docker --version"
            bg-bbw1-x64 | CHANGED | rc=0 >>
            Docker version 20.10.12, build e91ed57
            bg-bbw3-x64 | CHANGED | rc=0 >>
            Docker version 20.10.14, build a224086
            bg-bbw4-x64 | CHANGED | rc=0 >>
            Docker version 20.10.14, build a224086
            bg-bbw2-x64 | CHANGED | rc=0 >>
            Docker version 20.10.14, build a224086
            ci-bbw1-arm | CHANGED | rc=0 >>
            Docker version 20.10.14, build a224086
            db-p9-bbw1 | CHANGED | rc=0 >>
            Docker version v20.10.13, build a224086
            ci-bbw4-arm | CHANGED | rc=0 >>
            Docker version 20.10.12, build e91ed57
            ci-bbw3-arm | CHANGED | rc=0 >>
            Docker version 20.10.12, build e91ed57
            ci-bbw2-arm | CHANGED | rc=0 >>
            Docker version 20.10.14, build a224086
            us-intel-bbw1-x64 | CHANGED | rc=0 >>
            Docker version 20.10.14, build a224086
            fi-bbw-p9rhel7 | CHANGED | rc=0 >>
            Docker version 18.03.1-ce, build ccde200
            fi-bbw-p9rhel8 | CHANGED | rc=0 >>
            Docker version v20.10.12, build e91ed57
            hz-bbw2 | CHANGED | rc=0 >>
            Docker version 20.10.14, build a224086
            hz-bbw1 | CHANGED | rc=0 >>
            Docker version 20.10.14, build a224086
            ibm-s390x-sles15 | CHANGED | rc=0 >>
            Docker version 20.10.9-ce, build 79ea9d308018
            ibm-s390x-ubuntu20.04 | CHANGED | rc=0 >>
            Docker version 20.10.12, build e91ed57
            ibm-s390x-rhel8 | CHANGED | rc=0 >>
            Docker version 20.10.12, build e91ed57
            

            If you danblack can confirm that for those runners uring is enabled, it would be good because maybe something else is needed

            faust Faustin Lammler added a comment - I have made an upgrade round on our runners: faust@serv ~/MariaDB/ansible main .venv ❯ ansible bb_workers_docker -a "docker --version" bg-bbw1-x64 | CHANGED | rc=0 >> Docker version 20.10.12, build e91ed57 bg-bbw3-x64 | CHANGED | rc=0 >> Docker version 20.10.14, build a224086 bg-bbw4-x64 | CHANGED | rc=0 >> Docker version 20.10.14, build a224086 bg-bbw2-x64 | CHANGED | rc=0 >> Docker version 20.10.14, build a224086 ci-bbw1-arm | CHANGED | rc=0 >> Docker version 20.10.14, build a224086 db-p9-bbw1 | CHANGED | rc=0 >> Docker version v20.10.13, build a224086 ci-bbw4-arm | CHANGED | rc=0 >> Docker version 20.10.12, build e91ed57 ci-bbw3-arm | CHANGED | rc=0 >> Docker version 20.10.12, build e91ed57 ci-bbw2-arm | CHANGED | rc=0 >> Docker version 20.10.14, build a224086 us-intel-bbw1-x64 | CHANGED | rc=0 >> Docker version 20.10.14, build a224086 fi-bbw-p9rhel7 | CHANGED | rc=0 >> Docker version 18.03.1-ce, build ccde200 fi-bbw-p9rhel8 | CHANGED | rc=0 >> Docker version v20.10.12, build e91ed57 hz-bbw2 | CHANGED | rc=0 >> Docker version 20.10.14, build a224086 hz-bbw1 | CHANGED | rc=0 >> Docker version 20.10.14, build a224086 ibm-s390x-sles15 | CHANGED | rc=0 >> Docker version 20.10.9-ce, build 79ea9d308018 ibm-s390x-ubuntu20.04 | CHANGED | rc=0 >> Docker version 20.10.12, build e91ed57 ibm-s390x-rhel8 | CHANGED | rc=0 >> Docker version 20.10.12, build e91ed57 If you danblack can confirm that for those runners uring is enabled, it would be good because maybe something else is needed
            danblack Daniel Black added a comment -

            bg-bbw2-docker-fedora-35 https://ci.mariadb.org/23511/logs/aarch64-fedora-35/mysqld.1.err.1

            2022-03-23 11:17:57 0 [Warning] mariadbd: io_uring_queue_init() failed with ENOMEM: try larger memory locked limit, ulimit -l, or https://mariadb.com/kb/en/systemd/#configuring-limitmemlock under systemd
            2022-03-23 11:17:57 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF
            

            https://docs.docker.com/engine/reference/commandline/dockerd/#default-ulimit-settings (json file example on this page)

            danblack Daniel Black added a comment - bg-bbw2-docker-fedora-35 https://ci.mariadb.org/23511/logs/aarch64-fedora-35/mysqld.1.err.1 2022-03-23 11:17:57 0 [Warning] mariadbd: io_uring_queue_init() failed with ENOMEM: try larger memory locked limit, ulimit -l, or https://mariadb.com/kb/en/systemd/#configuring-limitmemlock under systemd 2022-03-23 11:17:57 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF https://docs.docker.com/engine/reference/commandline/dockerd/#default-ulimit-settings (json file example on this page)

            danblack, vladbogo do we have a way to query the BB DB or API (or something else) in order to quickly and programmatically spot this kind of problems?

            faust Faustin Lammler added a comment - danblack , vladbogo do we have a way to query the BB DB or API (or something else) in order to quickly and programmatically spot this kind of problems?
            danblack Daniel Black added a comment -

            Not really. It was quite painful to go through and find this one warning (I'm not sure I found any successes)

            Best I can think of at the moment is to take a test from https://github.com/axboe/liburing/tree/master/test, suggestion read-write.c, compile against the worker liburing (so don't pull the latest feature tests) into the bb workers, and run it on some pre-step in worker jobs. Like MDBF-386 it can be a worker information finding stage.

            I don't think properties are a cross reference searchable item but it should be possible to extend it that way.

            danblack Daniel Black added a comment - Not really. It was quite painful to go through and find this one warning (I'm not sure I found any successes) Best I can think of at the moment is to take a test from https://github.com/axboe/liburing/tree/master/test , suggestion read-write.c, compile against the worker liburing (so don't pull the latest feature tests) into the bb workers, and run it on some pre-step in worker jobs. Like MDBF-386 it can be a worker information finding stage. I don't think properties are a cross reference searchable item but it should be possible to extend it that way.

            FYI (and as already discussed) liburing-devel is not available in rhel9, see also:
            https://github.com/MariaDB/mariadb.org-tools/pull/160

            faust Faustin Lammler added a comment - FYI (and as already discussed) liburing-devel is not available in rhel9, see also: https://github.com/MariaDB/mariadb.org-tools/pull/160
            danblack Daniel Black added a comment -

            Please update the memlock limits. Spent a long time working out why MDEV-29610 isn't affecting bb.org builders.

            And we're delivering significant functionality (10.6+ liburing), that is complicated, and untested, to delivering this packaged to users.

            Just 1M of locked memory as the default memlock limit per container.

            This is needed in all hosts running a kernel < 5.12.

            danblack Daniel Black added a comment - Please update the memlock limits. Spent a long time working out why MDEV-29610 isn't affecting bb.org builders. And we're delivering significant functionality (10.6+ liburing), that is complicated, and untested, to delivering this packaged to users. Just 1M of locked memory as the default memlock limit per container. This is needed in all hosts running a kernel < 5.12.
            danblack Daniel Black made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            danblack Daniel Black made changes -
            Component/s Buildbot [ 18503 ]
            danblack Daniel Black made changes -
            Summary buildbot: enable uring in containers buildbot: raise memlock limits so uring is actually tested in containers
            danblack Daniel Black made changes -
            faust Faustin Lammler made changes -
            Assignee Faustin Lammler [ faust ] Vlad Bogolin [ vladbogo ]

            Assigned to vladbogo since it's better to be managed at BB level. Re-assign to me if we want to manage this at docker daemon level.

            faust Faustin Lammler added a comment - Assigned to vladbogo since it's better to be managed at BB level. Re-assign to me if we want to manage this at docker daemon level.
            danblack Daniel Black added a comment -

            Thanks vladbogo, already showing useful results in MDEV-29610.

            danblack Daniel Black added a comment - Thanks vladbogo , already showing useful results in MDEV-29610 .
            vladbogo Vlad Bogolin made changes -
            Fix Version/s N/A [ 27305 ]
            Affects Version/s N/A [ 27305 ]
            Description
            {noformat:title=from worker p9-db-bbw1-docker-debian-sid(https://ci.mariadb.org/23085/logs/ppc64le-debian-sid/mysqld.1.err.3)}
            2022-03-09 4:12:07 0 [Warning] mariadbd: io_uring_queue_init() failed with errno 1
            2022-03-09 4:12:07 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF
            {noformat}

            So errno 1 is a EPERM. Other workers have ENOSYS indicating an unsupported kernel.

            Without uring enable on a variety of architectures, there isn't sufficient testing of innodb's default mode of writing.
            {noformat:title=from worker p9-db-bbw1-docker-debian-sid(https://ci.mariadb.org/23085/logs/ppc64le-debian-sid/mysqld.1.err.3)}
            2022-03-09 4:12:07 0 [Warning] mariadbd: io_uring_queue_init() failed with errno 1
            2022-03-09 4:12:07 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF
            {noformat}

            So errno 1 is a EPERM. Other workers have ENOSYS indicating an unsupported kernel.

            Without uring enable on a variety of architectures, there isn't sufficient testing of innodb's default mode of writing.
            Original Estimate 0d [ 0 ]
            vladbogo Vlad Bogolin made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            vladbogo Vlad Bogolin made changes -
            Worklog Id 103161 [ 103161 ]
            Time Spent 1h [ 3600 ] 3h [ 10800 ]
            vladbogo Vlad Bogolin made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            julien.fritsch Julien Fritsch made changes -
            Workflow MariaDB v4 [ 163668 ] MariaDB Foundation v1 [ 188586 ]

            People

              vladbogo Vlad Bogolin
              danblack Daniel Black
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 0d
                  0d
                  Remaining:
                  Remaining Estimate - 0d
                  0d
                  Logged:
                  Time Spent - 3h
                  3h