[MDEV-16548] Innodb fails to start on older kernels that don't support F_DUPFD_CLOEXEC Created: 2018-06-21  Updated: 2019-05-24  Resolved: 2018-11-08

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.1.34, 10.2.16, 10.3.8
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Nicholas Jackson Assignee: Marko Mäkelä
Resolution: Won't Fix Votes: 2
Labels: None
Environment:

Linux kernels older than 2.6.24


Issue Links:
Problem/Incident
is caused by MDEV-8743 galera in SST keeps innodb files open Closed

 Description   

After 10.1.34 was released, we have received several reports of systems using the community repositories unable to start innodb with the error:

InnoDB: Error: unable to create temporary file; errno: 22

An strace shows:

3517 1529471977.932867 fcntl(6, F_DUPFD_CLOEXEC, 0) = -1 EINVAL (Invalid argument) <0.000004>
13517 1529471977.932897 write(2, "2018-06-20  2:19:37 47857084847680 [ERROR] mysqld: Out of resources when opening file 'ib*' (Errcode: 22 \"Invalid argument\")\n", 125) = 125 <0.000008>
13517 1529471977.932931 close(6)              = 0 <0.000010>
13517 1529471977.932955 write(2, "2018-06-20 02:19:37 2b86988ad640", 32) = 32 <0.000006>
13517 1529471977.932978 write(2, "  InnoDB: Error: unable to create temporary file; errno: 22\n", 60) = 60 <0.000006>

Seems that this was introduced in https://github.com/MariaDB/server/commit/bbee025370

Likely need a runtime check added to ensure F_DUPFD_CLOEXEC is supported by the kernel.

Haven't checked what other versions may be affected, but there are likely others.



 Comments   
Comment by Sergei Golubchik [ 2018-06-22 ]

What Linux distribution is it?

Comment by Jesse Asklund [ 2018-06-24 ]

This is being seen happening on CentOS distributions. We've primarily seen this happen on Virtuozzo environments where the host node is running older kernels even though the tenants are using modern OS versions.

Comment by Maurice Bizzarri [ 2018-06-25 ]

This happened to me on my CENTOS 6.6 virtuozzo VPS I rent from GoDaddy. A rollback to 10.1.33 fixed it.
Yum downgrade MariaDB-server and yum downgrade MariaDB-client fixed the issue for me. cPanel did an automatic minor version upgrade so I awoke to multiple error messages on my phone.

Comment by Ryan Pendleton [ 2018-06-27 ]

This is also happening to me on a GoDaddy CentOS 6.6 VPS with cPanel. I'm seeing the issue with 10.1.34 and 10.2.16, both of which were automatically upgraded by cPanel.

Comment by Matthew Swift [ 2018-06-27 ]

Also killed a GoDaddy CentOS 6.9 VPS.

Problem started with cPanel auto update from 10.1.33 to 10.1.34. A manual update to MariaDB 10.2 (i.e., 10.2.15) via WHM interface resolved the problem. I didn't understand why. Two days later cPanel auto-updates to 10.2.16 and everything is down again.

workaround: turn off automatic operating system package updates in WHM (Server Configuration -> Update Preferences) and:

% yum downgrade MariaDB-server-10.2.15 MariaDB-client-10.2.15 MariaDB-common-10.2.15 MariaDB-compat-10.2.15 MariaDB-devel-10.2.15 MariaDB-shared-10.2.15

FYI

% rpm --query centos-release
centos-release-6-9.el6.12.3.x86_64

% uname -r
2.6.18-028stab118.1

Comment by Daniel Black [ 2018-08-07 ]

When I added it I didn't suspect anyone was running such an old version. It was added in 2.6.24 (http://man7.org/linux/man-pages/man2/fcntl.2.html).

Alternate fix: https://github.com/MariaDB/server/pull/762

Comment by Jan Lindström (Inactive) [ 2018-08-07 ]

https://github.com/MariaDB/server/commit/517009ca0fa5f0e5b48b3d244a0b5bb0c44e90a8

Comment by Daniel Black [ 2018-08-08 ]

jplindst I don't think that helps. Its not a runtime check.

RHEL/Centos6 by default use Linux version 2.6.32-220.17.1.el6.x86_64 (from the output of the BB kernel log). GoDaddy in their silliness have kept a RHEL/Cento5 based 2.6.18 kernel even though the OS is Centos6. Even with your fix the packages generated will have HAVE_IB_F_DUPFD_CLOEXEC=1 at build time, and will fail when the GoDaddy users install them.

Perhaps some runtime like;

#if defined(F_DUPFD_CLOEXEC)
               static bool dupfd = true;
               if (dupfd) {           
          		fd2 = fcntl(fd, F_DUPFD_CLOEXEC, 0);
                        if (fd2 < 0) { 
                                // some warning....
                                dupfd = false;
                        	fd2 = dup(fd);
                        } 
               }
              else
#endif
   		fd2 = dup(fd);

CPU Branch predication will minimise the performance impacts and leave the PR above for a development branch to remove the dup/fcntl syscall entirely.

Note to users:

  • kernel 2.6.18 isn't tested by MariaDB and a perhaps a lot of upstream software communities because its not Centos6. Mileage on a 2.6.18 may vary as this bug indicates.
  • If you where sold a Centos6 service you aren't getting it. Seek an improved service or service provider. On raw specification alone, Linode offers a cheaper/better service than GoDaddy VPS.
  • 2.6.18 is a kernel not supported by Red Hat or the Linux kernel community. It is susceptible to Spectre/Meltdown and perhaps a number of other vulnerabilities.

Product offering question:
To what kernel version is the userspace of a different offering supported? This is quite important to the docker offering of mariadb https://hub.docker.com/_/mariadb/ for Ubuntu LTS distro userspace packages of MariaDB are going to be which could be run other any distro's kernel version.

Comment by Daniel Black [ 2018-08-22 ]

jplindst, why was it merged? It doesn't help packages because they are built on a 2.6.32 kernel. You've only help people who want to compile it themselves on an antiquated kernel.

Comment by Jan Lindström (Inactive) [ 2018-11-07 ]

In above commt I added CHECK_C_SOURCE where fcntl(fd, F_DUPFD_CLOEXEC, 0); is used and we define HAVE_IB_F_DUPFD_CLOEXEC only if call succeeds. In InnoDB then we use #if defined(F_DUPFD_CLOEXEC) && defined(HAVE_IB_F_DUPFD_CLOEXEC). If that is not enough I do not know what to do.

Comment by Marko Mäkelä [ 2018-11-07 ]

danblack, as far as I can tell, nothing was merged yet. The compile-time check that you added in MDEV-8743 is still there:

#ifdef F_DUPFD_CLOEXEC
	int fd2 = fcntl(fd, F_DUPFD_CLOEXEC, 0);
#else
	int fd2 = dup(fd);
#endif

I agree with you that replacing the above compile-time check with a little more sophisticated compile-time check is not helping anyone who uses binary packages that were compiled on a newer kernel.

Theoretically, we could add a run-time check (an if before the #else), but then we would have to test that on an antiquated kernel, preferably in our continuous integration system.

In the end, it boils down to the MariaDB platform roadmap. https://mariadb.org/about/maintenance-policy/ does not say anything about underlying operating systems. I believe that we try to support all supported versions on all ‘relevant’ operating system platforms. According to https://wiki.centos.org/HowTos/EOL the support of CentOS 5 ended on March 31, 2017, more than 11 months before MDEV-8743 was pushed to the MariaDB code base. This would seem to confirm that nobody should be using a CentOS 5 kernel in production.

Comment by Daniel Black [ 2018-11-07 ]

I agree marko, adding testing framework for unsupported kernels that happen to map to the policies of single VPS provider seems a little extreme. The source compile could work (if they use 2.6.18 headers) for these users but that may be beyond what I see as the normal demographic for GoDaddy users.

MDEV-16548 when/if it gets on the roadmap will incidentally fix this for those users by removing the code.

Comment by Marko Mäkelä [ 2018-11-08 ]

The oldest GNU/Linux distributions that MariaDB Corporation supports are as follows:

distribution oldest Linux kernel version newest version
RHEL 6/CentOS 6 2.6.32 2.6.32
Ubuntu 12.04 3.8 ?
Debian 8 3.16.0? 3.16.0-4
SLES 11 2.6.27.19-5.1 3.0.101-108.77.1

We also support ‘Generic Linux’, but I would expect that to refer to a reasonably recent kernel. Linux "kernel 2.6.32 was released on December 3, 2009 already.

The out-of-support Linux kernel 2.6.18 was released on September 20, 2006. Because it does not make sense to add a run-time check for an out-of-support Linux kernel that is more than 12 years old, I am closing this ticket as ‘Won’t Fix’.

Comment by Sergey Vojtovich [ 2019-05-24 ]

Fixed in 10.5.

Generated at Thu Feb 08 08:29:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.