[MDEV-26212] PAM authentication fails with ENOMEM Created: 2021-07-22  Updated: 2023-02-27  Resolved: 2022-04-26

Status: Closed
Project: MariaDB Server
Component/s: Plugin - pam
Affects Version/s: 10.4, 10.5, 10.6, 10.7
Fix Version/s: 10.4.25, 10.5.16, 10.6.8, 10.7.4

Type: Bug Priority: Major
Reporter: Volker Klasen Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: None

Attachments: File 0001-MDEV-26212-auth_pam-Replaced-fork-exec-by-posix_spaw.patch    
Issue Links:
Relates
relates to MDEV-30734 [ERROR] mariadbd: pam: cannot exec /u... Closed

 Description   

When using the authentication plugin PAM it fails in case the mariadb process uses more memory than is still free and memory overcommit is disabled.

The reason lies in fork() (https://github.com/MariaDB/server/blob/10.7/plugin/auth_pam/auth_pam.c#L62) which duplicates the process. As there is not enough free memory for a duplicate process fork() fails (silently) and the authentication does not succeed.

As the forked process is only used to exec() the pam tool, there is no need for a fork of the whole server process.

It can be reproduced by running a mariadb server with an innodb_buffer of more than half of the server's (best a VM with as few RAM as possible) RAM and filling it up by selecting a big table of test data.

This issue had already been reported in Percona server: https://jira.percona.com/browse/PS-3332

Cheers
Volker



 Comments   
Comment by Sergei Golubchik [ 2021-07-29 ]

https://jira.percona.com/browse/PS-3332 was closed with "Cannot Reproduce".

fork() does not duplicate the whole process memory, that would be too slow to be practically usable. It only copies changed pages, so if one of the processes rewrites the whole of innodb buffer pool — it'll be duplicated, yes. But the child process calls exec() immediately after the fork(). The parent cannot do much in that short time frame.

Comment by Volker Klasen [ 2021-07-30 ]

fork() does not duplicate the memory, but it kind of reserves it. Without overcommit this will be too much for big mariadb processes. I found a post which describes the issue quite good:

https://mail.gnome.org/archives/gtk-devel-list/2018-April/msg00000.html

When gnome-shell is launching apps (via glib) it ultimately comes down
to fork() + exec(). In this case the fork() fails with ENOMEM, because
the Linux kernel worries that the process being forked may end up
duplicating all of the memory allocations of the shell. By default the
memory map is set up so that the new process has a view on the exact
same pages as the parent process, however they are set up as
copy-on-write, so if the child writes to such memory it'll silently
cause new memory to be allocated. Under that limited perspective, the
kernel is not totally out of line in worrying about this situation,
especially because gnome-shell is a RAM-heavy process.

In reality we only want to fork() to immediately exec() which replaces
the child process memory map with a blank slate, but this
misinterpretation of intentions is a limitation of the fork() API
combined with a conflict with Linux's memory overcommit model.

Comment by Volker Klasen [ 2022-01-14 ]

I attached a patch that replaces fork() + exec() by posix_spawn() with which I could no longer reproduce the bug and could login via PAM even when the server process used more than 50% of available RAM.
However, as I am not very familiar with C, memory management and such, I cannot rule out side effects.

Cheers
Volker

Comment by Sergei Golubchik [ 2022-04-26 ]

Thanks. I've replaced fork() + exec() with posix_spawn()

Generated at Thu Feb 08 09:43:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.