Details
-
Task
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Incomplete
-
None
-
None
Description
Hi guys
We have a big problem when system is Out Of Memory
running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql
when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical (consume more time to repair / can loose important data) than a crash at application (only static/cached/transaction data) (something like killall -9 httpd)
in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values
—
the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration,
the /proc kernel interface may use oom_adj AND/OR oom_score_adj file (the second have a bigger range -1000 to 1000, instead of -17 to +15, the first is older and exist in more kernels , maybe before 2.6 kernels)
a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html
at my.cnf file we could include:
linux-oom_adj=xxx (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj)
linux-oom_score_adj=xxx (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj)
tested with kernel 4.xxx, we should use oom_score_adj:
[179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead.
https://www.kernel.org/doc/Documentation/filesystems/proc.txt
important notes:
(from 2.4 kernel docs):
It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*).
|
(from 4.xxx kernels docs):
The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted.
|
The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use.
|
For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500.
|
|
The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill.
|
Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*.
|
This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it.
|
The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0.
|
|
Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible.
|
This avoids servers and important system daemons from being killed and loses the minimal amount of work.
|
--------------------------------------------------
when linux-oom_adj is set at my.cnf, server MUST write this value to /proc/<pid>/task/*/oom_adj file at mysqld start
when linux-oom_score_adj is set at my.cnf, server MUST write this value to /proc/<pid>/task/*/oom_score_adj file at mysqld start
--------------------------------------------------
GLOBAL VARIABLE
READING GLOBAL VARIABLE VALUE (SELECT @@global.linux_oom_adj)
we should report ALL DISTINCT values from ALL TASKS in the main mysqld process <pid> (i tested with child and it works too...):
linux-oom_adj will report: DISTINCT FREAD /proc/<pid>/task/*/oom_adj, if not exists, or don't have permission to read it MUST report NULL
linux-oom_score_adj will report: DISTINCT FREAD /proc/<pid>/task/*/oom_score_adj, if not exists, or don't have permission to read it MUST report NULL
SETTING GLOBAL VARIABLE VALUE (SET @@global.linux_oom_adj=-17)
linux-oom_adj will FWRITE to all files at /proc/<pid>/task/*/oom_adj, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13)
linux-oom_score_adj will FWRITE to all files at /proc/<pid>/task/*/oom_score_adj, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13)
–
TEST NOTES:
using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0
changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS)
root@rspadim-Latitude:/proc/2398/task# ls
|
*2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423
|
here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400
root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj
|
200
|
200
|
root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj
|
0
|
-200
|
root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj
|
200
|
200
|
root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj
|
0
|
-200
|
root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj
|
200
|
200
|
root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj
|
0
|
-100
|
root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj
|
0
|
-150
|
root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj
|
0
|
-200
|
root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj
|
0
|
-500
|
tested with root user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose
—
others opensource projects that implement it:
Linux OOM.H
http://lxr.free-electrons.com/source/include/uapi/linux/oom.h#L8
/*
|
* /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for
|
* pid.
|
*/
|
#define OOM_SCORE_ADJ_MIN (-1000)
|
#define OOM_SCORE_ADJ_MAX 1000
|
/*
|
* /proc/<pid>/oom_adj set to -17 protects from the oom killer for legacy
|
* purposes.
|
*/
|
#define OOM_DISABLE (-17)
|
/* inclusive */
|
#define OOM_ADJUST_MIN (-16)
|
#define OOM_ADJUST_MAX 15
|
Chromium (Google Chrome?!)
http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc
important include files
http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.cc
http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.h
// NOTE: This is not the only version of this function in the source:
|
// the setuid sandbox (in process_util_linux.c, in the sandbox source)
|
// also has its own C version.
|
bool AdjustOOMScore(ProcessId process, int score) { |
if (score < 0 || score > kMaxOomScore) |
return false; |
|
FilePath oom_path(internal::GetProcPidDir(process));
|
|
// Attempt to write the newer oom_score_adj file first. |
FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); |
if (PathExists(oom_file)) { |
std::string score_str = IntToString(score);
|
DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " |
<< score_str;
|
int score_len = static_cast<int>(score_str.length()); |
return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); |
}
|
|
// If the oom_score_adj file doesn't exist, then we write the old |
// style file and translate the oom_adj score to the range 0-15. |
oom_file = oom_path.AppendASCII("oom_adj"); |
if (PathExists(oom_file)) { |
// Max score for the old oom_adj range. Used for conversion of new |
// values to old values. |
const int kMaxOldOomScore = 15; |
|
int converted_score = score * kMaxOldOomScore / kMaxOomScore; |
std::string score_str = IntToString(converted_score);
|
DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; |
int score_len = static_cast<int>(score_str.length()); |
return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); |
}
|
|
return false; |
}
|
–
maybe after we must implement something at windows source too?
http://src.chromium.org/chrome/trunk/src/base/process/memory_win.cc - void OnNoMemory(); / void EnableTerminationOnOutOfMemory();
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -16 to +15) when server start it must write this value to /proc/<pid>/oom_adj file, and that's all |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -16 to +15) - http://linux-mm.org/OOM_Killer when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -16 to +15) - http://linux-mm.org/OOM_Killer when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process change child tasks, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads, maybe report a list of distinct values? "100 200 -100" root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process change child tasks, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads, maybe report a list of distinct values? "100 200 -100" root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process change child tasks, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads, maybe report a list of distinct values? "100 200 -100" root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 --- writing to "/proc/<pid>/task/*/oom_score_adj" works ok here |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process change child tasks, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads, maybe report a list of distinct values? "100 200 -100" root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 --- writing to "/proc/<pid>/task/*/oom_score_adj" works ok here |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process change child tasks, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads, maybe report a list of distinct values? "100 200 -100" root@rspadim-Latitude:/proc/2398/task# ls 2398 2400 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 --- writing to "/proc/<pid>/task/*/oom_score_adj" works ok here |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process change child tasks, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads, maybe report a list of distinct values? "100 200 -100" root@rspadim-Latitude:/proc/2398/task# ls 2398 2400 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 --- writing to "/proc/<pid>/task/*/oom_score_adj" works ok here |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process change child tasks, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads, maybe report a list of distinct values? "100 200 -100" root@rspadim-Latitude:/proc/2398/task# ls *2398* __2400__ 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 --- writing to "/proc/<pid>/task/*/oom_score_adj" works ok here |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process change child tasks, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads, maybe report a list of distinct values? "100 200 -100" root@rspadim-Latitude:/proc/2398/task# ls *2398* __2400__ 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 --- writing to "/proc/<pid>/task/*/oom_score_adj" works ok here |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process change child tasks, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads, maybe report a list of distinct values? "100 200 -100" root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 --- writing to "/proc/<pid>/task/*/oom_score_adj" works ok here |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process change child tasks, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads, maybe report a list of distinct values? "100 200 -100" root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 --- writing to "/proc/<pid>/task/*/oom_score_adj" works ok here |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process change child tasks, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads, maybe report a list of distinct values? "100 200 -100" root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 --- writing to "/proc/<pid>/task/*/oom_score_adj" works ok here global var write: we write to * tasks global var read: read from all tasks, and report distinct values to user global variable -- tested with root user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose |
Summary | Linux Kernel Out Of Memory Configuration to my.cnf file | Implement Linux Kernel Out Of Memory Configuration (oom_score_adj) to my.cnf file and mysqld global variable |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql mysql need to repair tables and a crash (kill signal) at database is more risk than a crash at application well the idea is include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, to allow server configure the /proc kernel interface and use the right option at oom_adf file, like in this example: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html reading this i think (i'm not sure yet) that we only need a option at my.cnf: linux-oom_adj=xxx (a number from -17 to +15) - http://linux-mm.org/OOM_Killer important note: It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). when server start it must write this value to /proc/<pid>/oom_adj file, and that's all -- thinking about a MySQL Variable (show variables) when readed it return contents of /proc/<pid>/oom_adj when write it write values to /proc/<pid>/oom_adj must check if <pid> is main pid (from first mysqld, or current process pid) -- First Checks: from kernel 4.xxx: [179443.053386] bash (19390): /proc/2398/oom_adj is deprecated, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- using oom_adj didn't changed oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process change child tasks, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads, maybe report a list of distinct values? "100 200 -100" root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 --- writing to "/proc/<pid>/task/*/oom_score_adj" works ok here global var write: we write to * tasks global var read: read from all tasks, and report distinct values to user global variable -- tested with root user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel): It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). (from 4.xxx kernels): The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to _/proc/<pid>/task/*/oom_adj_ file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to _/proc/<pid>/task/*/oom_score_adj_ file at mysqld start -------------------------------------------------- global Variable: READING GLOBAL VARIABLE VALUE (SELECT @@global.linux_oom_adj) we should report ALL DISTINCT tasks values: *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/\*/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD(/proc/<pid>/task/*/oom_score_adj), if not exists, or don't have permission to read it MUST report *NULL* SETTING GLOBAL VARIABLE VALUE (SET @@global.linux_oom_adj=-17) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/\*/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/\*/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel): It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). (from 4.xxx kernels): The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to _/proc/<pid>/task/*/oom_adj_ file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to _/proc/<pid>/task/*/oom_score_adj_ file at mysqld start -------------------------------------------------- global Variable: READING GLOBAL VARIABLE VALUE (SELECT @@global.linux_oom_adj) we should report ALL DISTINCT tasks values: *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/\*/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD(/proc/<pid>/task/*/oom_score_adj), if not exists, or don't have permission to read it MUST report *NULL* SETTING GLOBAL VARIABLE VALUE (SET @@global.linux_oom_adj=-17) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/\*/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/\*/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose --- others opensource codes that implement it: Chromium?! http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc {code:c++} // NOTE: This is not the only version of this function in the source: // the setuid sandbox (in process_util_linux.c, in the sandbox source) // also has its own C version. bool AdjustOOMScore(ProcessId process, int score) { if (score < 0 || score > kMaxOomScore) return false; FilePath oom_path(internal::GetProcPidDir(process)); // Attempt to write the newer oom_score_adj file first. FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); if (PathExists(oom_file)) { std::string score_str = IntToString(score); DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } // If the oom_score_adj file doesn't exist, then we write the old // style file and translate the oom_adj score to the range 0-15. oom_file = oom_path.AppendASCII("oom_adj"); if (PathExists(oom_file)) { // Max score for the old oom_adj range. Used for conversion of new // values to old values. const int kMaxOldOomScore = 15; int converted_score = score * kMaxOldOomScore / kMaxOomScore; std::string score_str = IntToString(converted_score); DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } return false; } {code} |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose --- others opensource codes that implement it: Chromium?! http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc {code:c++} // NOTE: This is not the only version of this function in the source: // the setuid sandbox (in process_util_linux.c, in the sandbox source) // also has its own C version. bool AdjustOOMScore(ProcessId process, int score) { if (score < 0 || score > kMaxOomScore) return false; FilePath oom_path(internal::GetProcPidDir(process)); // Attempt to write the newer oom_score_adj file first. FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); if (PathExists(oom_file)) { std::string score_str = IntToString(score); DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } // If the oom_score_adj file doesn't exist, then we write the old // style file and translate the oom_adj score to the range 0-15. oom_file = oom_path.AppendASCII("oom_adj"); if (PathExists(oom_file)) { // Max score for the old oom_adj range. Used for conversion of new // values to old values. const int kMaxOldOomScore = 15; int converted_score = score * kMaxOldOomScore / kMaxOomScore; std::string score_str = IntToString(converted_score); DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } return false; } {code} |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose --- others opensource codes that implement it: Chromium?! http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc important include files http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.cc http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.h http://lxr.free-electrons.com/source/include/uapi/linux/oom.h#L8 {code:c++} // NOTE: This is not the only version of this function in the source: // the setuid sandbox (in process_util_linux.c, in the sandbox source) // also has its own C version. bool AdjustOOMScore(ProcessId process, int score) { if (score < 0 || score > kMaxOomScore) return false; FilePath oom_path(internal::GetProcPidDir(process)); // Attempt to write the newer oom_score_adj file first. FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); if (PathExists(oom_file)) { std::string score_str = IntToString(score); DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } // If the oom_score_adj file doesn't exist, then we write the old // style file and translate the oom_adj score to the range 0-15. oom_file = oom_path.AppendASCII("oom_adj"); if (PathExists(oom_file)) { // Max score for the old oom_adj range. Used for conversion of new // values to old values. const int kMaxOldOomScore = 15; int converted_score = score * kMaxOldOomScore / kMaxOomScore; std::string score_str = IntToString(converted_score); DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } return false; } {code} -- maybe after we must implement something at windows source too? http://src.chromium.org/chrome/trunk/src/base/process/memory_win.cc - void OnNoMemory(); / void EnableTerminationOnOutOfMemory(); |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose --- others opensource codes that implement it: Chromium?! http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc important include files http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.cc http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.h http://lxr.free-electrons.com/source/include/uapi/linux/oom.h#L8 {code:c++} // NOTE: This is not the only version of this function in the source: // the setuid sandbox (in process_util_linux.c, in the sandbox source) // also has its own C version. bool AdjustOOMScore(ProcessId process, int score) { if (score < 0 || score > kMaxOomScore) return false; FilePath oom_path(internal::GetProcPidDir(process)); // Attempt to write the newer oom_score_adj file first. FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); if (PathExists(oom_file)) { std::string score_str = IntToString(score); DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } // If the oom_score_adj file doesn't exist, then we write the old // style file and translate the oom_adj score to the range 0-15. oom_file = oom_path.AppendASCII("oom_adj"); if (PathExists(oom_file)) { // Max score for the old oom_adj range. Used for conversion of new // values to old values. const int kMaxOldOomScore = 15; int converted_score = score * kMaxOldOomScore / kMaxOomScore; std::string score_str = IntToString(converted_score); DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } return false; } {code} -- maybe after we must implement something at windows source too? http://src.chromium.org/chrome/trunk/src/base/process/memory_win.cc - void OnNoMemory(); / void EnableTerminationOnOutOfMemory(); |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose --- others opensource projects that implement it: Linux OOM.H http://lxr.free-electrons.com/source/include/uapi/linux/oom.h#L8 {code:c} #define OOM_SCORE_ADJ_MIN (-1000) #define OOM_SCORE_ADJ_MAX 1000 * /proc/<pid>/oom_adj set to -17 protects from the oom killer for legacy * purposes. */ #define OOM_DISABLE (-17) /* inclusive */ #define OOM_ADJUST_MIN (-16) #define OOM_ADJUST_MAX 15 {code} Chromium (Google Chrome?!) http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc important include files http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.cc http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.h {code:c++} // NOTE: This is not the only version of this function in the source: // the setuid sandbox (in process_util_linux.c, in the sandbox source) // also has its own C version. bool AdjustOOMScore(ProcessId process, int score) { if (score < 0 || score > kMaxOomScore) return false; FilePath oom_path(internal::GetProcPidDir(process)); // Attempt to write the newer oom_score_adj file first. FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); if (PathExists(oom_file)) { std::string score_str = IntToString(score); DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } // If the oom_score_adj file doesn't exist, then we write the old // style file and translate the oom_adj score to the range 0-15. oom_file = oom_path.AppendASCII("oom_adj"); if (PathExists(oom_file)) { // Max score for the old oom_adj range. Used for conversion of new // values to old values. const int kMaxOldOomScore = 15; int converted_score = score * kMaxOldOomScore / kMaxOomScore; std::string score_str = IntToString(converted_score); DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } return false; } {code} -- maybe after we must implement something at windows source too? http://src.chromium.org/chrome/trunk/src/base/process/memory_win.cc - void OnNoMemory(); / void EnableTerminationOnOutOfMemory(); |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose --- others opensource projects that implement it: Linux OOM.H http://lxr.free-electrons.com/source/include/uapi/linux/oom.h#L8 {code:c} #define OOM_SCORE_ADJ_MIN (-1000) #define OOM_SCORE_ADJ_MAX 1000 * /proc/<pid>/oom_adj set to -17 protects from the oom killer for legacy * purposes. */ #define OOM_DISABLE (-17) /* inclusive */ #define OOM_ADJUST_MIN (-16) #define OOM_ADJUST_MAX 15 {code} Chromium (Google Chrome?!) http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc important include files http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.cc http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.h {code:c++} // NOTE: This is not the only version of this function in the source: // the setuid sandbox (in process_util_linux.c, in the sandbox source) // also has its own C version. bool AdjustOOMScore(ProcessId process, int score) { if (score < 0 || score > kMaxOomScore) return false; FilePath oom_path(internal::GetProcPidDir(process)); // Attempt to write the newer oom_score_adj file first. FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); if (PathExists(oom_file)) { std::string score_str = IntToString(score); DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } // If the oom_score_adj file doesn't exist, then we write the old // style file and translate the oom_adj score to the range 0-15. oom_file = oom_path.AppendASCII("oom_adj"); if (PathExists(oom_file)) { // Max score for the old oom_adj range. Used for conversion of new // values to old values. const int kMaxOldOomScore = 15; int converted_score = score * kMaxOldOomScore / kMaxOomScore; std::string score_str = IntToString(converted_score); DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } return false; } {code} -- maybe after we must implement something at windows source too? http://src.chromium.org/chrome/trunk/src/base/process/memory_win.cc - void OnNoMemory(); / void EnableTerminationOnOutOfMemory(); |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose --- others opensource projects that implement it: Linux OOM.H http://lxr.free-electrons.com/source/include/uapi/linux/oom.h#L8 {code:c} /* * /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for * pid. */ #define OOM_SCORE_ADJ_MIN (-1000) #define OOM_SCORE_ADJ_MAX 1000 * /proc/<pid>/oom_adj set to -17 protects from the oom killer for legacy * purposes. */ #define OOM_DISABLE (-17) /* inclusive */ #define OOM_ADJUST_MIN (-16) #define OOM_ADJUST_MAX 15 {code} Chromium (Google Chrome?!) http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc important include files http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.cc http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.h {code:c++} // NOTE: This is not the only version of this function in the source: // the setuid sandbox (in process_util_linux.c, in the sandbox source) // also has its own C version. bool AdjustOOMScore(ProcessId process, int score) { if (score < 0 || score > kMaxOomScore) return false; FilePath oom_path(internal::GetProcPidDir(process)); // Attempt to write the newer oom_score_adj file first. FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); if (PathExists(oom_file)) { std::string score_str = IntToString(score); DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } // If the oom_score_adj file doesn't exist, then we write the old // style file and translate the oom_adj score to the range 0-15. oom_file = oom_path.AppendASCII("oom_adj"); if (PathExists(oom_file)) { // Max score for the old oom_adj range. Used for conversion of new // values to old values. const int kMaxOldOomScore = 15; int converted_score = score * kMaxOldOomScore / kMaxOomScore; std::string score_str = IntToString(converted_score); DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } return false; } {code} -- maybe after we must implement something at windows source too? http://src.chromium.org/chrome/trunk/src/base/process/memory_win.cc - void OnNoMemory(); / void EnableTerminationOnOutOfMemory(); |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose --- others opensource projects that implement it: Linux OOM.H http://lxr.free-electrons.com/source/include/uapi/linux/oom.h#L8 {code:c} /* * /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for * pid. */ #define OOM_SCORE_ADJ_MIN (-1000) #define OOM_SCORE_ADJ_MAX 1000 * /proc/<pid>/oom_adj set to -17 protects from the oom killer for legacy * purposes. */ #define OOM_DISABLE (-17) /* inclusive */ #define OOM_ADJUST_MIN (-16) #define OOM_ADJUST_MAX 15 {code} Chromium (Google Chrome?!) http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc important include files http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.cc http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.h {code:c++} // NOTE: This is not the only version of this function in the source: // the setuid sandbox (in process_util_linux.c, in the sandbox source) // also has its own C version. bool AdjustOOMScore(ProcessId process, int score) { if (score < 0 || score > kMaxOomScore) return false; FilePath oom_path(internal::GetProcPidDir(process)); // Attempt to write the newer oom_score_adj file first. FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); if (PathExists(oom_file)) { std::string score_str = IntToString(score); DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } // If the oom_score_adj file doesn't exist, then we write the old // style file and translate the oom_adj score to the range 0-15. oom_file = oom_path.AppendASCII("oom_adj"); if (PathExists(oom_file)) { // Max score for the old oom_adj range. Used for conversion of new // values to old values. const int kMaxOldOomScore = 15; int converted_score = score * kMaxOldOomScore / kMaxOomScore; std::string score_str = IntToString(converted_score); DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } return false; } {code} -- maybe after we must implement something at windows source too? http://src.chromium.org/chrome/trunk/src/base/process/memory_win.cc - void OnNoMemory(); / void EnableTerminationOnOutOfMemory(); |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose --- others opensource projects that implement it: Linux OOM.H http://lxr.free-electrons.com/source/include/uapi/linux/oom.h#L8 {code:c} /* * /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for * pid. */ #define OOM_SCORE_ADJ_MIN (-1000) #define OOM_SCORE_ADJ_MAX 1000 /* * /proc/<pid>/oom_adj set to -17 protects from the oom killer for legacy * purposes. */ #define OOM_DISABLE (-17) /* inclusive */ #define OOM_ADJUST_MIN (-16) #define OOM_ADJUST_MAX 15 {code} Chromium (Google Chrome?!) http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc important include files http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.cc http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.h {code:c++} // NOTE: This is not the only version of this function in the source: // the setuid sandbox (in process_util_linux.c, in the sandbox source) // also has its own C version. bool AdjustOOMScore(ProcessId process, int score) { if (score < 0 || score > kMaxOomScore) return false; FilePath oom_path(internal::GetProcPidDir(process)); // Attempt to write the newer oom_score_adj file first. FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); if (PathExists(oom_file)) { std::string score_str = IntToString(score); DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } // If the oom_score_adj file doesn't exist, then we write the old // style file and translate the oom_adj score to the range 0-15. oom_file = oom_path.AppendASCII("oom_adj"); if (PathExists(oom_file)) { // Max score for the old oom_adj range. Used for conversion of new // values to old values. const int kMaxOldOomScore = 15; int converted_score = score * kMaxOldOomScore / kMaxOomScore; std::string score_str = IntToString(converted_score); DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } return false; } {code} -- maybe after we must implement something at windows source too? http://src.chromium.org/chrome/trunk/src/base/process/memory_win.cc - void OnNoMemory(); / void EnableTerminationOnOutOfMemory(); |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical than a crash at application (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose --- others opensource projects that implement it: Linux OOM.H http://lxr.free-electrons.com/source/include/uapi/linux/oom.h#L8 {code:c} /* * /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for * pid. */ #define OOM_SCORE_ADJ_MIN (-1000) #define OOM_SCORE_ADJ_MAX 1000 /* * /proc/<pid>/oom_adj set to -17 protects from the oom killer for legacy * purposes. */ #define OOM_DISABLE (-17) /* inclusive */ #define OOM_ADJUST_MIN (-16) #define OOM_ADJUST_MAX 15 {code} Chromium (Google Chrome?!) http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc important include files http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.cc http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.h {code:c++} // NOTE: This is not the only version of this function in the source: // the setuid sandbox (in process_util_linux.c, in the sandbox source) // also has its own C version. bool AdjustOOMScore(ProcessId process, int score) { if (score < 0 || score > kMaxOomScore) return false; FilePath oom_path(internal::GetProcPidDir(process)); // Attempt to write the newer oom_score_adj file first. FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); if (PathExists(oom_file)) { std::string score_str = IntToString(score); DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } // If the oom_score_adj file doesn't exist, then we write the old // style file and translate the oom_adj score to the range 0-15. oom_file = oom_path.AppendASCII("oom_adj"); if (PathExists(oom_file)) { // Max score for the old oom_adj range. Used for conversion of new // values to old values. const int kMaxOldOomScore = 15; int converted_score = score * kMaxOldOomScore / kMaxOomScore; std::string score_str = IntToString(converted_score); DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } return false; } {code} -- maybe after we must implement something at windows source too? http://src.chromium.org/chrome/trunk/src/base/process/memory_win.cc - void OnNoMemory(); / void EnableTerminationOnOutOfMemory(); |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical (consume more time to repair / can loose important data) than a crash at application (only static/cached/transaction data) (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose --- others opensource projects that implement it: Linux OOM.H http://lxr.free-electrons.com/source/include/uapi/linux/oom.h#L8 {code:c} /* * /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for * pid. */ #define OOM_SCORE_ADJ_MIN (-1000) #define OOM_SCORE_ADJ_MAX 1000 /* * /proc/<pid>/oom_adj set to -17 protects from the oom killer for legacy * purposes. */ #define OOM_DISABLE (-17) /* inclusive */ #define OOM_ADJUST_MIN (-16) #define OOM_ADJUST_MAX 15 {code} Chromium (Google Chrome?!) http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc important include files http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.cc http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.h {code:c++} // NOTE: This is not the only version of this function in the source: // the setuid sandbox (in process_util_linux.c, in the sandbox source) // also has its own C version. bool AdjustOOMScore(ProcessId process, int score) { if (score < 0 || score > kMaxOomScore) return false; FilePath oom_path(internal::GetProcPidDir(process)); // Attempt to write the newer oom_score_adj file first. FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); if (PathExists(oom_file)) { std::string score_str = IntToString(score); DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } // If the oom_score_adj file doesn't exist, then we write the old // style file and translate the oom_adj score to the range 0-15. oom_file = oom_path.AppendASCII("oom_adj"); if (PathExists(oom_file)) { // Max score for the old oom_adj range. Used for conversion of new // values to old values. const int kMaxOldOomScore = 15; int converted_score = score * kMaxOldOomScore / kMaxOomScore; std::string score_str = IntToString(converted_score); DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } return false; } {code} -- maybe after we must implement something at windows source too? http://src.chromium.org/chrome/trunk/src/base/process/memory_win.cc - void OnNoMemory(); / void EnableTerminationOnOutOfMemory(); |
Description |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical (consume more time to repair / can loose important data) than a crash at application (only static/cached/transaction data) (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the /proc kernel interface use *oom_adj* OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.6 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose --- others opensource projects that implement it: Linux OOM.H http://lxr.free-electrons.com/source/include/uapi/linux/oom.h#L8 {code:c} /* * /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for * pid. */ #define OOM_SCORE_ADJ_MIN (-1000) #define OOM_SCORE_ADJ_MAX 1000 /* * /proc/<pid>/oom_adj set to -17 protects from the oom killer for legacy * purposes. */ #define OOM_DISABLE (-17) /* inclusive */ #define OOM_ADJUST_MIN (-16) #define OOM_ADJUST_MAX 15 {code} Chromium (Google Chrome?!) http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc important include files http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.cc http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.h {code:c++} // NOTE: This is not the only version of this function in the source: // the setuid sandbox (in process_util_linux.c, in the sandbox source) // also has its own C version. bool AdjustOOMScore(ProcessId process, int score) { if (score < 0 || score > kMaxOomScore) return false; FilePath oom_path(internal::GetProcPidDir(process)); // Attempt to write the newer oom_score_adj file first. FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); if (PathExists(oom_file)) { std::string score_str = IntToString(score); DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } // If the oom_score_adj file doesn't exist, then we write the old // style file and translate the oom_adj score to the range 0-15. oom_file = oom_path.AppendASCII("oom_adj"); if (PathExists(oom_file)) { // Max score for the old oom_adj range. Used for conversion of new // values to old values. const int kMaxOldOomScore = 15; int converted_score = score * kMaxOldOomScore / kMaxOomScore; std::string score_str = IntToString(converted_score); DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } return false; } {code} -- maybe after we must implement something at windows source too? http://src.chromium.org/chrome/trunk/src/base/process/memory_win.cc - void OnNoMemory(); / void EnableTerminationOnOutOfMemory(); |
Hi guys We have a big problem when system is Out Of Memory running a apache+mysql server, i prefer that kernel kill the apache server instead of mysql when mysqld restart, it can repair all tables (myisam/innodb/aria/toku/etc...), a crash (kill signal when out of memory) at database pid (somethiing killall -9 mysqld) is more critical (consume more time to repair / can loose important data) than a crash at application (only static/cached/transaction data) (something like killall -9 httpd) in this task we will not check others process oom_score_adj, we will only think about mysqld oom_score_adj values --- the idea is: include a option at my.cnf / mysqld / mysqld_multi / mysqld_safe, and a global variable (SHOW VARIABLES) to allow server oom_score_adj configuration, the */proc* kernel interface may use *oom_adj* AND/OR *oom_score_adj* file (the second have a bigger range -1000 to 1000, instead of -17 to +15, the first is older and exist in more kernels , maybe before 2.6 kernels) a comment about someone that have the same problem using oracle server: http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html at my.cnf file we could include: *linux-oom_adj=xxx* (a number from -17 to +15 user MUST know that we are using oom_adj or oom_score_adj) *linux-oom_score_adj=xxx* (a number from -1000 to 1000, user MUST know that we are using oom_adj or oom_score_adj) tested with kernel 4.xxx, we should use *oom_score_adj*: [179443.053386] bash (19390): */proc/2398/oom_adj is deprecated*, please use /proc/2398/oom_score_adj instead. https://www.kernel.org/doc/Documentation/filesystems/proc.txt important notes: (from 2.4 kernel docs): {code} It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. *Any particular process leader may be immunized against the oom killer* if the value of its /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as *-17*). {code} (from 4.xxx kernels docs): {code} The badness heuristic assigns a value to each candidate task ranging from *0 (never kill) to 1000 (always kill)* to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. The value of /proc/<pid>/oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from *-1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX)*. This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, *-1000, is equivalent to disabling oom killing entirely for that task* since it will always report a badness score of 0. Caveat: when a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. {code} -------------------------------------------------- when linux-oom_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_adj_* file at mysqld start when linux-oom_score_adj is set at my.cnf, server *MUST* write this value to *_/proc/<pid>/task/\*/oom_score_adj_* file at mysqld start -------------------------------------------------- GLOBAL VARIABLE *READING* GLOBAL VARIABLE VALUE (*SELECT @@global.linux_oom_adj*) we should report ALL *DISTINCT* values from *ALL TASKS* in the main mysqld process <pid> (i tested with child and it works too...): *linux-oom_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission to read it MUST report *NULL* *linux-oom_score_adj* will report: DISTINCT FREAD */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission to read it MUST report *NULL* *SETTING* GLOBAL VARIABLE VALUE (*SET @@global.linux_oom_adj=-17*) *linux-oom_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) *linux-oom_score_adj* will FWRITE to all files at */proc/<pid>/task/_\*_/oom_score_adj*, if not exists, or don't have permission it MUST report a WARNING message (no permission - error 13) -- TEST NOTES: using oom_adj sometimes dont change oom_score_adj automatically as docs reported, it only change if value >1 or <-1, 0/1/-1 = 0 changing main process sometimes change child tasks too, but it's a bit confusing, maybe change all mysqld pid is better, the problem is know what's the value to report to mysql variable if we have many threads (that's why we will report ALL DISTINCT VALUES FROM ALL TASKS) {code} root@rspadim-Latitude:/proc/2398/task# ls *2398* *_2400_* 2402 2403 2404 2406 2413 2414 2415 2416 2417 2418 2419 2420 2423 {code} here every oom_score_adj was =0, i'm checking only 2398 (main), and 2400 (child) in this report, but all others taks (child) are equal to 2400 {code} root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -200 > task/2400/oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n 200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 200 200 root@rspadim-Latitude:/proc/2398# echo -n -100 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -100 root@rspadim-Latitude:/proc/2398# echo -n -150 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -150 root@rspadim-Latitude:/proc/2398# echo -n -200 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -200 root@rspadim-Latitude:/proc/2398# echo -n -500 > oom_score_adj; cat oom_score; cat task/2400/oom_score_adj 0 -500 {code} tested with *root* user, must check what happens (permissions) with mysql code fopen/fread/fwrite/fclose --- others opensource projects that implement it: Linux OOM.H http://lxr.free-electrons.com/source/include/uapi/linux/oom.h#L8 {code:c} /* * /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for * pid. */ #define OOM_SCORE_ADJ_MIN (-1000) #define OOM_SCORE_ADJ_MAX 1000 /* * /proc/<pid>/oom_adj set to -17 protects from the oom killer for legacy * purposes. */ #define OOM_DISABLE (-17) /* inclusive */ #define OOM_ADJUST_MIN (-16) #define OOM_ADJUST_MAX 15 {code} Chromium (Google Chrome?!) http://src.chromium.org/chrome/trunk/src/base/process/memory_linux.cc important include files http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.cc http://src.chromium.org/chrome/trunk/src/base/process/internal_linux.h {code:c++} // NOTE: This is not the only version of this function in the source: // the setuid sandbox (in process_util_linux.c, in the sandbox source) // also has its own C version. bool AdjustOOMScore(ProcessId process, int score) { if (score < 0 || score > kMaxOomScore) return false; FilePath oom_path(internal::GetProcPidDir(process)); // Attempt to write the newer oom_score_adj file first. FilePath oom_file = oom_path.AppendASCII("oom_score_adj"); if (PathExists(oom_file)) { std::string score_str = IntToString(score); DVLOG(1) << "Adjusting oom_score_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } // If the oom_score_adj file doesn't exist, then we write the old // style file and translate the oom_adj score to the range 0-15. oom_file = oom_path.AppendASCII("oom_adj"); if (PathExists(oom_file)) { // Max score for the old oom_adj range. Used for conversion of new // values to old values. const int kMaxOldOomScore = 15; int converted_score = score * kMaxOldOomScore / kMaxOomScore; std::string score_str = IntToString(converted_score); DVLOG(1) << "Adjusting oom_adj of " << process << " to " << score_str; int score_len = static_cast<int>(score_str.length()); return (score_len == WriteFile(oom_file, score_str.c_str(), score_len)); } return false; } {code} -- maybe after we must implement something at windows source too? http://src.chromium.org/chrome/trunk/src/base/process/memory_win.cc - void OnNoMemory(); / void EnableTerminationOnOutOfMemory(); |
Priority | Critical [ 2 ] | Minor [ 4 ] |
Labels | need_feedback |
Workflow | MariaDB v3 [ 73121 ] | MariaDB v4 [ 130420 ] |
Status | Open [ 1 ] | Needs Feedback [ 10501 ] |
Labels | need_feedback |
Fix Version/s | N/A [ 14700 ] | |
Resolution | Incomplete [ 4 ] | |
Status | Needs Feedback [ 10501 ] | Closed [ 6 ] |
updated systemd documentation to mention OOMScoreAdjust https://mariadb.com/kb/en/mariadb/systemd/
not really critical as over allocation is a user problem dealt with by using dedicated memory per process, monitoring (https://mariadb.com/kb/en/mariadb/memory-instrumentaion/), or good planning. innodb_buffer_pool_populate can be used to detect obvious allocation.