Details
-
Bug
-
Status: Needs Feedback (View Workflow)
-
Minor
-
Resolution: Unresolved
-
11.7.2, 11.8.3
-
None
-
None
-
None
Description
Hi,
Since upgrading the nodes of my galera cluster to mariadb 11.7 (from 10.something), sql statements fail with error Local temporary space limit reached at least once per week. Upgrading to mariadb 11.8 didn't solve the issue.
The error seems to happen at "random" except that once the error occurs, several unrelated statements are affected in a row. Users of the frontend usually get this error for 10 seconds and then everything start working again. But, this morning, the error kept happening for more than one minute on a very lightly loaded server.
I did my homework. The error should be triggered by the following code in function temp_file_size_cb_func() of mysqld.cc:
temp_file_size_cb_func() |
if (thd->status_var.tmp_space_used + size_change >
|
thd->variables.max_tmp_space_usage && !no_error &&
|
thd->variables.max_tmp_space_usage)
|
{
|
global_tmp_space_used-= size_change;
|
error= EE_LOCAL_TMP_SPACE_FULL;
|
my_errno= ENOSPC;
|
goto exit;
|
}
|
max_tmp_space_usage is initialized by max_tmp_session_space_usage.
Its current value is the default value, 1024GB:
> show variables like "max_tmp_session_space_usage";
|
+-----------------------------+---------------+
|
| Variable_name | Value |
|
+-----------------------------+---------------+
|
| max_tmp_session_space_usage | 1099511627776 |
|
+-----------------------------+---------------+
|
My servers don't have that much free disk space (or RAM for that matter) 🙂 . So, the condition should not be fulfilled before the server crashes due to lack of disk space.
I'm a bit reluctant to set the value to zero to disable the error check entirely because if mariadb really tries to write that many bytes, the server will be in trouble for much longer.
Neither error.log nor linux journalctl contain errors. I don't know which one of the three galera nodes is reporting the error.
One of the galera node is the master of an aynchronous replication. The slave replication never lags behind the master. If remains at 0s according to the site24x7 agent installed on the slave.
Could you please tell me what might produce this error or advise about how to figure that out?