[MDEV-24393] table_open_cache creates same number of advisory /proc/locks. Is it worth it? Created: 2020-12-11  Updated: 2023-07-04  Resolved: 2021-07-27

Status: Closed
Project: MariaDB Server
Component/s: Server, Storage Engine - InnoDB
Affects Version/s: 10.5.8, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.2.40, 10.3.31, 10.4.21, 10.5.12, 10.6.4

Type: Bug Priority: Minor
Reporter: naox Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
causes MDEV-28495 Corruption due to attempting to start... Closed
causes MDEV-28662 [ERROR] mysqld got signal 11 after up... Closed
causes MDEV-28697 One Node crashed and did not recover Closed
causes MDEV-28976 InnoDB: Missing FILE_CHECKPOINT Closed
Relates
relates to MDEV-31568 innodb protection against dual proces... Closed

 Description   

MariaDB table_open_cache generates same and excessive number of advisory /proc/locks that causes OTHER programs on machine to waste cpu power ("sys load") when checking for locks. Are those really needed/worth it?
Executing lsof command while there are 100k locks in /proc/locks (made by mariadb with table_open_cache=100k) takes 2 minutes.
I see huge decrease in "sys" load after reducing table_open_cache from 100k to 4k on graphs.



 Comments   
Comment by Daniel Black [ 2020-12-11 ]

While its parially obvious the section of locks here are all mariadb lock (same pid - 28588, The inode of each entry seems unique (14......). Do you have https://mariadb.com/kb/en/innodb-file-per-table-tablespaces/ ? How many table_open_cache records are being used? Is there lots of temporary files open?

Its not obvious that the kernel locks are the issue. With kernel and mariadb debug symbols installed, can you use `perf record -a -g – sleep 5`, can you show with `perf report-g --no-children --stdio -` what sys time is being used? If you look at `perf top` where is CPU time being spend?

Have you eliminated other more obvious causes like slow queries, maybe a query plan of common queries has changed in the version upgrade from previous major version.

Comment by naox [ 2020-12-11 ]

Afer further investigation I've completly changed ticket contents so need_feedback flag is obsolete

Comment by Daniel Black [ 2020-12-11 ]

If you have a distro kernel there will be a package of kernel debug info symbols available for install. Packages mariadb versions also have debug symbol packages available. ref: https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#installing-debug-info-packages-on-linux . You will not been to recompile mariadb or kernel (unless you have already and aren't using packages).

While I see locks, and I believe you see sys usage in load, there are many other changes between 10.3 and 10.5 and the cause may be something else. Please try to get some perf data to identify the cause of where CPU time is spent.

Comment by naox [ 2020-12-11 ]

I'll rephrase:
MariaDB table_open_cache generates same and excessive number of advisory /proc/locks that causes OTHER programs on machine to waste cpu power ("sys load") when checking for locks. Are those really needed/worth it?
Executing lsof command while there are 100k locks in /proc/locks (made by mariadb with table_open_cache=100k) takes 2 minutes.
I see huge decrease in "sys" load after reducing table_open_cache from 100k to 4k on graphs.

Comment by Daniel Black [ 2020-12-14 ]

Thanks for the rephrase. The advisory locks are pretty good at identifying and protecting duplicate mariadb processes from running on the same data.

If both the advisory locks and the other tools are needed in their current form, I suggest we get some test cases together to take to the linux kernel people to get help resolving the usage (https://lore.kernel.org/linux-fsdevel/).

Comment by Sergei Golubchik [ 2020-12-14 ]

as far as I understand, advisory locking should not be used by default. You'd need to do something like --skip-external-locking=0 to make MariaDB server to use advisory locking.

Comment by naox [ 2020-12-15 ]

I did not modify default settings of external-locking and still mariadb creates advisory locks in /proc/locks up to number of table_open_cache

mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 2152674
Server version: 10.5.8-MariaDB MariaDB Server
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [(none)]> show global variables like '%skip_external_locking%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| skip_external_locking | ON    |
+-----------------------+-------+
1 row in set (0.001 sec)
MariaDB [(none)]> show global variables like '%table_open_cache%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| table_open_cache           | 500   |
| table_open_cache_instances | 8     |
+----------------------------+-------+
2 rows in set (0.001 sec)
 
pidof mysqld
7108
 
cat /proc/locks|grep '7108'|wc -l
501
 
cat /proc/locks|grep '7108
 
1: POSIX  ADVISORY  WRITE 7108 09:03:14320733 0 EOF
2: POSIX  ADVISORY  WRITE 7108 09:03:14320749 0 EOF
3: POSIX  ADVISORY  WRITE 7108 09:03:14320731 0 EOF
4: POSIX  ADVISORY  WRITE 7108 09:03:14320735 0 EOF
5: POSIX  ADVISORY  WRITE 7108 09:03:14320729 0 EOF
6: POSIX  ADVISORY  WRITE 7108 09:03:14320969 0 EOF
7: POSIX  ADVISORY  WRITE 7108 09:03:14466387 0 EOF
8: POSIX  ADVISORY  WRITE 7108 09:03:14466088 0 EOF
9: POSIX  ADVISORY  WRITE 7108 09:03:14466086 0 EOF
10: POSIX  ADVISORY  WRITE 7108 09:03:14321007 0 EOF
...

Comment by Sergei Golubchik [ 2020-12-18 ]

InnoDB also uses advisory locks, but only when it creates new files. Do you constantly create new InnoDB tables?
What kind of load do you have?

Comment by naox [ 2020-12-18 ]

Hosting kind of load (not fully known) so its not out of the question but I doubt it since I've observed same issue on every mariadb server out of 7 servers.

Comment by Elena Stepanova [ 2021-01-16 ]

If I understand the complaint correctly, it's reproducible rather easily as long as InnoDB is involved on MariaDB 10.2+ (at least) and MySQL 5.7, 8.0 (at least).

The raw test below creates 500 tables (more than table_open_cache allows) and checks the number of locks belonging to the MariaDB server in /proc/locks at different points.
The test doesn't show it, but all of them are advisory locks.

--source include/have_innodb.inc
 
create database db;
use db;
select @@skip_external_locking, @@table_open_cache;
 
--let $pidfile= `select @@pid_file`
 
--echo # /proc/locks before creating tables
--exec sh -c "grep -c `cat $pidfile` /proc/locks"
 
--let $tablenum= 500
--let $n= 1
--disable_query_log
while ($n <= $tablenum) 
{
  --eval create table t$n (a int) engine=InnoDB
  --inc $n
}
--enable_query_log
 
--echo # /proc/locks after creating tables
--exec sh -c "grep -c `cat $pidfile` /proc/locks"
flush tables;
 
--echo # /proc/locks after flush
--exec sh -c "grep -c `cat $pidfile` /proc/locks"
 
--source include/restart_mysqld.inc
 
--echo # /proc/locks after server restart
--exec sh -c "grep -c `cat $pidfile` /proc/locks"
--let $n= 1
 
--disable_query_log
--disable_result_log
while ($n <= $tablenum)
{
  --eval select * from t$n
  --inc $n
}
--enable_query_log
--enable_result_log
 
--echo # /proc/locks after opening tables
--exec sh -c "grep -c `cat $pidfile` /proc/locks"
 
drop database db;
 
--echo # /proc/locks after dropping tables
--exec sh -c "grep -c `cat $pidfile` /proc/locks"

10.5 a5b54f78

select @@skip_external_locking, @@table_open_cache;
@@skip_external_locking	@@table_open_cache
1	421
# /proc/locks before creating tables
6
# /proc/locks after creating tables
421
flush tables;
# /proc/locks after flush
421
# restart
# /proc/locks after server restart
22
# /proc/locks after opening tables
422
drop database db;
# /proc/locks after dropping tables
5

It doesn't happen with MyISAM:

Test above but with MyISAM instead of InnoDB as a table engine

select @@skip_external_locking, @@table_open_cache;
@@skip_external_locking	@@table_open_cache
1	421
# /proc/locks before creating tables
6
# /proc/locks after creating tables
6
flush tables;
# /proc/locks after flush
6
# restart
# /proc/locks after server restart
6
# /proc/locks after opening tables
6
drop database db;
# /proc/locks after dropping tables
6

Comment by Sergei Golubchik [ 2021-01-28 ]

marko, why InnoDB takes these advisory locks?

May be it could look at my_disable_locking variable (server's --skip-external-locking that disables advisory locks in MyISAM, Aria, and generally in the server), and do the same?

Comment by Marko Mäkelä [ 2021-04-26 ]

InnoDB acquires advisory locks in order to prevent data corruption due to misconfiguration (starting multiple instances of the server on the same data files).

This protection mechanism has proven useful from time to time, if you search for the message in the Internet:

Check that you do not already have another mysqld process using the same InnoDB data or log files.

I suppose that we could disable os_file_lock() for users who claim that they know what they are doing.

Comment by Sergei Golubchik [ 2021-04-28 ]

Yes, that's what I suggested above. MyISAM/Aria also take advisory locks to prevent data corruption due to misconfiguration (starting multiple instances of the server on the same data files).

This protection is disabled if the server is started with --skip-external-locking.

It would be logical if InnoDB would also obey this option.

Comment by naox [ 2021-04-28 ]

I'm just reminding that huge number of locks is only present if table_open_cache is set to high number (100k). When I've lowered it problem went away. So it can't be locks used for preventing multiple mysqld instances working on same datadir.
Huge number of locks hurts performance of other software running on same machine if it checks for those regulary (seen as "sys" load). So having table_open_cache set to high number currently adversely affect perfomrnace if machine is running anything other than mariadb.

Comment by Marko Mäkelä [ 2021-07-23 ]

serg, I applied the change that you suggested. Please review.
Note: Contrary to your claim, the InnoDB function that creates and opens files is named in a similar way to the Microsoft Windows function CreateFile(). Thus, the advisory locking affects all InnoDB files, not only newly created tables.

I ran the test that elenst helpfully provided. With the following invocation, I am seeing 422 entries in /proc/locks; without the extra parameter, there would be just 1 entry.

/mtr --mysqld=--external-locking innodb.locks

I would not add the test to the regression test suite.
The 1 offending entry comes from the Aria storage engine:

10.2 173e562dc2bd339de32d17de73b720e7ca863ff2

#1  0x000055555624aa82 in my_lock (fd=13, locktype=locktype@entry=1, 
    start=start@entry=0, length=length@entry=0, MyFlags=MyFlags@entry=416)
    at /mariadb/10.2o/mysys/my_lock.c:175
#2  0x0000555556123686 in lock_control_file (
    name=name@entry=0x7fffffffd5b0 "/dev/shm/10.2/mysql-test/var/mysqld.1/data/aria_log_control") at /mariadb/10.2o/storage/maria/ma_control_file.c:235
#3  0x000055555612321e in ma_control_file_open (
    create_if_missing=<optimized out>, print_error=1 '\001')
    at /mariadb/10.2o/storage/maria/ma_control_file.c:314
#4  0x00005555561207ae in ha_maria_init (p=<optimized out>)
    at /mariadb/10.2o/storage/maria/ha_maria.cc:3603

Comment by Sergei Golubchik [ 2021-07-26 ]

I think I'd prefer InnoDB to invoke my_lock() for locking instead. But your patch is ok too.

Comment by Marko Mäkelä [ 2021-07-27 ]

my_lock() does more than we might want in low-level InnoDB code.

Note: the default behavior will be changed. Because --skip-external-locking is activated by default, InnoDB will no longer acquire advisory file locks.

Generated at Thu Feb 08 09:29:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.