[MDEV-16790] Server crashes on 'show table status' Created: 2018-07-20  Updated: 2018-09-09  Resolved: 2018-09-09

Status: Closed
Project: MariaDB Server
Component/s: Admin statements, Storage Engine - TokuDB
Affects Version/s: 10.1.32
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Jonathan Levin Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

Ubuntu 16.04.4 LTS


Attachments: Text File error.log     Text File serverparm.txt    

 Description   

Hello,

We seem to have a strange issue where running 'show table status' takes 3-4 minutes and then crashes the server.
The server is used largely with TokuDB with quite a few partitions.
We have had other errors with too many open files, but those are intermittent and I am not sure if they are related.

I have attached the error log during the crash and the server parameters in the attachment to this ticket.



 Comments   
Comment by Elena Stepanova [ 2018-07-20 ]

There is no configuration for TokuDB in serverparm.txt – do you really use all defaults, or do you have it in a separate file maybe?
I've found an upstream bug https://jira.percona.com/browse/PS-4307 with the same assertion failure, there they claim it's related to tokudb-max-lock-memory value.

Comment by Jonathan Levin [ 2018-07-21 ]

The tokudb_max_lock_memory defaults to 4294967296.
This is higher than the memory on the machine. Are you suggesting I tune it down to fit in the machine's 64Gb?

Comment by Jonathan Levin [ 2018-07-21 ]

I've changed the following two settings:
table_open_cache = 6096;
table_definition_cache = 6096;

Now, when I run 'show table status' it does not crash, but I do get the following output for some of the tables in the output.

| a4120_db           | NULL   |    NULL | NULL       |      NULL |           NULL |         NULL |                NULL |         NULL |                 NULL |           NULL | NULL                | NULL                | NULL       | NULL              |     NULL | NULL                                             | Failed to read from the .par file                                                                                |
| a4123_db           | NULL   |    NULL | NULL       |      NULL |           NULL |         NULL |                NULL |         NULL |                 NULL |           NULL | NULL                | NULL                | NULL       | NULL              |     NULL | NULL                                             | Failed to read from the .par file                                                                                |
| db_failed_template | NULL   |    NULL | NULL       |      NULL |           NULL |         NULL |                NULL |         NULL |                 NULL |           NULL | NULL                | NULL                | NULL       | NULL              |     NULL | NULL                                             | Failed to read from the .par file                                                                                |
| db_template        | NULL   |    NULL | NULL       |      NULL |           NULL |         NULL |                NULL |         NULL |                 NULL |           NULL | NULL                | NULL                | NULL       | NULL              |     NULL | NULL                                             | Failed to read from the .par file                                                                                |
| engine2.old | NULL   |    NULL | NULL       |      NULL |           NULL |         NULL |                NULL |         NULL |                 NULL |           NULL | NULL                | NULL                | NULL       | NULL              |     NULL | NULL                                             | Got error 24 "Too many open files" from storage engine TokuDB                               

Comment by Elena Stepanova [ 2018-07-21 ]

Are you suggesting I tune it down to fit in the machine's 64Gb?

No, I wasn't suggesting that (yet), I was just asking about your TokuDB configuration, since I couldn't find it in the attached file.

Now, when I run 'show table status' it does not crash, but I do get the following output for some of the tables in the output.

Open tables mean open files, so the values are related to ulimit -n.

Comment by Jonathan Levin [ 2018-07-21 ]

Also, it did crash.. a few moments after that show table status command.

Comment by Jonathan Levin [ 2018-07-21 ]

root@db:~# sudo su - mysql
mysql@db:~$ ulimit -n
1024000

but the open-files-limit is set to 10,000 and for some reason, I cannot seem to effect this.

Comment by Jonathan Levin [ 2018-07-21 ]

There are no tokudb settings in the my.cnf file.

Comment by Elena Stepanova [ 2018-08-04 ]

Stricter ulimit -n might be set by service configuration or whatever startup scripts are used to handle the server. Please check them. This effect

| engine2.old | NULL   |  <...> | NULL                                             | Got error 24 "Too many open files" from storage engine TokuDB                               

is very easily reproducible with insufficient ulimit -n.

Further,

a4120_db           | NULL   |    <...> | NULL                                             | Failed to read from the .par file                                                                                |
|

it is unclear whether this error is related to any of this or not. Do the files exist? Can they be corrupt? Does it work when you run show table status for only one of the tables?

Comment by Jonathan Levin [ 2018-08-05 ]

File limit outside of mysql seems to be fine.
Its the open_file_limit that is set to 10k and when I briefly looked at the open files in the status variables, it was over 230k.

Would you know what is causing mysql to ignore the my.cnf variable for open_files_limit and keep it at 10k?

Comment by Elena Stepanova [ 2018-08-05 ]

From all I see, at least in 10.1.32 which you are using, the only thing that affects open_files_limit is the system limit; the value always ends up being set to whatever getrlmit returns.
It's not that the value is completely ignored, it's used first, but then it's changed to the system limit.
The logic is admittedly weird, it was modified later in 10.1 (although I can't claim that it's become better), but it's not the point – I don't see any path where the value that you see on the running server would be less than the system limit that the server had upon startup. You can experiment with it yourself by setting different limits in your session and running something like

mysqld --open-files-limit=100000 --verbose --help | grep 'open-files-limit'

(or using your config as --defaults-file, for an even cleaner experiment).

Comment by Jonathan Levin [ 2018-08-08 ]

Your line worked, but on the actual server.. no matter how many times I change the my.cnf file or attempt to change the OS ulimit in numerous places, I cannot change it in mysql.

From the online articles, this appears to be a common problem.

Comment by Elena Stepanova [ 2018-08-08 ]

As mentioned before, there are various places where the limit can be tweaked, service scripts/configuration would be one of those. It should be fairly easy to trace, e.g. add the debug output of the limit (raw from the system and one from mysqld similar to above) at the different stages of the startup.

Which packages are you using, btw, those provided by MariaDB or Debian? Given that you're on Xenial, it could be either. If you don't know, please paste the output of dpkg -l | grep -i maria.

Generated at Thu Feb 08 08:31:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.