[MDEV-14292] MariaDB threads increasing suddenly from 400 to 3k~5k Created: 2017-11-05  Updated: 2017-11-08

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.2.9
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Maziar Sojoudian Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Ubuntu, php web app


Attachments: Text File innodb_status_6_nov.txt     Text File innodb_status_6_nov_after_reboot.txt     Text File master_status_6_nov.txt     Text File master_status_6_nov_after_reboot.txt     JPEG File photo_2017-11-08_15-42-19.jpg     JPEG File photo_2017-11-08_15-42-24.jpg     JPEG File photo_2017-11-08_15-42-28.jpg     Text File processlist_6_nov.txt     Text File processlist_6_nov_after_reboot.txt     Text File status_6_nov.txt     Text File status_6_nov_after_reboot.txt    

 Description   

We have a big database, about 750GB, we have a Master and 4 slaves. some times without any issues, MariaDB threads increasing suddenly from 300~400 to around 5K and its make MariaDB unavailable and web app get `connection time out`.

By chance we found that we can recover MariaDB trough stoping slaves and starting them after 30 second or 1~2 minute.

But this some times work and some times not work ! and in this situations we should stop MariaDB on Master server and staring it again !

its happened after MariaDB 10.2.9, before. when we were at MariaDB 10.1.X everything were fine !!!

PS : I should mention that some times one of the slaves face gap and it cannot sync data with Master ! and we should stop MariaDB server on this slave server and starting it again.

PS : Also we have `Lock process` in MariaDB 10.2.9 !!!!! before we didn't have this much !!!!!
One of the reason that I check when MariaDB threads went up was these `Locks` , I should mention that these locks are not natural and they are not for my application before we didn't have such a issue on MariaDB 10.1.X



 Comments   
Comment by Elena Stepanova [ 2017-11-07 ]

sojoudian,
Please provide some useful information about the circumstances of the described problem. If you create database applications, you have an idea what the useful information could be, and if the problem happens regularly, it shouldn't be difficult to collect it:

  • processlists from the affected server(s);
  • error logs;
  • configuration files;
  • stack traces from the running server;
  • output of resource monitors.
Comment by Maziar Sojoudian [ 2017-11-08 ]

@Elena Stepanova
innodb_status_6_nov_after_reboot.txt innodb_status_6_nov.txt master_status_6_nov_after_reboot.txt master_status_6_nov.txt processlist_6_nov_after_reboot.txt processlist_6_nov.txt status_6_nov_after_reboot.txt status_6_nov.txt
As you can see in the process list file, we have some process/transactions which is running about 24 hours and we can't kill them (the KILL command doesn't work, it only changes the state of them to KILLED) and because we're using InnoDB row locking (for update) inside of those transactions their records are not accessible until we restart the database. I should mention, it's not true for all of the transactions which have row locking, only 20-30 transactions each day.

As I said before, we face another issue which is suddenly the number of connected threads on master DB goes high without any patterns every day. Sometimes we stop replications (executing STOP SLAVE) on each slave and it works, but sometimes we have to restart the master DB. because the
number of connected threads keep increasing and when it reaches 100k connected threads, it won't respond to any new connections and the applications will be down. The logs I attached are from two situations, one of them is before restarting the DB (the innodb_max_dirty_pages_pct was set to 0 to make shutdown time a little bit faster) and the other one is after restarting the DB. (when everything was stable)

PS : please check the attachments

Generated at Thu Feb 08 08:12:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.