[MDEV-17481] mariadb service won't shutdown when it's running and the OS datetime updated backwards Created: 2018-10-17 Updated: 2020-08-25 Resolved: 2020-07-22 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.3.7, 10.5.2, 10.2, 10.3, 10.4 |
| Fix Version/s: | 10.2.33, 10.3.24, 10.4.14, 10.5.5 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Jossef Harush | Assignee: | Thirunarayanan Balathandayuthapani |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | hang, server, shutdown | ||
| Environment: |
ubuntu 14.04.05 |
||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
When I update the OS datetime *backwards* while `mariadb` is running, it causes `mariadb` to hang when I command it's service to stop. — Steps to reproduce:
> setting the date, starting the service, resetting the date -10 hours, stopping the service => hang |
| Comments |
| Comment by Jossef Harush [ 2018-10-17 ] | |||||||||||||||||||||||||||||||||||||||||||
|
also posted in StackExchange Why is it hangs and how can I avoid it? | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2018-12-29 ] | |||||||||||||||||||||||||||||||||||||||||||
|
I can reproduce it reliably with the current 10.2/10.3 running on Trusty, but not on Stretch. I didn't try other OS flavors. Service involvement or build specifics are irrelevant, reproducible with a debug server started and shut down manually, as well as with release binary tarballs and debian packages. Not reproducible with 10.0/10.1.
All threads are attached as threads.trusty Also curiously, if I set the date back while it is hanging (or rather waiting for something), the shutdown proceeds to the successful finish. Note the timestamps:
| |||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-12-30 ] | |||||||||||||||||||||||||||||||||||||||||||
|
At first sight, this looks like a design problem in os_event::timed_wait(). On Windows, it is using a delay in milliseconds; on POSIX systems, it invokes pthread_cond_timedwait() to wait until a specified absolute timestamp. If the system time is moved backwards after the creation of the timestamp, the wait could be much longer than anticipated. In earlier versions of InnoDB, similar code is part of the function os_event_wait_time_low(). Maybe setting the system time backwards would cause a spurious wakeup and immediate return from pthread_cond_timedwait(). After all, spurious wakeups are documented to be possible. The most likely cause for the hang seems to be the way how the page cleaner threads have been implemented in MySQL 5.7 and MariaDB 10.2. The function pc_sleep_if_needed() or its caller buf_flush_page_cleaner_coordinator() or the page cleaner subsystem could make the matter worse by storing absolute timestamps over extended periods of time. For example, page_cleaner_flush_pages_recommendation() is storing prev_time across calls, and pc_flush_slot() does not prevent the time differences list_tm and lru_tm from becoming negative. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Thirunarayanan Balathandayuthapani [ 2019-09-06 ] | |||||||||||||||||||||||||||||||||||||||||||
|
The following patch solves this issue, but it causes a performance regression for idle server.
It should be fixable in mdev-16526 | |||||||||||||||||||||||||||||||||||||||||||
| Comment by somak92 [ 2020-04-20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
HI~! | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-04-20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Because this is repeatable with 10.5.2, I think that we must run some tests to assess the performance impact of the proposed change, and we should actually fix this in the earliest affected major version (10.2). | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Novak [ 2020-06-03 ] | |||||||||||||||||||||||||||||||||||||||||||
|
I have a VPS that has its hwclock set 2 hours ahead of local time, and I can't get the provider to fix it. Then NTP client (chronyd) sets the time -2 hours every time after VPS start. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Thirunarayanan Balathandayuthapani [ 2020-07-13 ] | |||||||||||||||||||||||||||||||||||||||||||
|
__pthread_cond_timedwait() hangs if time moved backwards. Hack could be waking up the page cleaner thread in logs_empty_and_mark_files_at_shutdown(). | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-13 ] | |||||||||||||||||||||||||||||||||||||||||||
|
wlad, what do you think of this hack? Basically, we would intentionally trigger some ‘spurious wakeups’ to work around the problem. Ultimately we want to rewrite the page cleaner ( | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-07-13 ] | |||||||||||||||||||||||||||||||||||||||||||
|
waking at shutdown seems fine, but do we really need it at other times, frequently? buf_flush_event is set here and there anyway, in different situations. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-07-13 ] | |||||||||||||||||||||||||||||||||||||||||||
|
https://jira.mariadb.org/browse/MDEV-17481?focusedCommentId=133652&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-133652 ulint next_loop_time = ut_time_ms() + 1000; vs ulint next_loop_time = my_interval_timer() + 1000; is a big difference. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-14 ] | |||||||||||||||||||||||||||||||||||||||||||
|
wlad, the old comment that you refer to is not part of the currently planned solution. We would not increase the use of my_interval_timer(), but instead basically just ensure that a call to os_event_set(buf_flush_event) will end the infinite wait here and there. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-22 ] | |||||||||||||||||||||||||||||||||||||||||||
|
The work-around looks OK to me. We’d be waking up the page cleaner in logs_empty_and_mark_files_at_shutdown() and srv_master_do_idle_tasks(). |