[MCOL-2245] altersystem-disablemodule return with failure on a busy system Created: 2019-03-15 Updated: 2023-10-26 Resolved: 2019-07-10 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ? |
| Affects Version/s: | 1.2.2 |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Major |
| Reporter: | David Hill (Inactive) | Assignee: | Unassigned |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2um 2pm system |
||
| Description |
|
Reported by customer and seen by support in a shared session. The altersystem-disableModule command failed on the first try. It passed on the second try. I tried to reproduce the issue on local 2um/2pm system when it was idle and couldn't reproduce it. I looked at the logs from the customer system and saw that a message was sent to UM2 to stop the process. UM2 received the message and replied back 1:02 minutes later. But the timeout in ProcMgr on PM1 is 1 minutes, so it timeout and returned and error back to the user. Viewing the logs, the system was active with Bulk Loading, which probably contributed to the msg timeout. So the timeout might need to be increased from 1 minute to something higher for it to work on a busy system. BUT its not recommend that user run this command on a active system to start with. From the logs below, its shows why the altersystem-disablemodule command failed. There was bulk loading, cpimport jobs, running the disablemodule was run. So this might have contributed to the timeout and it taking longer to perform the disablemodule and why I could reproduce the issue. I was running on an ideal system. So best bet is the altersystem-disableModule would have worked on you system if it was idle. Cant say for sure. I will go ahead and open a new bug request an increase on the timeout, but development might say the altersystem-disablemodule should only be done on a idle system and 1 minutes is valid. Thought I would pass this on.. This is my report on the altersystem-disablemodule failure your system had. Pm1 logs Thu Mar 14 15:01:02 2019: altersystem-disablemodule um2 y BULK LOAD WAS GOING ON AT THE TIME OF THE DISABLE Mar 14 15:01:01 ip-172-48-32-68 cpimport.bin[30605]: 01.613470 |0|0|0| I 34 CAL0081: Start BulkLoad: JobId-3049; db-tradealert Mar 14 15:01:02 ip-172-48-32-68 ProcessManager[4727]: 02.970810 |0|0|0| I 17 CAL0000: MSG RECEIVED: Stop Module request on um2 THE WAIT IS 1 MINUTE, SO A TIMEOUT OCCURRED WHICH CAUSES THE FAILURE TO THE USER ON THE DISABLE-MODULE COMMAND Mar 14 15:02:03 ip-172-48-32-68 ProcessManager[4727]: 03.034561 |0|0|0| E 17 CAL0000: line: 6901 sendMsgProcMon: ProcMon Msg timeout on module um2 Mar 14 15:02:03 ip-172-48-32-68 ProcessManager[4727]: 03.034635 |0|0|0| W 17 CAL0000: um2 module failed to stop!! um2 – took 1:04 minutes to respond Mar 14 15:01:02 ip-172-48-44-207 ProcessMonitor[3853]: 02.988421 |0|0|0| I 18 CAL0000: MSG RECEIVED: Stop All process request… |