[MCOL-4224] Improve error message: "Failed to open file: /var/lib/columnstore/data1/systemFiles/dbrm/tablelocks" Created: 2020-07-30  Updated: 2020-08-03  Resolved: 2020-08-03

Status: Closed
Project: MariaDB ColumnStore
Component/s: N/A
Affects Version/s: 1.5.3
Fix Version/s: N/A

Type: New Feature Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Todd Stoffel (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None

Issue Links:
Relates
relates to MCOL-4228 mcs-controllernode process fails on n... Closed
relates to MCOL-4223 CMAPI Server doesn't gracefully handl... Closed

 Description   

I'm testing multi-node ColumnStore on Ubuntu 20.04 with MariaDB Enterprise Server 10.5.4-2 and ColumnStore 1.5.3.

When I try to add a second node to the cluster, the syslog contains errors like this:

Jul 30 23:07:47 mcs2 IDBFile[6621]: 47.659400 |0|0|0| D 35 CAL0002: Failed to open file: /var/lib/columnstore/data1/systemFiles/dbrm/tablelocks, exception: unable to open Buffered file  
Jul 30 23:07:47 mcs2 controllernode[6621]: 47.659469 |0|0|0| D 29 CAL0000: TableLockServer::load(): could not open the save file/var/lib/columnstore/data1/systemFiles/dbrm/tablelocks

It is true that this file does not exist:

$ sudo ls -l /var/lib/columnstore/data1/systemFiles/dbrm/tablelocks
ls: cannot access '/var/lib/columnstore/data1/systemFiles/dbrm/tablelocks': No such file or directory

This error happens every time the mcs-controllernode process is started on a non-primary node:

$ sudo systemctl restart mcs-controllernode
$ sudo systemctl status mcs-controllernode
● mcs-controllernode.service - mcs-controllernode
     Loaded: loaded (/lib/systemd/system/mcs-controllernode.service; static; vendor preset: enabled)
     Active: active (running) since Mon 2020-08-03 22:15:07 UTC; 1s ago
   Main PID: 18879 (controllernode)
      Tasks: 2 (limit: 1074)
     Memory: 3.2M
     CGroup: /system.slice/mcs-controllernode.service
             └─18879 /usr/bin/controllernode fg
 
Aug 03 22:15:07 mcs-ubuntu2004-3 systemd[1]: Started mcs-controllernode.
Aug 03 22:15:07 mcs-ubuntu2004-3 IDBFile[18879]: 07.994772 |0|0|0| D 35 CAL0002: Failed to open file: /var/lib/columnstore/data1/systemFiles/dbrm/tablelocks, exception: unable to open Buffered file
Aug 03 22:15:07 mcs-ubuntu2004-3 controllernode[18879]: 07.995245 |0|0|0| D 29 CAL0000: TableLockServer::load(): could not open the save file/var/lib/columnstore/data1/systemFiles/dbrm/tablelocks

I have been told that this is probably "normal" for a non-primary node. If so, then the error message should be improved to clarify that this error is not serious.

If the mcs-controllernode process isn't supposed to run at all on non-primary nodes, then it might be best to avoid starting the process on those nodes in general. Why is ColumnStore starting a process that it doesn't need?



 Comments   
Comment by Todd Stoffel (Inactive) [ 2020-08-03 ]

I cannot reproduce this issue. Maybe there is a timing problem on your system. Try to increase the timeout in your api calls.

Comment by Geoff Montee (Inactive) [ 2020-08-03 ]

This seems to be unrelated to the API timeouts. This error happens when the mcs-controllernode process tries to start:

$ sudo systemctl status mcs-controllernode
● mcs-controllernode.service - mcs-controllernode
     Loaded: loaded (/lib/systemd/system/mcs-controllernode.service; static; vendor preset: enabled)
     Active: inactive (dead)
 
Aug 03 21:35:18 mcs-ubuntu2004-3 systemd[1]: Started mcs-controllernode.
Aug 03 21:35:18 mcs-ubuntu2004-3 IDBFile[1432]: 18.548124 |0|0|0| D 35 CAL0002: Failed to open file: /var/lib/columnstore/data1/systemFiles/dbrm/tablelocks, exception: unable to open Buffered file
Aug 03 21:35:18 mcs-ubuntu2004-3 controllernode[1432]: 18.549225 |0|0|0| D 29 CAL0000: TableLockServer::load(): could not open the save file/var/lib/columnstore/data1/systemFiles/dbrm/tablelocks
Aug 03 21:40:33 mcs-ubuntu2004-3 systemd[1]: Stopping mcs-controllernode...
Aug 03 21:40:33 mcs-ubuntu2004-3 controllernode[1432]: DBRM Controller: Waiting for threads to finish...
Aug 03 21:40:33 mcs-ubuntu2004-3 controllernode[1432]: Exiting...
Aug 03 21:40:33 mcs-ubuntu2004-3 systemd[1]: mcs-controllernode.service: Succeeded.
Aug 03 21:40:33 mcs-ubuntu2004-3 systemd[1]: Stopped mcs-controllernode.

But if I manually restart the process:

$ sudo systemctl restart mcs-controllernode

Then it seems to ignore the error:

$ sudo systemctl status mcs-controllernode
● mcs-controllernode.service - mcs-controllernode
     Loaded: loaded (/lib/systemd/system/mcs-controllernode.service; static; vendor preset: enabled)
     Active: active (running) since Mon 2020-08-03 21:57:47 UTC; 2s ago
   Main PID: 18124 (controllernode)
      Tasks: 2 (limit: 1074)
     Memory: 3.2M
     CGroup: /system.slice/mcs-controllernode.service
             └─18124 /usr/bin/controllernode fg
 
Aug 03 21:57:47 mcs-ubuntu2004-3 systemd[1]: Started mcs-controllernode.
Aug 03 21:57:47 mcs-ubuntu2004-3 IDBFile[18124]: 47.669057 |0|0|0| D 35 CAL0002: Failed to open file: /var/lib/columnstore/data1/systemFiles/dbrm/tablelocks, exception: unable to open Buffered file
Aug 03 21:57:47 mcs-ubuntu2004-3 controllernode[18124]: 47.669112 |0|0|0| D 29 CAL0000: TableLockServer::load(): could not open the save file/var/lib/columnstore/data1/systemFiles/dbrm/tablelocks

Comment by Geoff Montee (Inactive) [ 2020-08-03 ]

This error is very easy for me to reproduce. It happens every time I restart the mcs-controllernode process on the non-primary nodes.

See here:

$ sudo systemctl restart mcs-controllernode
$ sudo systemctl status mcs-controllernode
● mcs-controllernode.service - mcs-controllernode
     Loaded: loaded (/lib/systemd/system/mcs-controllernode.service; static; vendor preset: enabled)
     Active: active (running) since Mon 2020-08-03 22:15:07 UTC; 1s ago
   Main PID: 18879 (controllernode)
      Tasks: 2 (limit: 1074)
     Memory: 3.2M
     CGroup: /system.slice/mcs-controllernode.service
             └─18879 /usr/bin/controllernode fg
 
Aug 03 22:15:07 mcs-ubuntu2004-3 systemd[1]: Started mcs-controllernode.
Aug 03 22:15:07 mcs-ubuntu2004-3 IDBFile[18879]: 07.994772 |0|0|0| D 35 CAL0002: Failed to open file: /var/lib/columnstore/data1/systemFiles/dbrm/tablelocks, exception: unable to open Buffered file
Aug 03 22:15:07 mcs-ubuntu2004-3 controllernode[18879]: 07.995245 |0|0|0| D 29 CAL0000: TableLockServer::load(): could not open the save file/var/lib/columnstore/data1/systemFiles/dbrm/tablelocks

And I can do it again too:

$ sudo systemctl restart mcs-controllernode
$ sudo systemctl status mcs-controllernode
● mcs-controllernode.service - mcs-controllernode
     Loaded: loaded (/lib/systemd/system/mcs-controllernode.service; static; vendor preset: enabled)
     Active: active (running) since Mon 2020-08-03 22:16:57 UTC; 1s ago
   Main PID: 19023 (controllernode)
      Tasks: 2 (limit: 1074)
     Memory: 3.3M
     CGroup: /system.slice/mcs-controllernode.service
             └─19023 /usr/bin/controllernode fg
 
Aug 03 22:16:57 mcs-ubuntu2004-3 systemd[1]: Started mcs-controllernode.
Aug 03 22:16:57 mcs-ubuntu2004-3 IDBFile[19023]: 57.112230 |0|0|0| D 35 CAL0002: Failed to open file: /var/lib/columnstore/data1/systemFiles/dbrm/tablelocks, exception: unable to open Buffered file
Aug 03 22:16:57 mcs-ubuntu2004-3 controllernode[19023]: 57.112293 |0|0|0| D 29 CAL0000: TableLockServer::load(): could not open the save file/var/lib/columnstore/data1/systemFiles/dbrm/tablelocks

Comment by Todd Stoffel (Inactive) [ 2020-08-03 ]

Controller functions do not need to run on replicas. This is a primary only function. The journal entry is expected and perfectly fine.

Generated at Thu Feb 08 02:48:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.