[MCOL-1024] postConfigure reported system catalog creation error but database continue to work Created: 2017-11-09  Updated: 2018-01-30  Resolved: 2017-11-10

Status: Closed
Project: MariaDB ColumnStore
Component/s: installation
Affects Version/s: 1.1.2
Fix Version/s: 1.1.2

Type: Bug Priority: Critical
Reporter: Daniel Lee (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Sprint: 2017-22

 Description   

Build tested: 1.1.2 Github source

/root/columnstore/mariadb-columnstore-server
commit ed21e674cfd70db421957d0212ff7fc1835d06d5
Author: david hill <david.hill@mariadb.com>
Date: Mon Nov 6 08:41:53 2017 -0600

Update README.md

updated version

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit e0febdcd1a88bc3fab55519bd679f93719687ccc
Merge: 1603ce9 f1dd92a
Author: david hill <david.hill@mariadb.com>
Date: Wed Nov 8 15:32:50 2017 -0600

Merge pull request #313 from mariadb-corporation/MCOL-989

MCOL-989

This issue surfaced during 1um2pm localquery installation test. PostConfigure reported:

==> s1pm1: System Catalog Create Failure
==> s1pm1: Check latest log file in /tmp/dbbuilder.log.*

This caused replication setup to be skipped. But database operations from UM1 was functioning normally. Therefore, this could be a false error.

Because replication setup was skipped, local query was not setup correct.

This error does not happen all the time. It occurred one time out of three installations today.



 Comments   
Comment by Daniel Lee (Inactive) [ 2017-11-09 ]

Info from PM1

[root@localhost columnstore]# cat crit.log
Nov 9 16:52:16 localhost writeengine[9284]: 16.803682 |0|0|0| C 19 CAL0060: dbbuilder system catalog error: Creating TableName column OID: 1001 The File already exists. [BRM error status: UNKNOWN (255)]
[root@localhost columnstore]# ls -la --time-style=full-iso *
rw-rr-. 1 root root 0 2017-11-09 16:50:44.746791841 +0000 activeAlarms
rw-rr-. 1 root root 216 2017-11-09 16:50:44.746791841 +0000 alarm.log
rw------. 1 root root 207 2017-11-09 16:52:16.805683324 +0000 crit.log
rw-rr-. 1 root root 1871 2017-11-09 16:52:16.680948779 +0000 dbbuilder.log
-rwxr-xr-x. 1 root root 95920 2017-11-09 16:54:29.392919593 +0000 debug.log

The content in dbbulder.log indicated that system catalog was successful created. The file has a timestamp of "16:52:16.680948779". It was a bit later at "16:52:16.805683324" that crit.log was created, reporting a system catalog creation error.

Does that mean another process trying to create system catalog again?
or
Something to interpret the system catalog creation status incorrectly after the dbbuilder.log was created?
or
.......

Comment by David Hill (Inactive) [ 2017-11-10 ]

I worked with Daniel on his system were it occurred and found the issue.. ProgMgr created the system catalog and postCOnfigure errored when it tried to create it. Working on a fix

Comment by David Hill (Inactive) [ 2017-11-10 ]

Fixed - remove the dbbuilder 7 running from ProcMgr, which was causing issues with postConfigure.

You will see that ProcessManager doesnt mentioned running system catalog as before

Nov 10 19:35:51 ip-172-30-0-17 ProcessManager[3526]: 51.223468 |0|0|0| D 17 CAL0000: Set System State = ACTIVE
Nov 10 19:35:51 ip-172-30-0-17 ProcessMonitor[3437]: 51.223730 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = ACTIVE
Nov 10 19:35:51 ip-172-30-0-17 ProcessManager[3526]: 51.228224 |0|0|0| D 17 CAL0000: startSystemThread Exit
Nov 10 19:35:51 ip-172-30-0-17 ProcessManager[3526]: 51.228611 |0|0|0| D 17 CAL0000: distributeConfigFile called for system file = Columnstore.xml
Nov 10 19:35:51 ip-172-30-0-17 ProcessManager[3526]: 51.233856 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module um1
Nov 10 19:35:51 ip-172-30-0-17 ProcessManager[3526]: 51.234574 |0|0|0| D 17 CAL0000: um1 distributeConfigFile success.
Nov 10 19:35:51 ip-172-30-0-17 ProcessManager[3526]: 51.238365 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm2
Nov 10 19:35:51 ip-172-30-0-17 ProcessManager[3526]: 51.238884 |0|0|0| D 17 CAL0000: pm2 distributeConfigFile success.
Nov 10 19:35:51 ip-172-30-0-17 ProcessManager[3526]: 51.238954 |0|0|0| D 17 CAL0000: startMgrProcessThread Exit

Also added additional comment in postConfigure to inform users if a failure occurs, then always need to rerun postConfigure. And I added that in the multi-node install guides..

commit b63d9fcce1815e178701599b12df3f9810f6acac
Author: david hill <david.hill@mariadb.com>
Date: Fri Nov 10 13:41:50 2017 -0600

MCOL-1024

oamapps/postConfigure/installer.cpp | 5 ++
oamapps/postConfigure/postConfigure.cpp | 299 ++++++++++++-----------------------------------------------------------------------
procmgr/processmanager.cpp | 8 —

Comment by Daniel Lee (Inactive) [ 2017-11-10 ]

Build verified: 1.1.2 GitHub source

/root/columnstore/mariadb-columnstore-server
commit ed21e674cfd70db421957d0212ff7fc1835d06d5
Author: david hill <david.hill@mariadb.com>
Date: Mon Nov 6 08:41:53 2017 -0600

Update README.md

updated version

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit fda44955ea05e33f8bb6666df25b02926a17d6f8
Author: david hill <david.hill@mariadb.com>
Date: Fri Nov 10 14:13:27 2017 -0600

MCOL-1020 - added checks to prevent crash

Repeated 6 installations successfully.

Comment by Allen Chan [ 2018-01-30 ]

How can i skip this error?

Comment by David Hill (Inactive) [ 2018-01-30 ]

What is the contents of the log(s) file in /tmp

/tmp/dbbuilder.log.*

Generated at Thu Feb 08 02:25:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.