[MCOL-540] Nonroot installation: PrimProc restarts when creating tables Created: 2017-02-01 Updated: 2019-07-10 Resolved: 2019-07-10 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | PrimProc |
| Affects Version/s: | 1.0.7 |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Minor |
| Reporter: | Daniel Lee (Inactive) | Assignee: | Andrew Hutchings (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Description |
|
Build tested: 1.0.7-1, Binary distribution package for Ubuntu 16.04 This issue occurs only for the the following condition: Nonroot installation When creating a table: UM1: MariaDB [(none)]> use mytest PM1 crit.log and err.log have the same entries: Feb 1 20:41:01 vagrant controllernode[5053]: 01.215946 |0|0|0| C 29 CAL0000: BRMShmImpl::BRMShmImpl(): retrying on size==0 The same test with root install was successful. I stopped the system, set <MysqlRep> to n in the Columnstore.xml file and started the system again. I was able to create the table successfully. |
| Comments |
| Comment by Andrew Hutchings (Inactive) [ 2017-02-01 ] |
|
Hi Daniel, Do you have a core file for PrimProc for when this happens? Also is there anything useful in DEBUG/INFO logs at the time? |
| Comment by Daniel Lee (Inactive) [ 2017-02-01 ] |
|
Debug, info and warnings did not have any more useful info. While trying to enable core file, I noticed that if I stop ColumnStore and start it again without making any changes, ColumnStore would be in operational state. I was able to create table after. To enable core dump (default is disabled), I enabled the flag in Columnstore.xml right after I untarred the binary package. Therefore the system came up with core dump enabled. But no core file was generated when primproc crashed. Using new, clean VMs, I installed Columnstore, and gdb primproc. When primproc crashed due to creating a table, I got the following: root@vagrant:~# ps -ef |grep -i primproc warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
|
| Comment by Andrew Hutchings (Inactive) [ 2017-02-02 ] |
|
OK, so the problem is your /dev/shm is not writable by your non-root user (which it isn't by default). This causes the "BRMShmImpl::BRMShmImpl(): retrying on size==0" message it throws an exception which is uncaught and fires a sigabrt. This would have been fixed by our post-install script if it was run as root. Suggestions for fix: 1. Update documentation to state that post-install should be run as root (or sudo) Point 3 should be addressed in this ticket (as well as documentation in point 1). You can fix your own installation by either running post-install as root (or sudo) or using chmod. |
| Comment by Daniel Lee (Inactive) [ 2017-02-02 ] |
|
Thanks. I made the /dev/shm directory writable by the guest user in the base VM, then did the test again. It still failed with the same error. After the test was failed, I verified that the quest user was able to write to the /dev/shm directory. |
| Comment by David Thompson (Inactive) [ 2017-02-05 ] |
|
tried reproducing this manually with 3 vms, 2 new ubuntu16 latest updates and cannot. Non root install, and verified local query enabled and working. Allso deliberately reset /dev/shm permissions before and postCfg does update this to be 777. One annoyance that took time to resolve was understanding that the LD_LIBRARY_PATH needs to be set at the top of .bashrc to avoid install failures due to the ssh remote install being non interactive login shell. |
| Comment by Andrew Hutchings (Inactive) [ 2017-02-05 ] |
|
The LD_LIBRARY_PATH is unavoidable until a data directory can be configured (I think there is a Jira for that). After that the ColumnStore libs and binaries could be installed in a standard path (as part of apt/yum) and the data in the non-root area. I think this will be nearly impossible to reproduce without an exact way of duplicating Daniel's environment so that we can figure out why /dev/shm is not writable in his case. We can, however, artificially create this problem easily which is what I will do when improving the error messages. |
| Comment by David Thompson (Inactive) [ 2017-06-05 ] |
|
Improve the error message and documentation on this for now. |
| Comment by Andrew Hutchings (Inactive) [ 2017-07-27 ] |
|
Changed priority and version due to this just being a case of sorting out an error message |
| Comment by David Hill (Inactive) [ 2017-08-18 ] |
|
on 1.0.11 testing, found that PrimProc was crashing on this same test... But when I set it up to cpature a corefile, it didnt crash but the create table did hang. I got this from the PrimProc gdb session: [Thread debugging using libthread_db enabled]
|
| Comment by Andrew Hutchings (Inactive) [ 2017-08-18 ] |
|
discussed on Slack, but mentioning here for tracking: that backtrace shows an idle PrimProc with no in-progress commands. Whatever caused the hang wasn't PrimProc and is unlikely to be related to the /dev/shm permissions that this ticket is for. |
| Comment by Daniel Lee (Inactive) [ 2018-08-16 ] |
|
The issue also affected root installation. 1.0.15-1 is also affected. |