[MCOL-930] Cannot execute queries Created: 2017-09-18 Updated: 2018-11-28 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | PrimProc |
| Affects Version/s: | 1.0.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jimmy Atauje Hidalgo | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Red Hat Enterprise Linux Server release 7.2 (Maipo) |
||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
When executing a simple summary query it generally works but sometimes not, if query fails Is there a workaround for these? My errors are like these: [root@server6653 ~]# grep "error"/var/log/mariadb/columnstore/err.log Sep 18 13:10:08 localhost PrimProc[29337]: 08.069259 |0|0|0| C 28 CAL0000: /home/builder/mariadb-columnstore-server/mariadb-columnstore-engine/primitives/primproc/columncommand.cpp error on projectResultRG for oid 529299 lbid 1363416840: input rids 8192, output rids 8190 Additionally i get these error at random times: Sep 18 13:03:18 localhost [29337]: 18.682927 |0|0|0| W 28 CAL0000: BRP::getBlock(): got a BRM lookup error. LBID=5490343937 ver= SCN: 0#012 Txns: txn=0 vbFlg=0 |
| Comments |
| Comment by David Thompson (Inactive) [ 2017-09-19 ] | ||||||||||
|
Hi, is it possible for you to attach or send over seperately a support report as that will give us the info we need to triage further: You can also send this directly to me if you don't feel comfortable attaching to this jira. Also if you can describe the symptoms a bit more that might help. Will a given query work for some time and then fail or will a given query always fail and then make the system unstable? Anything else going on around the same time when problems occur? The logs in the support report may tell us this but always good to get a human perspective. | ||||||||||
| Comment by Jimmy Atauje Hidalgo [ 2017-09-27 ] | ||||||||||
|
Hi, the error was not present these days but today it happened again. i have attached the support report, i have removed some archive logs because the file was too big for attachment. The scenario is like this:
[root@elastic103 space]# mcsmysql -D cdrdatos MariaDB [cdrdatos]> SELECT access_point_name_NI,SUM(duration) FROM cdr20170926 GROUP BY access_point_name_NI; ..... +after server restart +....... MariaDB [cdrdatos]> SELECT access_point_name_NI,SUM(duration) FROM cdr20170926 GROUP BY access_point_name_NI;
---------------------
Any suggestion is aprecciated. | ||||||||||
| Comment by David Thompson (Inactive) [ 2017-09-27 ] | ||||||||||
|
Thanks, that is definitely helpful. What i can see today is that around 12:44, exemgr crashed: You can also see this by the fact it has a later start date by running mcsadmin getProcessStatus (ProcMon will restart it if it detects a crash). I don't know if it's the cause or just a long running query but the following query never finished: $' and (uli_ci + uli_sac + uli_ecgi) in ( 10900, 10901, 10902, 10903, 10908, 10909, 62201, 10905, 10906, 10907, 20361, 20362, 20363, 20365, 20366, 20367, 57170, 57176, 57177, 62200, 62206, 62207, 130839143, 130839141, 130839142, 132152421 ) and (uli_lac + uli_tai) in (1183,1412,14121) group by served_MSISDN, hour(DATE_ADD(record_opening_time, INTERVAL duration SECOND)) order by served_MSISDN, hour(DATE_ADD(record_opening_time, INTERVAL duration SECOND)) asc limit 10; |cdrdatos| This likely is tied in with the original errors you reported but i don't see any reported between 12.41 and 12.47. If it's ok to do on your system can you try running the above query manually and see if that triggers any error. It's possible it could be exhausting memory but you have a very beefy machine so this seems less likely. One thing that i can say is that we have fixed some similar stability bugs in 1.0.10 and 1.0.11 so it may be worthwhile considering an upgrade to 1.0.11 first? | ||||||||||
| Comment by Sasha V [ 2017-10-24 ] | ||||||||||
|
I reported a similar issue for the version 1.1.0. Perhaps for a workaround, one may try replacing the SMALLINT columns with INT. | ||||||||||
| Comment by Jimmy Atauje Hidalgo [ 2017-10-30 ] | ||||||||||
|
Thanks for your help!, i have many datatypes for my table like (bigint, tinyint, smallint, char, varchar) and reading your issue one guy said that it will happen with any columns not being the same, does this mean all my columns need to be integer for example? | ||||||||||
| Comment by Sasha V [ 2017-10-30 ] | ||||||||||
|
According to LinuxJedi, all columns should have the same widths (in bytes) as the first column. Since "one cannot currently use CHANGE COLUMN to change the definition of that column," I used DROP/CREATE to change the column data type. | ||||||||||
| Comment by Jimmy Atauje Hidalgo [ 2017-10-30 ] | ||||||||||
|
Guys correct me if i am wrong: | ||||||||||
| Comment by Sasha V [ 2017-12-12 ] | ||||||||||
|
That is a great example of thinking outside the box! However, as discussed in |