[MCOL-1638] Suse12 regression failure test023 median::nextValue crashing PrimProc Created: 2018-08-09  Updated: 2018-11-20  Resolved: 2018-11-20

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 1.2.0
Fix Version/s: 1.2.2

Type: Bug Priority: Major
Reporter: Ben Thompson (Inactive) Assignee: Andrew Hutchings (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Suse12 Buildbot Nightly


Sprint: 2018-18, 2018-19, 2018-20

 Description   

This is currently ONLY happening on SUSE12 buildbot for 1.2 (develop branch)

Note that test022 is also failing but test023 is failing and hanging indefinitely not allowing regression tests to continue.

010 Drop Partition Test:              Passed (18 counts all matched!)
Running test011.sh.
011 cpimport Features Test:           Passed
Running test012.sh.
012 Varbinary Test:                   Passed
Running test013.sh.
013 BLOB Test:                        Passed
Running test014.sh.
014 TIME Test:                        Passed
Running test022.sh.
022 GROUP BY handler test:                 Failed (check /data/mariadb-columnstore-regression-test/mysql/queries/nightly/alltest/test022.log)
Running test023.sh.

/var/log/mariadb/columnstore/trace/PrimProc.13222.log

Date/time: 2018-08-09 14:19:58
Signal: 11
 
[0x560e115dfe90]
/lib64/libpthread.so.0(+0x10b00)[0x7f5144fa7b00]
/usr/local/mariadb/columnstore/lib/libudfsdk.so.1(_ZN8mcsv1sdk6median9nextValueEPNS_12mcsv1ContextEPNS_11ColumnDatumE+0x16c)[0x7f51483768dc]
/usr/local/mariadb/columnstore/lib/librowgroup.so.1(_ZN8rowgroup14RowAggregation6doUDAFERKNS_3RowElllRm+0x3f8)[0x7f514894eb38]
/usr/local/mariadb/columnstore/lib/librowgroup.so.1(_ZN8rowgroup14RowAggregation11updateEntryERKNS_3RowE+0x172)[0x7f51489484a2]
/usr/local/mariadb/columnstore/lib/librowgroup.so.1(_ZN8rowgroup14RowAggregation12aggregateRowERNS_3RowE+0x354)[0x7f5148951144]
/usr/local/mariadb/columnstore/lib/librowgroup.so.1(_ZN8rowgroup14RowAggregation11addRowGroupEPKNS_8RowGroupE+0x7d)[0x7f5148946fed]
[0x560e1159e180]
[0x560e1159eb63]
[0x560e115af109]
/usr/local/mariadb/columnstore/lib/libthreadpool.so.1(_ZN10threadpool18PriorityThreadPool9threadFcnENS0_8PriorityE+0x37b)[0x7f5144976f9b]
/usr/lib64/libboost_thread.so.1.54.0(+0xcc0a)[0x7f5145e58c0a]
/lib64/libpthread.so.0(+0x8734)[0x7f5144f9f734]
/lib64/libc.so.6(clone+0x6d)[0x7f51439d9d3d]

From debug logging showing the query that was last executed:

2018-08-09T14:19:53.794323-05:00 ip-10-0-0-188 ExeMgr[13344]: 53.794266 |2147483682|0|0| D 16 CAL0041: Start SQL statement: select objectid,columnname from syscolumn where schema='tpch1' and tablename='lineitem' --columnRIDs/FE; ||
2018-08-09T14:19:53.805442-05:00 ip-10-0-0-188 ExeMgr[13344]: 53.805398 |2147483682|0|0| D 16 CAL0042: End SQL statement
2018-08-09T14:19:53.825289-05:00 ip-10-0-0-188 ExeMgr[13344]: 53.825246 |34|0|0| D 16 CAL0041: Start SQL statement: select l_shipmode,count(*),avg_mode(l_extendedprice) from lineitem group by l_shipmode order by l_shipmode; |tpch1|
2018-08-09T14:19:56.233495-05:00 ip-10-0-0-188 ExeMgr[13344]: 56.233412 |34|0|0| D 16 CAL0042: End SQL statement
2018-08-09T14:19:56.572488-05:00 ip-10-0-0-188 ExeMgr[13344]: 56.572401 |34|0|0| D 16 CAL0041: Start SQL statement: select l_shipmode,count(*),avg_mode(l_extendedprice) from lineitem group by l_shipmode order by avg_mode(l_extendedprice); |tpch1|
2018-08-09T14:19:58.882301-05:00 ip-10-0-0-188 ExeMgr[13344]: 58.882224 |34|0|0| D 16 CAL0042: End SQL statement
2018-08-09T14:19:58.926924-05:00 ip-10-0-0-188 ExeMgr[13344]: 58.926839 |2147483683|0|0| D 16 CAL0041: Start SQL statement: select objectid,columnname from syscolumn where schema='tpch1' and tablename='lineitem' --columnRIDs/FE; ||
2018-08-09T14:19:58.938124-05:00 ip-10-0-0-188 ExeMgr[13344]: 58.938062 |2147483683|0|0| D 16 CAL0042: End SQL statement
2018-08-09T14:19:58.977351-05:00 ip-10-0-0-188 ExeMgr[13344]: 58.977260 |35|0|0| D 16 CAL0041: Start SQL statement: select l_shipmode,avg_mode(l_extendedprice),l_shipmode,median(l_extendedprice) from lineitem group by l_shipmode order by l_shipmode; |tpch1|
2018-08-09T14:30:55.673851-05:00 ip-10-0-0-188 joblist[13474]: 55.673765 |0|0|0| C 05 CAL0000: /data/buildbot/bb-worker/suse12/mariadb-columnstore-engine/dbcon/joblist/distributedenginecomm.cpp @ 432 DEC: lost connection to 127.0.0.1
2018-08-09T14:30:55.686612-05:00 ip-10-0-0-188 joblist[13562]: 55.686558 |0|0|0| C 05 CAL0000: /data/buildbot/bb-worker/suse12/mariadb-columnstore-engine/dbcon/joblist/distributedenginecomm.cpp @ 432 DEC: lost connection to 127.0.0.1
2018-08-09T14:30:55.687295-05:00 ip-10-0-0-188 joblist[13389]: 55.687254 |0|0|0| C 05 CAL0000: /data/buildbot/bb-worker/suse12/mariadb-columnstore-engine/dbcon/joblist/distributedenginecomm.cpp @ 432 DEC: lost connection to 127.0.0.1
2018-08-09T14:30:55.688119-05:00 ip-10-0-0-188 joblist[13344]: 55.688075 |0|0|0| C 05 CAL0000: /data/buildbot/bb-worker/suse12/mariadb-columnstore-engine/dbcon/joblist/distributedenginecomm.cpp @ 432 DEC: lost connection to 127.0.0.1
2018-08-09T14:30:55.688914-05:00 ip-10-0-0-188 joblist[13344]: 55.688873 |0|0|0| C 05 CAL0000: /data/buildbot/bb-worker/suse12/mariadb-columnstore-engine/dbcon/joblist/distributedenginecomm.cpp @ 432 DEC: lost connection to 127.0.0.1
2018-08-09T14:30:55.689735-05:00 ip-10-0-0-188 joblist[13344]: 55.689699 |0|0|0| C 05 CAL0000: /data/buildbot/bb-worker/suse12/mariadb-columnstore-engine/dbcon/joblist/distributedenginecomm.cpp @ 432 DEC: lost connection to 127.0.0.1
2018-08-09T14:30:55.690572-05:00 ip-10-0-0-188 joblist[13344]: 55.690534 |0|0|0| C 05 CAL0000: /data/buildbot/bb-worker/suse12/mariadb-columnstore-engine/dbcon/joblist/distributedenginecomm.cpp @ 432 DEC: lost connection to 127.0.0.1
2018-08-09T14:30:55.698317-05:00 ip-10-0-0-188 joblist[13344]: 55.698278 |0|0|0| C 05 CAL0000: /data/buildbot/bb-worker/suse12/mariadb-columnstore-engine/dbcon/joblist/distributedenginecomm.cpp @ 432 DEC: lost connection to 127.0.0.1
2018-08-09T14:30:56.601230-05:00 ip-10-0-0-188 ProcessMonitor[11127]: 56.601108 |0|0|0| C 18 CAL0000: *****Calpont Process Restarting: PrimProc, old PID = 13222



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2018-10-05 ]

Since we don't use the median plugin any more I'm guessing this is gone now?

Comment by David Hall (Inactive) [ 2018-10-08 ]

Maybe. However, anytime we crash in one OS, it often is something that needs fixing and might break elsewhere later.

Comment by Roman [ 2018-11-19 ]

JFYI the query that crashed PrimProc is from test022(GROUP BY handler) not test023(exotic identifiers test suite). Moreover the query in question:

select l_shipmode,avg_mode(l_extendedprice),l_shipmode,median(l_extendedprice) from lineitem group by l_shipmode order by l_shipmode;

Was replaced with another query by the commit fec19269431248e41ce1baa57311b069ea7fae2d

select l_shipmode,avg_mode(l_extendedprice),l_shipmode,avg_mode(l_extendedprice) from lineitem group by l_shipmode order by l_shipmode;

Could you recheck whether the regression suit contains the latest develop?

Comment by Andrew Hutchings (Inactive) [ 2018-11-20 ]

Confirmed in BuildBot. This is fixed now

Generated at Thu Feb 08 02:30:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.