[MCOL-2106] InetStreamSocket::readToMagic(): I/O error1: rc-1; poll signal interrupt ( POLLHUP POLLERR ) appears during cpimport Bulk load Created: 2019-01-24  Updated: 2023-03-06  Resolved: 2023-03-06

Status: Closed
Project: MariaDB ColumnStore
Component/s: cpimport
Affects Version/s: 1.0.16, 1.1.7
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Zdravelina Sokolovska (Inactive) Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Environment:

mcs on 1UM-1PM ; CentOS7.5


Attachments: File columnstoreSupportReport 1.7z    

 Description   

InetStreamSocket::readToMagic(): I/O error1: rc-1; poll signal interrupt ( POLLHUP POLLERR ) appears during cpimport Bulk load

that problem was observed during concurrent data cpimport Bulk load
issue appeared after killing one query and starting cpimport Bulk load

note: that problem was not observed before when was performed the same concurrent data cpimport Bulk load

Loading MammothDB Table call_center
Loading MammothDB Table catalog_page
Loading MammothDB Table catalog_returns
Loading MammothDB Table catalog_sales
Loading MammothDB Table customer_address
Loading MammothDB Table customer_demographics
Loading MammothDB Table customer
Loading MammothDB Table date_dim
Loading MammothDB Table household_demographics
Loading MammothDB Table income_band
Loading MammothDB Table inventory
Loading MammothDB Table item
Loading MammothDB Table promotion
Loading MammothDB Table reason
Loading MammothDB Table ship_mode
Loading MammothDB Table store_returns
Loading MammothDB Table store_sales
Loading MammothDB Table store
Loading MammothDB Table time_dim
Loading MammothDB Table warehouse
Loading MammothDB Table web_page
Loading MammothDB Table web_returns
Loading MammothDB Table web_sales
Loading MammothDB Table web_site
2019-01-24 14:14:50 (9326) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9298) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9310) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9294) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9318) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9328) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9288) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9312) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9308) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9306) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9304) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9320) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9296) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9332) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9324) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9316) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9314) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9322) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9292) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9290) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:50 (9334) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:51 (9300) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:51 (9302) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:51 (9328) INFO : For table tpcds_1000.web_page: 3000 rows processed and 3000 rows inserted.
2019-01-24 14:14:51 (9328) INFO : Bulk load completed, total run time : 1.25263 seconds
2019-01-24 14:14:51 (9326) INFO : For table tpcds_1000.warehouse: 20 rows processed and 20 rows inserted.
2019-01-24 14:14:51 (9326) INFO : Bulk load completed, total run time : 1.26604 seconds
2019-01-24 14:14:51 (9330) INFO : Running distributed import (mode 1) on all PMs...
2019-01-24 14:14:51 (9304) INFO : For table tpcds_1000.household_demographics: 7200 rows processed and 7200 rows inserted.
2019-01-24 14:14:51 (9304) INFO : Bulk load completed, total run time : 1.53098 seconds
2019-01-24 14:14:52 (9314) INFO : For table tpcds_1000.reason: 65 rows processed and 65 rows inserted.
2019-01-24 14:14:52 (9314) INFO : Bulk load completed, total run time : 2.45287 seconds
2019-01-24 14:14:52 (9322) INFO : For table tpcds_1000.store: 1002 rows processed and 1002 rows inserted.
2019-01-24 14:14:52 (9322) INFO : Bulk load completed, total run time : 2.70559 seconds
2019-01-24 14:14:52 (9306) INFO : For table tpcds_1000.income_band: 20 rows processed and 20 rows inserted.
2019-01-24 14:14:52 (9306) INFO : Bulk load completed, total run time : 2.78959 seconds
2019-01-24 14:14:53 (9316) INFO : For table tpcds_1000.ship_mode: 20 rows processed and 20 rows inserted.
2019-01-24 14:14:53 (9316) INFO : Bulk load completed, total run time : 3.17326 seconds
2019-01-24 14:14:53 (9290) INFO : For table tpcds_1000.catalog_page: 30000 rows processed and 30000 rows inserted.
2019-01-24 14:14:53 (9290) INFO : Bulk load completed, total run time : 3.37585 seconds
2019-01-24 14:14:53 (9288) INFO : For table tpcds_1000.call_center: 42 rows processed and 42 rows inserted.
2019-01-24 14:14:53 (9288) INFO : Bulk load completed, total run time : 3.42519 seconds
2019-01-24 14:14:53 (9312) INFO : For table tpcds_1000.promotion: 1500 rows processed and 1500 rows inserted.
2019-01-24 14:14:53 (9312) INFO : Bulk load completed, total run time : 3.53151 seconds
2019-01-24 14:15:22 (9324) INFO : For table tpcds_1000.time_dim: 86400 rows processed and 86400 rows inserted.
2019-01-24 14:15:22 (9324) INFO : Bulk load completed, total run time : 32.4227 seconds
2019-01-24 14:15:37 (9298) INFO : For table tpcds_1000.customer_demographics: 1920800 rows processed and 1920800 rows inserted.
2019-01-24 14:15:37 (9298) INFO : Bulk load completed, total run time : 47.3799 seconds
2019-01-24 14:15:50 (9302) INFO : For table tpcds_1000.date_dim: 73049 rows processed and 73049 rows inserted.
2019-01-24 14:15:50 (9302) INFO : Bulk load completed, total run time : 60.5715 seconds
2019-01-24 14:16:26 (9334) INFO : For table tpcds_1000.web_site: 54 rows processed and 54 rows inserted.
2019-01-24 14:16:26 (9334) INFO : Bulk load completed, total run time : 96.7708 seconds
2019-01-24 14:16:39 (9310) INFO : For table tpcds_1000.item: 300000 rows processed and 300000 rows inserted.
2019-01-24 14:16:39 (9310) INFO : Bulk load completed, total run time : 109.856 seconds
2019-01-24 14:19:03 (9296) INFO : For table tpcds_1000.customer_address: 6000000 rows processed and 6000000 rows inserted.
2019-01-24 14:19:03 (9296) INFO : Bulk load completed, total run time : 253.643 seconds
2019-01-24 14:24:12 (9300) INFO : For table tpcds_1000.customer: 12000000 rows processed and 12000000 rows inserted.
2019-01-24 14:24:12 (9300) INFO : Bulk load completed, total run time : 562.465 seconds
2019-01-24 14:28:52 (9330) INFO : For table tpcds_1000.web_returns: 71997522 rows processed and 71997522 rows inserted.
2019-01-24 14:28:52 (9330) INFO : Bulk load completed, total run time : 842.168 seconds
2019-01-24 14:39:33 (9292) INFO : For table tpcds_1000.catalog_returns: 143996756 rows processed and 143996756 rows inserted.
2019-01-24 14:39:33 (9292) INFO : Bulk load completed, total run time : 1483.28 seconds
InetStreamSocket::readToMagic(): I/O error1: rc-1; poll signal interrupt ( POLLHUP POLLERR )
fClnt read error on 1
2019-01-24 14:43:02 (9332) ERR  : Received a Cpimport Failure from PM1
2019-01-24 14:43:02 (9332) INFO : Please verify error log files in PM1
InetStreamSocket::readToMagic(): I/O error1: rc-1; poll signal interrupt ( POLLHUP POLLERR )
fClnt read error on 1
2019-01-24 14:43:02 (9318) ERR  : Received a Cpimport Failure from PM1
2019-01-24 14:43:02 (9318) INFO : Please verify error log files in PM1
InetStreamSocket::readToMagic(): I/O error1: rc-1; poll signal interrupt ( POLLHUP POLLERR )
fClnt read error on 1
2019-01-24 14:43:02 (9294) ERR  : Received a Cpimport Failure from PM1
2019-01-24 14:43:02 (9294) INFO : Please verify error log files in PM1
InetStreamSocket::readToMagic(): I/O error1: rc-1; poll signal interrupt ( POLLHUP POLLERR )
fClnt read error on 1
2019-01-24 14:43:02 (9308) ERR  : Received a Cpimport Failure from PM1
2019-01-24 14:43:02 (9308) INFO : Please verify error log files in PM1
InetStreamSocket::readToMagic(): I/O error1: rc-1; poll signal interrupt ( POLLHUP POLLERR )
fClnt read error on 1
2019-01-24 14:43:02 (9320) ERR  : Received a Cpimport Failure from PM1
2019-01-24 14:43:02 (9320) INFO : Please verify error log files in PM1
2019-01-24 14:43:02 (9318) INFO : Trying to reconnect and rollback
2019-01-24 14:43:02 (9294) INFO : Trying to reconnect and rollback
2019-01-24 14:43:02 (9332) INFO : Trying to reconnect and rollback
2019-01-24 14:43:02 (9308) INFO : Trying to reconnect and rollback
2019-01-24 14:43:03 (9320) INFO : Trying to reconnect and rollback
Could not connect to pm1_WriteEngineServer: Connection refused
Unable to connect to PM1; Trying again...
Could not connect to pm1_WriteEngineServer: Connection refused
Unable to connect to PM1; Trying again...
 
 

Jan 24 14:42:06 pm1 ServerMonitor[21350]: 05.782719 |0|0|0| C 09 CAL0000: Swap Space usage over Major threashold, perform OAM command restartSystem
Jan 24 14:43:15 pm1 controllernode[13071]: 15.561663 |0|0|0| E 29 CAL0000: DBRM: error: SessionManager::getSystemState() failed (network)
Jan 24 14:43:16 pm1 ProcessMonitor[13016]: 16.547842 |0|0|0| E 18 CAL0000: statusControl: EXCEPTION ERROR on getSystemConfig: std::bad_alloc

 



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2019-01-24 ]

Looks like something crashed. Support report available?

Comment by Zdravelina Sokolovska (Inactive) [ 2019-01-29 ]

the problem appears also after executing > select calFlushCache();
and running cpimport Bulk load .
it's seems that issue it's deriving from the PrimProc process not deliberating memory after query execution

below the memory allocation before recreating the problem

top - 10:39:06 up 1 day,  9:02,  1 user,  load average: 0.04, 0.06, 0.05
Tasks: 187 total,   1 running, 186 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.9 us,  0.5 sy,  0.0 ni, 98.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 61848092 total, 16167556 free, 45340432 used,   340104 buff/cache
KiB Swap:  1048572 total,  1048572 free,        0 used. 15998124 avail Mem
 
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
11250 root      19  -1   44.1g  42.6g  12552 S   0.0 72.2   1150:54 PrimProc
 7290 root      20   0  358192  29144   7032 S   0.0  0.0   0:00.74 firewalld
11138 root      20   0  535412  21228  15752 S   0.0  0.0   0:03.29 workernode
 7942 root      20   0 1626380  19128  11684 S   1.0  0.0  15:15.76 ProcMon
 7773 root      20   0  573820  17272   6128 S   0.0  0.0   0:12.71 tuned
11277 root      20   0  576452  16440  10048 S   0.0  0.0   0:00.21 WriteEngineServ
 8050 root      20   0 1336956  16028   9060 S   3.3  0.0  36:34.95 ProcMgr
11108 root      20   0  454848  15904   8600 S   0.0  0.0   1:15.12 ServerMonitor
 7235 polkitd   20   0  615060  13976   4944 S   0.0  0.0   0:00.09 polkitd

Comment by Todd Stoffel (Inactive) [ 2023-03-06 ]

This ticket was created prior to convergence with the server and may be obsolete. If you find this issue still exists in a modern version, please open a new ticket.

Generated at Thu Feb 08 02:33:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.