|
Looking at the stack trace, it seems like it is not hanging, but rather executing a query:
Thread 3 (Thread 0x7f0351148700 (LWP 29644)):
|
#0 JOIN_CACHE::read_record_field (this=this@entry=0x7f02da9ae908, copy=copy@entry=0x7f02e5bfd5b0, blob_in_rec_buff=blob_in_rec_buff@entry=false) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_join_cache.cc:1863
|
#1 0x00007f036a040c52 in JOIN_CACHE::read_all_record_fields (this=this@entry=0x7f02da9ae908) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_join_cache.cc:1742
|
#2 0x00007f036a040ce7 in JOIN_CACHE::get_record (this=0x7f02da9ae908) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_join_cache.cc:1609
|
#3 0x00007f036a04177e in JOIN_CACHE::join_matching_records (this=0x7f02da9ae908, skip_last=false) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_join_cache.cc:2291
|
#4 0x00007f036a04120a in JOIN_CACHE::join_records (this=0x7f02da9ae908, skip_last=<optimized out>) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_join_cache.cc:2087
|
#5 0x00007f0369f8326b in evaluate_join_record (join=join@entry=0x7f02da19a630, join_tab=join_tab@entry=0x7f02e34cd2f8, error=error@entry=0) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:18874
|
#6 0x00007f0369f8a773 in sub_select (join=0x7f02da19a630, join_tab=0x7f02e34cd2f8, end_of_records=<optimized out>) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:18693
|
#7 0x00007f036a04fff7 in join_tab_execution_startup (tab=tab@entry=0x7f02e02fe468) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/opt_subselect.cc:5294
|
#8 0x00007f0369f8a6f6 in sub_select (join=0x7f02da19a630, join_tab=0x7f02e02fe468, end_of_records=<optimized out>) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:18642
|
#9 0x00007f0369f8326b in evaluate_join_record (join=join@entry=0x7f02da19a630, join_tab=join_tab@entry=0x7f02e02fe0b8, error=<optimized out>) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:18874
|
#10 0x00007f0369f8a72e in sub_select (join=0x7f02da19a630, join_tab=0x7f02e02fe0b8, end_of_records=<optimized out>) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:18654
|
#11 0x00007f0369f8326b in evaluate_join_record (join=join@entry=0x7f02da19a630, join_tab=join_tab@entry=0x7f02e02fdd08, error=<optimized out>) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:18874
|
#12 0x00007f0369f8a72e in sub_select (join=0x7f02da19a630, join_tab=0x7f02e02fdd08, end_of_records=<optimized out>) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:18654
|
#13 0x00007f0369f8326b in evaluate_join_record (join=join@entry=0x7f02da19a630, join_tab=join_tab@entry=0x7f02e02fd958, error=<optimized out>) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:18874
|
#14 0x00007f0369f8a72e in sub_select (join=0x7f02da19a630, join_tab=0x7f02e02fd958, end_of_records=<optimized out>) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:18654
|
#15 0x00007f0369f8326b in evaluate_join_record (join=join@entry=0x7f02da19a630, join_tab=join_tab@entry=0x7f02e02fd5a8, error=error@entry=0) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:18874
|
#16 0x00007f0369f8a773 in sub_select (join=0x7f02da19a630, join_tab=0x7f02e02fd5a8, end_of_records=<optimized out>) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:18693
|
#17 0x00007f0369fa8662 in do_select (procedure=<optimized out>, join=0x7f02da19a630) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:18198
|
#18 JOIN::exec_inner (this=this@entry=0x7f02da19a630) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:3530
|
#19 0x00007f0369fa8869 in JOIN::exec (this=this@entry=0x7f02da19a630) at /home/zdravelina.sokolovska/mariadb-10.2.12/sql/sql_select.cc:3325
|
|
|
It could produce an impression of being hung if the query plan is inefficient and so the query examines lots of rows, which takes a lot of time.
|
|
One way to check this : instead of SHOW PROCESSLIST, run select * from information_schema.processlist. It will include EXAMINED_ROWS column. Do several selects from information_schema.processlist and see how many records per second it examines.
|
note2 : the hanging query does no cause the problem ; after killing and rerun only that query is does not stick
This one is interesting, something I dont have a ready explanation for...
|
|
winstone, can you provide details how does one get that 1G data? (I assume it's less than 1G if compressed, if so, can you upload the data dump somewhere? That would be the fastest option)
data even if compressed fails to be attached there
|
|
Another thing to check is SHOW EXPLAIN FOR for the running query.
note2 : the hanging query does no cause the problem ; after killing and rerun only that query is does not stick
This could be a background statistics update in MyRocks, which causes a different query plan. Does this (run query the first time - it hangs, kill it and run it again - it works) happen every time you try it?
|
|
attached information_schema.processlist and show explain ,thist time hanging at query 95 ;
interesting previous time that query finished with correct result :
order count total shipping cost total net profit
73 120440.34 42133.12
but after rerun it remained more than 20h ;
|
|
winstone, thanks for the info!
Reading... so, the query is
with ws_wh as
|
(
|
select
|
ws1.ws_order_number,
|
ws1.ws_warehouse_sk wh1,
|
ws2.ws_warehouse_sk wh2
|
from
|
web_sales ws1,
|
web_sales ws2
|
where
|
ws1.ws_order_number = ws2.ws_order_number
|
and
|
ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk
|
)
|
|
select
|
count(distinct ws_order_number) as "order count"
|
,sum(ws_ext_ship_cost) as "total shipping cost"
|
,sum(ws_net_profit) as "total net profit"
|
from
|
web_sales ws1
|
,date_dim
|
,customer_address
|
,web_site
|
where
|
d_date between '1999-2-01' and
|
ADDDATE(cast('1999-2-01' as date),60)
|
and ws1.ws_ship_date_sk = d_date_sk
|
and ws1.ws_ship_addr_sk = ca_address_sk
|
and ca_state = 'IL'
|
and ws1.ws_web_site_sk = web_site_sk
|
and web_company_name = 'pri'
|
and ws1.ws_order_number in (select ws_order_number
|
from ws_wh)
|
and ws1.ws_order_number in (select wr_order_number
|
from web_returns,ws_wh
|
where wr_order_number = ws_wh.ws_order_number)
|
order by count(distinct ws_order_number)
|
limit 100
|
INFORMATION_SCHEMA.processlist.examined_rows is zero. This is actually expected - I've checked and it turns out that column is only updated by UPDATE and DELETE statements. We are running a SELECT...
|
|
The provided EXPLAIN is:
+------+--------------+------------------+--------+---------------+--------------+---------+-----------------------------+--------+--------------------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+--------------+------------------+--------+---------------+--------------+---------+-----------------------------+--------+--------------------------------------------------------+
|
| 1 | PRIMARY | ws1 | ALL | NULL | NULL | NULL | NULL | 719384 | Using where |
|
| 1 | PRIMARY | customer_address | eq_ref | PRIMARY | PRIMARY | 4 | tpcds_1.ws1.ws_ship_addr_sk | 1 | Using where |
|
| 1 | PRIMARY | date_dim | eq_ref | PRIMARY | PRIMARY | 4 | tpcds_1.ws1.ws_ship_date_sk | 1 | Using where |
|
| 1 | PRIMARY | web_site | eq_ref | PRIMARY | PRIMARY | 4 | tpcds_1.ws1.ws_web_site_sk | 1 | Using where |
|
| 1 | PRIMARY | <subquery3> | eq_ref | distinct_key | distinct_key | 4 | func | 1 | |
|
| 1 | PRIMARY | <subquery4> | eq_ref | distinct_key | distinct_key | 4 | func | 1 | |
|
| 3 | MATERIALIZED | ws1 | ALL | NULL | NULL | NULL | NULL | 719384 | |
|
| 3 | MATERIALIZED | ws2 | ALL | NULL | NULL | NULL | NULL | 719384 | Using where; Using join buffer (flat, BNL join) |
|
| 4 | MATERIALIZED | web_returns | index | NULL | PRIMARY | 8 | NULL | 71763 | Using index |
|
| 4 | MATERIALIZED | ws1 | ALL | NULL | NULL | NULL | NULL | 719384 | Using where; Using join buffer (flat, BNL join) |
|
| 4 | MATERIALIZED | ws2 | ALL | NULL | NULL | NULL | NULL | 719384 | Using where; Using join buffer (incremental, BNL join) |
|
+------+--------------+------------------+--------+---------------+--------------+---------+-----------------------------+--------+--------------------------------------------------------+
|
This query plan is not fast. Take a look at this for example:
| 3 | MATERIALIZED | ws1 | ALL | NULL | NULL | NULL | NULL | 719384 | |
|
| 3 | MATERIALIZED | ws2 | ALL | NULL | NULL | NULL | NULL | 719384 | Using where; Using join buffer (flat, BNL join) |
|
computing a cross-self-join for a table with 700K rows is not going to be fast.
Q1: The query actually has a restriction: ws1.ws_order_number = ws2.ws_order_number. Is there no index on that column?
Q2: So this is a mergeable CTE that's used from the two mergeable semi-join subqueries. Why wasn't it merged, and we see select_id=3 and 4 in the EXPLAIN? (winstone are you using non-standard @@optimizer_switch settings?)
|
|
winstone, more questions:
- if you just run the query manually, does it execute fast or slow?
- in case the answer to this is "fast", can you run {{ANALYZE SELECT .... }} for the query and post the output here?
|
|
Q1:
ws_order_number is defined as primary key
create table web_sales
|
(
|
ws_sold_date_sk integer ,
|
ws_sold_time_sk integer ,
|
ws_ship_date_sk integer ,
|
ws_item_sk integer not null,
|
ws_bill_customer_sk integer ,
|
ws_bill_cdemo_sk integer ,
|
ws_bill_hdemo_sk integer ,
|
ws_bill_addr_sk integer ,
|
ws_ship_customer_sk integer ,
|
ws_ship_cdemo_sk integer ,
|
ws_ship_hdemo_sk integer ,
|
ws_ship_addr_sk integer ,
|
ws_web_page_sk integer ,
|
ws_web_site_sk integer ,
|
ws_ship_mode_sk integer ,
|
ws_warehouse_sk integer ,
|
ws_promo_sk integer ,
|
ws_order_number integer not null,
|
ws_quantity integer ,
|
ws_wholesale_cost decimal(7,2) ,
|
ws_list_price decimal(7,2) ,
|
ws_sales_price decimal(7,2) ,
|
ws_ext_discount_amt decimal(7,2) ,
|
ws_ext_sales_price decimal(7,2) ,
|
ws_ext_wholesale_cost decimal(7,2) ,
|
ws_ext_list_price decimal(7,2) ,
|
ws_ext_tax decimal(7,2) ,
|
ws_coupon_amt decimal(7,2) ,
|
ws_ext_ship_cost decimal(7,2) ,
|
ws_net_paid decimal(7,2) ,
|
ws_net_paid_inc_tax decimal(7,2) ,
|
ws_net_paid_inc_ship decimal(7,2) ,
|
ws_net_paid_inc_ship_tax decimal(7,2) ,
|
ws_net_profit decimal(7,2) ,
|
primary key (ws_item_sk, ws_order_number)
|
);
|
|
Q2: the optimizer_switch is the default one :
select @@optimizer_switch;
index_merge=on
index_merge_union=on
index_merge_sort_union=on
index_merge_intersection=on
index_merge_sort_intersection=off
engine_condition_pushdown=off
index_condition_pushdown=on
derived_merge=on
derived_with_keys=on
firstmatch=on
loosescan=on
materialization=on
in_to_exists=on
semijoin=on
partial_match_rowid_merge=on
partial_match_table_scan=on
subquery_cache=on
mrr=off
mrr_cost_based=off
mrr_sort_keys=off
outer_join_with_cache=on
semijoin_with_cache=on
join_cache_incremental=on
join_cache_hashed=on
join_cache_bka=on
optimize_join_buffer_size=off
table_elimination=on
extended_keys=on
exists_to_in=on
orderby_uses_equalities=on
condition_pushdown_for_derived=on
have not yet exec time from the consecutive run of query 95 ; when run manually query 95 was done for 1 day 6 hours 19 min
------------------------------------------------
| order count |
total shipping cost |
total net profit |
------------------------------------------------
------------------------------------------------
1 row in set (1 day 6 hours 19 min 16.09 sec)
because the query is with 'with' construction analyze with returns error ; will be
analyze select * from (the_query)ff ; useful or there is some other analyze specific for 'with' select ?
|