[MCOL-1368] join on table with different charset performance Created: 2018-04-25 Updated: 2021-01-16 Resolved: 2021-01-16 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | N/A |
| Affects Version/s: | 1.1.4 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Minor |
| Reporter: | patrice | Assignee: | Unassigned |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Description |
|
hi, I discovered that joining table with different charset (latin1 and utf8) on (varchar to be verified that it is not also on numeric ) result in a very poor performance and erratic behavior. If using with statement and stays on the UM, the UM process is allocating memory until no more (128G) ,if pushed down to the pm , memory stay stable but the query is very slow, insert select into is not finishing after more than 1hour, whereas when the table are on the same charset, this same operation take 5s. I verified both way all table in utf8 and all table in latin1, it works even though latin1 seems faster. |
| Comments |
| Comment by Andrew Hutchings (Inactive) [ 2018-04-25 ] | |||
|
Can you please give an example of what you are doing in both the fast and slow cases so that we can reproduce it here? | |||
| Comment by patrice [ 2018-04-25 ] | |||
|
The whole thing is bot suited for here, I can do a videoconf to demonstrate it. Not a sure a minimal example will show the behavior. | |||
| Comment by patrice [ 2018-04-25 ] | |||
|
Here is the code, even with such a low number of row, the trace show the difference, Not that without the join on the integer has better performance, the degradation is when both condition are on the join : drop table toto_u; CREATE PROCEDURE `insert_it`(p1 int) call insert_it(200); #5 times insert into toto_l select calSetTrace(1);
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- select * from toto_u
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- select * from toto_l join feed on (feed.b=toto_l.b); MariaDB [sc]> select calgettrace();
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |||
| Comment by Andrew Hutchings (Inactive) [ 2018-04-26 ] | |||
|
It should probably go without saying, but joins on dictionary columns in ColumnStore is generally a bad idea, also mixing character sets in an equality match is a bad idea in ColumnStore. That said I don't have an instant explanation as to why that particular query would be slower than the last query. The thing that may give us more information before we have to dive into profiling tools is doing calgettrace(1) instead which will give a little more breakdown of what was executed. | |||
| Comment by patrice [ 2018-04-26 ] | |||
|
1) Sometimes this can't be avoided, varchar are important in our case. The saying cannot be "do not use long varchar, it is compatible but not really optimized...", should it not be "can we have a better performance/support for those". Are those joins creating a joint-dictionnary or of a double call comparison for all rows ? 2) the charset mix was more by inadvertence than a real design decision, it should work even if less than optimal, maybe a warning should be given ? 3) It is not only about slowness, it is also about memory "leak" or consumption out of control. there should be no reason that different charset result in an very large memory allocation, and that it seems to only happen with both condition on int and varchar. It should be reproducible from the example I gave. |