[MDEV-6096] Ideas about parallel query execution Created: 2014-04-14 Updated: 2023-03-03 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | Sergei Petrunia | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 12 |
| Labels: | optimizer | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Description |
|
Some ideas about using multiple threads to run a query. == Position at N% of table/index ==
If we want to run these with N threads, we need to give 1/Nth of table to each thread. (An alternative is to run one "reader" thread and distribute work to multiple compute threads. The problem with this is that reading from the table won't be parallel. This will put a cap on the performance.) In order to do that, we will need storage engine calls that do
these calls would also let us build equi-height histograms based on sampling. == General execution == == Evaluation == |
| Comments |
| Comment by Sergei Petrunia [ 2014-05-14 ] |
|
Shard-query is able to run a query on multiple CPUs if the queried tables are partitioned. There is some data about how much this brings: http://www.mysqlperformanceblog.com/2014/05/01/parallel-query-mysql-shard-query/ (this is probably not the only piece of data). Shard-query has been around for a long time, but didn't get much momentum for some reason. (An "obvious" technical explanation is that people don't want to partition their tables. But why wouldn't they?) |
| Comment by roberto spadim [ 2014-07-30 ] |
|
spider engine can execute parallel query? the "shard feature" |
| Comment by Sergei Petrunia [ 2014-08-02 ] |
|
Yes, it can. It has some limitations, though:
On the other hand, it would be stupid to re-implement features that are already implemented in Spider. |
| Comment by roberto spadim [ 2014-08-02 ] |
|
nice, any idea how to "throttle" if query will execute with many threads or not? |
| Comment by Sergei Petrunia [ 2014-10-16 ] |
|
Parallel query execution. Splitting the query across multiple threads requires R1. Enlisting the worker threads == R1. Enlisting the worker threads ==
== R2. Dividing the work into work units == We need to split the work into work units somehow. The most basic task is to == R3. Executing work units in parallel == When using InnoDB, there is a problem with doing reads from multiple threads: == R4. Collecting the results of work back together == Need a multiple-producer single consumer queue. |
| Comment by roberto spadim [ 2014-10-16 ] |
|
One question, how optimizer will know if a query should/shouldn't execute in parallel? |
| Comment by Sergei Petrunia [ 2014-10-16 ] |
|
I think, parallel execution should only be attempted when the join is sufficiently big. That is, we run the optimizer, then look at the query plan, and if it is expected to enumerate sufficiently many rows, we try parallel execution. |
| Comment by VAROQUI Stephane [ 2014-10-16 ] |
|
A temporary Spider table can be created partitioned to the same local table . if you do a range create x partition by range where x is number of thread. The query is run by replacing the original table by the spider table et voila |
| Comment by VAROQUI Stephane [ 2014-10-17 ] |
|
If you wan't this to work we first need merge of the // partition scan plugin from Spiral + Additional patches, but using the same idea we can define each partitions on a list of slaves to get // queries to a cluster of nodes SESSION variable or SQL HINT like SELECT SQL_LOCAL_ANALYTIC if run on slaves SQL_SLAVES_ANALYTIC For this to work we need partition range condition push down added to spider, that a count |
| Comment by roberto spadim [ 2014-10-17 ] |
|
humm, maybe with UNION selects too? in other words: union/union all, many rows, partitions?, sql hints/optimizer variable |
| Comment by Federico Razzoli [ 2014-10-17 ] |
|
Wouldn't this be useful for MRR queries, when many rows are expected to be read? |
| Comment by roberto spadim [ 2014-10-18 ] |
|
Here a example of a query that should (and shouldn't?) be executed in parallel: how to know if we should execute in parallel or not (what the condition) when we have a union+limit, with possible fast queries + possible biggers/slow queries? should we divide the union in many threads and start all queries? if the first one got the total limit result we should stop all queries and discart rows from others queries? in this case if we execute fast queries AND slow queries at same time, maybe the fast query could be slower (cause we have more iops at disks, that's something that we could better tune), it's something a bit complex to select if we should/shouldn't execute in parallel, isn't? in any case, a parallel execution (even with a optimizer switch) is a nice feature to have |
| Comment by Sergei Petrunia [ 2014-10-22 ] |
|
| Comment by Sergei Petrunia [ 2014-10-22 ] |
Results of discussion with igor: code-wise, before getting something to run, we need to learn:
|
| Comment by roberto spadim [ 2014-10-22 ] |
|
Hi Sergei Petrunia! yeap, but for example: 2)" if the first one got the total limit result we should stop all queries and discart rows from others queries" - I think, it is fairly rare that a query needs just LIMIT n rows, no matter which rows (please share your experience.. do you agree with this?). some times... it's a aplication problem, but today what guys do is: instead of a big (select) UNION (select) union (select) limit 1234, at client side it execute the first query, if total rows<1234, execute the second, if total rows<1234 .... could be nice executing it at server side, with a option to execute the 'faster' (more optimized, lower cost) query first 2.1)Typically it is "ORDER BY ... LIMIT n", where the first N rows are needed, according to some criteria. yes, i agree, the criteria normally is at mysql user, not mysql database 2.2)In that case, getting N rows from one part of the union still means you need to get N rows from the other part and then pick the N rows with the least order by criteria. yeap, that's the 'standard' way to solve this query, but we could execute UNION in a 'optimized' order, first queries with lower cost, and after with higher cost, here an optimizer switch or something like it must be exposed to server/client/optimizer to change union order 2.3) sorry many post editions, i forgot to explain why union and execution order is important an example: (select where some_index LIKE "%some thing%") UNION (select where full text match something) LIMIT 10 what is the more expensive query? i think only optimizer/server side could report, and yes i don't care about order, maybe i could rewrite as (select where some_index LIKE "%some thing%") UNION (select where full text match something) ORDER BY NULL LIMIT 10 or something like it just to explain that this query don't need a order at "union" execution — |
| Comment by VAROQUI Stephane [ 2014-10-22 ] |
|
In my experience 99,999% of workload are using order by limit and many have issues with such queries, most client stop using RDBMS for processing that , they are now using external indexers sphinx, solr, elastic search not for full text but for this. This is the hart of every back office presenting the data in a single page based on users input regarding filter and sorting. Partitioning is used yes, for most big tables, because BTree stop performing when the size of a table stop being in memory. I would say that many workload does not use covering index , and so all secondary index reference the clustered table, making such tables hot most of the time. There is more reasons to use parallel plan , Most storage engine does not have an I/O materialization layer for read , every read from secondary index to a clustered table will be queued in a single execution thread , so on a SAN or a networked FS , 5 ms larency will be added to every page missed , the total latency of such queries can be divide by the concurrency as long have we don't ask more than the total IO read capacity . Prefetching is only used when scanning index or clustered index but not when using both at the same table , or joins to secondary index without MRR. so we have added MRR but InnoDB does exactly the same as before because having this 2 level index scan , from secondary to primary index Now spider partitioning can be created on the flight with a temporary table. to create a virtual partitioning based on the QP . What need to be added is partition condition push down to every node in the reduce thread. Like this every thread can work on different range of the same table by heritage of the range conditions |
| Comment by roberto spadim [ 2014-10-22 ] |
|
1) i don't like the idea "multi thread only work if you partition your data", or "create a table to use multi thread" too, maybe implicit optimizations could be done, and at documentation we could explain others methods 2) i think this MDEV will solve problems like: to "get" data (a single SELECT): to "order by", "group by", "limit", "union" (after fetching data, or how to execute many selects) I don't know how to easily explain/extrapolate to user/optimizer these statistics, and limit the number of threads that could be created in each part of query (fetch, order, group, limit, union, etc...) 3)about the workload at order by/limit: i agree, some read/write contention when executing a order by is a killer with some apps, but i don't know if we can remove this or not, my today work around is reduce lock/contention with commit before ordering, and ordering after with others engines or others tools i really like the idea of "please database order by the fast way you can, i don't care about ordering result be the best one just order it with a low response time" |
| Comment by VAROQUI Stephane [ 2014-10-23 ] |
|
From what i learn order by is not such a costly operation, it's a creation of an index on the results, cost many CPU cycles without much latency between cycles and innodb can do this this in background threads by merging insert buffer. If you don't have the memory you are dead anyway and parallel query will not help. Also the order by limit can be optimized by LRU keeping the best order from all streams, so attention should be put with fetching data in parallel, and the granularity of such fetches, if you take for granted that a single core can fetch 600K to 1M records/sec from memory, if i have enough memory this is still 20 minutes for SUM(C1) on a billion records table. With a 64 cores machine and enough memory i would like to get this result in few seconds and if i have 20 slaves i would like answer in less than a second, instead of trying to make all type of operation multi threaded , it's more easy to partition a big baby table. The optimizer should be adapted for range queries, historically optimizer will compute the plan that produce less cost without knowing about Disk/Memory io distributions, for most OLAP queries this does not work and the best plan is alway going from the big table to small tables . Materialization of small tables should be done to apply filters to organize hash or mrr join as small as possible. |
| Comment by roberto spadim [ 2014-10-23 ] |
|
i agree and see that considering i/o scheduler doing a good job, the multi thread create more iops than single thread, and considering a shard/cluster table (engine) create more "workers" than single thread, no doubt about it my doubt still as, how many threads should be used in each part of query? how to optimize this number? should we execute every part as multithread or we should consider some part as single thread? there's some guys at arm world talking about 65k cores in only one machine, at intell x86-64 we talk about 128 cores, i don't know if we should consider only the size of data, but the cpu power is something that will have some "complex" scenarios in future like many cores with small "computer power" (arm) or less cores with a big "computer power" (xeon x86-64) |
| Comment by Federico Razzoli [ 2015-06-06 ] |
|
I totally agree with Varouqi about the need for hash joins - this is probably the most important limitation that make Maria not viable for big data warehouses. Any realistic chances to see this in 10.2? |
| Comment by Larry Adams [ 2022-05-05 ] |
|
It would be nice if we could get the ShardQuery concept embedded in MariaDB. For example, if I need to run a GROUP BY across 20 partitions or 20 tables, it would be nice if that was at least 20 parallel queries in using a Map Reduce algorithm. I keep telling myself that I'm just going to have to write this myself. I looked a Shawn's ShardQuery, and though it's great, it places too many dependencies onto any system deployment. MariaDB should just be doing what Shawn was doing in his ShardQuery. Could it be that there are patent implications preventing it's broader adoption in MariaDB? Column Store, though interesting, likely is not the right solution in all cases. |
| Comment by VAROQUI Stephane [ 2022-05-06 ] |
|
Hi Larry MariaDB offer already Spider for this just create a server object for each partition or modulo the number of core to use did you try it ? |
| Comment by VAROQUI Stephane [ 2022-05-06 ] |
|
If look closely, that's one direction core dev of MariaDB follow while putting a majority of ressources at staying the best for OLTP( NVme ,PMem, CPU architecture contention, for InnoDB and thread pool ) at the same time they have enable query fragment push down to storage engine and a global threading service so that storage engine does not have to worry about background thread allocation. This enable storage engines like ColomnStore or Xpand to be plugged into MariaDB and no more being a Fork, like it was forced in the past. So not yet perfect but it looks to be moving in one architecture direction and Larry dream of coding something could be done by taking the map and reduce step of an SQL fragment. There are some particular issues that have been fixed regarding XA that make me think that reentrant transactions may arrive one day and this is required for consistency of concurrent execution , may be Sergei or Monty could share their view on the roadmap on this topic but i guess so many important other features have been needed like JSON support, CTE, external foreign key and statististics , S3 storage , non Blocking DDL, replication lag free DDL that we can't blame anyone for moving slow into that direction. |
| Comment by Larry Adams [ 2022-05-06 ] |
|
Okay, coming up to speed on Spider. Got a few more questions. 1) I see Map Reduce by GROUP BY, ORDER & LIMIT, and aggregation functions which is great. Thanks!! |
| Comment by Larry Adams [ 2022-05-06 ] |
|
Oh, I see a problem, I'm just trying to query these tables, not insert into them. If the tables already exist, and I just want a virtual tables that queries from them in a Read Only Fashion, am I going to run into a problem? Let me apologize in advance for my ignorance. This is why I was speaking in terms of a temporary table. It's more like a shard enabled view. |
| Comment by Larry Adams [ 2022-05-06 ] |
|
Okay, testing now. So, let's say I have to create a table with 50 partitions, and I want to use 10 connections per server to 5 servers. How do you do that? The way it's working for me, is that each query is handled serially, so my speedup with 50 partitions would only be about 5x even though I've got 120 threads per server and NVMe everywhere. How do I get Spider to hit each server with 10 concurrent threads? Oh, and I faked out a partition creation by using "partition by hash" to get it to work, but ended up crashing the database server when I tried to "alter table add partition". The underlying tables were not hashed. 10.8.x. Sorry for polluting this thread. I'll blow things away afterwards. |
| Comment by Rick James [ 2022-05-07 ] |
|
I suggest the following plan. 1. Put something simple in. This lets us say that "parallel execution" does exist. (Marketing!) Sorry for being cynical. The "simple" implementation might limit itself to
I would be happy to add another reason for using Partitioning to my very short list. (Oops, I am being cynical again.) |
| Comment by Federico Razzoli [ 2022-05-07 ] |
|
Agree with Rick, but I'm a bit afraid of performance regressions. Please support a way to avoid parallelisation for a specific query or globally. |
| Comment by Larry Adams [ 2022-05-07 ] |
|
Okay, I setup two shard servers and created a faux partitioned table with even more faux hash and a few millions of record of real data, and then ran the most simplest if group by queries with an aggregate function, and then proceeded to observe zero parallelism. Everything ran serially, the query ran like a simple union hitting one table at a time. My use case is biased towards select. These are read only tables. So, disappointed. I agree with Rick, add a table to system option to define the level of parallelism and it should work as I described, give me a nearly 50x speedup for reads to a read only set of tables. As far as I am concerned, make it a read option only, and give us a flag to mark the table as read only if that helps. If this technique is to work, we also need a dummy partitioning algo, yes and I know that is sacreligious. The workaround would be to add a useless sequence column to every table, but now I'm quickly entering the hacker zone. It's a write vs buy decision at this point as I refuse to give up on mariadb. If Shawn can do it using gearman and PHP, I can too. I simply don't want to go that route. |
| Comment by VAROQUI Stephane [ 2022-05-08 ] |
|
For spider to work wit concurrency you need to tell him if when servers on partition point to localhost and about to query in concurrency please refer to documentation or test cases for this |
| Comment by Rick James [ 2022-05-08 ] |
|
I picture Spider as being mainly used to access separate, potentially remote, servers. I have trouble imagining that Spider provides any efficiency when using it to run multiple threads on a single server. I expected this Mdev thread to discuss how to coordinate and optimize the multi-threading of a single SQL statement running on a single server. Am I wrong about Spider? Does it have special hooks to realize that the targets happen to be the same server, and take advantage of such? |
| Comment by VAROQUI Stephane [ 2022-05-09 ] |
|
Yes i used it many years ago and without using udf for this for concurrency it happen at connection level so you need to declare different servers that point to same backend , if the backend is the same as the spider you also need to add spider_same_server_link. |
| Comment by VAROQUI Stephane [ 2022-05-09 ] |
|
I'm sharing some parameters that are of. interest for that case Those are in my favorites but i have no more clue if they play a role on this Intrested to get any feadback if they do . |
| Comment by VAROQUI Stephane [ 2022-05-09 ] |
|
When Kentoku was onboarded he was supposed to work on SQL rewrite plugin that would have enable easy SQL syntax integration of such concurrent execution |
| Comment by VAROQUI Stephane [ 2022-05-09 ] |
|
The first integration of this feature was possible on queries using a single table, but a following work was made that if other tables of in the query are federated to the same backend as the partition then it possible to used parallel partition scan as well. |
| Comment by Larry Adams [ 2022-05-10 ] |
|
Stephane, I've been going through the documentation, I don't think there is a real good write-up on how to run Spider queries in a true parallel map reduce fashion (or force it to). I'll do more investigation in my home lab. Thus far, I have distributed a years worth of tables (365) to 4 servers that are all separate from the server that will solve as the Spider server. So, I've got a good setup. When I did my first test in a customer environment, as mentioned, it was pure serial one server and one partition at a time (disappointing). But that environment used the spider server as a data server as well, so not sure if that had any impact. What I'm really trying to do I will attach as a screen grab. My plan will be to: 1) Setup 20 servers to 4 back-ends Larry |
| Comment by Larry Adams [ 2022-05-11 ] |
|
Stephane, With all those settings, zero performance difference. Each partition is handled serially. There is this document where Shiba-san was claiming that he had a patch that added parallelization and he provided no instructions in the write-up as to how to leverage it. I'm not sure what ever became of that though. https://blog.mariadb.org/wp-content/uploads/2014/05/Spider_in_MariaDB_20140403.pdf Larry |