[MCOL-383] R Programming In-Database: MariaDB ColumnStore ENGINE Created: 2016-10-31 Updated: 2022-11-05 Resolved: 2022-11-05 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | N/A |
| Affects Version/s: | 1.0.4 |
| Fix Version/s: | Icebox |
| Type: | New Feature | Priority: | Major |
| Reporter: | Juan Telleria | Assignee: | Todd Stoffel (Inactive) |
| Resolution: | Won't Do | Votes: | 1 |
| Labels: | None | ||
| Environment: |
Windows, Linux. |
||
| Attachments: |
|
| Description |
|
I would like to suggest that MariaDB implements into its new Engine "ColumnStore" Built-In Database R Programming (Based on Microsoft Open R), similar to the one that |
| Comments |
| Comment by David Thompson (Inactive) [ 2016-10-31 ] |
|
Some details on sql servers offering: https://msdn.microsoft.com/en-us/library/mt604845.aspx Currently use of R requires a separate installation and use of the RMySQL package. |
| Comment by Juan Telleria [ 2017-03-04 ] |
|
¿Any news on this topic? I personally think that future of Databases goes somehow in this direction. In fact, SQL Server 2016 was considered to be one of the best database from 2016 (http://db-engines.com/en/blog_post/67), and one of the features it included was this. The point is to avoid having to move big amounts of data from MariaDB to Open R (With RStudio GUI), and instead analyze it in the database itself. |
| Comment by David Thompson (Inactive) [ 2017-03-04 ] |
|
Today you could deploy a single larger server system running both columnstore and R which would help reduce network round traffic time. To do this you would need to tune columnstores memory settings down to leave adequate memory for R. Going forward we are looking into adding lower level api's to the PM level for higher performance direct access. We are prioritizing the write path first but the intention of the read api would be to support higher performance access (but with much simpler query capabilities) from systems like R or Spark. We also plan to add a native regression function and support for custom aggregate functions that would allow you to develop other solutions. Probably the main benefit of the microsoft r / sql server support is the scale-r libraries which i believe are proprietary not open source. Do you have a specific use case in mind that does not currently work with columnstore and R installed either remotely or on a shared server? |
| Comment by Juan Telleria [ 2017-03-04 ] |
|
I did no thought of installing R and Columstore on the same server and adequate memory. But It's a brilliant idea! By doing so network traffic for data transmission can be avoided. In fact, it is possible to use RStudio Server (Open Source) to access directly to a remote R server connection. I think it might work for my purpose of alarm log analysis. I'll give it a try! Thank you |
| Comment by David Thompson (Inactive) [ 2017-03-05 ] |
|
Let me know how it goes! As a reminder here are the memory settings to play with to reduce the amount of memory columnstore will consume to ensure you have enough for R: |
| Comment by Juan Telleria [ 2017-05-31 ] |
|
My review on the topic: I installed MariaDB and "Microsft R Open" in the same machine, in order to perform Big Data Analytics, and what I can say about it is that:
Therefore, my recommendation are:
And some solutions could be:
And a disruptive innovation could be:
|
| Comment by David Thompson (Inactive) [ 2017-05-31 ] |
|
Thanks for your research. Would you be ok if i converted this to a knowledge base article? It looks like the RMySQL community is fairly active so if you have some specific test cases you can file with them maybe we can make it bettter? |
| Comment by David Thompson (Inactive) [ 2017-05-31 ] |
|
I checked with some developers internally and one suggestion was to try increasing the mariadb server variable net_write_timeout. The default is 60s but potentially larger queries could be inactive for longer. Also it's possible RMySQL has its own timeout but i can't find any reference to such a setting yet. |
| Comment by Juan Telleria [ 2017-06-02 ] |
|
Yes, but I would prefer to prepare a Word document and pass it to you through here so that you can check it before releasing anything in MariaDB Knowledge Base. I learned how to program in R recently by means of 2 excellent books concept-oriented:
Also, my main development GUI is RStudio. And in my personal PC, I use the general distribution of R (From R Project Foundation), but for a Production Environment I use "Microsoft R Open", which uses a fixed snapshot of CRAN. The best webpage for searching R related information is: And explanations on how to use RODBC can be found here: Kind regards, |
| Comment by Juan Telleria [ 2017-06-02 ] |
|
If I worked in MariaDB, I would also recompile RMySQL from scrach, and:
Thank you. |
| Comment by Juan Telleria [ 2017-06-05 ] |
|
There is a very interesting github project for using R formulas in Excel, which could also be extended to MariaDB: https://bert-toolkit.com/bert-quick-start Although I have not used it and I do not know how well it works. |
| Comment by Juan Telleria [ 2017-08-09 ] |
|
Just discovered a package in R (Based on RODBC) which allows IBM in-Database Analytics for R (ibmdbR): Porting such package to MariaDB is extremely simple, as it is all Open Source (GPL-v3). This would be really useful for performing Machine Learning with R using MariaDB as the background database. |
| Comment by Juan Telleria [ 2017-08-13 ] |
|
Dear Mr. Thompson, I attach to this issue, as you requested some time ago, a Word document containing some basic notions on "How-To" start using R Statistical Programming along with MariaDB. If appropriate, this document could be added to the MariaDB Knowledge Base. Although simple, hope it is useful. Kind regards, R Statistical Programming using MariaDB as the background database.docx |
| Comment by Juan Telleria [ 2017-08-14 ] |
|
Just added the article to the MariaDB Knowledge Base: R Statistical Programming using MariaDB as the background database Kind regards, |
| Comment by David Thompson (Inactive) [ 2017-08-15 ] |
|
Awesome - many thanks and looks great! I'll hopefully have some time to run through an example later this week. |
| Comment by Juan Telleria [ 2019-06-09 ] |
|
Current Microsoft Implementation I think is no adequate for a Database Level. I feel that the integration between R with MariaDB shall be understood only as Scheduled Events, or Triggers, that are able to call from within MariaDB Server certain RMarkdown Documents or RScripts. You could directly contact RStudio Company for collaborating with them in this sense. |
| Comment by Todd Stoffel (Inactive) [ 2022-11-05 ] |
|
Item is out of date. Closing due to inactivity. If you feel this was done in error please open a new ticket. |