[MCOL-5] Build for Ubuntu 16.04 Created: 2016-05-02  Updated: 2016-08-23  Resolved: 2016-08-23

Status: Closed
Project: MariaDB ColumnStore
Component/s: Build
Affects Version/s: None
Fix Version/s: 1.0.2

Type: Task Priority: Major
Reporter: Dipti Joshi (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MCOL-70 systemd service support Closed
Sprint: 1.0.2-2

 Comments   
Comment by David Hill (Inactive) [ 2016-05-29 ]

CentOS 6.6 build works install on Ubuntu OS, so the changes to do community builds should allow a Ubuntu build to be done

Comment by David Hill (Inactive) [ 2016-07-20 ]

still working in a private repo called mcol-5.

1. can successfully build a binary package for ubuntu 16 and centos 7 on this repo
2. can successfully install and run regression on U 16 and C 7
3. But its failing to build at this time on centos 6.6 build server. So currently working this

Comment by David Hill (Inactive) [ 2016-08-02 ]

There is a major issue in the ubuntu 16.04 builds. Its failing a lot of the test suites and the main issue to why is the LDI and cpimport mode 1 is failing...

Comment by David Hall (Inactive) [ 2016-08-08 ]

It turns out there are three separate issues involved. All of these could fail on any given OS, so it's been pure luck that they haven't (or have they and we just moved on?)

1) When setting up the cpimport command line for LDI, the command line is built, then parsed into a std::vector<std::string>. Why not put it in the vector to start? Anyway, the address of each string's c_ptr is then stored in another vector to be sent as the exec function's argv. The issue is that the addresses were retrieved in the same loop that was adding the strings to the first vector. When re-allocation was required, all the addresses taken so far were invalidated. Depending on what happened to that memory next determines the behavior. If left unmolested (as is probable in CentOS), then it worked. In Ubuntu, that memory was overwritten and it broke. The values sent in argv had been compromised. To fix, the code was re-written to finish the first vector before taking the addresses. This way re-allocations wouldn't happen after we got them.

2) ExeMgr was crashing. Different versions of the boost library have different implementations to some small degree. In this case, lock was being unlocked twice by typo. The line was supposed to re-lock the lock, but it said unlock. Later, during the destruction of the lock, it asserted that the count wasn't zero. Fixed the typo.

3) PrimProc was crashing during test005. This was during a specific query involving a Join. It could happen for any join, but the timing had to be right. PrimProc takes all the commands it gets from ExeMgr and loads them into various thread pools. The code executing any given thread can decide stuff isn't ready and return -1 to the thread pool, which tells the pool to re-schedule and try this task later. During a join, a lock is maintained where it's locked in one thread an unlocked in another with some nasty stuff trying to keep it right. Anyway, a lock must be unlocked before destroy, or it asserts and aborts. I don't think CentOS does this. When the BATCH_PRIMITIVE_END_JOINER command is received, it often (always?) reschedules. This command must be run before destroy, as it unlocks the lock. Sometime later, a BATCH_PRIMITIVE_DESTROY command arrives and destroys the object containing the lock, and thus the lock itself. Assert, abort. So I added code to the BATCH_PRIMITIVE_DESTROY handler to look for a specific flag – joinDataReceived – which only gets set by BATCH_PRIMITIVE_END_JOINER. If it isn't set, the destroy is rescheduled.

Comment by Dipti Joshi (Inactive) [ 2016-08-16 ]

@dhill Is this fixed now ?

Comment by David Hill (Inactive) [ 2016-08-16 ]

no fix for the mysqld crashing during the test211 run. Both D Hall and I have seen on the centos systems as well..

Comment by Andrew Hutchings (Inactive) [ 2016-08-17 ]

Added a fix for D Hall's fix due to compile failing on CentOS 7 and Ubuntu 16.04: https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/3

Comment by David Hill (Inactive) [ 2016-08-19 ]

Hey andrew, can you review Halls changes since you already know the code with a fix for the compiler error

Comment by Andrew Hutchings (Inactive) [ 2016-08-19 ]

Review done. I've also tested these changes and haven't hit any related problems since.

Moving to Daniel for QA

Comment by Daniel Lee (Inactive) [ 2016-08-23 ]

Have been testing binary files (not deb packages) on 16.04.
There is a separate ticket for generating deb package.

Generated at Thu Feb 08 02:17:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.