Fixpoint

2021-10-31

Muzzling MySQL into Gales shape, part 1: groundwork

Filed under: Gales Linux, JWRD, MySQL, News, Software — Jacob Welsh @ 05:10

MySQL, the "world's most popular open source database" as it'll have you know, is coming at last to Gales Linux, the original operating system distribution from JWRD Computing. This development will enable a variety of existing server and workstation applications to run on the system largely unmodified, from Web publishing to messaging and collaboration platforms to organizational record management to data analysis. MySQL will be joining BerkeleyDB and SQLite as supported Database Management System options in the GPorts collection.

While this has been in the works for a while, it's now pretty much there. As may be expected when porting a large piece of software,(i) however portable it may fancy itself, to a new environment, while keeping an eye to quality assurance, smooth system integration and ease of administration, this took me a fair amount of doing; indeed it was quite possibly my most involved gport yet.(ii) I ended up using Git to wrangle the larger than usual number of required patches for MySQL and its multiple versions under consideration, though that tool is not imposed on anyone to simply read them, build the result or propose new patches. The upstream version currently used as the basis is 5.6.45; this is not necessarily final but is unlikely to change much for reasons we'll see in a bit.

To take a closer look at the journey, we'll start by returning to when I took its first steps in July of 2020. My starting point was version 5.6.38, because it had been proven to work, after a fashion, in a prototype Cuntoo environment. This shared with Gales its most significant divergence from earlier Linux practice, namely the choice of the (mostly) tidy and readable musl over the obese and obtuse glibc as its standard C library. Beyond this, I expected some further minor but constructive difficulties coming from the deliberate rejection of dynamic-linked libraries (aka shared objects) in Gales, its somewhat unconventional filesystem layout and "opt-in" approach to library visibility.

Before even getting to that point though, there was the fact that all of the even vaguely recent versions of MySQL demanded CMake as a configuration and build system. CMake is this beast that first came into my view when it was adopted by the KDE project, around the time the latter was going to the dogs (unbeknownst to my eager young self).(iii) I imagine it would like to think of itself as a sort of "autotools done right"; in reality it is nothing of the sort. While the GNU autotools may be ugly and in many cases unjustifiable, CMake is worse from every practical angle I've seen thus far; and even were it as good or a bit better, it would still be a crime against sysadmin-kind, because it's not compatible nor even anywhere near compatible so now we're stuck with two of the damned things. And yes, if there's two then there's infinity and I expect the altcoin lineup has reached a few times around the block by now. But hey, apparently it has a Windows GUI, because it's 1995 and that's totally where the future is at. What an incredible time of progress to be alive!

While I've had good results in several codebases - at varying costs - at replacing autoconf/automake/libtool tangles with plain Makefiles like the gods intended, the prospects of pulling it off in this case, between the lesser comprehensibility of CMake and the complexity of MySQL, without having an initial working build to guide the process, were slim. So, not seeing any real alternative, I happily proceeded to an initial cmake gport. I already had a manual recipe for it (from some previous odds and ends that really should have just been using "make"), and the rest was fairly straightforward, leaving only two choice caveats:

# Uses bundled: expat, zlib, bzip2, liblzma, curl, jsoncpp, libarchive

See, the problem with "make" was that it didn't come with a builtin web browser. Much like the problem with the electric drill was that it didn't come with a builtin nuclear reactor.

check () {
	echo "skipping slow and failing test suite"
	#cd $P-$V
	#make test
}

But I'm sure these two items are entirely coincidental and unrelated. And just by the way, have I used up my sarcasm quota yet?

Now the "configuration" piece of any autotools, that is the "autoconf" part, works like this: there's a series of questions about what is present, or in what variant, in the target environment; usually in yes/no form, sometimes numeric or an enumerated choice. This much can be automated, either declaratively ("it says Linux on the tin so from this we assume xyz") or empirically ("guess and check" - compile a minimal program using the desired feature and see if it works).(iv) Then there's a similar series of questions about things that are fundamental enough that they must be decided in advance (at compile time) by the user, even if only by default; usually things like optional components to include, namespace matters and compiler paranoia levels. Traditionally these were discovered by running ./configure --help; and if you've used a unixlike system without having to do this, I dare say it's because you've had a sysadmin or distributor somewhere making the choices on your behalf.

With CMake, that second part i.e. the fiddling of user-facing knobs is done through variables which can be set on the command line. As far as I've seen it draws very little distinction between this sort of variable and the sort that's set internally, either by the project's CMake code (yes, it has a scripting language all of its own and it sucks) or by CMake itself, and the way to discover them is murky at best. So partly by reading docs and party by trial and error I came up with about a dozen settings to adapt the build to the Gales way and disable various annoyances.(v)

As expected, I observed the same build failure as Diana, having something to do with threads and backtrace code. If I got to the heart of the matter at the time then I quite forgot by now and had to dig it back up, but with the benefit of the full history in Git it's quite clear what's going on. At one point there was some code to detect at runtime which threading library was in use, because at the time the Linux people were struggling to figure out what this threading business was all about, and there was an older half-assed LinuxThreads (LT) and a newer pthreads-compliant Native POSIX Thread Library (NPTL). Meanwhile nobody uses LinuxThreads anymore, and musl came onto the scene with an original implementation of the standard pthreads. How complete it is I can't quite say - the last time I delved into their mailing list and changelogs, there were all sorts of things about the merits and implementation considerations of exotic varieties of mutexes and the like that went over my head, even having dabbled in threaded programming, but in practice it seems quite solid enough for most applications.

But it turns out the actual failure here has nothing to do with threads at all. The detection code in MySQL got removed at some point, but incompletely, leaving dangling references to a THD_LIB_LT constant and thd_lib_detected variable. So how did this not fail loudly and unignorably before being injected into an unsuspecting world, you ask? The stale bits are in a #elif defined(TARGET_OS_LINUX) preprocessor block that acts as a fallback from an earlier HAVE_BACKTRACE test, among others, that being the autoconf-produced availability indicator for backtrace(3) and related functions, which as the manual notes are GNU (read, glibc) extensions. Thus by using musl, which doesn't have all that backtrace jazz - as well it shouldn't, there already being perfectly serviceable and vastly more powerful debugging facilities around - we exposed some rot that had otherwise slipped under the radar.

My objective being a more definitive "get MySQL on Gales" rather than just exploring whether it can be done, I set to expanding Diana's workaround(vi) into a proper cut of the offending nonsense. I'll go ahead and publish the result, for the curious; stats:

 b/client/mysqltest.cc                 |   46 --
 b/cmake/os/WindowsCache.cmake         |    4
 b/config.h.cmake                      |    6
 b/configure.cmake                     |   16
 b/include/my_stacktrace.h             |   41 -
 b/mysys/stacktrace.c                  |  700 ----------------------------------
 b/sql/CMakeLists.txt                  |    1
 b/sql/mysqld.cc                       |  123 -----
 b/storage/ndb/include/ndb_global.h    |    7
 b/storage/ndb/include/util/NdbTap.hpp |    8
 b/unittest/mytap/tap.c                |   44 --
 sql/signal_handler.cc                 |  241 -----------
 12 files changed, 2 insertions(+), 1235 deletions(-)

It amounts to a complete excision of 9 preprocessor defines, 8 functions and one whole source file. Only 14999-N to go!

So then it built fine and they all lived happily ever after, right??

Well, not quite... actually, we're only just getting started.

~ To be continued ~

  1. This one is large, at 3.5 GB for the full history packed in Git, or 30 MB for a single release containing over 15k files. Fortunately not all of this goes into the final product! [^]
  2. The Gales packaging utilities involved held up just fine, as expected. They are however getting due for a user-facing switch to enable or disable execution of test suites when building packages; this would provide an alternative to my current awkward practice of commenting out test invocation in those ports where it proves excessively slow or spuriously broken. This should be a simple change, as the build scripts were properly structured for it. [^]
  3. No, that quip didn't come out of nowhere. [^]
  4. Neither method is airtight; they can fail, for the same reason that no amount of confirmatory evidence can prove a theory correct while a single contradictory observation can unravel it. Still, in practice they can work quite well, especially when used sparingly for the rare and well-understood divergences in a field that's otherwise well standardized, such as C and unix systems today. [^]
  5. Such as the part where CMake suppresses the normal display of build commands, so that you have no idea what's going on or what it's messing up when it gets underway. Red or green, happy or sad, :D or D:, that's the outer limit of comprehension the user is expected to attain and I suppose it's not even such an unreasonable default these days. Perhaps it's time I gave up being reasonable altogether. [^]
  6. Rereading that in the context of further and ongoing struggles with the beast had me second- and third-guessing whether this was indeed my objective and whether it was sane; at least it helps to have others around with more than just the view from the front lines. [^]

5 Comments »

  1. [...] We left off where I'd cleared out a neglected broken appendage along with the body it came attached to, namely MySQL's internal fatal signal interception and backtracing code, on the theory that if it's that broken we're going to need a full-scale debugger anyway. It wouldn't be long at all before that played out, too, but meanwhile there were further wrinkles in just getting the code to compile. [...]

    Pingback by MySQL in Gales 2: bundles of joy « Fixpoint — 2021-11-06 @ 05:21

  2. But hey, apparently it has a Windows GUI, because it's 1995 and that's totally where the future is at.

    ...

    See, the problem with "make" was that it didn't come with a builtin web browser. Much like the problem with the electric drill was that it didn't come with a builtin nuclear reactor.

    Nuts.

    Red or green, happy or sad, :D or D:, that's the outer limit of comprehension the user is expected to attain and I suppose it's not even such an unreasonable default these days. Perhaps it's time I gave up being reasonable altogether.

    Maybe here is where you exceeded my sarcasm quota, but fuck giving up being reasonable altogether, that's the edge we defend and sharpen.

    Thus by using musl, which doesn't have all that backtrace jazz - as well it shouldn't, there already being perfectly serviceable and vastly more powerful debugging facilities around - we exposed some rot that had otherwise slipped under the radar.

    Nice.

    So then it built fine and they all lived happily ever after, right??

    Well, not quite... actually, we're only just getting started.

    I'm thankful you've the patience to work through this and write about it in an educational and enjoyable manner.

    Comment by Robinson Dorion — 2021-11-09 @ 17:12

  3. fuck giving up being reasonable altogether, that's the edge we defend and sharpen.

    How did it go?

    The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.

    Cheers.

    Comment by Jacob Welsh — 2021-11-09 @ 19:49

  4. The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.

    Touche TouShaw.

    Comment by Robinson Dorion — 2021-11-10 @ 13:43

  5. [...] third patch came more recently when I was craving a break from the MySQL writing and returns attention to the build system with a high-leverage change (as in, tiny change with big [...]

    Pingback by A tetrad of tuneups for bitcoind « Fixpoint — 2021-11-13 @ 15:54

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.