Building R on CentOS 6

Filed under: Data, Software — Jacob Welsh @ 01:19

R, also known by its more searchable handle of R-project, introduces itself as "a language and environment for statistical computing and graphics." A free software project since its early days in 1995, it inherited its syntax from the earlier S language, and other aspects from Scheme such as proper functional programming with lexical scoping and garbage collection. Sounds cool enough to me!

I'd never worked with it before, but I remember hearing of it way back, listed as a peer among the most powerful statistics tools. Since then, it's kept popping up in enough of the right places that I figured it was at least worth a look, now that I have some data visualization needs that seem to exceed the capacities of my trusty old gnuplot, or at least what it can handle without some serious help.(i)

So I checked if it was available from the repositories for my CentOS 6 box:(ii) partly because it's my go-to for low-commitment software experimentation and partly because it's where the data is.(iii) It was not, so I set to downloading the latest version (4.2.1, now one before latest as I'd taken that first step back in June of 2022) and looked into the build process. It was a normal enough autotools build - you know, the 60`000 lines of configure generated from 3`000 lines of in order to adapt it to every system under the Sun that nobody uses anymore and none of the systems that actual people use; the usual drill. After drinking all I could handle from the firehose of the "R Installation and Administration" manual and looking over the ./configure options, I determined that I'd simply give it a self-contained prefix and otherwise follow the defaults and let it do as it pleased:

$ tar xf R-4.2.1.tar.gz
$ cd R-4.2.1
$ ./configure --prefix=/opt/R

After a while it bombed out with a:

checking whether zlib supports suffices...(iv) configure: error: zlib library and headers are required

This being CentOS, I reflexively went for the yum install zlib-devel, finding it already installed. Wait, what?! So a second look revealed the preceeding line where the actual problem was confessed:

checking if zlib version >= 1.2.5... no

Great, again with that nonsense except this time the system already had the relatively recent version 1.2.3; not recent enough, apparently, for the rigorous demands of the mighty R Project. How many times does a lousy compression library have to reinvent itself already?! And how was I supposed to have known about this requirement, prior to spending my own resources on discovering it fresh? Well, buried down in Appendix A, section A.1, sixth paragraph of that installation manual we find sure enough,

Installations of zlib (version 1.2.5 or later), libbz2 (version 1.0.6 or later: called bzip2-libs/bzip2-devel or libbz2-1.0/libbz2-dev by some Linux distributions) and liblzma56 version 5.0.3 or later are required.

So not only does a mildly dusty version of the ubiquitous compression library not suffice, one must also have three different such libraries each coming with its own specific demands. We were not amused, as the royal saying goes, but I persisted: not accepting that I'd give it a free surgical treatment just because I can, I searched back in the R version history for the latest that was still possible to build as-is. At least going by the latest point release of each (major.minor) branch, this worked out to version 3.2.5 dated April 2016. And was that because the compression code hadn't yet drunk the poison at that point? Not at all; rather, the equivalent passage at the time read:

If you have them installed (including the appropriate headers and of suitable versions), system versions of zlib (version 1.2.5 or later),, libbz2 (version 1.0.6 or later: called bzip2-libs/bzip2-devel or libbz2-1.0/libbz2-dev by some Linux distributions) and PCRE (version 8.10 or later, preferably 8.32 or later): will be used, otherwise versions in the R sources will be compiled in. The external versions can be avoided by configure options --without-system-zlib, --without-system-bzlib and --without-system-pcre.

So I'd say it looks like they were still trying to maintain some semblance or at least appearances of responsibility for their dependencies. By the time 3.6 rolled around even this was too much to ask and in later versions we even get to see the "rationale" in the .texi source of that manual:

@c zlib 1.2.5 is from July 2010, bzip2 1.0.6 from Sept 2010
@c xz 5.0.3 is from May 2011

I therefore recommend that all their manuals, code commentary, identifier naming and message strings be translated to and henceforth maintained solely in Esperanto. Because newer is better, and English is this mongrel of a language that evolved into a recognizable form somewhere around the late middle ages whereas Esperanto is from the late 19th century, consistent and intelligently designed for that matter!

Speaking of manuals, after the main R build completed I noticed that the various formattings of the manuals had kinda quietly failed to build in the process:

make[1]: Entering directory `/home/user/src/R-3.2.5/doc/manual'
'texi2any' v5.1 or later needed to make HTML docs but missing on your system.
file R-FAQ.html will be missing and linked from CRAN
creating doc/manual/version.texi
'texi2any' v5.1 or later needed to make HTML docs but missing on your system.
file R-admin.html will be missing and linked from CRAN
'texi2any' v5.1 or later needed to make HTML docs but missing on your system.
file R-data.html will be missing and linked from CRAN
'texi2any' v5.1 or later needed to make HTML docs but missing on your system.
file R-exts.html will be missing and linked from CRAN
'texi2any' v5.1 or later needed to make HTML docs but missing on your system.
file R-intro.html will be missing and linked from CRAN
'texi2any' v5.1 or later needed to make HTML docs but missing on your system.
file R-ints.html will be missing and linked from CRAN
'texi2any' v5.1 or later needed to make HTML docs but missing on your system.
file R-lang.html will be missing and linked from CRAN

So what's this texi2any? It's also not in CentOS, but texinfo is, which includes makeinfo which ought to at least get me the .info files built. Come to find out upstream basically renamed the program, leaving the old makeinfo name for compatibility which the geniuses of R chose to ditch. So yes, I can still build the manuals, and in HTML format too, it just takes a good whack upside the head to the recent substitutions. I'd also found my /opt/R/ prefixing to be unhelpful since most of its stuff goes into a whole dedicated lib64/R/ subtree anyway, and the parts that don't are the better for it such as executables and manuals going in the recognized search paths.

$ keksum R-3.2.5.tar.gz
f31acc07bc5460a2e8bf1ebdb14ecf8478dbb1b712e8fffb023832b7cee73c7bdc3fc1b198f36464fc7b58598186ea71d79c90f49d6376f5082ea8d023cef354 R-3.2.5.tar.gz
$ tar xf R-3.2.5.tar.gz
$ cd R-3.2.5
$ ./configure
$ make MAKEINFO=makeinfo
$ su
# make install INSTALL_INFO=install-info

I was then able to simply run 'R', getting a command prompt, then load up the fresh /usr/local/lib64/R/doc/html/index.html in browser and start working the introductory examples, happily spinning up data frames and throwing them into plots like a pro.

Yet while thinking about a writeup I had that one last niggling thought: I'd noticed that there was an R package available in EPEL: Extra Packages for Enterprise Linux, the sorta community maintained repository of extras boasting seamless compatibility with the stable RHEL and CentOS releases. Had I done a stupid thing by not simply going with the fruits of their past labors? Another hour or two later I had all 11 GB of the EPEL 6 sources added to my archives(v) alongside the full CentOS 6 collection, and my RPM packaging skills dusted off. So does it build?

$ rpmbuild --rebuild R-3.5.2-2.el6.src.rpm
Installing R-3.5.2-2.el6.src.rpm
warning: user mockbuild does not exist - using root
warning: group mock does not exist - using root
warning: user mockbuild does not exist - using root
warning: group mock does not exist - using root
[snipped a bunch more of those...]
error: Failed build dependencies:
        libmetalink-devel is needed by R-3.5.2-2.el6.x86_64
        libssh2-devel is needed by R-3.5.2-2.el6.x86_64
        stunnel is needed by R-3.5.2-2.el6.x86_64
        texinfo-tex is needed by R-3.5.2-2.el6.x86_64
        tcl-devel is needed by R-3.5.2-2.el6.x86_64
        tk-devel is needed by R-3.5.2-2.el6.x86_64
        pcre-devel is needed by R-3.5.2-2.el6.x86_64
        pcre2-devel is needed by R-3.5.2-2.el6.x86_64
        valgrind-devel is needed by R-3.5.2-2.el6.x86_64
        libtiff-devel is needed by R-3.5.2-2.el6.x86_64
        gcc-objc is needed by R-3.5.2-2.el6.x86_64
        xz-devel is needed by R-3.5.2-2.el6.x86_64
        libicu-devel is needed by R-3.5.2-2.el6.x86_64

Clearly the more correct introduction is that R is a language and environment for statistical computing, graphics, SSL tunneling, Macintosh programming and exotic Unicode acrobatics.

In any case, it seemed interesting that they had a newer version than mine working at least somewhere, while EPEL is designed not to introduce conflicting versions of base system libraries. Had they done a patch to fix the broken zlib code? I had to know! From R.spec we find these choice snippets:

# R really wants zlib 1.2.5, bzip2 1.0.6, xz 5.0.3, curl 7.28, and pcre 8.10+
# These are too new for RHEL 5/6. HACKITY HACK TIME.
%global zlibhack 0

%if 0%{?rhel} == 5
%global zlibhack 1

%if 0%{?rhel} == 6
%global zlibhack 1
%if %{texi2any}
# If we have texi2any 5.1+, we can generate the docs on the fly.
# If not, we're building for a very old target (RHEL 6 or older)
# In this case, we need to use pre-built manuals.
# NOTE: These need to be updated for every new version.
%if %{zlibhack}
%global zlibv 1.2.11
%global bzipv 1.0.6
%global xzv 5.2.4
%global pcrev 8.42
%global curlv 7.63.0
BuildRequires: glibc-devel
BuildRequires: groff
BuildRequires: krb5-libs
BuildRequires: krb5-devel
BuildRequires: libgssapi-devel
BuildRequires: libidn-devel
BuildRequires: libmetalink-devel
BuildRequires: libssh2-devel
BuildRequires: openldap
BuildRequires: openldap-devel
BuildRequires: openssl-devel
BuildRequires: openssh-clients
BuildRequires: openssh-server
BuildRequires: pkgconfig
BuildRequires: python
BuildRequires: stunnel
# If you're seeing this, I'm sorry. This is ugly.
# But short of updating RHEL 5/6 (which isn't happening), this is the best worst way to keep R working.
%if %{zlibhack}
pushd zlib-%{zlibv}
./configure --libdir=%{_libdir} --includedir=%{_includedir} --prefix=%{_prefix} --static
make %{?_smp_mflags} CFLAGS='%{optflags} -fpic -fPIC'
mkdir -p target
make DESTDIR=./target install

(and so on in that vein for bzip etc.)

%if %{texi2any}
    make MAKEINFO=texi2any info
# Well, this used to work, but now rhel 6 is too old and buggy.
# make MAKEINFO=makeinfo info
%if 0%{?zlibhack}
# Clean our shameful shame out of the files.
sed -i 's|-Wl,--whole-archive %{_builddir}/%{name}-%{version}/zlib-%{zlibv}/target%{_libdir}/libz.a %{_builddir}/%{name}-%{version}/bzip2-%{bzipv}/target%{_libdir}/libbz2.a %{_builddir}/%{name}-%{version}/xz-%{xzv}/target%{_libdir}/liblzma.a %{_builddir}/%{name}-%{version}/pcre-%{pcrev}/target%{_libdir}/libpcre.a %{_builddir}/%{name}-%{version}/curl-%{curlv}/target%{_libdir}/libcurl.a -Wl,--no-whole-archive -L%{_builddir}/%{name}-%{version}/curl-%{curlv}/target%{_libdir}/||g' %{buildroot}%{_libdir}/R/etc/Makeconf

(and a bunch more in that vein. And here I thought the Gales bootstrap recipe was hairy!)

In short, instead of hitting the mailing lists to point out to the intellectual idiots that their shit was broken and thereby either getting it fixed or at least burying the dead, they papered over the problems and ensured I'd still be stuck taking out everyone's garbage a good seven years later. Or if they did communicate, I couldn't find it, nor any sort of discussion about the zlibhack, much like on the R side regarding the breakage itself. Indeed I quite expect I'm the first person on the Internet who cared even enough to notice your shameful shame, much less bring it to light.

Repent today, before it's too late. Actually, for you it's too late already, odds are.

  1. There's surely more to that program than I've explored yet too, but apparently it can't even do histograms without my manually reinventing the binning process with due care to various complications. Lacking even such basics, I find it hard to see much promise in further exploration right now. [^]
  2. For any onlookers not tuned in to the minutia of the branch of the general unwinding of civilization at work here, that was the last usable release of a major stable Linux distribution, running up primarily against the systemd fracture with assorted others coming later. It's younger than Windows 7, a system where the latest software packages by and large continue to work fine. There is no "upgrading" on the menu and if you don't support CentOS 6, it means you don't support Linux, simple as that. [^]
  3. I'm tempted to properly write "where the data are" but it seems quite a lost cause by now: the word came to signify a lump of data as a whole, de-emphasizing the separate points within it, and it's as if we lacked the word "forest" as distinct from "trees". How much trees is in your trees set? Are you failing to see the data for all the data? I'm sure I do sometimes. [^]
  4. D'oh, "suffixes" please! I can't promise it's correct Latin but "suffices" is a different English word entirely. [^]
  5. No thanks to your "MirrorManager" distractions, Fedora Project, and yes I ended up rsyncing directly from your "master" mirror. Whatcha gonna do? [^]

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.