Fixpoint

2020-04-28

Build system overhaul for bitcoind

Filed under: Bitcoin, Historia, Software — Jacob Welsh @ 21:19

Background

The Real Bitcoin's build system for some years has consisted at the top level of a number of GNU Makefiles and a thing called "Rotor", building on an earlier "Stator". According to its 2015 introduction by Stanislav Datskovskiy, it served to compile the "bitcoind" executable deterministically and with full static linking, given a reasonable starting environment. The key to this magic was "buildroot", essentially a miniature, non-self-hosting Linux distribution designed for cross-compiling embedded systems.

The determinism came from capturing a full set of dependencies all the way down to the compiler. This came at the cost of adding considerable complexity to the process, as what was formerly a mere application took on all the potential problems involved in bootstrapping an operating system from an unpredictable environment, in addition to the already intricate build systems of the required libraries Boost, Berkeley DB (BDB) and OpenSSL. In an early sign of trouble, Michael Trinque found it wouldn't build BDB on his system without some CPU architecture specific configuration. In my own experience, I got it to work once, but when demonstrating to some friends on fairly similar Gentoo systems I'd built for them, it failed in multiple different ways. Ultimately I couldn't be bothered to track them all down, partly because of how unbearably slow it was: to try any change you would have to repeat the whole toolchain bootstrap.

The V cryptographic source code management system was introduced, with Bitcoin as its first user, shortly after Rotor; somehow the Rotor scripts and patches didn't end up in the V-tree proper, meaning that they in addition to the library and toolchain sources had to be rounded up in order to do offline (i.e. reliable) builds or study the code.

Finally, having already taken on the publication of a Linux distribution with similar static linking and deterministic bootstrapping goals, but going further in providing a self-sufficient system with native compilers, I had little desire to be stuck maintaining two different such beasts.

The vpatch

Thus I now present bitcoin_system_compiler.vpatch (with seals on the shelf). Building on my previous raw transactions patch, it:

  • Rewrites the Makefiles almost entirely. This includes the "makefile.unix" inherited from earlier developers, greatly simplifying it and eliminating historical baggage such as dynamic linking tweaks, Ubuntu bug workarounds, linking of "libssl" and way too much "sed" magic. Additions include compiler warning flags (resulting in quite a bit of warning spew, some of which might be interesting) and building "test_bitcoin" by default. Automatic header dependency analysis is preserved.
  • Removes some minor GNUisms like "tar xvfz" in order to work on BusyBox systems.
  • Brings "openssl-004-musl-termios.patch" formerly found in the external rotor sources into the tree.
  • Adds "boost-no-demangler.patch", discussed below.
  • Removes the various "src/obj" directories and moves all compiler output to the "build" directory. (For instance, this makes it easier to "grep" or "diff" the code without tripping on binary files.)
  • Avoids copying the "bitcoind" binary all around and removes the "bin" subdirectory: one place is enough.
  • Corrects the oversight that a library build failure would be ignored on a second "make" invocation because the mere extracted directories were used as the targets for dependency calculation.
  • Fixes parallel "make" by forcibly serializing the recursion into OpenSSL's ever-so-special custom build system.
  • Avoids recursing into "deps" on a top-level "make clean" so that dependency tarballs won't need to be re-downloaded. (Ultimately these need to get cleaned up and imported directly to the tree.)
  • Tweaks the BDB configuration to prevent "libtool" from attempting to build unwanted shared librarires.
  • Tweaks the Boost "bjam" invocation (the "compression" module gets built at install time if suppressed only at build time) and removes some "|| true" constructs that caused failures to be ignored.

It still supports the "make ONLINE=1" mode to download out-of-tree dependencies into the "deps" subdirectory from deedbot; these are reduced to the three essentials (Boost, BDB, OpenSSL).

In short, it makes both development and deployment much less painful, with a sane starting system as the price of admission.

The undemangling

Special mention is in order for the new boost-no-demangler.patch. Gales Linux uses an older branch of GCC that didn't receive fixes when the long-existing "stack clashing" (archived) family of attacks was stirred up in 2017, meaning some applications could end up vulnerable, especially those using the hazardous yet popular "alloca" or variable-length array features.

As a first step in investigating this, I enabled a number of warnings by default in the GCC configuration relating to excessive or dynamic stack frame size. While these warnings produce many false positives, they've done nicely to illuminate some suspicious code, such as binutils/libiberty/cp-demangle.c. Got that all read? Me neither... but so what, that "libiberty" is just an internal part of the toolchain, or so say the docs, right? And it's "well-known" that you don't want to feed untrusted input to the linker and friends. But wait: the GCC build system copies that code into libstdc++; from there it gets linked into C++ programs. This happens even if the program doesn't use the nonstandard "__cxa_demangle" extension it provides, by way of the default exception handler ("terminate called after throwing an instance of std::whatever").

So my gcc-4.7.4-demangler-amputation.patch, included in the Gales toolchain, removes __cxa_demangle along with the copying of cp-demangle.c, and simplifies the termination function to print raw symbol name, as it previously did anyway for the case of demangler failure (hah! - we learn that they knew their code doesn't always work). These names can be fed to "c++filt" to decode them manually if need be. It then turns out that Boost contains a couple uses of __cxa_demangle - none of them in components actually needed by bitcoind, harrumph - so the Boost patch simply cuts out the assumption that GNU compilers support it.

Stats

$ diffstat bitcoin_system_compiler.vpatch
 .gitignore                          |   21 ----
 Makefile                            |   21 ----
 bin/Makefile                        |   13 --
 bin/Manifest.sha512                 |    1
 build/Makefile                      |   97 +++++++++++++--------
 build/Makefile.rotor                |   56 ------------
 deps/Makefile                       |  166 +-----------------------------------
 deps/Manifest.sha512                |   17 ---
 deps/boost-no-demangler.patch       |   49 ++++++++++
 deps/openssl-004-musl-termios.patch |   46 +++++++++
 manifest                            |    1
 src/makefile.unix                   |  145 -------------------------------
 src/obj-test/.gitignore             |    2
 src/obj/.gitignore                  |    2
 src/obj/nogui/.gitignore            |    2
 src/obj/test/.gitignore             |    2
 verify.mk                           |    5 -
 17 files changed, 168 insertions(+), 478 deletions(-)

Looking only at the "make" code, 471 lines across seven files is reduced to 112 lines across three: quite the improvement I should think!

Future directions

Some work that sorely needs doing, as suggested earlier, is getting those external libraries under control, through some combination of pruning their code to just the necessary parts, replacing their build systems, and importing to the tree, or changing bitcoind code to eliminate them.

2020-04-22

New and improved raw transaction support for bitcoind

Filed under: Bitcoin, Software — Jacob Welsh @ 05:25

Previously on Fixpoint, I introduced my implementation of the getrawtransaction command and reviewed an existing one for sendrawtransaction. There were some more and less serious problems and stylistic considerations outstanding. To recap, for the "get":

  1. Not a proper vpatch.
  2. Loose parsing, ignoring trailing non-hex or excess characters in the txid argument.
  3. Directly accessing the mapTransactions object without acquiring the cs_mapTransactions lock.
  4. Unnecessarily exposing the mapTransactions object globally.

And for the "send", (with some points not previously noted, indicated by *, for my teeth grow sharper with the passage of time):

  1. Needing a regrind to build on other work from the intervening years.
  2. Redundant inclusion of CDataStream constructor defaults SER_NETWORK and VERSION.
  3. Loose parsing, ignoring excess data in the hex argument after the first decoded transaction.
  4. Refusal to re-broadcast transactions already in the node's memory pool.
  5. Returning -5 for already-seen transactions, a code otherwise used only for invalid arguments or inexistant objects. *
  6. New TransactionSeen function, clumsy in both usage and implementation.
  7. Supposed "missing inputs" condition that can't occur and wouldn't distinguish error codes if it did.
  8. Application of anti-spam checks by way of calling AcceptToMemoryPool with fCheckInputs=true.
  9. No reason provided for some rejection cases. *
  10. "txHash" breaking the conventional Hungarian notation, suggesting it's a CTransaction instance. *

Now meet bitcoin_rawtx_get_send.vpatch.

It addresses points 1, 3, 4, 5, 6, 8, 9, 10, 11, and 14. The risky exposure of mapTransactions is replaced with specific accessor functions that take care of locking. Each of these also turns out to simplify other code in main.cpp: always a nice sign that you're onto a good interface.

#2 I've discussed previously and don't see as a problem with this patch.

#7 I've let slide; please speak up if you think it's important.

#12 I've preserved deliberately, on the theory that it's better to reject transactions likely to be rejected by peers so the sender has a chance to determine the cause and fix it. This strikes me as a weak argument though; perhaps better would be to have a way to force it if necessary.

#13 is tricky because AcceptToMemoryPool is a monster of a function that could fail for a number of reasons, and doesn't make that reason available to the caller. It does at least write to the debug log so I've made the error message provide that referral.

The two previous patches added up to +99/-2 lines; the new unified one to +85/-10.

Further, I've now made the whole Real Bitcoin patch set and known seals available on my own reference code shelves.(i) I went through them all afresh to identify those on which I was comfortable adding my own signature; for the rest I used my "jfw_unchecked" subordinate key (both keys are available through my contact page).

On that shelf you may also notice a novel bitcoin_system_compiler.vpatch. It amounts to a full overhaul of the build system and will be the subject of an upcoming article.

  1. At some point I intend to put a formal, efficient, GNU bloatware free sync mechanism in place. For now, the best I can suggest to grab the whole collection in one shot is:

    wget --mirror --no-parent http://fixpoint.welshcomputing.com/v/

    [^]

2020-04-11

Preliminary report on the bitcoind hex conversion mess

Filed under: Bitcoin, Politikos, Software — Jacob Welsh @ 16:46

As I previously noted about my getrawtransaction patch,

It does accept invalid hex strings, perhaps a flaw in that SetHex method.

Indeed, SetHex in the base_uint class is completely permissive: it accepts leading whitespace, optional "0x" or "0X", then however many hex digits it finds, up to the length of the template specialization (160 or 256 bits), zero-filling if too short and ignoring excess if too long. It also includes some pointer arithmetic I believe has technically undefined behavior: comparing after decrementing past the start of the object. And it's one source of the obnoxious byte order reversal in bitcoind hash I/O, with GetHex being its counterpart.

The users of the function are the RPC commands listsinceblock and gettransaction. There's another SetHex function in CBigNum, similar but taking arbitrary length, entirely unused.

On the other hand, ParseHex, the subject of mod6_phexdigit_fix.vpatch, is oriented toward "dumps" in that it allows space between digit pairs and doesn't reverse byte order. It's used in some tests, in the hard-coded genesis block output in main.cpp, in the mining RPC commands getwork and getmemorypool, and in sendrawtransaction as seen in polarbeard_add_sendrawtransaction.vpatch.

My conclusion is that getrawtransaction is doing the right thing here, in that it shouldn't be made a special case, but the hex parsing should be cleaned up generally. If it were just me, I'd rip it all out and use a single, strict, non-reversing hex decoder. But there's no telling how much outside code or data has built up around the old rules. Parties interested in the matter, in the sense of having a meaningful amount of coin on the line, are encouraged to write in.(i)

  1. Not that I expect this to do much on its own, as politics involves involves actively going out to find people where they are and talk to them. [^]

2020-04-10

The missing Adacore public download index, vintage 2018, while it lasts

Filed under: Historia, Software — Jacob Welsh @ 21:16

I usually browse the web with JavaScript disabled, if present at all: it's bad for your computer and it's bad for your mind. That it's bad for your computer should be clear if you've dug at all into browser security in the past, say, 15 years. Failing that, consider the intended behavior of the thing: to allow any page to consume unlimited processing and network resources even once loaded. It's as if the mailman, upon completion of a delivery, strolls right into your house and helps himself to whatever is in the fridge. As for why it's bad for your mind, one aspect is that it leads you to believe that a page "works" or contains some piece of information when it in fact does not, like a mirage, or the screensavers of the cathode-ray era, falling away as soon as you reach out to touch it.(i)

Being thus grounded in reality has its difficulties, such as when those around you mistakenly believe a certain link points to something other than a blank page, even expecting you to have an opinion on it. Worse is when you actually want the promised item. On these occasions you can of course choose to let in the hungry mailman; sometimes he'll even give you helpful instructions on how to do so (as if you'd need them, having barred the door in the first place). But sometimes, a little effort can reveal the secrets of the code and allow you to possess the thing for yourself. Some might call it "reverse engineering", but isn't that a strong term for what's really just reading ? It's reading some text that was foisted on you, albeit not in the manner the foisters would like for it to be read.

So it went the last time I was looking into the Ada programming language, in 2018. GNAT, the foremost public implementation, as I understand, was mostly developed as a frontend for the GNU Compiler Collection at US taxpayer expense by a company presently known as Adacore.(ii) But their public download site demanded use of a JavaScript menu tree to filter options by platform and other categories, and offered no readily accessible listing of files or links otherwise.

A survey of the page source didn't directly reveal the hidden list, but turned up a number of scripts by reference. Skipping over some well-known "framework" wads left a promising "script.js", which straightaway pointed to a JSON "feed" containing a sizable collection of file metadata. A search in the code for how it turned this into URLs, and then some custom coding to do the same in a controlled context, was all that remained to produce a usable index. This done, I downloaded a couple GNAT binary releases for x86_64 Linux(iii) and added them to my archives.

Sadly, at the time I had neither a blog on which to boast of my exploits nor the kind of social engagement to motivate it. A year and change went by and I rectified that, but I didn't substantially revisit the Ada bootstrapping process until now, and had forgotten all about the indexing work.

Through looking into the publications of the now disbanded Republic and asking around, I found a series of recipes, notes, more notes, and a partial collection of dependencies, but little assurance that I'd obtained all necessary ingredients. In particular, I realized that the initial binary required to bootstrap the process had not been nailed down, but reports indicated the 2016 version was known to work. After the back-and-forth over what pieces someone might have on hand, I decided what I really wanted was access to the full collection.

In the unsurprising heathen manner, Adacore had broken their download links, apparently in the course of a move to Amazon's Cloudfront delivery network. Looking for the new locations, I was again greeted by the non-index page, felt the deja vu, and unearthed my old work. It appeared the "feed" format was unchanged and the new URL format was easily constructed from the old data. More surprisingly, the feed URL itself still pointed to the vanished "mirrors.cdn.adacore.com" hostname; as far as I could see, downloads wouldn't be working even with full JS. Behold the incredible bandwidth savings realized by moving to the Cloud! On the bright side, my existing SHA1 checksums provide some assurance that the historical files have not been diddled since 2018.

The code

Despite the tree-structured JSON format, the feed is essentially tabular. We'll use Python so as to accurately parse the JSON and convert to a more manageable comma-separated format (taking care that the separator characters don't occur in the values themselves).

import json

# http://mirrors.cdn.adacore.com/gpl_feed
rels = json.load(open('gpl_feed.json'))['feed']['releases']

fields = (
	'name',
	'id',
	'size',
	'sha1',
	'type',
	'client',
	'component',
	'date',
	'display_name',
	'display_order',
	'kind',
	'platform',
	'platform_display_name',
	'platform_display_order',
	'release_date',
	'release_name',
	'title',
)

table = [[str(obj[f]) for f in fields] for obj in rels]
table.sort()

for rec in table:
	for val in rec:
		assert ',' not in val and '\n' not in val

print ','.join(fields)
for rec in table:
	print ','.join(rec)

The output includes column headers and can be processed by any number of standard tools. How about a shell one-liner to produce the full URL listing?

awk -F, 'NR>1 { print "https://community.download.adacore.com/v1/" $4 "?filename=" $1 }' | uniq

Note that downloads are not uniquely identified by filename!

The data

And no, I don't intend to go mirroring the whole 21 GB of it, though if you do, I'll gladly link it here. Once I'm more clear on what parts are truly needed, I'll probably host those.

Enjoy!

  1. Terms pertaining to this effect, ranging from technical to marketing, include XHR, AJAX, and Web 2.0. [^]
  2. At some point the code was imported by the main GCC project, but for reasons I haven't yet ascertained, that version was considered broken for the purposes of The Most Serene Republic, so Adacore remained the source of record while efforts were made for the Republic to take over that role. [^]
  3. Namely, one dated 2007, being the oldest available, and 2014, for reasons I forget but perhaps because the next one came with a precipitous size increase. [^]

2020-04-07

Selection and other sundries for MP-WP

Filed under: MP-WP, Software — Jacob Welsh @ 04:49

Here are four patches to the V-tree for Mircea Popescu's Wordpress, which amount to an approximation of the changes I've been running since initial setup of Fixpoint, reground to build upon subsequent improvements by billymg and Diana Coman.

I plan to sign them shortly unless cause for revision comes to light (such as for example server-side selection being implemented outside the theme-specific code).

Quoth the manifest:

624752 mp-wp_remove-textselectionjs-pop3-etc jfw Remove the unreliable JS-based selection, posting by POP3 login, and a stray .php.orig file. Neutralize and comment the example pingback updater.
624752 mp-wp_svg-screenshots-and-errorreporting jfw Allow .svg extensions in theme screenshot search. Don't clobber the user's errorreporting level without WP_DEBUG.
624752 mp-wp_serverside-selection jfw Add server-side text selection to example htaccess and themes, roughly as seen in http://trilema.com/2019/proper-html-linking-the-crisis-the-solution-the-resolution-conclusion/ (xmlrpc.php changes not included)
624752 mp-wp_footnote-link-tweaks jfw Avoid turning double-quotes into backquotes in footnote tooltips. Expand the link part of footnote identifiers to cover the pre_identifier and post_identifier strings, for a larger clickable area.

Update: I've reground the first based on billymg's feedback to prune the wrapper "span" tags originating from mp-wp_add-footnotes-and-textselectionjs.vpatch, and the second trivially to follow the changed antecedent. As I had previously signed these, I've bumped the canonical patch names to avoid confusion. The second two patches remain in a draft state. Current patches and seals:

2020-04-02

V in Perl with parsing fix, keksum, and starter, plus the ill-fated vdiff

Filed under: Software, V — Jacob Welsh @ 17:50

Following my prior adventures, I reoriented my efforts toward some simpler changes to the v.pl tree, abandoning hopes of a robust patch creation tool built on Busybox diff.

I've split the changes into two patches. The first is "v_strict_headers", which I think would be of interest to any v.pl user. It tightens vpatch parsing to prevent false-positive header matches that could cause incorrect or nonsensical antecedent information to be extracted from valid vpatches. Following the precedent of the vtools vpatch program, this is done by requiring the string "diff " at the start of a line preceding the header, which works because all other lines of a diff "packet"(i) start with either @, +, -, or space characters. This patch also backfills the manifest file and brings it fully in line with the spec.

The second patch, "v_keksum_busybox", swaps keksum and patch in for ksum and vpatch, making V presses possible again on systems with little more than a C toolchain, Busybox utilities and Perl.

I have also mirrored the rest of the VTree and contributed my own seals, which can be found in the same directory.

For deployment on systems with no previous V, there's a starter tarball which includes the tree pressed to v_keksum_busybox, the keksum code, and an install script. Take a look at what it does, then run as root, from the extracted directory:

# sh install.sh

Download

The ill-fated vdiff

What follows is my abandoned attempt at vdiff in awk, supporting any conforming diff program. It identifies headers using a three-state machine to recognize the ---, +++, @@ sequence. This would still be fooled by a ---, +++ sequence followed immediately by another hunk, except that the lines of context prevent this, unless the change comes at the end of the file in which case there can't be another hunk prior to the next file header.

It works as far as parsing both GNU and Busybox diff output, produces working vpatches in the GNU case, and could even be expanded to do the same for Busybox. But since fully-reproducible output seems to be desirable, I can't presently justify further work in this direction or recommend it over the vtools vdiff.

#!/bin/sh
export LC_COLLATE=C
diff -uNr $1 $2 | awk -v sq=\' '
function shell_quote(s) {
	gsub(sq, sq "\\" sq sq, s);
	return sq s sq;
}

function vhash(path) {
	if (path == "/dev/null") return "false";
	qpath = shell_quote(path);
	cmd = "test -e " qpath " && keksum -s256 -l512 -- " qpath;
	gotline = cmd | getline rec;
	close(cmd);
	if (!gotline) return "false";
	split(rec, parts);
	return parts[1];
}

function print_header(line) {
	split(line, parts);
	print parts[1], parts[2], vhash(parts[2]);
}

{
	if (state == 0) {
		if ($0 ~ /^---/) {
			from = $0;
			state = 1;
		}
		else {
			print;
		}
	}
	else if (state == 1) {
		if ($0 ~ /^\+\+\+/) {
			to = $0;
			state = 2;
		}
		else if ($0 ~ /^---/) {
			print from;
			from = $0;
		}
		else {
			print from;
			print;
			state = 0;
		}
	}
	else if (state == 2) {
		if ($0 ~ /^@@/) {
			print_header(from);
			print_header(to);
			print;
			state = 0;
		}
		else if ($0 ~ /^---/) {
			print from;
			print to;
			from = $0;
			state = 1;
		}
		else {
			print from;
			print to;
			print;
			state = 0;
		}
	}
}

END {
	if (state == 1) {
		print from;
	}
	else if (state == 2) {
		print from;
		print to;
	}
}'
  1. Or what else do you call the header and sequence of hunks associated with a single file? [^]

Powered by MP-WP. Copyright Jacob Welsh.