Fixpoint

2019-12-10

Una visita a la Republica Oriental del Uruguay, parte 1

Filed under: Politikos, Vita — Jacob Welsh @ 18:22

Having bought some of the remains of the historic but sadly liquidated Bitcoin firms No Such lAbs and Pizarro ISP, and with expected overseas shipping costs being comparable to a personal courier run, I seized the opportunity for some travel and networking. It's been a success on all three fronts: retrieving the gear, getting a taste of Montevideo, and meeting and spending some quality time with Aaron Rogier aka BingoBoingo, whom I'd previously known mainly as the humorously grandiose voice of Qntra and a thoughtful contributor to IRC discussions; I found him to demonstrate the same insight in person and be quite likable besides. I made it a three-night stay to allow one full day for the hauling and packing and one for tourism.(i)

My biggest mess-up of the trip as I see it was not allowing enough time for my initial departure from the "Hub of the Americas", Panama's recently expanded Tocumen International Airport - for which I'm starting to develop a hearty loathing - or the time to get there, my previous departures here having been either in the wee hours or from locations with better toll road access. I got stuck in the check-in cattle queue for the better part of an hour.(ii) By the time my turn came, I was informed that due to late check-in my bags would be subject to "voluntary separation" and might end up on the next flight. Since apparently I couldn't find out whether they made it until arrival, I worried and contemplated my options on the flight. At security, while not subjected to the gate-side mandatory gropings reserved for the US-bound, there were still US-inspired theatrics like shoe removal and inspecting my carry-on for liquids, confiscating my over-100ml sunscreen. Serves me right for being such a terrorist, huh.

Things went much smoother from there; immigration in Montevideo was a breeze at least for chip-enabled passport holders, there were no kilometers to walk to the airport's one baggage claim, and my bags had made it just fine. Having been warned about the pricey airport taxi service, I elected to wait for a shuttle, which departed once the next flight had dumped enough passengers to form a group. On exiting the airport (around 2am local time) I was welcomed by the delightfully cool, spring-like air: always a nice thing after months in the tropics, though my skin and nose didn't adjust to the dryness too well.

All the travel intel Aaron had given that I had chance to verify proved accurate, and the Punta Trouville hotel he recommended was the perfect fit for my needs: budget but clean, functional, well located and with 24-hour service. Power outlets and money proved easier than anticipated. The hotel had multi-format outlets; it's just as well I came prepared with adapters, as Aaron said those can be flaky, though they worked for me. While there are cambios all over for changing currency with around 4% spread, I never ended up needing one as the airport transit and the merchants I tried were all equipped and even glad to take my specie (well, USD) and give change in pesos Uruguashos; the local currency sees the sort of inflation that gets automatically priced into yearly contracts.

To be continued (and with photos).

  1. Not ideal for really getting to know a place, but I already had a longer holiday coming up and lots to get done before it. [^]
  2. The "web check-in" line turned out to move faster; I can't see any good reason as it doesn't save much time at the counter: you still need to get docs checked, bags weighed and tagged, and any overage paid. The main reason as far as I could tell was simply that they'd allocated more agents there and didn't rebalance until the line was entirely exhausted. [^]

2019-12-05

Basic getrawtransaction patch proposal for bitcoind

Filed under: Bitcoin, Software — Jacob Welsh @ 17:35

I present a bitcoind patch, of my own authorship, for consideration: a basic but functional getrawtransaction RPC command. I expect the need for it is clear: if it's your node then you shouldn't accept any sort of "I'm sorry Dave, I'm afraid I can't do that" especially regarding the data it already has handy.

To speak plainly about some deficits, this time not my own: the Real Bitcoin project presently rests on some shaky foundations. Despite apparently intricate efforts, a Keccak based vtree has seen little progress, in that the original patch signers besides mod6 have not signed on to the regrinds of their work, and some recommended patches have not been reground at all, updated for the very necessary manifest spec, or included in the mainline collection. Further, the sorry state of the inherited code such as magic numbers everywhere has seen little improvement. Perhaps I will have to take up some of these burdens in time; for now I'll leave it as an entreaty to the elder hands to please find a way to make it happen!

The patch

As in the original introduced somewhere around PRB 0.7.0, this command takes a transaction ID (256 bits of hex), searches first the node's own memory pool then the confirmed transaction database (blkindex.dat) and returns a hex dump of the encoded transaction. Unlike the original it does not support a "verbose" option to give a JSON representation of the data. This task seems to me better suited to an external tool, but I could see including it here if the implementation is concise and obviously correct.

Backporting the original was not possible due to the many intervening changes, though I did consult it to confirm I hadn't missed anything important and matched its numeric error code.

Based on the overall idea in the V version control system of building yggdrasil, I'm breaking from one of the project's prior naming conventions by including "bitcoin" but not the author in the patch name; the author is still recorded in the enclosed manifest file. Due to the problems noted earlier with the prior patch tree it's also not a proper vpatch yet.

Download: bitcoin_getrawtransaction.draft.patch.

To try this you will need to:

  1. Perform a press to asciilifeform_whogaveblox.vpatch;
  2. Manually apply mod6_phexdigit_fix.vpatch (which could be missed otherwise due to lacking a manifest entry);
  3. Manually apply the patch in question.

In detail

I added a manifest entry for the phexdigit fix, to make its inclusion explicit:

--- a/bitcoin/manifest.txt
+++ b/bitcoin/manifest.txt
@@ -28,3 +28,5 @@
 542413 asciilifeform_aggressive_pushgetblocks asciilifeform Issue PushGetBlocks command to any peer that issues 'version' command
 542413 mod6_excise_hash_truncation mod6 Regrind of ben_vulpes original; removes truncation of hashes printed to TRB log file
 543661 asciilifeform_whogaveblox asciilifeform Record the origin of every incoming candidate block (whether accepted or rejected)
+606789 mod6_phexdigit_fix mod6 Fix decoding LUT which wrongly accepted certain invalid characters as hex
+606789 bitcoin_getrawtransaction jfw Add RPC to get transactions from memory pool or database (hex only)

The RPC itself is about as simple as it gets in this codebase. First we try the mempool, as this should be fast and may contain unconfirmed transactions.(i)

--- a/bitcoin/src/bitcoinrpc.cpp
+++ b/bitcoin/src/bitcoinrpc.cpp
@@ -1351,7 +1351,31 @@
     return entry;
 }

+Value getrawtransaction(const Array& params, bool fHelp)
+{
+    if (fHelp || params.size() != 1)
+        throw runtime_error(
+            "getrawtransaction <txid>\n"
+            "Get hex serialization of <txid> from memory pool or database.");

+    uint256 hash;
+    map<uint256, CTransaction>::iterator it;
+    CTransaction tx;
+    CDataStream ssTx;
+
+    hash.SetHex(params[0].get_str());
+    it = mapTransactions.find(hash);
+    if (it != mapTransactions.end())
+        tx = it->second;

Functions everywhere have to open their own database connections - though some have the luxury of getting one passed in - which then implies a whole caching mechanism so as not to be horribly inefficient. Odin knows why there couldn't just be one global (or at least per-thread) "This Is The Database; Use It" object.

+    else {
+        CTxDB txdb("r");
+        if (!txdb.ReadDiskTx(hash, tx))
+            throw JSONRPCError(-5, "Transaction not found in memory pool or database.");
+    }
+    ssTx << tx;
+    return HexStr(ssTx.begin(), ssTx.end());
+}
+
 Value backupwallet(const Array& params, bool fHelp)
 {
     if (fHelp || params.size() != 1)

Wiring the function into the RPC dispatch table (I don't recall how I chose where to insert it, as the list was already non-alphabetical; probably based on where it seemed sensible in the help listing):

@@ -1865,6 +1889,7 @@
     make_pair("getreceivedbyaccount",   &getreceivedbyaccount),
     make_pair("listreceivedbyaddress",  &listreceivedbyaddress),
     make_pair("listreceivedbyaccount",  &listreceivedbyaccount),
+    make_pair("getrawtransaction",      &getrawtransaction),
     make_pair("backupwallet",           &backupwallet),
     make_pair("keypoolrefill",          &keypoolrefill),
     make_pair("walletpassphrase",       &walletpassphrase),

The mempool object now needs to be visible between compilation units. I suggest doing a grep to verify this introduces no name conflicts.

--- a/bitcoin/src/main.cpp
+++ b/bitcoin/src/main.cpp
@@ -26,7 +26,7 @@

 CCriticalSection cs_main;

-static map<uint256, CTransaction> mapTransactions;
+map<uint256, CTransaction> mapTransactions;
 CCriticalSection cs_mapTransactions;
 unsigned int nTransactionsUpdated = 0;
 map<COutPoint, CInPoint> mapNextTx;
--- a/bitcoin/src/main.h
+++ b/bitcoin/src/main.h
@@ -46,6 +46,7 @@

 extern CCriticalSection cs_main;
+extern std::map<uint256, CTransaction> mapTransactions;
 extern std::map<uint256, CBlockIndex*> mapBlockIndex;
 extern uint256 hashGenesisBlock;
 extern CBlockIndex* pindexGenesisBlock;

I tested that it builds, successfully fetches transactions from both mempool and database, and returns the expected errors for missing argument or transaction not found. It does accept invalid hex strings, perhaps a flaw in that SetHex method. I've been running the patch in production since around August 10th of this year.

  1. The original cause for my writing the patch was a stuck transaction that wasn't getting relayed to miners or block explorers for unknown reasons. Upon fishing the raw tx from the mempool and submitting it to one such site, a useful error was finally obtained identifying the problem as the S-value normalization mess; the -lows option provided a workaround, after double-spending to self for safety, which was a whole other pain. [^]

2019-12-04

keksum, a Keccak implementation in C as standalone Unix utility: genesis

Filed under: Software — Jacob Welsh @ 17:36

I produced a Keccak implementation in May 2019, through about one week of intensive study and hacking. It builds on some techniques and routines I'd been developing for small, self-contained C programs on Unix, whereby the standard I/O library is thrown out in favor of a minimalistic interface fitting the needs of the program, requiring for portability only the system call wrappers, which generally have direct translations to assembly language. The approach ensures that system errors are detected without requiring effort by calling code and leaves no uncertainty regarding signal behavior, flushing of buffers or integer overflow. The resulting binary here (musl/amd64, static, unstripped) weighs in at 19K and contains not so much as a "malloc".

Regarding the permutation and sponge construction themselves, I found the Keccak reference quite comprehensible; the only topics I needed to brush up on were Linear Feedback Shift Registers and the polynomial algebra used to describe them.

Having previously written what ended up as possibly the world's slowest hash functions in an interpreted language, I was itching to make something fast for once: not pouring vast efforts into optimization but at least avoiding obvious slowdowns. I wish I could say I got it all right on the first try, or that I identified all mistakes through careful re-reading, but not quite:

jfw: well my keccak proggy hashes; now to look for some test vectors
jfw: sure enough I botched it:
jfw: mathematically, (x-1) mod y is equivalent to (x+y-1) mod y
jfw: (pretty much the definition of mod...)
jfw: but if x is an unsigned type -- which I made it, because it's used as an array index and those better be nonnegative -- the subtraction wraps *mod 2*
jfw: er, mod power-of-two
jfw: the arrays in question are indexed mod 5, which being coprime to any power of 2, gives a decidedly different result from if x were a signed or mathematical integer.
jfw: (in another part of the code I had in fact anticipated this.)

The arithmetic in question sure looked correct on the surface, so I tracked this down by adding fine-grained test probes to compare each step of the permutation against provided data.(i) Further contemplation lead me to see that the mod-5 operations were entirely invariant in the permutation's input and thus amenable to a lookup table without introducing a timing side-channel.

I did a full re-read a month after writing; the only mistake found was a comment with off-by-one range description.

Capacity and output length parameters are user-selectable; I chose a default of 512 bits for both, as explained in the code:

Sponge capacity is an upper bound on security level because the permutation is readily inverted if its full output is known. Beyond that, its relationship to actual security is not clear to me (or perhaps anyone); some margin of safety seems prudent. In the FIPS202 parameters, capacity under 512 is seen only in "SHAKE128" and none of the "SHA3" fixed-width hashes. The EuCrypt default is 256 (bitrate 1344); I have not found a discussion of this choice.

I should have piped up and asked about this at the time; though I was still a WoT non-entity living in the shadows, it happens that blog comments are one of the easier ways to get started in the grand conversation, and certainly if you have something interesting to say or ask.

The past being what it is, I will ask now: do any mathematical minds in the readership have input on what constitutes a "good" choice of capacity and why? If as I'd been thinking it's more than 256 to be at least as secure as SHA3, it would seem to suggest a need to regrind the existing Keccak vpatches or otherwise deal with a multiplicity of standards.

A compiler with 64-bit integer support is required; there is no dependence on machine byte order. The little-endian convention is used for interpreting bytes as the bit strings the permutation is defined on. I believe the code to be timing-invariant with respect to potentially secret bits, with the exception of output hex encoding. This would be good to fix. The other main deficiency I see is lack of a working "-c" option to verify provided hashes.

Finally, some basic performance numbers, from a modest Core 2 @2.26GHz, 3072K cache:

$ dd if=/dev/zero bs=1048576 count=100 | keksum
100+0 records in
100+0 records out
104857600 bytes (100.0MB) copied, 2.656227 seconds, 37.6MB/s
7a37879dcc634482775f4c1ebf294a0a02c9bcf924222cf6d2fe1c6beca3574debfe73f9034b868deefd7bbad4f5c251333bc3c735f1a82de045c7980814a2c2

$ dd if=/dev/urandom bs=1048576 count=100 > rand
$ dd if=rand bs=1048576 count=100 | keksum
100+0 records in
100+0 records out
104857600 bytes (100.0MB) copied, 2.648968 seconds, 37.8MB/s
ec4eab1cff9198e1ef23d663cc9da505eaeda713168bf31ed7807ec63fb1ddfa3454aea212fab193b071ad98e2cc47b5d004a4d1737d23ae8804978995ae1b7a

Download vpatches

  1. The spiffy modular types in Ada would have prevented this mistake entirely, though not without some cost either in runtime performance or compiler complexity (and thus potential bugs...), I would imagine. [^]

2019-12-03

Keccak background

Filed under: Bitcoin, Historia, Software — Jacob Welsh @ 18:52

"Keccak" is a cryptographic hash function, or rather, some primitives for constructing such functions in a desired size and shape, of relatively recent design as these things go. It was brought to the attention of the forum in early 2016 in the context of contemplating changes to the Bitcoin protocol,(i) (ii) (iii) and subsequently differentiated from SHA3.(iv)

Compared to the prevailing standards at the time - mostly variants on the MD4 concept, processing blocks of input through an iterated compression function - Keccak is based on a large pseudorandom permutation (1600 bits, though the spec also defines smaller variants). As this is readily invertible, the desired "one-way" property is provided by a "sponge construction", mixing in blocks of input and extracting output while iterating the permutation and keeping some number of its bits secret as internal state. This number is called the capacity (or by complement the rate, the two summing to the permutation bit width) and can be tuned for the desired balance of security and computational intensity. The construction can take unlimited input, or produce unlimited output as a kind of stream cipher.(v)

I started out in 2017 playing with a C implementation found in the wild, supposedly a "readable and compact" version written by the original team. With some cleanup I got it into a state that could be described as compact, but I couldn't get very far in reading it, at least without having first digested the spec. And it had the unfortunate limitation of requiring the full input and output to exist in memory, no streaming. My confidence as an applied cryptographer was growing and I soon implemented a number of classical hash functions, but set Keccak aside as not being an immediate necessity. Meanwhile, Diana Coman produced and incrementally published a very nice and documented reference implementation in Ada, which was adopted for use in V and soon became non-optional.

While I was well convinced by the Republican rationale for Ada, I was much less keen on introducing GNAT, the flagship implementation, into my environment. It was a million-plus-line-of-code beast that I wouldn't stand a chance to ever really understand; making matters worse, it was a "Thompsonism", a circular dependency requiring existing binaries in order to build from source and thus dubiously "open source" at all. While I already depended on one such thing - the C compiler - I was hoping to somehow keep this to ONE thing, or at least ensure a way to work with the crucial V on existing machines without pulling all this in.

Stay tuned for the result.


  1. mircea_popescu: actually i wouldn't go to war over keccak.
    mircea_popescu: letting bitfury & friends eat 100mn in unrecoupable engineering costs would provide exactly the correct lesson as to what it's a good idea to say and when it's a good idea to shut the fuck up and toe the line.

    [^]

  2. The necessary prerequisite for any change to the Bitcoin protocol [^]

  3. mircea_popescu: http://log.bitcoin-assets.com/?date=01-02-2016#1393026 << at least it wasn;t fucking developed by teh nsa.
    assbot: Logged on 01-02-2016 19:29:18; ascii_butugychag: ;;later tell mircea_popescu in what sense is adoptinc keccak a rejection of usg standards? it was actually adopted as sha3...
    mircea_popescu: as far as we know. whatevs. minor point.
    ascii_butugychag: btw between that thread and now i went and read the keccak spec
    ascii_butugychag: it is mighty spiffy.
    ascii_butugychag: accordionizes to size.
    mircea_popescu: :)
    mircea_popescu: i don't need to explain what i meant by not finite then ?
    ascii_butugychag: aha.
    ascii_butugychag: other hashes also accept infinite bits but they eat where they shit.
    mircea_popescu: quite.
    mircea_popescu: and mind that while in no means do i propose this is "Asic resistant", from a designer perspective you must appreciate i'm giving you a fun job to do.
    mircea_popescu: at least therer's that.
    mircea_popescu: always make sure everyone's having fun.
    ascii_butugychag: quite! nobody will be plagiarizing old verilog from fpga docs to bake this one.
    ascii_butugychag: very asian-resistant.
    ascii_butugychag: which is a mega-plus.

    [^]


  4. asciilifeform: holyshit the original keccak www is gone
    asciilifeform: replaced with some horrorshow
    asciilifeform: ada code -- gone
    asciilifeform: fortunately still on my hdd
    asciilifeform: check this out, keccak.noekeon.org now forwards to buncha tards
    asciilifeform: https://archive.is/GkmgU < original
    shinohai: Notice that happened after nist.gov declared their spec
    asciilifeform: shinohai: not immediately , iirc was still intact last yr
    asciilifeform: incidentally shinohai keccak != usg.sha3
    asciilifeform: they adopted ~particular params~ of keccak as the new national whatever
    asciilifeform: orig is ~family~ of functions.
    asciilifeform: see also https://archive.is/lViVh << since 'unhappened' article
    asciilifeform: ' The SHA-3 version of Keccak being proposed appears to provide essentially the same level of security guarantees as SHA-2, its predecessor. If we are going to develop a next generation hash, there certainly should be standardized versions that provide a higher security level than the older hash functions! NIST, in the original call for submissions, specifically asked for four versions in each submission, with at least two that would
    asciilifeform: be stronger than what was currently available, so it's hard to understand this post-competition weakening.'
    asciilifeform: didjaknow.
    asciilifeform: notice how 'everyone' nao thinks 'oh, keccak? that's called sha3 nao' [^]
  5. Since state is still finite, output will of course repeat eventually; one would hope this cycle length approaches that order of 21600. [^]

2019-12-02

Mapping context and scope for a Keccak writeup

Filed under: Software, Writing — Jacob Welsh @ 19:42

I intended to briefly introduce and present my implementation of the Keccak family of cryptographic hash functions today. A relatively small and self-contained piece of code, doing a well-defined thing, solving a well-understood problem... should be easy, right? Or so I thought, for about two minutes.

As I considered what context would be necessary even for readers already educated in "wtf are hash functions?", it soon blew up into a whole web of connected topics: some already written up elsewhere, some clear enough in my head to write up now, some needing a refresher, and some requiring further research.

There'd be history of Keccak itself; history of its interest to the forum, including why not SHA3; the seeming lack of a spec in place of SHA3 for "last hundred yards" questions like byte order and parameter selection; the degree to which the original paper does or doesn't help with those questions; history of interest in the Ada programming language in the forum; how I wanted to embrace it but was averse to accepting GNAT; approaches I made to that problem; an attempt I made to tidy up an older Keccak implementation; deciding to do my own in C; the level I aim to work at in that language regarding support libraries and why; my experience of reading the Keccak paper and filling in knowledge gaps; my observations in comparing it with SHA3; how I approached the parameter question and what remains open there; the process of implementing, mistakes made and tracking them down. Then could come presentation of the code itself, noting requirements, tested platforms and observed performance.

Perhaps I've now exaggerated the problem a wee tad; surely something sensible could be produced with a scope in between "everything that could possibly be said" and "here's some code: good luck". Certainly the debts of my past writing avoidance are making themselves felt. I hope I've at least better illuminated for myself a path out of the pit and provided a hint of things to come. Onwards!

2019-12-01

Gales Linux initial release

Filed under: Software — Jacob Welsh @ 21:05

Today I am pleased to present, at long last, an initial public release of my Gales Linux project.

The goods

To get started you may wish to examine the files:

  • BUILD, a lengthy recipe for bootstrapping the system from source.
  • INSTALL, listing the steps to install the result on a machine. Presently quite terse; some experience with manual Linux installation such as in Gentoo will likely be needed as well as some clarifications.
  • PORTS, explaining how to get started working with gports and turn the rather minimal base system into something more comfortable.
  • gports/gales-util/gales-mirror-sync, with which you can download the much larger collection of sources referenced from this repository, both base and ports.(i) (ii)

Mirror sync is designed to work with minimal dependencies: you should be able to copy the script from the repository and run it on an existing Unix-like system, setting the DST variable to a fresh download directory and possibly replacing the "wget" call with "curl" or your preferred download tool. HTML scraping is neither required nor supported. If you'd rather not use my script you could parse the manifest yourself. (Just don't go putting something that blindly re-downloads the whole thing in a cron job, m'kay?) The hashes in the manifest act as a first pass of error detection in storage and transit, while canonical ones are in the signed repository.

The full mirror presently weighs around 440MB. Distribution is quite skew, with the top 10 file sizes:

$ awk '{print $2,$1}' Manifest.sha512 | sort -nr | head
93192404 linux-4.9.tar.xz
82935453 gcc-4.7.4.tar.bz2
76156220 gcc-6.4.0.tar.xz
22887305 db-4.8.30.tar.gz
22716802 binutils-2.24.tar.bz2
12495628 Python-2.7.13.tar.xz
12465748 php-5.6.34.tar.xz
11961692 perl-5.26.0.tar.xz
11570420 perl-5.24.2.tar.xz
9377313 bash-4.4.tar.gz

You'll need to supply the Linux kernel patch of your choice, or the whole thing if you need something other than 4.9.

The ports collection consists of:

acpica autoconf automake bash bc bison bzip2 cl-hyperspec clisp dash db flex gales-util gcc64 git gnupg less libevent libressl libusb links m4 man-pages man-pages-posix mandoc ncurses nginx ocaml openssh patch pciutils perl php56 py-setuptools python python-docs qmail readline redis sbcl sqlite sqlite-doc tmux ucspi-tcp vim xz zlib

Note that some of these may be trimmed down or configured in more minimal ways than you're accustomed to, either due to compatibility problems with musl or static linking or as a general precaution. I'm not opposed to re-enabling features upon reasonable demonstration of safety, or you can certainly edit the builds to suit your own needs. Things I consider high priorities for porting effort include:

apache emacs gdb ghc gnat hdparm iptables mysql sdparm shred smartctl strace texinfo X11

Obligatory disclaimer: Like any operating system targeting the "modern" hardware and software cocktail, Gales Linux contains large amounts of toxic and hazardous materials both known and unknown. While I have striven to make prudent and security-conscious choices, I am not attempting to keep up with the "penetrate and patch" rat-race in its many third-party components. Be careful out there.

Next steps

As my goal for now has been to release what I have without too much further fiddling, a few obstacles remain for a proper V genesis of the repository.(iii) What I've noted so far are some executable scripts in the repository that would either need to be invoked differently or somehow "chmod"ed before use, and one patch (to bin86) that includes a control character in its context lines. Also some clutter for potential removal is a number of old-style "context diffs" that I reground to unified format for BusyBox compatibility but preserved the originals for the sake of preservation (e.g. in bash and vim ports).

  1. I've fixed the previously noted subdirectory flaw. [^]
  2. Present mirror IP: 198.199.79.106 welshcomputing.com [^]
  3. If such a direct conversion is even a sensible way to go; for instance, including externally maintained items as tarballs by hash reference is not in keeping with V principles, but taking ownership of the whole mess will be a larger project. [^]

2019-11-29

Introducing Gales Linux, a cross-bootstrapped, do-it-yourself, fully-static, discriminatory distribution

Filed under: Software — Jacob Welsh @ 20:57

Motto: Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary.(i)

Gales Linux is a new operating system distribution, navigating the stormy seas of software since 2017 and to be released shortly. While composed mainly of existing components, their selection, the build and install processes and glue are of my own design and implementation, not derived from existing distributions though certainly informed by them. It provides a bulwark against the seemingly uncontrollable growth of accidental complexity and technological churn that characterizes the modern Linux scene, harking back to a time of more comprehensible computing while aiming to incorporate some of the better ideas that have come along since. This said, it is not intended to be for all comers or all purposes. It's been an experiment in applying certain design elements and seeing how far they can go. I find it useful already; if you do too, then great; and if you're inclined to build your own system you may find it a useful starting point or resource to draw parts from.

It includes a Linux kernel (that you will need to configure yourself to fit your hardware), a GCC 4.7.4(ii) toolchain supporting C and C++, the musl C library, BusyBox utilities and a custom pdksh-derived shell.(iii) At present the environment is text only. Notably, the bootstrap procedure is documented and does not require an existing Gales system or matching architecture; in theory it can work from any reasonably POSIX-like system, which so far has been demonstrated on Gentoo, OpenBSD and Gales itself.

I've had three main guiding principles in the design process. First, the system should loyally serve the operator; for example, the act of installing, reinstalling, upgrading, patching or whatnot on a program should not "helpfully" modify live configuration or daemon process state. Second, the system should preserve meaning: while the ideal of direct execution from human-readable source code may not be presently practical, it should be the preference, and full reproduction of the system from source should be regarded as a primary necessity. Third is the old "Keep It Simple, Stupid" - perhaps better formulated as fits in head.

History

I had made some earlier attempts to take control of an OS starting in 2016. At the time I was running Fedora, Debian and OpenBSD, having been soured by the constantly broken builds in Gentoo after years of using it.(iv) The idea was to adapt the Linux From Scratch process to bootstrap a musl-based system from source, then use the existing RedHat Package Manager for applications. The effort was unsuccessful but instructive. I then tried Alpine Linux; it appeared to be elegant and developed by knowledgeable musl people, but I was alarmed to find that despite a claimed focus on security, its package management tool was written in C and "secured" by HTTPS. Next I tried the experimental "Gentoo hardened musl" project, using its "stage3" binary as a base but building further libraries and applications manually, thus forcing myself to inspect the upstream offerings, read the READMEs and run the ./configures. This went fairly well and I built up a personal archive of sources, patches and recipes to document my steps. I planned to deal with the remaining mystery blob of the stage3 by reproducing it from some pre-existing system; in the sort of surprise that by then was becoming unsurprising, I found that Gentoo's bootstrapping tool, "catalyst", did not support cross builds,(v) nor had the "hardened musl" project left any documentation on how they'd seeded their image.

Around the same time I had started studying the work of Daniel J. Bernstein aka djb. Some revelations were that much of what I had understood as "package management" was an unnecessary result of poor filesystem layout; that bug-free code wasn't such an unrealistic thing to aim for, but required questioning established interfaces; and that the more enticing aspects from a sysadmin's standpoint of the "systemd" abomination had been available at least a decade prior and with vastly less code. With Gentoo appearing to have minimal value left to offer, I set out to revive my from-scratch process and build a full system around these ideas.

Key design decisions

1. No separate "package database." A package-major hierarchy plus symbolic links is enough.

2. Fully static linking. The operator should be free to modify libraries, even keep multiple versions around, without risk of breaking existing programs. Thanks to musl the cost of object code duplication is low in most cases; in theory, program load time and memory consumption can even improve compared to traditional GNU/Linux and without extra caching mechanisms. Questions of how to build a given thing static or dynamic or both become unnecessary. In combination with 1, packages can be updated or rolled back more-or-less atomically.

3. Minimal PID 1 (init), as it occupies a position with special privileges and reliability requirements. Use external scripts for the boot process and daemontools for service management.

4. Static device management. No layers of "tweak the udev rules to tell the daemon how to regenerate the nodes" - that's what the filesystem is for. If you need to tweak /dev, you just do it.

5. Use initramfs for install and rescue environments and ensure its contents are easily customized. One result is a "viral" property that installing the system does not require physical boot media, merely an existing Linux-compatible bootloader (though of course a bootloader can be installed on external media).

6. Lightly automated build and install tools for additional software ported to the system, working based on build definitions including metadata, source checksums, patches, and build scripts.

7. A conservatively curated library of such software, known as the gports tree.

8. No effort to keep up with churning data sets such as Unicode, time zones, or message translations. News from the OS should pertain only to the functioning of the OS; definitionally unstable databases are the operator's business.

9. Config files protected without any special mechanism: shipped configs are installed exclusively to /etc/examples, from which you can copy or diff at leisure. This does mean you sometimes need to check for such examples for things to work as expected.

10. Simplifying third-party build systems as a gradual effort, typically replacing autotools spew with static config.h and Makefile, which often provides a noticeable build speedup and greatly eases investigation of questions like "wtf code am I even running?!"

11. Few libraries visible in standard search paths. To link with a non-standard library you use the -I and -L compiler flags to indicate its installed path. Thus linkage becomes much more explicit even in the presence of magical build systems that try everything they see.

12. Self-extracting shell archives, allowing precise and deterministic specification of metadata for trees of text or binary files without inheriting the complexities of the "tar" formats (plural!).

13. Deterministic build for the base so as to truly factor out the bootstrap host. While much progress was made here to the extent that results were bitwise-reproducible from two Linux systems, the goal of extending this to any host remains elusive, particularly in GCC and the kernel.

14. HTTP mirror for third-party source tarballs, including base and ports, with script to replicate and efficiently synchronize without allowing existing files to change or disappear.(vi)

15. Original sources (including documentation, scripts, base config files, gports and patches) kept in a single relatively lightweight repository suitable for management with V.

Over the next few days I will be dusting off the repository and publishing the code and some stats, so stay tuned!

  1. R. Kelsey, W. Clinger, J. Rees (eds.), The Revised5 Report on the Algorithmic Language Scheme. [^]
  2. Last series that can be bootstrapped purely from C. [^]
  3. IMHO providing a good compromise between comfort, code size and standards compliance. Bash is available as an option. [^]
  4. For one thing, it tries to take no stands and be adaptable to any purpose through a system of USE flags controlling how programs are built; for another, it has a "rolling release" model and generally accepts upstream updates. The result is a combinatorial explosion such that nothing really gets tested and every Gentoo system becomes unique, uncharted territory. [^]
  5. Which raises doubts on to what extent it really builds from sources rather than importing artifacts from the host system, something that can easily happen by accident given the complexity of the toolchain. [^]
  6. A present flaw is that the sync script doesn't allow subdirectories - validating server-provided paths in a shell script is tricky! - yet the mirror has one. Manual intervention required for now. [^]

2019-11-28

This is not an article.

Filed under: Tempus fugit — Jacob Welsh @ 19:25

Today's "morning article" is a public admission of failure to produce a morning article or even coherent slice of an article. I've got a start on presenting my Linux distro, but I'd rather not waste anyone's time with a half-assed mess on this complex topic. Faced with a couple bad options, I'm choosing to not snowball my problems further and move on for now. May I be more productive tomorrow.

Gales Bitcoin Wallet spec and battle plan

Filed under: Bitcoin, Software — Jacob Welsh @ 04:51

This proposal will detail the software part of my wallet under development,(i) consisting of a security critical offline part and less sensitive online part.

All monetary values are input and displayed in decimal BTC but kept internally in exact integer (satoshi) form; conversion must be lossless.

Offline

Wallet data is stored in a directory tree containing a keys subdirectory and outputs, change, fee, and transactions files.

Private keys are represented in hex, with one key per file in the keys directory. (Probably little-endian; need to check existing conventions.)

Each key file is named by the Base58Check-encoded Bitcoin address corresponding to the key. (This implies a case-sensitive filesystem.) Support for compressed public keys may be added using an extra tag byte in the private key.(ii)

A list of outputs believed to be spendable is maintained in the tabular text file outputs, with records delimited by linefeeds and the following fields delimited by one or more space or tab characters:

  • Address (Base58Check)
  • Value (decimal BTC)
  • TXID (hash of transaction containing the output, in the conventional little-endian hex format)
  • Index (position in the transaction's output vector, decimal integer)
  • Comment (optional; may contain spaces and tabs)

Encryption is supported by keeping these in a tmpfs and persisting with tar+gpg or similar.

The offline wallet program provides the following commands:

  • gen-key - generate a new key file and print the resulting address to stdout. (Nice to have: numeric argument to generate many.)
  • send ADDR VAL ... - generate a signed transaction sending the given addresses the given amounts in BTC, pairwise. Inputs are selected from the outputs table in the order they appear. The change address is read from the change file; the fee in BTC per 1000 bytes is read from the fee file. The hex-encoded raw transaction is appended to the transactions file. (Nice to have: summarize the proposed transaction and prompt for confirmation.) The outputs file is overwritten to remove the spent ones and add the change and any others that pay the wallet's own addresses.

Once transferred, the transactions file can be removed or truncated at will.

Online

The online part has a database that serves to index a lightweight subset of blockchain data pertinent to a single offline wallet. It tracks the following objects:

  • Watched addresses
  • Confirmed outputs funding them and inputs spending from them
  • Confirmed transaction metadata (hash, block hash+height+index, size, fee, comment)
  • Raw transactions submitted by the operator
  • Scan state

The online wallet program communicates with a Bitcoin node to populate its database and transmit new transactions. It provides sync commands:

  • scan - iterate blocks, filter for relevant transactions and update database.
  • reset - clear the scan pointer, e.g. to pick up past transactions affecting newly watched addresses. (Nice to have: argument to set to a given height)

Input/output commands:

  • unspent-outs - print unspent output table in the format used by the offline wallet. (Nice to have: other views like all outputs or only new outputs)
  • watch - import new addresses to watch from stdin, with optional comments.
  • push - import raw transaction(s) to push from stdin, with optional comments.

Accounting commands:

  • balance - print the sum of unspent output values.
  • register - print transaction history, format TBD.

Plan and time estimates

Online:
Write SQLite schema (currently drafted on paper): 1h
Import RPC client, block and transaction parsing code (I considered using python-bitcoinlib from Garzik et al. here; happily I have original implementations of the necessary parts): 1h
Block fetching ("dumpblock" from TRB to named pipe): 1h
Sync commands: 3h
I/O commands: 3h
Accounting commands (using something quick and dirty for "register" format for v1): 2h
Write manual: 3h
Total online part: 14h
Proposed deadline: Sunday, Dec 1.

Offline:
Port transaction signing prototype to Scheme: 2h
Key file I/O: 1h
BTC numeric I/O: 1h
Outputs table I/O: 3h
Update "gen-key" command for new spec: 1h
"send" command: 3h
Sample tar+gpg wallet open and close scripts: 1h
Design and implement tests: 5h
Write manual: 3h
Total offline part: 20h
Proposed deadline: Thursday, Dec 12.

Other:
"sendrawtransaction" RPC for TRB: 5h

Total: 39h
Deadline: Friday, Dec 13.

Now: I see 40 available hours remaining in my schedule for this time interval. Given that I'm far from confident in my estimation abilities here and there's always something unexpected, this is worrying. Perhaps the documentation would be a light enough job at this point to handle on the plane. Perhaps there's a "sendrawtransaction" patch floating around somewhere. Perhaps someone can be motivated to lend a hand - this would be the most separable task of the pile.

  1. Through further consideration of the coding tasks and data dependencies, in particular mapping out the online part as a relational database, I've realized my earlier notion of transactions as the central data structure had introduced unnecessary complications. If accounting of historical transactions is left to the online database, the offline signing part need only be aware of currently unspent outputs, which reduces its storage needs to a flat table, no S-expressions or JSON or similar. I believe this also brings it in closer alignment with the ideal. A second simplification is in the storage of addresses and keys, following a philosophy of "the filesystem is a database; use it", and naming of keys has been dropped. [^]
  2. This may be desirable to import keys or sweep funds from legacy wallets without having to execute their code. If used by default it would reduce outbound transaction size, thus perhaps fees, though as ever miner behavior is uncertain. [^]

2019-11-27

Early history of me, part 6

Filed under: Ego, Historia, Paidagogia, Vita — Jacob Welsh @ 18:22

Continued from part 5

Another eventually-successful parental negotiation involved my music studies. While my violin skills had advanced substantially from ages six to twelve, both solo and in orchestra, and I enjoyed performing, I had never quite accepted the burden internally, and the rigors of daily practice continued to grate. It probably didn't help that my parents weren't demonstrating much musical discipline themselves. If you want to raise a Wolfgang Mozart, it helps to be a Leopold Mozart, y'know?

At the same time, I'd dabbled a bit with the piano, because it was there, and it called out to be tackled properly. I convinced them to let me switch; we found a local teacher (at greater expense, if I recall, for having to look outside the organization) and I pursued the study with vigor. Unfortunately this only lasted about two years until we couldn't seem to make time for it among the increasing demands of school.

Some words about extended family would seem in order to round out an overview of my childhood. There was one set of grandparents surviving, my mother's side, who had retired about an hour north (a seemingly interminable drive at that age) in Gettysburg, Pennsylvania.(i) We'd visit every month or two. I liked them better than my parents did, probably due to less historical baggage on one hand and their inclination to spoil me on the other. When I slept over I'd be able to watch cartoons and play with Grandpa's Mac (with color display!) for hours. They had an affinity for the Arab world, having spent their careers as professors at the American University of Beirut. "Sittou" as we called her was the only churchgoer (Lutheran) in the clan, while Grandpa was a kind of tolerant non-believer. There was an uncle with family that I'd usually see at the grandparents' place.

On my father's side there was an elder aunt and family in Maine; due to the distance we'd see them yearly, at least in the good years when we could afford the vacation. They had picked up the tab on a coastal summer cottage that had been in the family a few generations; I remember with great fondness the change of scenery, climate and pace afforded by these trips; the smell of pine forests and ocean.

While all sorts of details could be relevant to the story of childhood, I will close this series with one that made a distinct mark on me and my generation: the events of the morning of September 11, 2001 and subsequent descent into war on an emotion. It was a school day in the sixth grade. The administration's first reaction was to say nothing, but by lunchtime a growing list of names was being called to report for early pickup, and rumor spread: "the country is under attack!" The superficial facts became clear soon enough, if not the interpretation. Following my parents I was skeptical of the official narrative; LaRouche had even spoken of the possibility of a "Reichstag fire" i.e. false flag event, before it happened. Whatever the Bush/Cheney administration's negligence or even complicity may have been, things played right into their hands. There was an upswell of patriotic fervor, with the songs, "United We Stand" posters and "Fight Terrorism" bumper stickers. I noted the blue skies vacant of contrails as civilian flight was suspended in the following weeks, and the later conversion of airport "security" from this quaint thing with X-ray machines to the complete exercise in humiliation that the inmates now take for granted. As the war whoops escalated, the average low-information voter didn't seem to perceive a difference between supposed Saudi hijackers, Taliban, or Saddam Hussein. Someone had to pay and it didn't much matter who. It marked the beginning of an end of innocence, both in the culture as a whole and my relationship to it.

  1. Perhaps most famous for its battlefield, regarded as the turning point of the American Civil War. [^]
Older Posts »

Powered by WordPress