Fixpoint

2020-09-29

Gales Scheme

Filed under: Historia, JWRD, Scheme, Software — Jacob Welsh @ 22:26

There's a project I've been working on, passionately though intermittently, for some years now that I'm at last going to start laying out, one bit at a time.(i)

Its framing started out ambitiously enough as a simple yet complete Scheme interpreter for the Unix environment. The idea was to build a robust implementation of this classical, expressive and compact programming language, that would fit in my own head and enable me to meet business needs as technical co-founder of what's now JWRD Computing, with greatly reduced reliance on the conventional options of the day. Those seemed to offer only "industrial scale" bug-infested quagmires on one end and half-assed hobbyist efforts on the other.(ii) It has achieved this to some degree and is already quite serviceable, albeit for a rather narrow sort of domain.

I now envision its future as nothing less than a complete and self-sufficient computing system, enabling all of the accreted layers of software and hardware sludge - historical, modern and postmodern - to be safely jettisoned. It has a lot of growing to do, and so do I if I'm to get it there. And I'll be needing some help.

The code is already available for what I consider a maintenance branch, in V patch format here,(iii) but major changes are afoot.

  1. Hopefully not the binary sort of "bit" - though I'm going for small-scale articles on the theory that I can be more consistent about it that way. [^]
  2. For what it's worth, it was an attempt at reading the TinyScheme source code that convinced me that writing my own was quite doable and yet quite necessary if I was to have the kind of diamond I sought. [^]
  3. I recently issued a re-genesis to better follow V conventions, and some bug fixes: the recommended head is gscm_fix_m_whitespace_package_install.vpatch. [^]

2020-06-27

A bevy of fixes for V in Perl

Filed under: Software, V — Jacob Welsh @ 21:49

"Fixes" may be a strong term, in that it could be argued the item in question wasn't exactly broken. It was however darn near unusable in practice, and contained a number of fragile and unstated assumptions about its environment and invocation.

I thought I'd introduce my work by putting it in context with a quick recap of what V is; but then it turned out that my understanding of the matter, and perhaps even my approach to gaining understanding, needed some fixing too.(i) In short, it's nearsighted to define V as an improved kind of version control system; rather, its proper placement is a few levels up in the tree of concepts as a new way of thinking about, talking about, and deploying software, broadly construed. Thus I surmise it's also inadequate to consider the present "V in Perl" artifact, its label notwithstanding, to be a kind of reference implementation of V, but rather an early implementation of a relatively small part of the overall vision.

Still, whether I fully manage to see in this horseless carriage prototype a "transportation revolution to permanently alter the shape of civilization" or "just a faster kind of cart" - or something in between - if I'm to be riding in it then I want the wheels on straight; and I'm plenty capable of seeing to that.

The first patch packs in quite a few relatively small and independent fixes, building off the GNAT-demanding ksum/vpatch branch. Paralleling my previous work, the second swaps in keksum and patch to avoid the secondary compiler requirement (a change that's now more straightforward).

Download

Changes

Taking the main patch, one item at a time, from the manifest:

(1) Eliminate use of external binaries (cat ls sort pwd which) and provide more hygienic directory listing;

Now I'm no Perl monk (and would likely never choose it myself for a new project) but I know you don't have to open subshells and pipelines just in order to read a file or list a directory, really now! The "ls" scraping is a particularly risky pattern due to the possibility of unexpected control characters in filenames. I was surprised to learn that stock Perl lacks an internal "getcwd" function, but the "pwd" and "which" usage turned out to be doing more harm than good anyway and easily eliminated.

(2) add missing error handling in build_wot;

This is a repeated pattern in the code, and one I probably haven't entirely eliminated. Much like the underlying C, many Perl functions don't raise exceptions but expect the caller to check manually for errors. Much like the underlying C programmers, many Perl programmers don't bother with that pesky error checking stuff. Combine this with mutable variables repeatedly reinitialized by a loop on the assumption that nothing fails, and you get all sorts of interesting leakage possibilities. In this case, if one of the GPG key files placed in ".wot" was invalid, the program would not only fail to notice but label the bad key with the metadata of an unrelated victim that was seen previously.(ii)

(3) eliminate some variable regex patterns;

Using a complex tool when a simple one suffices - and, predictably, not using it correctly (i.e. by quoting regex metacharacters).

(4) avoid slurping(iii) full vpatches and verbose output into memory;

Thus it should now work (albeit slowly) on arbitrarily large vpatch files with respect to the system's main memory, and for verbose mode flush the "patch" output to terminal in closer to real time.

(5) fix exponential recursion blowup in traverse_press_path and get_all_descendant_nodes;

This was the original motivation for this work, as I'd noted:

Algorithmic inefficiency is a serious drawback of this tool. I suspect the "toposort" is something like O(n^2 log n) while "traverse_press_path" is exponential, like the textbook Fibonacci example of how not to do recursion. This becomes acutely noticeable around 32+ patches.

Exponential traversals turned out to be in two different functions; happily, the memoization needed to avoid revisiting the same subtrees over and over was already inherent in the data structures being built.

After tracking down the second instance, I used some "awk" to build a full list of recursive calls in the program for scrutiny (namely: traverse_press_path verify_ante remove_desc add_desc_edges add_desc_src_files), so I'm fairly confident there are no more heads on this particular hydra.

(6) allow the patchdir to be a relative path;

It happened that relative paths worked already for the seals and wot directories.

(7) replace some numeric indexing with named variables;
(8) tweak hash program parsing to not require two spaces;
(9) document restricted positioning of global options handled distinctly from commands;

The option-handling code clearly wasn't doing what it was meant to be doing; I haven't exactly fixed that but have at least noted the extant restrictions on where the wotdir/patchdir/sealdir options may be given.

(10) make help and version commands work without a wotdir;

You can't demand I already know how to use a program in order to view its documentation! Well I mean, you can, but... you know what I mean?!

(11) allow patches and seals to share a directory but require standard extensions (.vpatch .sig);

I never quite saw the point of separate subdirectories here, so now you can just ln -s patches .seals and keep them all under "patches", saving a fair amount of pointless shuffling in my own usage. (The default ".seals" path is preserved for compatibility.)

(12) take basenames of patch arguments to allow tab-completeable paths;

That is:

v.pl press a some_big_long_name.vpatch

can now be spelled as

v.pl press a patches/some_big_long_name.vpatch

(i.e. the actual path, although the prefix could be anything). This applies to all subcommands that take patch names.

(13) clean up the tempdir on SIGINT (^C);

Less important now that it doesn't get effectively wedged on exponential algorithms, but still.

(14) factor out some repetition;
(15) other minor simplifications.
Version 99989.

Stats

$ diffstat v_fix_exptimes_paths_etc.vpatch
 manifest |    1
 v.pl     |  234 +++++++++++++++++++++++++++++++++------------------------------
 2 files changed, 124 insertions(+), 111 deletions(-)

Enjoy!

  1. Full details in the logs:

    jfw: diana_coman: I'm trying for a concise intro/description of V, in the present (post-Republic) context. Does this about capture it: "versioning system that supports owner control of computing by placing primary focus on the change and explicit management of trust through strong cryptography" ?
    diana_coman: jfw - hm, what do you mean by "of computing" there?
    jfw: well, of the operation of one's own computers
    jfw: possibly a bit circular with "ownership of one's own"...
    diana_coman: it's more that the definition as you gave it doesn't do all that much - though it takes a few readings, hm.
    diana_coman: it's a bit tortured on various fences by the looks of it; for one thing, defining it as a versioning system cuts away an important part - the deployment of software that is usually not all that much the traditional concern of versioning systems
    diana_coman: jfw - what's the audience you have in mind there or is this blog/generic?
    jfw: it's the blog, yes - and partly for clarifying it for myself, heh.
    jfw: my grasp of what V does for deployment is basically to say that the other tools traditionally used for it aren't necessary
    diana_coman: ahaha, going for once fully-negative-space there (and that getting rid of all the other "tools" is not a tiny thing either at that, but it's more of a consequence than anything else)
    diana_coman: V is a complete solution in that sense, hence "the other tools [...] aren't necessary"
    jfw: (though um, it's still known to lean on 'wget' etc.)
    diana_coman: well, it also still requires an OS!
    diana_coman: anyways, I wouldn't say that "other tools are not necessary" - it's more that the change is so fundamental that previous tools don't fit /don't have a useful place anymore; other tools though *are* still necessary - only they need to be built
    diana_coman: it changes the whole landscape if you want
    diana_coman: but let's rewind and try to grab it from some more concrete end perhaps
    jfw: alright
    diana_coman: so for one thing, V is not some particular implementation but essentially a paradigm for software
    diana_coman: and software as a whole, not just development, nor even just deployment, it goes all the way to even what software *is*
    diana_coman: sure, one can use V for some narrow part that they care about and it's true that the first implementation was just that, a very narrow thing in fact, but that doesn't mean much.
    diana_coman: and I suppose that the current state of V-use and development otherwise might give the impression that there isn't anything more to it either, huh
    jfw: I suppose I've tried to understand the species based on observations of what's shared by the known instances
    diana_coman: jfw - you know, I think your attempt and question there hits actually deeper (and well done for it, too) than you intended, lol
    jfw: haha, indeed
    diana_coman: jfw - so where did you start from, anyway? from the current implementations of V, is that what you mean by the instances?
    jfw: right
    jfw: heh, you know the one about the blind men and the elephant?
    diana_coman: that kind of locks you unhelpfully into some rather sterile and narrow mindframe, myeah (and I'll leave the tracing of the root cause there to each log reader)
    diana_coman: jfw - hm? doesn't come to mind, no.
    jfw: apparently a story that exists in many versions, but basically each man feels a different part of the elephant and extrapolates a completely different (& quite incomplete) picture of what an elephant is.
    diana_coman: ah, the fable, yes
    diana_coman: I can see the similarity, indeed
    jfw: https://allpoetry.com/The-Blind-Man-And-The-Elephant - possibly the main English version.
    jfw: ponders how to "see true v-elephant with mind's eye"
    diana_coman: the thing is, V is not just a different type of versioning system - a bit like a car is not just a faster cart, hm
    diana_coman: jfw - well, better start from the beginning as it were which indeed is *not* whatever implementation, no matter what claims are made otherwise; e.g. [http://trilema.com/2015/no-such-labs-releases-v-for-victory/?b=change&e=satellites#select][the change similar to that introduced by the understanding and controlling movement in terms of mass, impulse and energy, such as it occurs in the launching of
    diana_coman: satellites]
    diana_coman: damn, it still broke the link, didn't it
    jfw: space between the words in the text, yeah
    diana_coman: jfw - my, yrc can't recall previous line??
    jfw: nope :/
    diana_coman: jfw - why, why, why whyyyyyy
    diana_coman: the change similar to that introduced by the understanding and controlling movement in terms of mass, impulse and energy, such as it occurs in the launching of satellites
    jfw: because it's young still
    diana_coman: so based on the above, you can start perhaps with a broad definition of V as a new way of understanding software - and therefore, as a consequence of this deeper and more precise understanding, the resulting more efficient way of talking about software, developing (version controlling being only one part of that developing) software, deploying software, maintaining software and so on.
    diana_coman: jfw - well, yrc may be young and have all the time ahead of it indeed but what can I say, I'm getting older day by day here so pleaaaase: can haz tab-completion and last-line recall?
    jfw: yes; and kill/yank (cut/paste) for the input is needed too.
    jfw: "manage his investment of trust at all junctures so that he is never required to implicitly trust either an unknown code author, or a code snippet of unknown provenance." - hey I pretty much got that part, right?
    diana_coman: with that broad definition at hand to help you avoid the pitfalls of stupid compartmentalizing, narrow focus, childish pick-and-choose and other numerous afflictions of the "software industry/engineering", the next step is to review the stated principles at the root of it all:
    jfw: (but yes, paradigm rather than particular set of scripts was missing.)
    diana_coman: namely software being the property of those running it and identity being constructed by others' view, upon a fixed support
    diana_coman: jfw - trust is possibly the skin of that particular elephant and at least the word itself has been repeatedly brandied about for sure
    diana_coman: it might have been bandied, but I do like brandied better.
    jfw: mmm, brandytrust!
    diana_coman: quite, it can produce... intoxication!
    jfw: especially hazardous when pregnant with concepts & definitions
    diana_coman: ahahah, indeed!
    diana_coman: looking back at your original definition, I'm afraid there isn't much of it left though.
    diana_coman: making a first attempt at tightening up that previous definition:
    sonofawitch: 2020-06-23 21:56:55 (#ossasepia) diana_coman: so based on the above, you can start perhaps with a broad definition of V as a new way of understanding software - and therefore, as a consequence of this deeper and more precise understanding, the resulting more efficient way of talking about software, developing (version controlling being only one part of that developing) software, deploying software, maintaining software and so on.
    diana_coman: V is a new conceptual framework for software, emerging from a better understanding of what software is and providing as main benefits the means for explicit, verifiable enforcement of software ownership by users as well as the correct incentives and supporting concepts for a qualitative jump in the way software is developed, deployed, maintained and evolved.
    diana_coman: jfw - does the above sound like the sort of concise definition you were looking for?
    diana_coman: it aims for a more practical intro so it necessarily leaves some stuff out/picks some to highlight.
    jfw: diana_coman: it's the sort of definition, yes - I don't know that I'll use it here directly though because if I'm to give a definition I'd want it to be one I fully understand myself (i.e. to have that new understanding of software & be able to explain why it's better)
    jfw: I'll work on getting there but the present article can make do without it.
    diana_coman: jfw - ah, no need to use it directly anywhere, lol; and anyways, if not clear, ask further tomorrow or whenever, sure.
    jfw: yep, & thanks for the pointers.
    diana_coman: yw
    diana_coman: such excellent questions are a pleasure to answer, so...keep asking them!
    [^]

  2. "User error", yes; but as the Python folks say, "errors should never pass silently, unless explicitly silenced", one reason that I still grade that language a cut above Perl. [^]
  3. It's the official Perl term, what can I say? [^]

2020-06-08

yrc re-genesis and patch for smooth scrolling and other fixes

Filed under: Software, yrc — Jacob Welsh @ 06:50

I revisit now my IRC client from the other side of seven months since initial public release and a full year since my last round of development on it. Its usage in this interval has been pleasantly free of unexpected problems; indeed the Unix process containing my own "daily driver" has an uptime of that full year, handling everything Freenode's managed to throw at it without a hitch. On the other hand, the software didn't see all that much attention from others, and I became rather complacent about its known quirks and problems.

The interval also saw me coming around to Mircea Popescu's insistence that indentation is done with tabs and not spaces(i) while line breaks are for conveying some kind of meaning beyond "my terminal is too dumb to wrap text on its own". I also switched to the tree structuring convention, prevalent in the software that grew in TMSR, of having a top-level directory named to match the project. As making these changes to an existing tree results in quite a noisy patch, perhaps to the point of effectively severing the connection to the previous state for all the good it does the reader, I've taken this occasion to re-bake the genesis patch yet again. As I'm unaware of anyone else having reviewed the code, I expect this won't pose any real inconvenience.

This out of the way, we get to the more substantive change: refining the scrolling model to count by lines on screen rather than full IRC lines, while still staying anchored to the visible text when the window is resized. I also managed to preserve algorithmic independence from scrollback log length: wrapping is computed on demand for only the necessary lines, ensuring UI responsiveness.

The reason for prioritizing this fix was that a new user pointed out its importance. There's more work upcoming, including tab-completion of names.

A few other fixes and improvements are included, including Python 2.6 compatibility (as yet untested by me); see the included NEWS file for details.

To install, you will need a Keccak V implementation such as my v.pl starter, plus the patches and seals:

Or download in bulk, e.g.:

wget -m --no-parent http://fixpoint.welshcomputing.com/v/yrc/

Once the tree is pressed, see yrc/README.txt for futher instructions.

  1. I'm still not sure what if anything this says about Lisp. [^]

2020-04-28

Build system overhaul for bitcoind

Filed under: Bitcoin, Historia, Software — Jacob Welsh @ 21:19

Background

The Real Bitcoin's build system for some years has consisted at the top level of a number of GNU Makefiles and a thing called "Rotor", building on an earlier "Stator". According to its 2015 introduction by Stanislav Datskovskiy, it served to compile the "bitcoind" executable deterministically and with full static linking, given a reasonable starting environment. The key to this magic was "buildroot", essentially a miniature, non-self-hosting Linux distribution designed for cross-compiling embedded systems.

The determinism came from capturing a full set of dependencies all the way down to the compiler. This came at the cost of adding considerable complexity to the process, as what was formerly a mere application took on all the potential problems involved in bootstrapping an operating system from an unpredictable environment, in addition to the already intricate build systems of the required libraries Boost, Berkeley DB (BDB) and OpenSSL. In an early sign of trouble, Michael Trinque found it wouldn't build BDB on his system without some CPU architecture specific configuration. In my own experience, I got it to work once, but when demonstrating to some friends on fairly similar Gentoo systems I'd built for them, it failed in multiple different ways. Ultimately I couldn't be bothered to track them all down, partly because of how unbearably slow it was: to try any change you would have to repeat the whole toolchain bootstrap.

The V cryptographic source code management system was introduced, with Bitcoin as its first user, shortly after Rotor; somehow the Rotor scripts and patches didn't end up in the V-tree proper, meaning that they in addition to the library and toolchain sources had to be rounded up in order to do offline (i.e. reliable) builds or study the code.

Finally, having already taken on the publication of a Linux distribution with similar static linking and deterministic bootstrapping goals, but going further in providing a self-sufficient system with native compilers, I had little desire to be stuck maintaining two different such beasts.

The vpatch

Thus I now present bitcoin_system_compiler.vpatch (with seals on the shelf). Building on my previous raw transactions patch, it:

  • Rewrites the Makefiles almost entirely. This includes the "makefile.unix" inherited from earlier developers, greatly simplifying it and eliminating historical baggage such as dynamic linking tweaks, Ubuntu bug workarounds, linking of "libssl" and way too much "sed" magic. Additions include compiler warning flags (resulting in quite a bit of warning spew, some of which might be interesting) and building "test_bitcoin" by default. Automatic header dependency analysis is preserved.
  • Removes some minor GNUisms like "tar xvfz" in order to work on BusyBox systems.
  • Brings "openssl-004-musl-termios.patch" formerly found in the external rotor sources into the tree.
  • Adds "boost-no-demangler.patch", discussed below.
  • Removes the various "src/obj" directories and moves all compiler output to the "build" directory. (For instance, this makes it easier to "grep" or "diff" the code without tripping on binary files.)
  • Avoids copying the "bitcoind" binary all around and removes the "bin" subdirectory: one place is enough.
  • Corrects the oversight that a library build failure would be ignored on a second "make" invocation because the mere extracted directories were used as the targets for dependency calculation.
  • Fixes parallel "make" by forcibly serializing the recursion into OpenSSL's ever-so-special custom build system.
  • Avoids recursing into "deps" on a top-level "make clean" so that dependency tarballs won't need to be re-downloaded. (Ultimately these need to get cleaned up and imported directly to the tree.)
  • Tweaks the BDB configuration to prevent "libtool" from attempting to build unwanted shared librarires.
  • Tweaks the Boost "bjam" invocation (the "compression" module gets built at install time if suppressed only at build time) and removes some "|| true" constructs that caused failures to be ignored.

It still supports the "make ONLINE=1" mode to download out-of-tree dependencies into the "deps" subdirectory from deedbot; these are reduced to the three essentials (Boost, BDB, OpenSSL).

In short, it makes both development and deployment much less painful, with a sane starting system as the price of admission.

The undemangling

Special mention is in order for the new boost-no-demangler.patch. Gales Linux uses an older branch of GCC that didn't receive fixes when the long-existing "stack clashing" (archived) family of attacks was stirred up in 2017, meaning some applications could end up vulnerable, especially those using the hazardous yet popular "alloca" or variable-length array features.

As a first step in investigating this, I enabled a number of warnings by default in the GCC configuration relating to excessive or dynamic stack frame size. While these warnings produce many false positives, they've done nicely to illuminate some suspicious code, such as binutils/libiberty/cp-demangle.c. Got that all read? Me neither... but so what, that "libiberty" is just an internal part of the toolchain, or so say the docs, right? And it's "well-known" that you don't want to feed untrusted input to the linker and friends. But wait: the GCC build system copies that code into libstdc++; from there it gets linked into C++ programs. This happens even if the program doesn't use the nonstandard "__cxa_demangle" extension it provides, by way of the default exception handler ("terminate called after throwing an instance of std::whatever").

So my gcc-4.7.4-demangler-amputation.patch, included in the Gales toolchain, removes __cxa_demangle along with the copying of cp-demangle.c, and simplifies the termination function to print raw symbol name, as it previously did anyway for the case of demangler failure (hah! - we learn that they knew their code doesn't always work). These names can be fed to "c++filt" to decode them manually if need be. It then turns out that Boost contains a couple uses of __cxa_demangle - none of them in components actually needed by bitcoind, harrumph - so the Boost patch simply cuts out the assumption that GNU compilers support it.

Stats

$ diffstat bitcoin_system_compiler.vpatch
 .gitignore                          |   21 ----
 Makefile                            |   21 ----
 bin/Makefile                        |   13 --
 bin/Manifest.sha512                 |    1
 build/Makefile                      |   97 +++++++++++++--------
 build/Makefile.rotor                |   56 ------------
 deps/Makefile                       |  166 +-----------------------------------
 deps/Manifest.sha512                |   17 ---
 deps/boost-no-demangler.patch       |   49 ++++++++++
 deps/openssl-004-musl-termios.patch |   46 +++++++++
 manifest                            |    1
 src/makefile.unix                   |  145 -------------------------------
 src/obj-test/.gitignore             |    2
 src/obj/.gitignore                  |    2
 src/obj/nogui/.gitignore            |    2
 src/obj/test/.gitignore             |    2
 verify.mk                           |    5 -
 17 files changed, 168 insertions(+), 478 deletions(-)

Looking only at the "make" code, 471 lines across seven files is reduced to 112 lines across three: quite the improvement I should think!

Future directions

Some work that sorely needs doing, as suggested earlier, is getting those external libraries under control, through some combination of pruning their code to just the necessary parts, replacing their build systems, and importing to the tree, or changing bitcoind code to eliminate them.

2020-04-22

New and improved raw transaction support for bitcoind

Filed under: Bitcoin, Software — Jacob Welsh @ 05:25

Previously on Fixpoint, I introduced my implementation of the getrawtransaction command and reviewed an existing one for sendrawtransaction. There were some more and less serious problems and stylistic considerations outstanding. To recap, for the "get":

  1. Not a proper vpatch.
  2. Loose parsing, ignoring trailing non-hex or excess characters in the txid argument.
  3. Directly accessing the mapTransactions object without acquiring the cs_mapTransactions lock.
  4. Unnecessarily exposing the mapTransactions object globally.

And for the "send", (with some points not previously noted, indicated by *, for my teeth grow sharper with the passage of time):

  1. Needing a regrind to build on other work from the intervening years.
  2. Redundant inclusion of CDataStream constructor defaults SER_NETWORK and VERSION.
  3. Loose parsing, ignoring excess data in the hex argument after the first decoded transaction.
  4. Refusal to re-broadcast transactions already in the node's memory pool.
  5. Returning -5 for already-seen transactions, a code otherwise used only for invalid arguments or inexistant objects. *
  6. New TransactionSeen function, clumsy in both usage and implementation.
  7. Supposed "missing inputs" condition that can't occur and wouldn't distinguish error codes if it did.
  8. Application of anti-spam checks by way of calling AcceptToMemoryPool with fCheckInputs=true.
  9. No reason provided for some rejection cases. *
  10. "txHash" breaking the conventional Hungarian notation, suggesting it's a CTransaction instance. *

Now meet bitcoin_rawtx_get_send.vpatch.

It addresses points 1, 3, 4, 5, 6, 8, 9, 10, 11, and 14. The risky exposure of mapTransactions is replaced with specific accessor functions that take care of locking. Each of these also turns out to simplify other code in main.cpp: always a nice sign that you're onto a good interface.

#2 I've discussed previously and don't see as a problem with this patch.

#7 I've let slide; please speak up if you think it's important.

#12 I've preserved deliberately, on the theory that it's better to reject transactions likely to be rejected by peers so the sender has a chance to determine the cause and fix it. This strikes me as a weak argument though; perhaps better would be to have a way to force it if necessary.

#13 is tricky because AcceptToMemoryPool is a monster of a function that could fail for a number of reasons, and doesn't make that reason available to the caller. It does at least write to the debug log so I've made the error message provide that referral.

The two previous patches added up to +99/-2 lines; the new unified one to +85/-10.

Further, I've now made the whole Real Bitcoin patch set and known seals available on my own reference code shelves.(i) I went through them all afresh to identify those on which I was comfortable adding my own signature; for the rest I used my "jfw_unchecked" subordinate key (both keys are available through my contact page).

On that shelf you may also notice a novel bitcoin_system_compiler.vpatch. It amounts to a full overhaul of the build system and will be the subject of an upcoming article.

  1. At some point I intend to put a formal, efficient, GNU bloatware free sync mechanism in place. For now, the best I can suggest to grab the whole collection in one shot is:

    wget --mirror --no-parent http://fixpoint.welshcomputing.com/v/

    [^]

2020-04-11

Preliminary report on the bitcoind hex conversion mess

Filed under: Bitcoin, Politikos, Software — Jacob Welsh @ 16:46

As I previously noted about my getrawtransaction patch,

It does accept invalid hex strings, perhaps a flaw in that SetHex method.

Indeed, SetHex in the base_uint class is completely permissive: it accepts leading whitespace, optional "0x" or "0X", then however many hex digits it finds, up to the length of the template specialization (160 or 256 bits), zero-filling if too short and ignoring excess if too long. It also includes some pointer arithmetic I believe has technically undefined behavior: comparing after decrementing past the start of the object. And it's one source of the obnoxious byte order reversal in bitcoind hash I/O, with GetHex being its counterpart.

The users of the function are the RPC commands listsinceblock and gettransaction. There's another SetHex function in CBigNum, similar but taking arbitrary length, entirely unused.

On the other hand, ParseHex, the subject of mod6_phexdigit_fix.vpatch, is oriented toward "dumps" in that it allows space between digit pairs and doesn't reverse byte order. It's used in some tests, in the hard-coded genesis block output in main.cpp, in the mining RPC commands getwork and getmemorypool, and in sendrawtransaction as seen in polarbeard_add_sendrawtransaction.vpatch.

My conclusion is that getrawtransaction is doing the right thing here, in that it shouldn't be made a special case, but the hex parsing should be cleaned up generally. If it were just me, I'd rip it all out and use a single, strict, non-reversing hex decoder. But there's no telling how much outside code or data has built up around the old rules. Parties interested in the matter, in the sense of having a meaningful amount of coin on the line, are encouraged to write in.(i)

  1. Not that I expect this to do much on its own, as politics involves involves actively going out to find people where they are and talk to them. [^]

2020-04-10

The missing Adacore public download index, vintage 2018, while it lasts

Filed under: Historia, Software — Jacob Welsh @ 21:16

I usually browse the web with JavaScript disabled, if present at all: it's bad for your computer and it's bad for your mind. That it's bad for your computer should be clear if you've dug at all into browser security in the past, say, 15 years. Failing that, consider the intended behavior of the thing: to allow any page to consume unlimited processing and network resources even once loaded. It's as if the mailman, upon completion of a delivery, strolls right into your house and helps himself to whatever is in the fridge. As for why it's bad for your mind, one aspect is that it leads you to believe that a page "works" or contains some piece of information when it in fact does not, like a mirage, or the screensavers of the cathode-ray era, falling away as soon as you reach out to touch it.(i)

Being thus grounded in reality has its difficulties, such as when those around you mistakenly believe a certain link points to something other than a blank page, even expecting you to have an opinion on it. Worse is when you actually want the promised item. On these occasions you can of course choose to let in the hungry mailman; sometimes he'll even give you helpful instructions on how to do so (as if you'd need them, having barred the door in the first place). But sometimes, a little effort can reveal the secrets of the code and allow you to possess the thing for yourself. Some might call it "reverse engineering", but isn't that a strong term for what's really just reading ? It's reading some text that was foisted on you, albeit not in the manner the foisters would like for it to be read.

So it went the last time I was looking into the Ada programming language, in 2018. GNAT, the foremost public implementation, as I understand, was mostly developed as a frontend for the GNU Compiler Collection at US taxpayer expense by a company presently known as Adacore.(ii) But their public download site demanded use of a JavaScript menu tree to filter options by platform and other categories, and offered no readily accessible listing of files or links otherwise.

A survey of the page source didn't directly reveal the hidden list, but turned up a number of scripts by reference. Skipping over some well-known "framework" wads left a promising "script.js", which straightaway pointed to a JSON "feed" containing a sizable collection of file metadata. A search in the code for how it turned this into URLs, and then some custom coding to do the same in a controlled context, was all that remained to produce a usable index. This done, I downloaded a couple GNAT binary releases for x86_64 Linux(iii) and added them to my archives.

Sadly, at the time I had neither a blog on which to boast of my exploits nor the kind of social engagement to motivate it. A year and change went by and I rectified that, but I didn't substantially revisit the Ada bootstrapping process until now, and had forgotten all about the indexing work.

Through looking into the publications of the now disbanded Republic and asking around, I found a series of recipes, notes, more notes, and a partial collection of dependencies, but little assurance that I'd obtained all necessary ingredients. In particular, I realized that the initial binary required to bootstrap the process had not been nailed down, but reports indicated the 2016 version was known to work. After the back-and-forth over what pieces someone might have on hand, I decided what I really wanted was access to the full collection.

In the unsurprising heathen manner, Adacore had broken their download links, apparently in the course of a move to Amazon's Cloudfront delivery network. Looking for the new locations, I was again greeted by the non-index page, felt the deja vu, and unearthed my old work. It appeared the "feed" format was unchanged and the new URL format was easily constructed from the old data. More surprisingly, the feed URL itself still pointed to the vanished "mirrors.cdn.adacore.com" hostname; as far as I could see, downloads wouldn't be working even with full JS. Behold the incredible bandwidth savings realized by moving to the Cloud! On the bright side, my existing SHA1 checksums provide some assurance that the historical files have not been diddled since 2018.

The code

Despite the tree-structured JSON format, the feed is essentially tabular. We'll use Python so as to accurately parse the JSON and convert to a more manageable comma-separated format (taking care that the separator characters don't occur in the values themselves).

import json

# http://mirrors.cdn.adacore.com/gpl_feed
rels = json.load(open('gpl_feed.json'))['feed']['releases']

fields = (
	'name',
	'id',
	'size',
	'sha1',
	'type',
	'client',
	'component',
	'date',
	'display_name',
	'display_order',
	'kind',
	'platform',
	'platform_display_name',
	'platform_display_order',
	'release_date',
	'release_name',
	'title',
)

table = [[str(obj[f]) for f in fields] for obj in rels]
table.sort()

for rec in table:
	for val in rec:
		assert ',' not in val and '\n' not in val

print ','.join(fields)
for rec in table:
	print ','.join(rec)

The output includes column headers and can be processed by any number of standard tools. How about a shell one-liner to produce the full URL listing?

awk -F, 'NR>1 { print "https://community.download.adacore.com/v1/" $4 "?filename=" $1 }' | uniq

Note that downloads are not uniquely identified by filename!

The data

And no, I don't intend to go mirroring the whole 21 GB of it, though if you do, I'll gladly link it here. Once I'm more clear on what parts are truly needed, I'll probably host those.

Enjoy!

  1. Terms pertaining to this effect, ranging from technical to marketing, include XHR, AJAX, and Web 2.0. [^]
  2. At some point the code was imported by the main GCC project, but for reasons I haven't yet ascertained, that version was considered broken for the purposes of The Most Serene Republic, so Adacore remained the source of record while efforts were made for the Republic to take over that role. [^]
  3. Namely, one dated 2007, being the oldest available, and 2014, for reasons I forget but perhaps because the next one came with a precipitous size increase. [^]

2020-04-07

Selection and other sundries for MP-WP

Filed under: MP-WP, Software — Jacob Welsh @ 04:49

Here are four patches to the V-tree for Mircea Popescu's Wordpress, which amount to an approximation of the changes I've been running since initial setup of Fixpoint, reground to build upon subsequent improvements by billymg and Diana Coman.

I plan to sign them shortly unless cause for revision comes to light (such as for example server-side selection being implemented outside the theme-specific code).

Quoth the manifest:

624752 mp-wp_remove-textselectionjs-pop3-etc jfw Remove the unreliable JS-based selection, posting by POP3 login, and a stray .php.orig file. Neutralize and comment the example pingback updater.
624752 mp-wp_svg-screenshots-and-errorreporting jfw Allow .svg extensions in theme screenshot search. Don't clobber the user's errorreporting level without WP_DEBUG.
624752 mp-wp_serverside-selection jfw Add server-side text selection to example htaccess and themes, roughly as seen in http://trilema.com/2019/proper-html-linking-the-crisis-the-solution-the-resolution-conclusion/ (xmlrpc.php changes not included)
624752 mp-wp_footnote-link-tweaks jfw Avoid turning double-quotes into backquotes in footnote tooltips. Expand the link part of footnote identifiers to cover the pre_identifier and post_identifier strings, for a larger clickable area.

Update: I've reground the first based on billymg's feedback to prune the wrapper "span" tags originating from mp-wp_add-footnotes-and-textselectionjs.vpatch, and the second trivially to follow the changed antecedent. As I had previously signed these, I've bumped the canonical patch names to avoid confusion. The second two patches remain in a draft state. Current patches and seals:

2020-04-02

V in Perl with parsing fix, keksum, and starter, plus the ill-fated vdiff

Filed under: Software, V — Jacob Welsh @ 17:50

Following my prior adventures, I reoriented my efforts toward some simpler changes to the v.pl tree, abandoning hopes of a robust patch creation tool built on Busybox diff.

I've split the changes into two patches. The first is "v_strict_headers", which I think would be of interest to any v.pl user. It tightens vpatch parsing to prevent false-positive header matches that could cause incorrect or nonsensical antecedent information to be extracted from valid vpatches. Following the precedent of the vtools vpatch program, this is done by requiring the string "diff " at the start of a line preceding the header, which works because all other lines of a diff "packet"(i) start with either @, +, -, or space characters. This patch also backfills the manifest file and brings it fully in line with the spec.

The second patch, "v_keksum_busybox", swaps keksum and patch in for ksum and vpatch, making V presses possible again on systems with little more than a C toolchain, Busybox utilities and Perl.

I have also mirrored the rest of the VTree and contributed my own seals, which can be found in the same directory.

For deployment on systems with no previous V, there's a starter tarball which includes the tree pressed to v_keksum_busybox, the keksum code, and an install script. Take a look at what it does, then run as root, from the extracted directory:

# sh install.sh

Download

The ill-fated vdiff

What follows is my abandoned attempt at vdiff in awk, supporting any conforming diff program. It identifies headers using a three-state machine to recognize the ---, +++, @@ sequence. This would still be fooled by a ---, +++ sequence followed immediately by another hunk, except that the lines of context prevent this, unless the change comes at the end of the file in which case there can't be another hunk prior to the next file header.

It works as far as parsing both GNU and Busybox diff output, produces working vpatches in the GNU case, and could even be expanded to do the same for Busybox. But since fully-reproducible output seems to be desirable, I can't presently justify further work in this direction or recommend it over the vtools vdiff.

#!/bin/sh
export LC_COLLATE=C
diff -uNr $1 $2 | awk -v sq=\' '
function shell_quote(s) {
	gsub(sq, sq "\\" sq sq, s);
	return sq s sq;
}

function vhash(path) {
	if (path == "/dev/null") return "false";
	qpath = shell_quote(path);
	cmd = "test -e " qpath " && keksum -s256 -l512 -- " qpath;
	gotline = cmd | getline rec;
	close(cmd);
	if (!gotline) return "false";
	split(rec, parts);
	return parts[1];
}

function print_header(line) {
	split(line, parts);
	print parts[1], parts[2], vhash(parts[2]);
}

{
	if (state == 0) {
		if ($0 ~ /^---/) {
			from = $0;
			state = 1;
		}
		else {
			print;
		}
	}
	else if (state == 1) {
		if ($0 ~ /^\+\+\+/) {
			to = $0;
			state = 2;
		}
		else if ($0 ~ /^---/) {
			print from;
			from = $0;
		}
		else {
			print from;
			print;
			state = 0;
		}
	}
	else if (state == 2) {
		if ($0 ~ /^@@/) {
			print_header(from);
			print_header(to);
			print;
			state = 0;
		}
		else if ($0 ~ /^---/) {
			print from;
			print to;
			from = $0;
			state = 1;
		}
		else {
			print from;
			print to;
			print;
			state = 0;
		}
	}
}

END {
	if (state == 1) {
		print from;
	}
	else if (state == 2) {
		print from;
		print to;
	}
}'
  1. Or what else do you call the header and sequence of hunks associated with a single file? [^]

2020-03-31

Adventures in the forest of V

Filed under: Historia, Software, V — Jacob Welsh @ 19:11

It started as what I thought a simple enough job: take the existing SHA512 v.pl I'd been using to press the Bitcoin code, or rather the VTree that grew from it, swap out the hash with my own keksum so as to avoid a hefty and otherwise unnecessary GNAT requirement, add my version of the classic vdiff modified likewise, bundle up a "starter" to cut the bootstrapping knot, and publish the bunch as my own tested and supported offering for wherever a V may be needed.

Such a thing would still require Perl, itself not an insignificant liability. While work had been underway to replace that, the results fell short of completeness, and from the ensuing discussion I decided it would be best to shore up my own grounding in the historical tools before venturing deeper into the frontier. I suppose I should be glad, because I got even more of that grounding - or swamping, more like - than I had asked for.

I.

One pitfall I already knew was that file header lines in the "unified diff" format used by V, which begin with "---" and "+++", cannot be accurately distinguished from deleted lines beginning "--" and inserted lines beginning "++", if parsing linewise and statelessly as done by the original "one-liner" vdiff. This was discovered in practice through an MP-WP patch containing base64-encoded images, and the potential damage is hardly restricted to that; for instance both SQL and Ada programming languages use "--" as comment marker. This was part of the motivation behind vtools, which took the approach of avoiding the system's existing "diff" program in favor of a stripped-down version of the GNU codebase with integrated hashing. My own approach had been more lightweight: tightening up the awk regex to at least reduce the false positive cases. It wasn't too satisfying, but had worked well enough so far.

II.

The first surprise I hit (stupidly late in the process, after I'd already signed my patch and starter) was that the Busybox version of "diff -N" replaces the input or output file path with "/dev/null" for the cases of creation and deletion respectively.

This reflects a larger reservation I have about Busybox code: it's been hacked extensively toward the goal of minimizing executable and memory footprint, which sometimes but only sometimes coincides with clear code and sensible interfaces. In this case, from brief inspection I surmise that it literally uses /dev/null so as to avoid some kind of null check in the downstream code that compares and emits the header. It's clever, but breaks compatibility with the GNU format in unforeseen ways.(i) In fairness to Busybox, the format was poorly specified in the first place - and nobody involved with V apparently found this important enough to remedy either.

III.

Another surprise for me was that the sloppy parsing affects not just diffing but pressing too. At least v.py and v.pl exhibit the same sort of blind regexing in extracting antecedent information from vpatches. (I'd guess that use of somewhat tighter regexes has prevented this from causing trouble in practice yet.) Further, v.pl extracts file paths only from the "---" part of the header which suggests it would indeed be broken by "/dev/null" style patches. On the extended vtools side, vfilter makes yet another assumption not backed by either such documentation as exists for the format or the Busybox version: a line showing a diff pseudo-command at the start of the header.

IV.

Finally, I've noticed what strikes me as a design problem affecting all V implementations, which I haven't seen mentioned before: it's not possible to have the same (path, hash) pair as an output of two different patches in the same VTree. More simply put, you can't have a patch that changes a file back to a previous state, contrary to the suggestion that "adding and removing the null character from the manifest file in every other patch would work" seen in the manifest spec. The reason is that both patches would end up in the antecedent set of a patch referencing either version of the file, in some cases producing a cyclic graph.(ii)

Stay tuned for the aforementioned patch and starter that make progress on a few of these fronts.

  1. A related annoyance I've had is Busybox "diff -qr" doesn't report added or removed directories, while adding -N replaces "Only in ..." messages with the less helpful "Files ... differ". [^]
  2. At this point I must say I wonder why V wasn't made to simply include in the header of each patch the hash of its antecedent patch as a whole. It would have neatly bypassed all these problems, forcing a tree topology and simplifying implementation. Would it have smelled too much like Git, or what? [^]
Older Posts »

Powered by MP-WP. Copyright Jacob Welsh.