Fixpoint

2020-03-31

Adventures in the forest of V

Filed under: Historia, Software, V — Jacob Welsh @ 19:11

It started as what I thought a simple enough job: take the existing SHA512 v.pl I'd been using to press the Bitcoin code, or rather the VTree that grew from it, swap out the hash with my own keksum so as to avoid a hefty and otherwise unnecessary GNAT requirement, add my version of the classic vdiff modified likewise, bundle up a "starter" to cut the bootstrapping knot, and publish the bunch as my own tested and supported offering for wherever a V may be needed.

Such a thing would still require Perl, itself not an insignificant liability. While work had been underway to replace that, the results fell short of completeness, and from the ensuing discussion I decided it would be best to shore up my own grounding in the historical tools before venturing deeper into the frontier. I suppose I should be glad, because I got even more of that grounding - or swamping, more like - than I had asked for.

I.

One pitfall I already knew was that file header lines in the "unified diff" format used by V, which begin with "---" and "+++", cannot be accurately distinguished from deleted lines beginning "--" and inserted lines beginning "++", if parsing linewise and statelessly as done by the original "one-liner" vdiff. This was discovered in practice through an MP-WP patch containing base64-encoded images, and the potential damage is hardly restricted to that; for instance both SQL and Ada programming languages use "--" as comment marker. This was part of the motivation behind vtools, which took the approach of avoiding the system's existing "diff" program in favor of a stripped-down version of the GNU codebase with integrated hashing. My own approach had been more lightweight: tightening up the awk regex to at least reduce the false positive cases. It wasn't too satisfying, but had worked well enough so far.

II.

The first surprise I hit (stupidly late in the process, after I'd already signed my patch and starter) was that the Busybox version of "diff -N" replaces the input or output file path with "/dev/null" for the cases of creation and deletion respectively.

This reflects a larger reservation I have about Busybox code: it's been hacked extensively toward the goal of minimizing executable and memory footprint, which sometimes but only sometimes coincides with clear code and sensible interfaces. In this case, from brief inspection I surmise that it literally uses /dev/null so as to avoid some kind of null check in the downstream code that compares and emits the header. It's clever, but breaks compatibility with the GNU format in unforeseen ways.(i) In fairness to Busybox, the format was poorly specified in the first place - and nobody involved with V apparently found this important enough to remedy either.

III.

Another surprise for me was that the sloppy parsing affects not just diffing but pressing too. At least v.py and v.pl exhibit the same sort of blind regexing in extracting antecedent information from vpatches. (I'd guess that use of somewhat tighter regexes has prevented this from causing trouble in practice yet.) Further, v.pl extracts file paths only from the "---" part of the header which suggests it would indeed be broken by "/dev/null" style patches. On the extended vtools side, vfilter makes yet another assumption not backed by either such documentation as exists for the format or the Busybox version: a line showing a diff pseudo-command at the start of the header.

IV.

Finally, I've noticed what strikes me as a design problem affecting all V implementations, which I haven't seen mentioned before: it's not possible to have the same (path, hash) pair as an output of two different patches in the same VTree. More simply put, you can't have a patch that changes a file back to a previous state, contrary to the suggestion that "adding and removing the null character from the manifest file in every other patch would work" seen in the manifest spec. The reason is that both patches would end up in the antecedent set of a patch referencing either version of the file, in some cases producing a cyclic graph.(ii)

Stay tuned for the aforementioned patch and starter that make progress on a few of these fronts.

  1. A related annoyance I've had is Busybox "diff -qr" doesn't report added or removed directories, while adding -N replaces "Only in ..." messages with the less helpful "Files ... differ". [^]
  2. At this point I must say I wonder why V wasn't made to simply include in the header of each patch the hash of its antecedent patch as a whole. It would have neatly bypassed all these problems, forcing a tree topology and simplifying implementation. Would it have smelled too much like Git, or what? [^]

2020-03-18

Virus terror shuts down Panama City

Filed under: News — Jacob Welsh @ 00:27

On a normal Tuesday rush hour, this would be a busy sidewalk.

pty-shutdown-1

There wouldn't be space between cars on Via EspaƱa.

pty-shutdown-2

Three cops rolled out purposefully. Maybe there's a gathering to be suppressed.

pty-shutdown-3

At the supermarket, the normal street-level entrance was closed. The masked - and gloved, for good measure! - guard at the exit was a bit distracted and didn't at first notice me stroll in, but then hollered and pointed me to the parking entrance.

pty-shutdown-4

The bread lines are here - not for lack of bread, as yet, but for limited headcount permitted inside.

pty-shutdown-5

Maybe the checkout lines at least go faster now. I have my doubts: they seem to be always just a bit understaffed by design. I didn't stick around to find out.

I wonder how often they miss someone leaving and don't let the next in.

The chino's shelves are holding up fine, on the offchance they have something you'd want to buy.

pty-shutdown-6

Back at the cell block, celebrations prohibited, and a first hint of concern about the water supply, though that's not uncommon here as the growth of the city outpaced its internal improvements and maintenance for years. At least there's clip art!

pty-shutdown-7

But yes, I'm holding out fine so far, thanks.

2020-03-07

JFW's 130 top Trilema picks to date

Filed under: Bitcoin, Hardware, Historia, Lex, Paidagogia, Philosophia, Politikos, Software, Vita — Jacob Welsh @ 16:25

Inquiring minds have asked of me to please shed a bit more light on what this Republic thing and that Popescu fellow in particular are all about. Is there more to it than the ravings that first meet the eye, of sluts and slaves and scandalous sexual predations and every "ism" and trigger word known to man or woman? What's the value I see in it that keeps me coming back? And what's the plan for this world domination thing anyway?

I gave the most accurate response I could, if not the most helpful: see, all you gotta do is read a couple thousand articles in multiple languages averaging maybe a thousand words each, a couple times over, and likely a bunch of the imported cultural environment and extensive chat logs besides, and then all will become clear! At least as clear as it can be so far. At least I think it will. But what would I know, I'm a long ways from being there.

Well great, so couldn't I at least give an executive summary? Not exactly an easy task either. Short of that, here's an attempt at picking some of the especially interesting, informative or significant articles on Trilema from my reading so far, a map of sorts of enticing entries to the rabbit hole.

The very unfair process that articles went through to make this list was as follows:

  1. I extracted an initial set of 957 items from my presently accessible browsing history, using some CLI magic.(i)
  2. I narrowed the list to those where I believed I recalled something of the article, going off the title alone. This brought it down to 424.
  3. I further selected based on roughly the above "interesting, informative or significant" standard in my subjective perception, again by memory from title alone.(ii) I also ended up skipping some that would have met this by way of having especially horrified me; not sure if I've done anyone any favors thus, but there it is.

The ordering within each publication year is merely alphabetical (because I can't quite see a pressing need to do it better in this context).

Enjoy... if you dare. What can I say, it's not for everyone.

2012

2013

2014

2015

2016

2017

2018

2019

2020

  • The slap and human dignity
  • Fin.

    1. You know Firefox keeps this in a SQL database, yes? Because they told you about it in the manual, and documented the schema and all? [^]
    2. At times I was overpowered by the temptation to go check, with the inevitable expenditure of time on re-reading which, useful as it can be, I hadn't planned on getting drawn into just now. And while my shiny tools got this down to a minimal "this button to keep, that button to skip" flow, they were entirely powerless to speed up the thinking. [^]

2020-03-04

Bitcoin transactions and their signing, 2: attachment

Filed under: Bitcoin, Software — Jacob Welsh @ 20:10

Having outlined the shape of the building block provided by digital signatures, we now face the potential problem of how to attach signatures to the messages they sign. The one hard requirement for any attachment scheme is that the verification function can work, that is, can answer unambiguously whether a signature is valid for a specific message and key. I will explore the space of possible approaches here,(i) then describe the one used in Bitcoin.

The simplest approach is to say: "what problem?" That is, treat the message and signature as separate objects (bitstrings, numbers, files or however you like to think of them) and use some external system to organize them. This is known in the traditional GPG toolset as detached signing. It has its advantages, besides the obvious "less work to implement". The original, unmodified message is directly available to the reader and his tools. New signatures can be added to a collection without duplicating or modifying the message object, and thus without needing further verification that they in fact refer to the same message. These properties are exploited in present manifestations of the V version control concept.

Assuming one does indeed want attached signatures, then, the first option is to package the message and signature together in some container format. Depending on how it's done, this can preserve the advantage that at least a semblance of the original message is readily visible in plain text, as with GPG clearsigning.(ii) New signatures can be added either with support from the container format, producing a single multiply-signed document, or without such support, either by nesting (such that each new signature references the previous stack) or duplication.

A second option, when the message represents a formal data structure, is to embed signatures in that structure itself in an application-specific way. At first sight this appears to be a circular data dependency: how can a signature be computed for a message that includes a representation of that signature?(iii) However, this can be worked around by applying a transformation to clip or whiteout the signature field at both signing and verification time.

The third and final option is to generalize the previous into a flexible or perhaps even universal embedding scheme. For example, signatures can be wrapped in whatever comment delimiters are available in a programming language, as seen in Mircea Popescu's recent proposal.(iv)

Bitcoin transactions, we can now say, use option #2: format-specific embedding, though with some added complications as follows.

The signature on each input is wrapped using the "script" encoding, in a field originally named "scriptSig", and its interpretation is determined by a corresponding script in the linked output being spent, originally "scriptPubKey". If we constrain our interest to transactions in the standard pay-to-pubkey-hash form, these considerations reduce to a formality.

The whiteout procedure is basically to replace the scriptSig on each input with an empty script. This implies the signatures are independent of each other. The twist, though, is that for the input for which a signature is being computed, the scriptSig is replaced instead by the corresponding scriptPubKey. I can't see any security advantage in doing this, since the previous output is already referenced by a unique identifier(v) covered by the signature. The result is that a different message must be signed for each input, and transaction verification takes quadratic time with respect to the number of inputs. This makes for a good reminder that the Bitcoin protocol externalizes much of the cost of transacting onto all node operators, and unless a satisfactory solution to that tough problem is deployed, transaction throughput must be kept a scarce resource.

To be continued.

  1. I struggled more than usual in writing about these, perhaps indicating I didn't grasp them as well as I'd thought. I don't claim to be equipped to discuss why one choice might be philosophically preferable to others; yet neither can I take a "purely technical" approach since cryptography is necessarily shaped as much from above by its utility to human society as from below by mathematical possibility. Maybe search the logs? [^]
  2. That format however incurs further complexity from tackling the additional perceived problems of linefeed normalization and in-band bracketing for inclusion in a larger text, with the drawback of having to quote instances of the magical bracket sequence in the signed message. [^]
  3. Such a message can be conceived as a fixpoint of the hash-sign-attach pipeline, but finding one in practice would seem to constitute a severe break in the cryptographic primitives. [^]
  4. It's not yet clear to me if or how this can be implemented reliably. For starters, how would you distinguish actual signatures from, say, quoted signatures, without knowing the lexical rules of the target language? How would the "whiteout" work to produce the same hash after addition of new signatures, without knowing same? [^]
  5. Well, not quite unique but at least identifying its contents including the scriptPubKey in question, to the extent you trust SHA256. And if you don't trust that, the signature hash would seem to be the bigger problem. [^]

Powered by MP-WP. Copyright Jacob Welsh.