Fixpoint

2020-01-16

Draft gbw-node frontend, part 2

Filed under: Bitcoin, Software — Jacob Welsh @ 18:52

Continued from schema and part 1. Source: gbw-node.py.

What follows is a kluge and probably the sorest point of the thing in my mind. I will attempt to retrace the decision tree that led to it.

Ultimately what we're after is a way to examine all confirmed transactions so as to filter for relevant ones and record these in the database ("scanning"). Since on disk they're already grouped into blocks of constrained size, it makes sense from an I/O standpoint to load them blockwise. But how? We could read directly from the blk####.dat files, but this would go against the "loose coupling" design. Concretely, this manifests as several messes waiting to happen: do we know for sure that only validated blocks get written to these files? How will we detect and handle blocks that were once valid but abandoned when a better chain was found? Might we read a corrupt block if the node is concurrently writing? Much more sensible would be to work at the RPC layer which abstracts over internal storage details and provides the necessary locking.

Unfortunately there is presently no getblock method in TRB, only dumpblock which strikes me as more of a debugging hack than anything else, returning its result by side effect through the filesystem. I worried this could impose a substantial cost in performance and SSD wear, once multiplied across the many blocks and possible repeated scans, especially if run on a traditional filesystem with more synchronous operations than the likes of ext4. It occurred to me to use a named pipe to allow the transfer to happen in memory without requiring changes to the TRB code. I took this route, after some discussion with management confirmed that minimizing TRB changes was preferable. I hadn't really spelled out the thinking on files versus pipes, or anticipated what now seems a fundamental problem with the pipe approach: if the reading process aborts for any reason, the write operation in bitcoind can't complete; in fact, due to coarse locking, the whole node ends up frozen. (This can be cleaned up manually e.g. by cat ~/.gbw/blockpipe >/dev/null).

Given this decision, the next problem was the ordering of reads and writes, as reading the block data through the pipe must complete before the RPC method will return. Thus some sort of threading is needed: either decomposing the RPC client into separate send and receive parts so as to read the pipe in between, or using a more general facility. Hoping to preserve the abstraction of the RPC code, I went with the latter in the form of Python's thread support. I tend to think this was a mistake; besides all the unseen complexity it invokes under the hood, it turned out the threading library provides no reliable way to cancel a thread, which I seemed to need for the case where the RPC call returns an error without sending data. I worked around by making the reader a long-lived "daemon" thread, which ensures it terminates with the program, and using an Event object to synchronize handoff of the result through a global variable.

getblock_thread = None
getblock_done = Event()
getblock_result = None
def getblock_reader(pipe):
	global getblock_result
	while True:
		fd = os_open(pipe, O_RDONLY)
		getblock_result = read_all(fd)
		getblock_done.set()
		close(fd)

def getblock(height):
	global getblock_thread
	pipe = gbw_home + '/blockpipe'
	if getblock_thread is None:
		require_fifo(pipe)
		getblock_thread = Thread(target=getblock_reader, args=(pipe,))
		getblock_thread.daemon = True
		getblock_thread.start()
	if not rpc('dumpblock', height, pipe):
		raise ValueError('dumpblock returned false')
	getblock_done.wait()
	getblock_done.clear()
	return getblock_result

The astute reader may notice a final and rather severe "rake" waiting to be stepped on: a fixed filesystem path is used for the named pipe, so concurrent scan processes will interfere with the result of corrupted blocks and all sorts of possible failures following. While I'm not sure there's any reason to run such concurrent scans, at least this possibility should be excluded. I'm thinking the process ID would make a sufficiently unique identifier to add to the filename, but then the pipe will need to be removed afterward, and what if the process dies first? Yet more code to clean up possibly-defunct pipes? Or one could keep the fixed name and use filesystem locking. At any rate I hope this all makes it clear why I'm not too enamored of dumpblock.

To be continued.

2020-01-07

Errata for gbw-node drafts to date, and Bitcoin txid collisions

Filed under: Bitcoin, Software — Jacob Welsh @ 18:11

I've discovered a few mistakes in my wallet code published so far, one from pure carelessness and two from insufficient cognizance of environmental hazards, which together seem interesting enough for a dedicated article.

The first is in the HTTP Basic authentication header in the JSON-RPC client: I slipped in an erroneous space character after the colon in the base64 input, in the course of a hasty and perhaps excessive last-minute style cleanup. I discovered this through basic testing as it broke authentication altogether. I will update the article but preserve the original file for whoever cares to diff.

The second, found in code not yet written up, is the obnoxious but standardized behavior of the SQL SUM function, whereby the sum of the empty set is NULL rather than zero. In Python this becomes the None object then my code passes it to functions that expect integers. I first tried to work around at the Python level, but soon found this to be awkward especially for queries returning multiple columns where some were perfectly well-behaved integers. A fix closer to the source of the problem is found in the standard SQL coalesce function, though having to use this as boilerplate around every use of SUM is not exactly satisfying.

The above fixes are in: draft2/gbw-node.py.

The third and deepest might not even be a problem in practice, but seems to warrant further investigation. From the schema:

CREATE UNIQUE INDEX i_tx_hash ON tx(hash);

The problem is that Bitcoin doesn't guarantee uniqueness of transaction contents - miners can use identical coinbase inputs - despite the fact that the implementation assumes unique transaction hashes! The possibility of collision was realized in 2010,(i) condemning all future implementers to bugwise compatibility, whatever that means. The Power Rangers in their boundless wisdom addressed this in BIP 30 except without solving all that much. Further discussions I've been chewing on include Mirco… Mezzo… Macroflation—Overheated Economy (archived) and of course the forum log. Takeaways so far are that 1) the sky is probably (still) not falling regarding Bitcoin itself, but 2) this could possibly be used by malicious peers to hard-wedge syncing TRB nodes.

The conservative approach for my program would seem to be leaving the schema as is and letting SQL throw an integrity error if you try to monitor the addresses in question. Relaxing the unique constraint should be possible with some further changes to the code, but the question would arise of how exactly the quasi-duplicate outputs should be interpreted. Please speak up in the comments if you know of further references or conclusions on this topic!

  1. Block pair 91722/91880 paying address 1GktTvnY8KGfAS72DhzGYJRyaQNvYrK9Fg and 91812/91842 paying address 16va6NxJrMGe5d2LP6wUzuVnzBBoKQZKom. [^]

2019-12-18

Draft gbw-node frontend, part 1

Filed under: Bitcoin, Software — Jacob Welsh @ 01:18

The database schema for the "node" part of Gales Bitcoin Wallet covered, we now proceed to the frontend program that puts it to work: collecting data from bitcoind, parsing its various binary encodings and extracting something useful.

Source: draft1/gbw-node.py draft2/gbw-node.py (and the previous schema-node.sql).

You'd be well advised to read the downloaded thing prior to executing, especially since it's in an unsigned draft state. As for what's necessary to graduate to a vpatch I'd be ready to sign, my thinking is that it's this review and annotation process itself, plus whatever important changes come out of it, and the previously suggested schema tweaks (since changing that is the most obnoxious part once deployed).

At present there's not much of an installation process and the database is initialized manually. I'd suggest creating some directory to hold the two sources. Then from that directory:

$ chmod +x gbw-node.py
$ mkdir ~/.gbw
$ sqlite3 ~/.gbw/db < schema-node.sql
$ ./gbw-node.py help

In preparing this code for publication I observed that I had continued by force of habit (and editor settings) with the Python style guidelines of four-space indents and some fixed line width limit, in opposition to Republican doctrine. I've attempted to clean it up such that line breaks occur only for good reasons, though I can't say I'm happy with how my browser wraps the long lines. And it's not like I expect the poor thing to know good indentation rules for every possible programming language now... wut do?!

Prologue

We start with the usual Pythonistic pile of imports. The ready libraries are a big reason the language is hard to beat for getting things working quickly, and at the same time a dangerous temptation toward thinking you don't need to care what's inside them.

#!/usr/bin/python2
# J. Welsh, December 2019

from os import getenv, open as os_open, O_RDONLY, O_WRONLY, mkdir, mkfifo, read, write, close, stat
from stat import S_ISDIR, S_ISFIFO
from sys import argv, stdin, stdout, stderr, exit
from socket import socket
from threading import Thread, Event
from binascii import a2b_hex, b2a_hex
from base64 import b64encode
from struct import Struct
from hashlib import sha256 as _sha256
from decimal import Decimal
from inspect import getdoc
import errno
import signal
import string
import json
import sqlite3
from sqlite3 import IntegrityError

The above are all in the standard library, assuming they're enabled on your system. The ones that stick out like sore thumbs to me are threading and decimal; more on these to come.

As the comments say:

# Safety level: scanning stops this many blocks behind tip
CONFIRMS = 6

# There's no provision for handling forks/reorgs. In the event of one deeper than CONFIRMS, a heavy workaround would be:
#   $ sqlite3 ~/.gbw/db
#   sqlite> DELETE FROM output;
#   sqlite> DELETE FROM input;
#   sqlite> DELETE FROM tx;
#   sqlite> .exit
#   $ gbw-node reset
#   $ gbw-node scan

At least a semi-automated and lighter-touch recovery procedure would certainly be nice there.

gbw_home = getenv('HOME') + '/.gbw'
bitcoin_conf_path = getenv('HOME') + '/.bitcoin/bitcoin.conf'

# Further knobs in main() for database tuning.
db = None

This Is The Database; Use It.

For reasons I don't quite recall (probably interpreting hashes as integers, combined with pointer type punning - an unportable C programming practice common in Windows-land), bitcoind ended up reversing byte order compared to the internal representation for hex display of certain things including transaction and block hashes. Thus we have "bytes to little-endian hex" wrappers.

b2lx = lambda b: b2a_hex(b[::-1])
lx2b = lambda x: a2b_hex(x)[::-1]

Not taking any chances with display of monetary amounts, a function to convert integer Satoshi values to fixed-point decimal BTC notation. The remainder/modulus operators have varying definitions between programming languages (sometimes even between implementations of the same language!) when it comes to negative inputs, so we bypass the question.

def format_coin(v):
	neg = False
	if v < 0:
		v = -v
		neg = True
	s = '%d.%08d' % divmod(v, 100000000)
	if neg:
		return '-' + s
	return s

Preloading and giving more intelligible names to some "struct" based byte-packing routines.

u16 = Struct('<H')
u32 = Struct('<I')
u64 = Struct('<Q')
s64 = Struct('<q')
unpack_u16 = u16.unpack
unpack_u32 = u32.unpack
unpack_u64 = u64.unpack
unpack_s64 = s64.unpack
unpack_header = Struct('<i32s32sIII').unpack
unpack_outpoint = Struct('<32sI').unpack

Some shorthand for hash functions.

def sha256(v):
	return _sha256(v).digest()

def sha256d(v):
	return _sha256(_sha256(v).digest()).digest()

An exception type to indicate certain "should not happen" database inconsistencies.

class Conflict(ValueError):
	pass

For reading a complete stream from a low-level file descriptor; experience has led me to be suspicious of Python's file objects.

def read_all(fd):
	parts = []
	while True:
		part = read(fd, 65536)
		if len(part) == 0:
			break
		parts.append(part)
	return ''.join(parts)

Ensuring needed filesystem objects exist.

def require_dir(path):
	try:
		mkdir(path)
	except OSError, e:
		if e.errno != errno.EEXIST:
			raise
		if not S_ISDIR(stat(path).st_mode):
			die('not a directory: %r' % path)

def require_fifo(path):
	try:
		mkfifo(path)
	except OSError, e:
		if e.errno != errno.EEXIST:
			raise
		if not S_ISFIFO(stat(path).st_mode):
			die('not a fifo: %r' % path)

RPC client

Bitcoind uses a password-authenticated JSON-RPC protocol. I expect this is one of the more concise client implementations around.

class JSONRPCError(Exception):
	"Error returned in JSON-RPC response"

	def __init__(self, error):
		super(JSONRPCError, self).__init__(error['code'], error['message'])

	def __str__(self):
		return 'code: {}, message: {}'.format(*self.args)

Some of this code was cribbed from earlier experiments on my shelf. The fancy exception class above doesn't really look like my style; it may have hitchhiked from an outside JSON-RPC library.

The local bitcoin.conf is parsed to get the node's credentials. This is done lazily to avoid unnecessary error conditions for the many commands that won't be needing it.

bitcoin_conf = None
def require_conf():
	global bitcoin_conf
	if bitcoin_conf is None:
		bitcoin_conf = {}
		with open(bitcoin_conf_path) as f:
			for line in f:
				line = line.split('#', 1)[0].rstrip()
				if not line:
					continue
				k, v = line.split('=', 1)
				bitcoin_conf[k.strip()] = v.lstrip()

Side note: I detest that "global" keyword hack. It's "necessary" only because variable definition is conflated with mutation in the single "=" operator, and completely misses the case of a nested function setting a variable in an outer but not global scope. ("So they added 'nonlocal' in Python 3, solves your problem!!")

def rpc(method, *args):
	require_conf()
	host = bitcoin_conf.get('rpcconnect', '127.0.0.1')
	port = int(bitcoin_conf.get('rpcport', 8332))
	auth = 'Basic ' + b64encode('%s:%s' % (
		bitcoin_conf.get('rpcuser', ''),
		bitcoin_conf.get('rpcpassword', '')))
	payload = json.dumps({'method': method, 'params': args})
	headers = [
		('Host', host),
		('Content-Type', 'application/json'),
		('Content-Length', len(payload)),
		('Connection', 'close'),
		('Authorization', auth),
	]
	msg = 'POST / HTTP/1.1\r\n%s\r\n\r\n%s' % ('\r\n'.join('%s: %s' % kv for kv in headers), payload)
	sock = socket()
	sock.connect((host, port))
	sock.sendall(msg)
	response = read_all(sock.fileno())
	sock.close()
	headers, payload = response.split('\r\n\r\n', 1)
	r = json.loads(payload, parse_float=Decimal)
	if r['error'] is not None:
		raise JSONRPCError(r['error'])
	return r['result']

I could see removing the "parse_float=Decimal", and thus the corresponding import, as we won't be calling here any of the problematic interfaces that report monetary values as JSON numbers. But then, I'd also see value in one RPC client implementation that can just be copied for whatever use without hidden hazards.

Bitcoin data parsing

Now things might get interesting. To parse the serialized data structures in a manner similar to the C++ reference implementation and hopefully efficient besides, I used memory views, basically bounds-checking pointers.(i)

# "load" functions take a memoryview and return the object and number of bytes consumed.

def load_compactsize(v):
	# serialize.h WriteCompactSize
	size = ord(v[0])
	if size < 253:
		return size, 1
	elif size == 253:
		return unpack_u16(v[1:3])[0], 3
	elif size == 254:
		return unpack_u32(v[1:5])[0], 5
	else:
		return unpack_u64(v[1:9])[0], 9

def load_string(v):
	# serialize.h Serialize, std::basic_string and CScript overloads
	n, i = load_compactsize(v)
	return v[i:i+n].tobytes(), i+n

def vector_loader(load_element):
	# serialize.h Serialize_impl
	def load_vector(v):
		n, i = load_compactsize(v)
		r = [None]*n
		for elem in xrange(n):
			r[elem], delta = load_element(v[i:])
			i += delta
		return r, i
	return load_vector

def load_txin(v):
	# main.h CTxIn
	i = 36
	txid, pos = unpack_outpoint(v[:i])
	scriptsig, delta = load_string(v[i:])
	i += delta
	i += 4 # skipping sequence
	return (txid, pos, scriptsig), i

load_txins = vector_loader(load_txin)

def load_txout(v):
	# main.h CTxOut
	i = 8
	value, = unpack_s64(v[:i])
	scriptpubkey, delta = load_string(v[i:])
	return (value, scriptpubkey), i+delta

load_txouts = vector_loader(load_txout)

def load_transaction(v):
	# main.h CTransaction
	i = 4 # skipping version
	txins, delta = load_txins(v[i:])
	i += delta
	txouts, delta = load_txouts(v[i:])
	i += delta
	i += 4 # skipping locktime
	hash = sha256d(v[:i])
	return (hash, i, txins, txouts), i

load_transactions = vector_loader(load_transaction)

def load_block(v):
	# main.h CBlock
	i = 80
	head = v[:i]
	version, prev, root, time, target, nonce = unpack_header(head)
	hash = sha256d(head)
	txs, delta = load_transactions(v[i:])
	return (hash, prev, time, target, txs), i+delta

The code dig to come up with this magic for identifying standard pay-to-pubkey-hash outputs and extracting the enclosed addresses was ugly.

def out_script_address(s):
	# Standard P2PKH script: OP_DUP OP_HASH160 20 ... OP_EQUALVERIFY OP_CHECKSIG
	if len(s) == 25 and s[:3] == '\x76\xA9\x14' and s[23:] == '\x88\xAC':
		return s[3:23]
	return None

To be continued.(ii)

Updated for errata.

  1. I'm just now noticing these were added in 2.7, ugh... sorry, 2.6 users. [^]
  2. My blog will be going on hiatus as far as new articles until early January. There's quite a ways to go on this file and I might not make it all the way through on this pass. If the suspense gnaws, you can always keep reading the source! [^]

2019-12-16

Draft gbw-node schema

Filed under: Bitcoin, Software — Jacob Welsh @ 22:09

This series will present my initial implementation of a Bitcoin wallet built up from loosely-coupled parts ("simple droplets"). This follows from necessity as much as design philosophy, in that: 1) the only practical way to secure keys against remote compromise is with dedicated offline hardware; 2) additional online code is needed to support this; and 3) the existing bitcoind is a costly and treacherous thing to hack on. Further background on this work is at:

Today we will look at the data model for the awk-like part, which has shaped up as a line-mode command suite run on demand to collect validated block data, index and make accessible the parts needed to track funds and issue transactions.

Source: schema-node.sql.

Compared to a key-value store like BerkeleyDB as used by bitcoind, using SQL allows formalizing the structure and relationships in the data, working at a high level and enforcing integrity along several dimensions. Besides simplifying implementation, this means the human operator can bypass any limitations of the frontend software and work directly with the database if need be.

Because SQL in practice isn't quite as standard as one might like, one needs to pick either a specific implementation or some abstraction layer to target. While I've worked with MySQL, PostgreSQL and SQLite, I'm most familiar with the last, and figure it has some nice properties for this application: simple to install, widely available, without daemon processes, network listeners, user accounts and passwords to configure. Don't be fooled by the "lite": it's still a large item with fairly comprehensive and optimized SQL engine, claiming robust concurrency and durability properties (ACID transactions) and large, well-defined and tested limits. It includes a command-line shell sqlite3 for manual operation, dump and restore. You are of course welcome to try other implementations and report findings; I haven't used inherently SQLite-specific features like dynamic typing.

A "watched address" model is used such that only transaction outputs and inputs affecting Bitcoin addresses of interest are recorded. The benefit is much lower storage requirement than indexing the full blockchain; the cost is having to rescan after importing addresses with existing history.(i) Addresses are grouped into sets (tags) such that one node can serve many wallets or include decoy sets.

Prologue

--- Gales Bitcoin Wallet: node (online component) schema
--- J. Welsh, December 2019
--- Dialect: SQLite (3.7.0 for WAL)

SQLite originally used a writeback rollback journaling approach, which is terribly slow for write-heavy uses, at least on mechanical HDDs lacking battery-backed cache, due to requiring at least two fsync operations per transaction. It provides no way to relax durability without sacrificing integrity guarantees, which would precisely suit the needs of the scan process. We'll instead use the newer Write-Ahead Logging mode; once activated, it's remembered in the database file.

PRAGMA journal_mode=WAL;
BEGIN;

Transactions

The tx table tracks confirmed Bitcoin transaction metadata. All "_id" fields are database-assigned integers for efficiency of storage and lookup; hashes are stored as raw bytes.

CREATE TABLE tx (
	tx_id        INTEGER PRIMARY KEY,
	hash         BLOB    NOT NULL,
	block_hash   BLOB    NOT NULL,
	block_height INTEGER NOT NULL,
	n            INTEGER NOT NULL, -- position in block
	comment      TEXT,
	size         INTEGER NOT NULL,
	fee          INTEGER
);
CREATE UNIQUE INDEX i_tx_hash ON tx(hash);
CREATE UNIQUE INDEX i_tx_height_n ON tx(block_height, n);

The purist might note that this table isn't entirely normalized since many-to-one block information is stored in two fields; but then, there can also be upsides to avoiding excessive joins. It might be nice to track more block metadata anyway since it's not that big and bitcoind does it quite opaquely.

Comment and fee fields are unused at present: computing fees, while desirable, would seem to require a full transaction index; comments don't seem to fit with the concept of minimizing private information stored by the node.

Transactions can be found either by their 256-bit hash or the more concise block coordinates.

Outputs and inputs

The transaction output is the main structure of interest, holding a fixed parcel of spendable or spent coin.

The REFERENCES clause, besides making relationships explicit, creates a foreign key constraint. Unfortunately SQLite doesn't enforce these by default for compatibility reasons; it must be enabled by the application on each connection. To ensure the sqlite3 shell gets it too, I include in my ~/.sqliterc a PRAGMA foreign_keys=ON;, as well as a .headers on and .mode quote (one per line) for better handling the blobs.

CREATE TABLE output (
	output_id  INTEGER PRIMARY KEY,
	tx_id      INTEGER NOT NULL REFERENCES tx,
	n          INTEGER NOT NULL, -- position in output vector
	address_id INTEGER NOT NULL REFERENCES address,
	value      INTEGER NOT NULL,

SQLite integers take up to 8 bytes as needed, thus can safely represent Satoshi values up to the maximum BTC that will ever exist (1e8 x 21e6 = 2.1e15; 263 > 1018).

	spent      INTEGER REFERENCES input(input_id),

An initial mistake was keeping the spent reference on the input side, following the raw transaction structure. The trouble with this is there's no efficient way to find unspent outputs: "the rows in this table that aren't referenced from another table" are necessarily not contained in that table's index! So instead we keep the link on this side, allowing NULL, and update the row when an input is seen that spends it.

	flags      TEXT
);
CREATE UNIQUE INDEX i_output_txid_n ON output(tx_id, n);
CREATE INDEX        i_output_addrid ON output(address_id);
CREATE UNIQUE INDEX i_output_spent  ON output(spent);

The flags field is presently unused: the idea was to track local state bits like seen or acknowledged.

CREATE TABLE input (
	input_id   INTEGER PRIMARY KEY,
	tx_id      INTEGER NOT NULL REFERENCES tx,
	n          INTEGER NOT NULL -- position in input vector
);
CREATE UNIQUE INDEX i_input_txid_n ON input(tx_id, n);

Having a separate table for inputs is looking pointless now that it's a mere one-to-one reference from output.spent... ah, the joys of schema refactoring.

Addresses and tags

CREATE TABLE address (
	address_id INTEGER PRIMARY KEY,
	address    BLOB NOT NULL
);
CREATE UNIQUE INDEX i_address_address ON address(address);

Like transaction and block hashes, addresses are represented in raw byte form as opposed to Base58Check-encoded. This makes scanning faster but importing and formatting for display slower. "Version" is assumed to be zero, meaning P2SH / "3-addresses" aren't a thing.

CREATE TABLE tag (
	tag_id INTEGER PRIMARY KEY,
	name   TEXT NOT NULL
);
CREATE UNIQUE INDEX i_tag_name ON tag(name);

Unlike the "accounts" seen in bitcoind, addresses can have multiple tags.

CREATE TABLE address_tag (
	address_id INTEGER NOT NULL REFERENCES address,
	tag_id     INTEGER NOT NULL REFERENCES tag,
	PRIMARY KEY (address_id, tag_id)
);
CREATE INDEX i_addrtag_tag ON address_tag(tag_id);

Like many databases, SQLite uses B-tree storage. One implication is that in the case of composite keys (those spanning multiple fields as seen here), a dedicated index is not necessary for the first field as B-trees provide efficient prefix search. Subsequent fields however may benefit from them.

State

Finally, a "singleton" table to track scan state. A more extensible approach to global variables would be textual name and value fields, but I dunno, I like the explicit-ness of mapping variables to column rather than row.

CREATE TABLE state (
	scan_height INTEGER NOT NULL DEFAULT(-1)
);
INSERT INTO state DEFAULT VALUES;

COMMIT;

Stay tuned for the code that puts it all to work. To run it you'll be needing Python 2 (2.7 tested) built with the sqlite3 module, the standalone sqlite3 shell (as loading the schema is manual for now), and of course, a Bitcoin node.

  1. A test scan of 607`849 blocks on 1`102 addresses finding 8`929`351 transactions took 13 hours producing a 1.8GB database file. [^]

2019-12-15

Review of polarbeard_add_sendrawtransaction_rpc.vpatch

Filed under: Bitcoin, Software — Jacob Welsh @ 20:14

This patch to The Real Bitcoin was published somewhere around 2016 by someone who as I gather showed up to dump some fairly verbose and undocumented patches then didn't stick around to advocate for them or keep them updated for the meanwhile changing tree.

One of them fills a still-unmet requirement of any split wallet so I'm reviewing it in hopes of saving some time on having to redo from scratch. Full patch is here.

+Value sendrawtransaction(const Array& params, bool fHelp)
+{
+    if (fHelp || params.size() < 1 || params.size() > 1)
+        throw runtime_error(
+            "sendrawtransaction <hex string>\n"
+            "Submits raw transaction (serialized, hex-encoded) to local node and network.");
+
+    vector<unsigned char> txData(ParseHex(params[0].get_str()));
+    CDataStream ssData(txData, SER_NETWORK, VERSION);

Above we see the three-argument form of the CDataStream constructor(i) with explicit arguments matching the defaults for the nTypeIn and nVersionIn parameters.

Some documentation of what if anything these parameters are for would be nice (though not a deficiency of this patch). For now it seems to me a reasonable demand that the defaults never change, or at any rate that the costs of such a change be borne by he who makes it. Thus their inclusion here would be redundant, and not really the helpful kind of redundancy at that i.e. the kind that catches mistakes. I had omitted them in my getrawtransaction patch.

+    CTransaction tx;
+
+    try {
+        ssData >> tx;
+    }
+    catch (std::exception &e) {
+        throw JSONRPCError(-22, "tx decode failed");
+    }
+

The -22 isn't seen elsewhere in the tree, but matches the number in the PRB implementation and named RPC_DESERIALIZATION_ERROR.

The condition of excess data in the stream after the transaction does not appear to be detected, which I figure would be nice to have.

+    CTxDB txdb("r");
+    uint256 txHash = tx.GetHash();
+    uint256 hashBlock = 0;
+
+    if (TransactionSeen(txHash, hashBlock))
+    {
+        throw JSONRPCError(-5, string("tx already ") +
+            (hashBlock != 0 ?
+                string("included in block ") + hashBlock.ToString() :
+                "in memory pool"));
+    }
+

Here it gets iffy in my opinion. Just because my node sees the transaction, whether in block or mempool, it doesn't follow that other nodes do too. What if I want to re-broadcast?

Secondarily, the use of zero as a sentinel seems mildly unhygienic as it's a legal hash value, though I suppose if a preimage were found then the whole system would be screwed anyway. Along with the use of argument mutation, I see it more as an indication that too much implicit behavior is being packed into this new TransactionSeen interface, such that its (ONE!) caller needs a hack to do what it actually wants.

+    bool fMissingInputs = false;
+
+    if (!tx.AcceptToMemoryPool(txdb, true, &fMissingInputs) || fMissingInputs)
+    {
+        throw JSONRPCError(-22, string("tx rejected") +
+            (fMissingInputs ? ", missing inputs" : ""));
+    }
+

Perhaps a nitpick but why bother having error codes if you don't use them to distinguish things? Is the human text part of API now?

Looking into AcceptToMemoryPool, it doesn't actually do anything with that fMissingInputs besides initializing to false!

By the earlier argument it's also questionable whether that second fCheckInputs parameter should be true. It activates what looks like assorted anti-spam code of dubious value in this context.

+    SyncWithWallets(tx, NULL, true);
+    CInv inv(MSG_TX, txHash);
+    RelayMessage(inv, tx);
+
+    return txHash.GetHex();
+}

Oh yes, there's code to support handling wallets plural, not that anything is done with it.

The CInv boilerplate is demanded by the poor abstraction of RelayMessage. That function includes some 15-minute expiration mechanism whose purpose and workings aren't readily apparent to me.

Generally, it seems awkward that all these steps are needed here; shouldn't there be a function somewhere that handles transactions received from the network, that we'd just reuse? I'm almost afraid to look.

+// return wether a given transaction is found on mempool or already included in a block
+bool TransactionSeen(const uint256 &hash, uint256 &hashBlock)
+{
+    CRITICAL_BLOCK(cs_mapTransactions)
+        if (mapTransactions.count(hash))
+            return true;
+
+    CTransaction tx;
+    CTxDB txdb("r");
+    CTxIndex txindex;
+
+    if (tx.ReadFromDisk(txdb, COutPoint(hash, 0), txindex))
+    {
+        CBlock block;
+        if (block.ReadFromDisk(txindex.pos.nFile, txindex.pos.nBlockPos, false))
+            hashBlock = block.GetHash();
+        return true;
+    }
+
+    return false;
+}

All the objects getting passed around and mutated here are painful to follow. This appears to do what it says, though I'd much rather be rid of it.

Painfully apparent in the debts of the codebase is documentation of locking protocol. Getting this sort of thing wrong results in non-deterministic failures, either data corruption or deadlock depending on the direction of the mistake.

 // make sure all wallets know about the given transaction, in the given block
-void static SyncWithWallets(const CTransaction& tx, const CBlock* pblock = NULL, bool fUpdate = false)
+void SyncWithWallets(const CTransaction& tx, const CBlock* pblock, bool fUpdate)
 {
     BOOST_FOREACH(CWallet* pwallet, setpwalletRegistered)
         pwallet->AddToWalletIfInvolvingMe(tx, pblock, fUpdate);

Parameter defaults move to the declaration; I gather C++ puts the burden of setting these on the generated calling code, which seems sensible for a statically-compiled language.


 void RegisterWallet(CWallet* pwalletIn);
 void UnregisterWallet(CWallet* pwalletIn);
+void SyncWithWallets(const CTransaction& tx, const CBlock* pblock = NULL, bool fUpdate = false);
+bool TransactionSeen(const uint256 &hash, uint256 &hashBlock);
 bool ProcessBlock(CNode* pfrom, CBlock* pblock);
 bool CheckDiskSpace(uint64 nAdditionalBytes=0);
 FILE* OpenBlockFile(unsigned int nFile, unsigned int nBlockPos, const char* pszMode="rb");

Perhaps polarbeard had other patches that make use of this TransactionSeen. I'd rather this sort of patch not introduce globally-visible interfaces unnecessarily.

TransactionSeen is introduced as a globally-visible interface so that the underlying mapTransactions object stays static, in contrast with my more liberal approach.

In conclusion, the patch appears safe to apply and should do for now for the sake of testing my offline wallet, which might help inform any changes needed here. After some testing I might even end up signing it, though I'd much prefer something simpler and more clearly defined.

  1. And here I long for a self-hosted and accurately cross-referenced i.e. language- and context-aware source browser, but thanks to jurov for this approximate one. [^]

2019-12-05

Basic getrawtransaction patch proposal for bitcoind

Filed under: Bitcoin, Software — Jacob Welsh @ 17:35

I present a bitcoind patch, of my own authorship, for consideration: a basic but functional getrawtransaction RPC command. I expect the need for it is clear: if it's your node then you shouldn't accept any sort of "I'm sorry Dave, I'm afraid I can't do that" especially regarding the data it already has handy.

To speak plainly about some deficits, this time not my own: the Real Bitcoin project presently rests on some shaky foundations. Despite apparently intricate efforts, a Keccak based vtree has seen little progress, in that the original patch signers besides mod6 have not signed on to the regrinds of their work, and some recommended patches have not been reground at all, updated for the very necessary manifest spec, or included in the mainline collection. Further, the sorry state of the inherited code such as magic numbers everywhere has seen little improvement. Perhaps I will have to take up some of these burdens in time; for now I'll leave it as an entreaty to the elder hands to please find a way to make it happen!

The patch

As in the original introduced somewhere around PRB 0.7.0, this command takes a transaction ID (256 bits of hex), searches first the node's own memory pool then the confirmed transaction database (blkindex.dat) and returns a hex dump of the encoded transaction. Unlike the original it does not support a "verbose" option to give a JSON representation of the data. This task seems to me better suited to an external tool, but I could see including it here if the implementation is concise and obviously correct.

Backporting the original was not possible due to the many intervening changes, though I did consult it to confirm I hadn't missed anything important and matched its numeric error code.

Based on the overall idea in the V version control system of building yggdrasil, I'm breaking from one of the project's prior naming conventions by including "bitcoin" but not the author in the patch name; the author is still recorded in the enclosed manifest file. Due to the problems noted earlier with the prior patch tree it's also not a proper vpatch yet.

Download: bitcoin_getrawtransaction.draft.patch.

To try this you will need to:

  1. Perform a press to asciilifeform_whogaveblox.vpatch;
  2. Manually apply mod6_phexdigit_fix.vpatch (which could be missed otherwise due to lacking a manifest entry);
  3. Manually apply the patch in question.

In detail

I added a manifest entry for the phexdigit fix, to make its inclusion explicit:

--- a/bitcoin/manifest.txt
+++ b/bitcoin/manifest.txt
@@ -28,3 +28,5 @@
 542413 asciilifeform_aggressive_pushgetblocks asciilifeform Issue PushGetBlocks command to any peer that issues 'version' command
 542413 mod6_excise_hash_truncation mod6 Regrind of ben_vulpes original; removes truncation of hashes printed to TRB log file
 543661 asciilifeform_whogaveblox asciilifeform Record the origin of every incoming candidate block (whether accepted or rejected)
+606789 mod6_phexdigit_fix mod6 Fix decoding LUT which wrongly accepted certain invalid characters as hex
+606789 bitcoin_getrawtransaction jfw Add RPC to get transactions from memory pool or database (hex only)

The RPC itself is about as simple as it gets in this codebase. First we try the mempool, as this should be fast and may contain unconfirmed transactions.(i)

--- a/bitcoin/src/bitcoinrpc.cpp
+++ b/bitcoin/src/bitcoinrpc.cpp
@@ -1351,7 +1351,31 @@
     return entry;
 }

+Value getrawtransaction(const Array& params, bool fHelp)
+{
+    if (fHelp || params.size() != 1)
+        throw runtime_error(
+            "getrawtransaction <txid>\n"
+            "Get hex serialization of <txid> from memory pool or database.");

+    uint256 hash;
+    map<uint256, CTransaction>::iterator it;
+    CTransaction tx;
+    CDataStream ssTx;
+
+    hash.SetHex(params[0].get_str());
+    it = mapTransactions.find(hash);
+    if (it != mapTransactions.end())
+        tx = it->second;

Functions everywhere have to open their own database connections - though some have the luxury of getting one passed in - which then implies a whole caching mechanism so as not to be horribly inefficient. Odin knows why there couldn't just be one global (or at least per-thread) "This Is The Database; Use It" object.

+    else {
+        CTxDB txdb("r");
+        if (!txdb.ReadDiskTx(hash, tx))
+            throw JSONRPCError(-5, "Transaction not found in memory pool or database.");
+    }
+    ssTx << tx;
+    return HexStr(ssTx.begin(), ssTx.end());
+}
+
 Value backupwallet(const Array& params, bool fHelp)
 {
     if (fHelp || params.size() != 1)

Wiring the function into the RPC dispatch table (I don't recall how I chose where to insert it, as the list was already non-alphabetical; probably based on where it seemed sensible in the help listing):

@@ -1865,6 +1889,7 @@
     make_pair("getreceivedbyaccount",   &getreceivedbyaccount),
     make_pair("listreceivedbyaddress",  &listreceivedbyaddress),
     make_pair("listreceivedbyaccount",  &listreceivedbyaccount),
+    make_pair("getrawtransaction",      &getrawtransaction),
     make_pair("backupwallet",           &backupwallet),
     make_pair("keypoolrefill",          &keypoolrefill),
     make_pair("walletpassphrase",       &walletpassphrase),

The mempool object now needs to be visible between compilation units. I suggest doing a grep to verify this introduces no name conflicts.

--- a/bitcoin/src/main.cpp
+++ b/bitcoin/src/main.cpp
@@ -26,7 +26,7 @@

 CCriticalSection cs_main;

-static map<uint256, CTransaction> mapTransactions;
+map<uint256, CTransaction> mapTransactions;
 CCriticalSection cs_mapTransactions;
 unsigned int nTransactionsUpdated = 0;
 map<COutPoint, CInPoint> mapNextTx;
--- a/bitcoin/src/main.h
+++ b/bitcoin/src/main.h
@@ -46,6 +46,7 @@

 extern CCriticalSection cs_main;
+extern std::map<uint256, CTransaction> mapTransactions;
 extern std::map<uint256, CBlockIndex*> mapBlockIndex;
 extern uint256 hashGenesisBlock;
 extern CBlockIndex* pindexGenesisBlock;

I tested that it builds, successfully fetches transactions from both mempool and database, and returns the expected errors for missing argument or transaction not found. It does accept invalid hex strings, perhaps a flaw in that SetHex method. I've been running the patch in production since around August 10th of this year.

  1. The original cause for my writing the patch was a stuck transaction that wasn't getting relayed to miners or block explorers for unknown reasons. Upon fishing the raw tx from the mempool and submitting it to one such site, a useful error was finally obtained identifying the problem as the S-value normalization mess; the -lows option provided a workaround, after double-spending to self for safety, which was a whole other pain. [^]

2019-12-03

Keccak background

Filed under: Bitcoin, Historia, Software — Jacob Welsh @ 18:52

"Keccak" is a cryptographic hash function, or rather, some primitives for constructing such functions in a desired size and shape, of relatively recent design as these things go. It was brought to the attention of the forum in early 2016 in the context of contemplating changes to the Bitcoin protocol,(i) (ii) (iii) and subsequently differentiated from SHA3.(iv)

Compared to the prevailing standards at the time - mostly variants on the MD4 concept, processing blocks of input through an iterated compression function - Keccak is based on a large pseudorandom permutation (1600 bits, though the spec also defines smaller variants). As this is readily invertible, the desired "one-way" property is provided by a "sponge construction", mixing in blocks of input and extracting output while iterating the permutation and keeping some number of its bits secret as internal state. This number is called the capacity (or by complement the rate, the two summing to the permutation bit width) and can be tuned for the desired balance of security and computational intensity. The construction can take unlimited input, or produce unlimited output as a kind of stream cipher.(v)

I started out in 2017 playing with a C implementation found in the wild, supposedly a "readable and compact" version written by the original team. With some cleanup I got it into a state that could be described as compact, but I couldn't get very far in reading it, at least without having first digested the spec. And it had the unfortunate limitation of requiring the full input and output to exist in memory, no streaming. My confidence as an applied cryptographer was growing and I soon implemented a number of classical hash functions, but set Keccak aside as not being an immediate necessity. Meanwhile, Diana Coman produced and incrementally published a very nice and documented reference implementation in Ada, which was adopted for use in V and soon became non-optional.

While I was well convinced by the Republican rationale for Ada, I was much less keen on introducing GNAT, the flagship implementation, into my environment. It was a million-plus-line-of-code beast that I wouldn't stand a chance to ever really understand; making matters worse, it was a "Thompsonism", a circular dependency requiring existing binaries in order to build from source and thus dubiously "open source" at all. While I already depended on one such thing - the C compiler - I was hoping to somehow keep this to ONE thing, or at least ensure a way to work with the crucial V on existing machines without pulling all this in.

Stay tuned for the result.


  1. mircea_popescu: actually i wouldn't go to war over keccak.
    mircea_popescu: letting bitfury & friends eat 100mn in unrecoupable engineering costs would provide exactly the correct lesson as to what it's a good idea to say and when it's a good idea to shut the fuck up and toe the line.

    [^]

  2. The necessary prerequisite for any change to the Bitcoin protocol [^]

  3. mircea_popescu: http://log.bitcoin-assets.com/?date=01-02-2016#1393026 << at least it wasn;t fucking developed by teh nsa.
    assbot: Logged on 01-02-2016 19:29:18; ascii_butugychag: ;;later tell mircea_popescu in what sense is adoptinc keccak a rejection of usg standards? it was actually adopted as sha3...
    mircea_popescu: as far as we know. whatevs. minor point.
    ascii_butugychag: btw between that thread and now i went and read the keccak spec
    ascii_butugychag: it is mighty spiffy.
    ascii_butugychag: accordionizes to size.
    mircea_popescu: :)
    mircea_popescu: i don't need to explain what i meant by not finite then ?
    ascii_butugychag: aha.
    ascii_butugychag: other hashes also accept infinite bits but they eat where they shit.
    mircea_popescu: quite.
    mircea_popescu: and mind that while in no means do i propose this is "Asic resistant", from a designer perspective you must appreciate i'm giving you a fun job to do.
    mircea_popescu: at least therer's that.
    mircea_popescu: always make sure everyone's having fun.
    ascii_butugychag: quite! nobody will be plagiarizing old verilog from fpga docs to bake this one.
    ascii_butugychag: very asian-resistant.
    ascii_butugychag: which is a mega-plus.

    [^]


  4. asciilifeform: holyshit the original keccak www is gone
    asciilifeform: replaced with some horrorshow
    asciilifeform: ada code -- gone
    asciilifeform: fortunately still on my hdd
    asciilifeform: check this out, keccak.noekeon.org now forwards to buncha tards
    asciilifeform: https://archive.is/GkmgU < original
    shinohai: Notice that happened after nist.gov declared their spec
    asciilifeform: shinohai: not immediately , iirc was still intact last yr
    asciilifeform: incidentally shinohai keccak != usg.sha3
    asciilifeform: they adopted ~particular params~ of keccak as the new national whatever
    asciilifeform: orig is ~family~ of functions.
    asciilifeform: see also https://archive.is/lViVh << since 'unhappened' article
    asciilifeform: ' The SHA-3 version of Keccak being proposed appears to provide essentially the same level of security guarantees as SHA-2, its predecessor. If we are going to develop a next generation hash, there certainly should be standardized versions that provide a higher security level than the older hash functions! NIST, in the original call for submissions, specifically asked for four versions in each submission, with at least two that would
    asciilifeform: be stronger than what was currently available, so it's hard to understand this post-competition weakening.'
    asciilifeform: didjaknow.
    asciilifeform: notice how 'everyone' nao thinks 'oh, keccak? that's called sha3 nao' [^]
  5. Since state is still finite, output will of course repeat eventually; one would hope this cycle length approaches that order of 21600. [^]

2019-11-28

Gales Bitcoin Wallet spec and battle plan

Filed under: Bitcoin, Software — Jacob Welsh @ 04:51

This proposal will detail the software part of my wallet under development,(i) consisting of a security critical offline part and less sensitive online part.

All monetary values are input and displayed in decimal BTC but kept internally in exact integer (satoshi) form; conversion must be lossless.

Offline

Wallet data is stored in a directory tree containing a keys subdirectory and outputs, change, fee, and transactions files.

Private keys are represented in hex, with one key per file in the keys directory. (Probably little-endian; need to check existing conventions.)

Each key file is named by the Base58Check-encoded Bitcoin address corresponding to the key. (This implies a case-sensitive filesystem.) Support for compressed public keys may be added using an extra tag byte in the private key.(ii)

A list of outputs believed to be spendable is maintained in the tabular text file outputs, with records delimited by linefeeds and the following fields delimited by one or more space or tab characters:

  • Address (Base58Check)
  • Value (decimal BTC)
  • TXID (hash of transaction containing the output, in the conventional little-endian hex format)
  • Index (position in the transaction's output vector, decimal integer)
  • Comment (optional; may contain spaces and tabs)

Encryption is supported by keeping these in a tmpfs and persisting with tar+gpg or similar.

The offline wallet program provides the following commands:

  • gen-key - generate a new key file and print the resulting address to stdout. (Nice to have: numeric argument to generate many.)
  • send ADDR VAL ... - generate a signed transaction sending the given addresses the given amounts in BTC, pairwise. Inputs are selected from the outputs table in the order they appear. The change address is read from the change file; the fee in BTC per 1000 bytes is read from the fee file. The hex-encoded raw transaction is appended to the transactions file. (Nice to have: summarize the proposed transaction and prompt for confirmation.) The outputs file is overwritten to remove the spent ones and add the change and any others that pay the wallet's own addresses.

Once transferred, the transactions file can be removed or truncated at will.

Online

The online part has a database that serves to index a lightweight subset of blockchain data pertinent to a single offline wallet. It tracks the following objects:

  • Watched addresses
  • Confirmed outputs funding them and inputs spending from them
  • Confirmed transaction metadata (hash, block hash+height+index, size, fee, comment)
  • Raw transactions submitted by the operator
  • Scan state

The online wallet program communicates with a Bitcoin node to populate its database and transmit new transactions. It provides sync commands:

  • scan - iterate blocks, filter for relevant transactions and update database.
  • reset - clear the scan pointer, e.g. to pick up past transactions affecting newly watched addresses. (Nice to have: argument to set to a given height)

Input/output commands:

  • unspent-outs - print unspent output table in the format used by the offline wallet. (Nice to have: other views like all outputs or only new outputs)
  • watch - import new addresses to watch from stdin, with optional comments.
  • push - import raw transaction(s) to push from stdin, with optional comments.

Accounting commands:

  • balance - print the sum of unspent output values.
  • register - print transaction history, format TBD.

Plan and time estimates

Online:
Write SQLite schema (currently drafted on paper): 1h
Import RPC client, block and transaction parsing code (I considered using python-bitcoinlib from Garzik et al. here; happily I have original implementations of the necessary parts): 1h
Block fetching ("dumpblock" from TRB to named pipe): 1h
Sync commands: 3h
I/O commands: 3h
Accounting commands (using something quick and dirty for "register" format for v1): 2h
Write manual: 3h
Total online part: 14h
Proposed deadline: Sunday, Dec 1.

Offline:
Port transaction signing prototype to Scheme: 2h
Key file I/O: 1h
BTC numeric I/O: 1h
Outputs table I/O: 3h
Update "gen-key" command for new spec: 1h
"send" command: 3h
Sample tar+gpg wallet open and close scripts: 1h
Design and implement tests: 5h
Write manual: 3h
Total offline part: 20h
Proposed deadline: Thursday, Dec 12.

Other:
"sendrawtransaction" RPC for TRB: 5h

Total: 39h
Deadline: Friday, Dec 13.

Now: I see 40 available hours remaining in my schedule for this time interval. Given that I'm far from confident in my estimation abilities here and there's always something unexpected, this is worrying. Perhaps the documentation would be a light enough job at this point to handle on the plane. Perhaps there's a "sendrawtransaction" patch floating around somewhere. Perhaps someone can be motivated to lend a hand - this would be the most separable task of the pile.

  1. Through further consideration of the coding tasks and data dependencies, in particular mapping out the online part as a relational database, I've realized my earlier notion of transactions as the central data structure had introduced unnecessary complications. If accounting of historical transactions is left to the online database, the offline signing part need only be aware of currently unspent outputs, which reduces its storage needs to a flat table, no S-expressions or JSON or similar. I believe this also brings it in closer alignment with the ideal. A second simplification is in the storage of addresses and keys, following a philosophy of "the filesystem is a database; use it", and naming of keys has been dropped. [^]
  2. This may be desirable to import keys or sweep funds from legacy wallets without having to execute their code. If used by default it would reduce outbound transaction size, thus perhaps fees, though as ever miner behavior is uncertain. [^]

2019-11-20

Next steps in wallet planning

Filed under: Bitcoin, Software — Jacob Welsh @ 18:40

The planning in my last article left a fair amount to the imagination, and in particular several pending decisions without sufficient detail to move discussion forward. The picture coming into view is that through a habit of insufficiently structured communications between engineering and management, I've been attempting to bear more weight of decision-making than is my due, trying to optimize things from the technical point of view which, while not unimportant, does not always align with the needs of a business to survive and grow in a dynamic market. I'll also take the opportunity to encourage present or future management to lean on me for support should it happen that I wander off into my own world.

With that in mind, let's expand a bit on the remaining tasks and decisions.

Switching to the now-builtin bignum support

I noted this as something that could potentially be deferred due to time constraints, because it does work as is. Unfortunately upon testing I've found that performance is noticeably degraded from what it used to be, at 14 seconds to derive an address from a private key on my dev system where I recall this having been closer to 5 seconds on earlier versions of the interpreter. (Transaction signing would be several times this.) I suspect a big part of the slowdown is the now-redundant layers of type and overflow checking; I'm unsure how to confirm this other than just trying the change. FWIW, I don't anticipate it taking all that long; maybe I take an hour and see if I can get it done.

Unspent output accounting; import of inbound or previously unknown outbound payments

The wallet tracks three types of data: private keys, the public keys and addresses derived from them, and transactions affecting its addresses. Queries such as balance and unspent outputs can be resolved by scanning the list of transactions, or tracked separately. As I see it, the "full scan" will have to work anyway so separate tracking, while desirable from an algorithmic scalability standpoint, would be a premature optimization at this stage. Without any such derived data to maintain, the "import" operation can simply mean appending lines containing encoded transaction data to a text file. Which brings us to:

External representation of transaction data (S-expression or maybe JSON)

I'm presently thinking S-exprs are the way to go; it keeps things simple on the Scheme side, and the Python side would only need to emit them, not parse them. What I'm not in a position to see is whether interoperability with other JSON-based tools would be desirable. I do have a parser in Scheme for a simplified form of JSON - omitting some Unicode details - that would suffice here.

Transaction input selection (menu-based?)

A possible cut here is to skip any sort of interface and use a simple selection strategy like FIFO. A manual interface could be added later. The idea here was to print a numbered listing of outputs by address/age/value and prompt for which ones to spend until sufficient value is accumulated.

ECDSA S-value normalization?

Turns out I've already got this: "As (r, -s mod n) is also a valid signature, the result is canonicalized to use the lesser of the two possible s-values." Thanks, 2017-me! (Some background.)

Fetching blocks from TRB (new RPC? Bitcoin protocol?)

In fact this could be done using the existing "dumpblock", perhaps using a named pipe. An RPC to return a block directly still seems preferable to me, but patching TRB isn't free. The "Bitcoin protocol" angle here meant connecting to port 8333 and using the p2p protocol commands to fetch a block, which is probably overkill if this is to be used in a batched or "pull" style.

Pushing raw transactions to TRB (likewise)

This one I don't think can be avoided.

Tracking watched addresses and synchronization state (SQL?)

The simple approach here, if indeed we're going for a batch model, would be a text file listing addresses and another that gives the last scanned block number/hash. When you run a scan, new transactions affecting the given addresses are emitted to a text file for transfer offline, and the last-scanned-block file is rewritten. If whatever data gets lost, you just rescan from an older block. Dealing with reorgs is where it could all get messier; one approach could be to insist on, say, 6 confirmations, and do a full rescan in the rare event of a deeper fork. How exactly one detects this and what could happen if one doesn't seems to warrant further consideration.

2019-11-19

Gales Bitcoin Wallet: status, preliminary work plan and code dump

Filed under: Bitcoin, Software — Jacob Welsh @ 21:01

The item in question is the software component of a combination of hardware, software and operational practices with the goal of sending and receiving Bitcoin while excluding the possibility of remote compromise of keys (so long as underlying cryptographic assumptions hold). Proper airgapping is the foundation, but it's easier said than done when it comes to Bitcoin because of the dependency on data from the outside world for the inputs of a transaction. Popular "cold storage" solutions may involve storing keys offline, but have windows of vulnerability when the keys are generated and funds are swept using an online (i.e. insecure) machine.

Our approach is to allow data to jump the gap as selectively as possible, using a simple and low-speed channel (serial port) with optical isolation and manually switched simplex operation (one way at a time). The system is intended for manual, low-frequency usage; thus we're not especially concerned with performance or timing side channels.(i) At the same time, we're very concerned with correctness, which informs a preference for straightforward "textbook" implementations when possible, minimizing external dependencies and overall system complexity, maximizing auditability (e.g. avoiding any sort of processor-specific "crypto acceleration"), and publishing code.(ii)

Design points include:

  • An offline machine, running the offline component of the wallet. The machine is to have no swap, be operated on battery power during key operations, and powered off after use.
  • A set of private keys, generated offline by explicit request and associated with names of the operator's choice, stored in a tmpfs during operation and GPG-encrypted on disk using a strong password when at rest.
  • A true (i.e. hardware) random number generator, used for all key generation including the ephemeral keys used in ECDSA signing.
  • Transactions constructed offline (potentially reducing round trips through the airgap).
  • An online machine running a TRB node and the online component of the wallet, which pushes outbound transactions, scans for sufficiently-confirmed new transactions affecting the set of watched addresses, and produces minimal data structures for updating state of the offline component.

Status

I've used my own Scheme system for implementation of the offline component so far, after an earlier prototype in the much larger (albeit more battle-tested) Python. Much of the code falls into one of two categories: math, or shuffling data from one format to another.

On the first category, I've implemented the following algorithms, with reference primarily to Menezes, van Oorschot and Vanstone and the SECG standards.

  • Bignum arithmetic: addition/subtraction, multiplication, division/remainder. My first pass here was intended more as an exercise, a "userspace" (pure Scheme) implementation supporting only unsigned values. I subsequently redid this, with signed integers and plugging into the generic arithmetic operators, still mostly in Scheme. The wallet code is presently still using the former. This is a conventional arbitrary-precision framework and thus does not support constant-time operations.
  • Modular inversion (extended Euclidean GCD).
  • Unbiased random integer in a given range.
  • Elliptic curve point addition and scalar multiplication, for the secp256k1 curve required by Bitcoin. Scalar multiplication is the most intensive operation, done once for key generation and several times for signing, and comes out noticeably slow; one optimization was to precompute iterated doublings of the generator point.
  • ECDSA signing.
  • SHA256 and RIPEMD160 hash functions.

On the second, the following, with reference primarily to the TRB and OpenSSL code (and sometimes the Bitcoin Wiki which was almost always a mistake):

  • Base58 and Base58Check
  • Little-endian hex and octet encoding
  • Fixed and variable-length integer byte encodings as seen in Bitcoin
  • The OpenSSL EC point (public key) encoding
  • The subset of ASN.1 DER as used by OpenSSL for ECDSA signatures
  • Bitcoin script, with code to construct p2pkh and p2sh outputs(iii)

These are presently wrapped up into commands by which one can generate private key files, the associated addresses, and (in case of emergency) export private keys to the "WIF" format for import to other wallet software.

The aforementioned Python prototype also implements the somewhat convoluted Bitcoin transaction signing algorithm, and I have Python code for deserializing blocks and transactions.

Remaining tasks

This list will surely need some revision upon contact with reality. The practical deadline so as not to wreck my own holiday is December 13.

For the offline component:

  • Porting transaction signing to Scheme
  • External representation of transaction data (S-expression or maybe JSON)
  • Switching to the now-builtin bignum support
  • Unspent output accounting; import of inbound or previously unknown outbound payments
  • Transaction input selection (menu-based?)
  • ECDSA S-value normalization?
  • Frontend commands
  • Shell scripts for "opening/closing" the wallet (GPG operations)
  • More tests

The online component, for which I'll probably stick to Python and make use of available libraries:

  • Fetching blocks from TRB (new RPC? Bitcoin protocol?)
  • Pushing raw transactions to TRB (likewise)
  • Tracking watched addresses and synchronization state (SQL?)
  • Export of new payment data for transfer offline
  • Alert for unexpected outbound payments

Partial code dump

Python prototype:

Offline component, Scheme code:

  • pkg.scm - basic import/export mechanism used by the remaining
  • bignum.scm - Unsigned "userspace" bignum library
  • ecdsa.scm
  • bit-ops.scm - some rather sad complexity for implementing fixed-width arithmetic and bitwise operations, which in hindsight would have been better done as interpreter extensions
  • hashes.scm - the RIPEMD and SHA families, based on the above
  • wallet.scm
  1. Certainly it would be preferable to eliminate these. [^]
  2. At least when it comes to cryptography, you don't gain much from obscuring your algorithms or implementation, because chances are you're not as clever as you think: all the ways to mess it up are already known to someone and will eventually be probed. [^]
  3. The latter might be a contentious point; it does not involve any support for "multisig", "segwit" or similar on my side, but enables one to transact with those who insist on these Gavinistic 3-addresses. [^]
« Newer Posts

Powered by MP-WP. Copyright Jacob Welsh.