Fixpoint

2019-12-18

Draft gbw-node frontend, part 1

Filed under: Bitcoin, Software — Jacob Welsh @ 01:18

The database schema for the "node" part of Gales Bitcoin Wallet covered, we now proceed to the frontend program that puts it to work: collecting data from bitcoind, parsing its various binary encodings and extracting something useful.

Source: draft1/gbw-node.py draft2/gbw-node.py (and the previous schema-node.sql).

You'd be well advised to read the downloaded thing prior to executing, especially since it's in an unsigned draft state. As for what's necessary to graduate to a vpatch I'd be ready to sign, my thinking is that it's this review and annotation process itself, plus whatever important changes come out of it, and the previously suggested schema tweaks (since changing that is the most obnoxious part once deployed).

At present there's not much of an installation process and the database is initialized manually. I'd suggest creating some directory to hold the two sources. Then from that directory:

$ chmod +x gbw-node.py
$ mkdir ~/.gbw
$ sqlite3 ~/.gbw/db < schema-node.sql
$ ./gbw-node.py help

In preparing this code for publication I observed that I had continued by force of habit (and editor settings) with the Python style guidelines of four-space indents and some fixed line width limit, in opposition to Republican doctrine. I've attempted to clean it up such that line breaks occur only for good reasons, though I can't say I'm happy with how my browser wraps the long lines. And it's not like I expect the poor thing to know good indentation rules for every possible programming language now... wut do?!

Prologue

We start with the usual Pythonistic pile of imports. The ready libraries are a big reason the language is hard to beat for getting things working quickly, and at the same time a dangerous temptation toward thinking you don't need to care what's inside them.

#!/usr/bin/python2
# J. Welsh, December 2019

from os import getenv, open as os_open, O_RDONLY, O_WRONLY, mkdir, mkfifo, read, write, close, stat
from stat import S_ISDIR, S_ISFIFO
from sys import argv, stdin, stdout, stderr, exit
from socket import socket
from threading import Thread, Event
from binascii import a2b_hex, b2a_hex
from base64 import b64encode
from struct import Struct
from hashlib import sha256 as _sha256
from decimal import Decimal
from inspect import getdoc
import errno
import signal
import string
import json
import sqlite3
from sqlite3 import IntegrityError

The above are all in the standard library, assuming they're enabled on your system. The ones that stick out like sore thumbs to me are threading and decimal; more on these to come.

As the comments say:

# Safety level: scanning stops this many blocks behind tip
CONFIRMS = 6

# There's no provision for handling forks/reorgs. In the event of one deeper than CONFIRMS, a heavy workaround would be:
#   $ sqlite3 ~/.gbw/db
#   sqlite> DELETE FROM output;
#   sqlite> DELETE FROM input;
#   sqlite> DELETE FROM tx;
#   sqlite> .exit
#   $ gbw-node reset
#   $ gbw-node scan

At least a semi-automated and lighter-touch recovery procedure would certainly be nice there.

gbw_home = getenv('HOME') + '/.gbw'
bitcoin_conf_path = getenv('HOME') + '/.bitcoin/bitcoin.conf'

# Further knobs in main() for database tuning.
db = None

This Is The Database; Use It.

For reasons I don't quite recall (probably interpreting hashes as integers, combined with pointer type punning - an unportable C programming practice common in Windows-land), bitcoind ended up reversing byte order compared to the internal representation for hex display of certain things including transaction and block hashes. Thus we have "bytes to little-endian hex" wrappers.

b2lx = lambda b: b2a_hex(b[::-1])
lx2b = lambda x: a2b_hex(x)[::-1]

Not taking any chances with display of monetary amounts, a function to convert integer Satoshi values to fixed-point decimal BTC notation. The remainder/modulus operators have varying definitions between programming languages (sometimes even between implementations of the same language!) when it comes to negative inputs, so we bypass the question.

def format_coin(v):
	neg = False
	if v < 0:
		v = -v
		neg = True
	s = '%d.%08d' % divmod(v, 100000000)
	if neg:
		return '-' + s
	return s

Preloading and giving more intelligible names to some "struct" based byte-packing routines.

u16 = Struct('<H')
u32 = Struct('<I')
u64 = Struct('<Q')
s64 = Struct('<q')
unpack_u16 = u16.unpack
unpack_u32 = u32.unpack
unpack_u64 = u64.unpack
unpack_s64 = s64.unpack
unpack_header = Struct('<i32s32sIII').unpack
unpack_outpoint = Struct('<32sI').unpack

Some shorthand for hash functions.

def sha256(v):
	return _sha256(v).digest()

def sha256d(v):
	return _sha256(_sha256(v).digest()).digest()

An exception type to indicate certain "should not happen" database inconsistencies.

class Conflict(ValueError):
	pass

For reading a complete stream from a low-level file descriptor; experience has led me to be suspicious of Python's file objects.

def read_all(fd):
	parts = []
	while True:
		part = read(fd, 65536)
		if len(part) == 0:
			break
		parts.append(part)
	return ''.join(parts)

Ensuring needed filesystem objects exist.

def require_dir(path):
	try:
		mkdir(path)
	except OSError, e:
		if e.errno != errno.EEXIST:
			raise
		if not S_ISDIR(stat(path).st_mode):
			die('not a directory: %r' % path)

def require_fifo(path):
	try:
		mkfifo(path)
	except OSError, e:
		if e.errno != errno.EEXIST:
			raise
		if not S_ISFIFO(stat(path).st_mode):
			die('not a fifo: %r' % path)

RPC client

Bitcoind uses a password-authenticated JSON-RPC protocol. I expect this is one of the more concise client implementations around.

class JSONRPCError(Exception):
	"Error returned in JSON-RPC response"

	def __init__(self, error):
		super(JSONRPCError, self).__init__(error['code'], error['message'])

	def __str__(self):
		return 'code: {}, message: {}'.format(*self.args)

Some of this code was cribbed from earlier experiments on my shelf. The fancy exception class above doesn't really look like my style; it may have hitchhiked from an outside JSON-RPC library.

The local bitcoin.conf is parsed to get the node's credentials. This is done lazily to avoid unnecessary error conditions for the many commands that won't be needing it.

bitcoin_conf = None
def require_conf():
	global bitcoin_conf
	if bitcoin_conf is None:
		bitcoin_conf = {}
		with open(bitcoin_conf_path) as f:
			for line in f:
				line = line.split('#', 1)[0].rstrip()
				if not line:
					continue
				k, v = line.split('=', 1)
				bitcoin_conf[k.strip()] = v.lstrip()

Side note: I detest that "global" keyword hack. It's "necessary" only because variable definition is conflated with mutation in the single "=" operator, and completely misses the case of a nested function setting a variable in an outer but not global scope. ("So they added 'nonlocal' in Python 3, solves your problem!!")

def rpc(method, *args):
	require_conf()
	host = bitcoin_conf.get('rpcconnect', '127.0.0.1')
	port = int(bitcoin_conf.get('rpcport', 8332))
	auth = 'Basic ' + b64encode('%s:%s' % (
		bitcoin_conf.get('rpcuser', ''),
		bitcoin_conf.get('rpcpassword', '')))
	payload = json.dumps({'method': method, 'params': args})
	headers = [
		('Host', host),
		('Content-Type', 'application/json'),
		('Content-Length', len(payload)),
		('Connection', 'close'),
		('Authorization', auth),
	]
	msg = 'POST / HTTP/1.1\r\n%s\r\n\r\n%s' % ('\r\n'.join('%s: %s' % kv for kv in headers), payload)
	sock = socket()
	sock.connect((host, port))
	sock.sendall(msg)
	response = read_all(sock.fileno())
	sock.close()
	headers, payload = response.split('\r\n\r\n', 1)
	r = json.loads(payload, parse_float=Decimal)
	if r['error'] is not None:
		raise JSONRPCError(r['error'])
	return r['result']

I could see removing the "parse_float=Decimal", and thus the corresponding import, as we won't be calling here any of the problematic interfaces that report monetary values as JSON numbers. But then, I'd also see value in one RPC client implementation that can just be copied for whatever use without hidden hazards.

Bitcoin data parsing

Now things might get interesting. To parse the serialized data structures in a manner similar to the C++ reference implementation and hopefully efficient besides, I used memory views, basically bounds-checking pointers.(i)

# "load" functions take a memoryview and return the object and number of bytes consumed.

def load_compactsize(v):
	# serialize.h WriteCompactSize
	size = ord(v[0])
	if size < 253:
		return size, 1
	elif size == 253:
		return unpack_u16(v[1:3])[0], 3
	elif size == 254:
		return unpack_u32(v[1:5])[0], 5
	else:
		return unpack_u64(v[1:9])[0], 9

def load_string(v):
	# serialize.h Serialize, std::basic_string and CScript overloads
	n, i = load_compactsize(v)
	return v[i:i+n].tobytes(), i+n

def vector_loader(load_element):
	# serialize.h Serialize_impl
	def load_vector(v):
		n, i = load_compactsize(v)
		r = [None]*n
		for elem in xrange(n):
			r[elem], delta = load_element(v[i:])
			i += delta
		return r, i
	return load_vector

def load_txin(v):
	# main.h CTxIn
	i = 36
	txid, pos = unpack_outpoint(v[:i])
	scriptsig, delta = load_string(v[i:])
	i += delta
	i += 4 # skipping sequence
	return (txid, pos, scriptsig), i

load_txins = vector_loader(load_txin)

def load_txout(v):
	# main.h CTxOut
	i = 8
	value, = unpack_s64(v[:i])
	scriptpubkey, delta = load_string(v[i:])
	return (value, scriptpubkey), i+delta

load_txouts = vector_loader(load_txout)

def load_transaction(v):
	# main.h CTransaction
	i = 4 # skipping version
	txins, delta = load_txins(v[i:])
	i += delta
	txouts, delta = load_txouts(v[i:])
	i += delta
	i += 4 # skipping locktime
	hash = sha256d(v[:i])
	return (hash, i, txins, txouts), i

load_transactions = vector_loader(load_transaction)

def load_block(v):
	# main.h CBlock
	i = 80
	head = v[:i]
	version, prev, root, time, target, nonce = unpack_header(head)
	hash = sha256d(head)
	txs, delta = load_transactions(v[i:])
	return (hash, prev, time, target, txs), i+delta

The code dig to come up with this magic for identifying standard pay-to-pubkey-hash outputs and extracting the enclosed addresses was ugly.

def out_script_address(s):
	# Standard P2PKH script: OP_DUP OP_HASH160 20 ... OP_EQUALVERIFY OP_CHECKSIG
	if len(s) == 25 and s[:3] == '\x76\xA9\x14' and s[23:] == '\x88\xAC':
		return s[3:23]
	return None

To be continued.(ii)

Updated for errata.

  1. I'm just now noticing these were added in 2.7, ugh... sorry, 2.6 users. [^]
  2. My blog will be going on hiatus as far as new articles until early January. There's quite a ways to go on this file and I might not make it all the way through on this pass. If the suspense gnaws, you can always keep reading the source! [^]

3 Comments »

  1. [...] 1, 2 [...]

    Pingback by Draft gbw-node frontend, part 3 « Fixpoint — 2020-01-19 @ 02:52

  2. [...] 1, 2, 3 [...]

    Pingback by Draft gbw-node frontend, part 4 « Fixpoint — 2020-01-19 @ 04:38

  3. [...] 1, 2, 3, 4 [...]

    Pingback by Draft gbw-node frontend, part 5 « Fixpoint — 2020-01-19 @ 19:03

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.