Fixpoint

2020-01-20

Draft gbw-node frontend, part 6

Filed under: Bitcoin, Software — Jacob Welsh @ 21:32

Continued from:

The first of the input/output commands is to print a table of possibly-spendable outputs in the format required by the offline signer. While the Bitcoin protocol refers to transactions by 256-bit hash, the more compact confirmation coordinates (height, index) are included for convenience in the comment field. The queries are a bit lengthy, since we now join several tables to build the flat output file, but aren't doing anything too fancy once you break them down. The only difference when a tag is specified is the extra join to filter on its ID.

In some cases, BLOB fields need to be converted back to str.(i)

def cmd_unspent_outs(argv):
	'''
	unspent-outs [TAG]

	Display the unspent outputs table for addresses with the given TAG (or all watched addresses), as required by the offline wallet, ordered by age.
	'''
	if len(argv) > 0:
		tag_id = require_tag(argv.pop(0))
		r = db.execute('SELECT address, value, hash, output.n, block_height, tx.n FROM output \
				JOIN address ON output.address_id = address.address_id \
				JOIN tx ON output.tx_id = tx.tx_id \
				JOIN address_tag ON output.address_id = address_tag.address_id \
				WHERE spent IS NULL AND tag_id=? \
				ORDER BY block_height DESC', (tag_id,))
	else:
		r = db.execute('SELECT address, value, hash, output.n, block_height, tx.n FROM output \
				JOIN address ON output.address_id = address.address_id \
				JOIN tx ON output.tx_id = tx.tx_id \
				WHERE spent IS NULL \
				ORDER BY block_height DESC')
	for a, v, hash, n_out, height, n_tx in r:
		stdout.write('%s %s %s %s #blk %s tx %s\n' % (format_address(str(a)), format_coin(v), b2lx(hash), n_out, height, n_tx))

Idea: Add a command to print the outputs table for a given raw transaction. For example, this would enable spending unconfirmed or too recently confirmed outputs in a pinch, without requiring any further changes. Or more generally: all the data conversion code is already here so might as well make it accessible.

Next we proceed to the accounting commands, as they're really just another kind of output command. The balance of an address set is the total value of unspent outputs to addresses in the set.

def cmd_balance(argv):
	'''
	balance [TAG]

	Display confirmed balance of addresses with the given TAG (or all watched addresses).
	'''
	if len(argv) > 0:
		tag_id = require_tag(argv.pop(0))
		r = db.execute('SELECT COALESCE(SUM(value),0) FROM output \
				JOIN address_tag ON output.address_id = address_tag.address_id \
				WHERE spent IS NULL AND tag_id=?', (tag_id,))
	else:
		r = db.execute('SELECT COALESCE(SUM(value),0) FROM output WHERE spent IS NULL')
	bal, = r.fetchone()
	stdout.write('%s\n' % format_coin(bal))

Things get tricker for the register report as it attempts to usefully summarize several things in a small space. In particular, summing the incoming and outgoing value per transaction seems to require separate queries since the join criteria differ.(ii)

def cmd_register(argv):
	'''
	register [TAG]

	Display a tab-delimited transaction register report for addresses with the given TAG (or all watched addresses). Columns are:

	- confirmation block height
	- number of transaction within block
	- total deposits (new outputs)
	- total withdrawals (spent outputs)
	- running balance
	'''
	if len(argv) > 0:
		tag_id = require_tag(argv.pop(0))
		outs = db.execute('SELECT block_height, tx.n, COALESCE(SUM(value),0) FROM tx \
				JOIN output ON output.tx_id = tx.tx_id \
				JOIN address_tag ON output.address_id = address_tag.address_id \
				WHERE tag_id=? \
				GROUP BY tx.tx_id \
				ORDER BY block_height, tx.n', (tag_id,))
		ins = db.execute('SELECT block_height, tx.n, COALESCE(SUM(value),0) FROM tx \
				JOIN input ON input.tx_id = tx.tx_id \
				JOIN output ON input.input_id = output.spent \
				JOIN address_tag ON output.address_id = address_tag.address_id \
				WHERE tag_id=? \
				GROUP BY tx.tx_id \
				ORDER BY block_height, tx.n', (tag_id,))
	else:
		outs = db.execute('SELECT block_height, tx.n, COALESCE(SUM(value),0) FROM tx \
				JOIN output ON output.tx_id = tx.tx_id \
				GROUP BY tx.tx_id \
				ORDER BY block_height, tx.n')
		ins = db.execute('SELECT block_height, tx.n, COALESCE(SUM(value),0) FROM tx \
				JOIN input ON input.tx_id = tx.tx_id \
				JOIN output ON input.input_id = output.spent \
				GROUP BY tx.tx_id \
				ORDER BY block_height, tx.n')
	bal = 0
	for height, n, o_val, i_val in merge_moves(outs.fetchall(), ins.fetchall()):
		bal = bal + o_val - i_val
		stdout.write('%s\t%s\t%s\t%s\t%s\n' % (height, n, format_coin(o_val), format_coin(-i_val), format_coin(bal)))

A helper is used to join the two possibly uneven lists by transaction, inserting zeros for transactions found on only one side. Perhaps it could all be done in SQL with subqueries and some type of outer joins, but I wasn't quite seeing it, so resorted to the low level with an algorithm reminiscent of the merging step of classical mergesort.

# Merge ordered lists of total input and output values per transaction into single table with columns for both.
def merge_moves(outs, ins):
	i = o = 0

	while True:
		if o == len(outs):
			for height, n, val in ins[i:]:
				yield (height, n, 0, val)
			return
		o_height, o_n, o_val = outs[o]
		o_key = (o_height, o_n)

		if i == len(ins):
			for height, n, val in outs[o:]:
				yield (height, n, val, 0)
			return
		i_height, i_n, i_val = ins[i]
		i_key = (i_height, i_n)

		if o_key < i_key:
			yield (o_height, o_n, o_val, 0)
			o += 1
		elif i_key < o_key:
			yield (i_height, i_n, 0, i_val)
			i += 1
		else:
			yield (o_height, o_n, o_val, i_val)
			i += 1
			o += 1

Next, the input commands. For sanity's sake, we exclude newlines in tag names as implicitly required by the tags listing format.

def cmd_watch(argv):
	'''
	watch [TAG]

	Import a set of addresses to watch linewise from stdin, optionally named by the given TAG. Addresses can be associated with multiple tags using multiple watch commands.
	'''
	tag_id = None
	if len(argv) > 0:
		name = argv.pop(0)
		if '\n' in name:
			die('newline not allowed in tag name')
		tag_id = insert_or_get_tag_id(name)
	while True:
		l = stdin.readline()
		if len(l) == 0:
			break
		addr_id = insert_or_get_address_id(parse_address(l.rstrip('\n')))
		if tag_id is not None:
			try:
				db.execute('INSERT INTO address_tag (address_id, tag_id) VALUES (?,?)',
						(addr_id, tag_id))
			except IntegrityError:
				pass
		db.commit()

def cmd_push(argv):
	'''
	push

	Import raw hex transactions linewise from stdin and send to bitcoind.
	'''
	while True:
		line = stdin.readline()
		if len(line) == 0:
			break
		tx_hex = line.rstrip('\n')
		stdout.write('txid %s\n' % rpc('sendrawtransaction', tx_hex))

General or command-specific help, and a command registry allowing abbreviation:

def cmd_help(argv):
	'''
	help [COMMAND]

	Display help for a given command or list all commands.
	'''
	if len(argv) > 0:
		name = argv.pop(0)
		name, func = get_command(name)
		doc = getdoc(func)
		if doc is None:
			stdout.write('No help for %r\n' % name)
		else:
			stdout.write('gbw-node %s\n' % doc)
	else:
		stdout.write('''Usage: gbw-node COMMAND [ARGS]

Available commands (can be abbreviated when unambiguous):

%s
''' % '\n'.join([name for name, f in cmds]))

cmds = (
	('help', cmd_help),
	('scan', cmd_scan),
	('reset', cmd_reset),
	('tags', cmd_tags),
	('addresses', cmd_addresses),
	('unspent-outs', cmd_unspent_outs),
	('watch', cmd_watch),
	('push', cmd_push),
	('balance', cmd_balance),
	('register', cmd_register),
)

def get_command(name):
	rows = [r for r in cmds if r[0].startswith(name)]
	if len(rows) == 0:
		die('command not found: %s' % name)
	if len(rows) > 1:
		die('ambiguous command %s. Completions: %s' % (name, ' '.join([r[0] for r in rows])))
	return rows[0]

When invoked as a program (as opposed to imported elsewhere e.g. for testing), we connect to the database, enable foreign key constraints, and boost cache size and checkpoint interval from the meager defaults. These can be tuned if needed to optimize the scan process for your machine. Finally we dispatch to the given command.

Ideally, we'd create the database from schema here if not found.

def main():
	global db
	signal.signal(signal.SIGINT, signal.SIG_DFL)
	require_dir(gbw_home)
	db = sqlite3.connect(gbw_home + '/db', timeout=600) # in seconds
	db.execute('PRAGMA foreign_keys=ON')
	db.execute('PRAGMA cache_size=-8000') # negative means in KiB
	db.execute('PRAGMA wal_autocheckpoint=10000') # in pages (4k)
	if len(argv) < 2:
		die('missing command', help=True)
	get_command(argv[1])[1](argv[2:])

if __name__ == '__main__':
	main()

This concludes the node frontend. Congratulations if you've followed thus far! There's no magic in programming, just a ruthless decomposition of bigger problems into smaller ones, a search for useful and robust abstractions -- and of course a whole lot of background reading and practice.

In the next month or two I will be completing the missing pieces of the signer; meanwhile, the code here is quite ready to play with. Import some addresses, run a scan, run the reports, and let me know how it goes in the comments below.

  1. Well, so far the only such case is format_address, so perhaps it should just be changed to allow passing a buffer. [^]
  2. It's looking like the COALESCE trick is pointless here, since rows are only generated by the join when matching outputs are present; that is, the SUM aggregation is always getting at least one row. Was I overzealous before? I don't recall if I observed an actual problem here rather than just in cmd_balance. It does no harm to leave in though, at least as far as correctness. [^]

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.