Fixpoint

2020-02-02

From the forum log, 6-18 January 2020

Filed under: Summaries — Jacob Welsh @ 20:55

From #trilema: 2020-01-06 - 2020-01-18.

BingoBoingo reported on "Madre Dios" promoters, LabMD vs. Tiversa, and Iran striking back against US hostilities in Iraq.

Lobbes was planning a Gales install then return to mp-wp logger work.

Asciilifeform's FFA series sprung back into action.

MP and BingoBoingo discussed USG politics between the military, executive and bureaucracy in the context of Iran escalations and Trump impeachment.

MP poked dorion, noting his recent silence after an initial self-imposed overload. Dorion responded in convalescence from his intense Hawaiian cruise.

Diana_coman and spyked investigated feedbot behavior in the face of a deluge of repaired trackbacks in the trilema comments feed. Spyked set himself a deadline for implementing synchronous unsubscribing.

Lulz were had from Warner Brothers, Paul Krugman, and contemplation of the Great Again arriving by way of hot war in the US. One of BingoBoingo's articles attracted some friendly hate-mail apparently from a Vancouver IP.

Trinque was busy with research for his upcoming BusyBox-centered article, and still needed to get the deedbot web view updated. Dorion noted his blog was stripping html tags and links from comments.

Diana_coman republished the current Eulora client and dependency pile, it having otherwise gone offline. Mocky set himself a date to get his portion restored.

Dorion wondered if MP's recent comment about Linux 2.6 being good enough meant bvt should backport his RNG work, and re-poked ave1 about availability for compiler work. MP and bvt figured the backport shouldn't be too hard, due to good decisions in that regard of bvt and asciilifeform. Trinque and dorion seem to be in harmony on software or more general topics, in particular having reasons for decisions and avoiding runaway complexity.

Upon full restoration of deedbot functionality, BingoBoingo proceeded with the mike_c case and MP acknowledged the ~123 BTC claim with minimal further ado, to be settled once he's off the road. He critiqued BingoBoingo's handling of the fee structure.

MP was displeased by link rot on therealbitcoin.org; mod6 repaired it.

Diana_coman wondered about the rationale for detached signatures in V; MP attributed it to an unexamined(i) stand by asciilifeform.

MP reflected on the decline of the US since the 9/11 headshot.

A series of computer graphics articles by diana_coman inspired bvt to write about his formative experiences with the subject. She and MP highlight the benefits of writing vigorously since you never know what spark you might strike.

Lobbes's latest penance piece was well received. MP and diana_coman contemplated how it might have helped phf and when men or women may be too late to be fixed.

MP, lobbes and whaack explored some details of the mp-wp html selection mechanism.

Dorion did some article chasing and reporting on the OS project, and had received a positive response from font of EFI knowledge Rod Smith.

  1. Funny how my Firefox spellchecker doesn't know this word. [^]

2020-01-31

From the forum log, 31 December 2019 - 5 January 2020

Filed under: Summaries — Jacob Welsh @ 22:29

From #trilema: 2019-12-31 through 2020-01-05.

BingoBoingo deployed Qntra DNS service to redundant sites, until the whole fleet ran aground a few days later. Initial reporting suggested the process of propagating records between NSD instances was vulnerable to clock skew. Meanwhile, trinque woke up the deeding part of deedbot from a slumber of unclear duration, and mod6 restored his blog after October's untimely death of the all-conquering ISP in October.(i)

Diana_coman figured it wasn't the start of a new decade because the AD timeline starts at 1.

MP pointed out that seemingly impossible choices of the sort diana_coman was facing in computer graphics are how you know you're getting somewhere.

To assorted lulz with a side of rage, MP brought out a recent Reddit thread wherein a noob's bewilderment on discovering Trilema was answered by Bitcoin Power Ranger Gregory Maxwell with a pile of lies and projection. The trouble of reading with fixed codecs was contemplated, as well as the phenomenon of the inept attempting to emulate institutions they haven't understood while thinking their own superior.

Not unrelatedly, MP stumbled on a quirk of PHP, ascribing it to mental failings of the language's originators.(ii)

BingoBoingo offered four new public Bitcoin nodes for mod6's advertised list.

  1. Leaving Mocky as the last of the lordship still out his mouthpiece. [^]
  2. Such as for example: let's make something that looks like C, because that's popular and all, but For The Web!, fix only its most obvious failings and remain by all appearances unaware that such a thing as Lisp ever existed. [^]

2020-01-30

From the forum log, 30 December 2019

Filed under: Paidagogia, Software, Summaries — Jacob Welsh @ 18:48

From #trilema: 2019-12-30.

MP clarified for trinque that nobody's proposing all software be installed everywhere, and his own point was more about making gradual code and design cleanups so the mess becomes much smaller. A V tree could grow in different directions for different purposes, with a GUI environment likely being an early such split.

He weighed in on the Lisp critique thread. The latest ASDF was wrecked by Francois-Rene Rideau aka fare when the elder Lispers were no longer around to stop it. MP's experience with Lisp code had been that most ended up being Python anyway. He cited the late Erik Naggum on the trouble with free Lisp implementations lacking the more powerful features, but lambasted him for checking out of life before he could do anything useful, along with recent TMSR dropouts. He formulated the purpose of Lisp as for solving closed problems in the most efficient way, once problems are adequately understood.

In computer graphics, the nonsense of "XML shaders" was more popular than diana_coman knew.

Hanbot noted the use of Scheme scripting in Gimp, apropos of both practical Lisp usage and bypassing large dependency trees as seen in ImageMagick.

Mocky said he'll catch up.

MP and diana_coman picked up a discussion from blog comments about Eulora work, wherein various parts are stalled; the project outgrew the world it came from, in terms of both tools and artists. MP explored the art question by comparing valuation of a lone fighter aircraft, to a black-box Go-playing AI, to a cube of colored pixels: all being piles of accumulated tinkering, results of another's process, useless on their own without access to the process. Maintaining infrastructure to cater to the current crop of artists would be costly; growing new talent likewise. Hanbot had once done some Blender character modeling work but got stuck on the last step of exporting to a usable format; unfortunately she didn't recall documenting the struggle. MP posed the practical options as either trawling the entire Internet for artwork and sorting out what's usable, or machine-generating it and doing similar, at least knowing it's in a usable format. Diana_coman wasn't enthused by either. MP was interested in doing the first anyway, inviting unemployed readership to talk.

For the first time ever, MP found himself struggling with recent publication volume. He and diana_coman pointed out some kinks in trinque's blog setup; trinque started looking into it.

2020-01-29

From the forum log, 27-29 December 2019

Filed under: Summaries — Jacob Welsh @ 00:36

From #trilema: 2019-12-27, 2019-12-28, 2019-12-29.

Diana_coman updated some WoT ratings, most notably of asciilifeform.

MP pushed trinque to expand on a point and not redeploy the suppression machine now. He found an article that someone had been looking for, discussing the toddler's notion of a static world and surprise at encountering dynamic equilibrium, but then couldn't recall where the reference had been needed. He noted some shortcomings in a TMSR-OS article by dorion; diana_coman was already on the case.

Mocky returned 1 BTC that MP had lent him in late 2018 toward the cause of escaping stupidity. MP was pleased, but pushed him to get up to speed on the blogging and prospective reporting game that had been stepping up across the board in his absence.

Trinque not having replied yet, MP expanded his own point on the suppression machine: going through life, one is constantly uncovering new personal deficiencies. Each time, one faces a difficult choice: to face and overcome it, painful because it requires killing a part of one's youth; to accept insufficiency; or to pretend it doesn't exist or doesn't matter and eventually end up an overgrown teenage loser (possibly hanging out in #asciilifeform to commiserate). He went on to note that the existence of evil(i) is essential for life to be interesting, but that this doesn't make it any less evil.

Meanwhile, Wikipedia deprecated itself from hanbot's web browser, to amusement all around, and BingoBoingo observed vast police reporting discrepancies on a cocaine bust based on differing local market conditions.

Trinque followed up with willingness to adapt; to that end he was working on an OS article series, and kind-of requested a castle deed. As a teaser, he noted his finding on the weight of the GRUB2 bootloader denominated in units of BusyBox. He tried to get a point through to jfw and dorion that their Gales project didn't exist as a thing worth naming, by reason of building on far larger components, yet he still thought the work was valuable. Dorion characterized it as a stepping stone.

MP was surprised by spyked's feedbot sending him a deluge, but realized this was "user error". He humorously noted the superlative passivity of trinque's deed request, but resolved to grant it.

BingoBoingo was awaiting deedbot processing to proceed with the mike_c case; MP said they'd pick it back up after New Year's.

MP praised the absence of unnecessary scripting language dependencies in the base Gales system, noted by bvt in his installation report. On the question of how practical the system would be for wrangling packages with large dependency graphs, he figured the preferable approach would be to work on trimming down those packages toward sanity and reject traditional notions of "installing". Trinque was keen that this not result in everything ending up installed everywhere; there was brief discussion of the nature or desired shape of V trees.

Experience had soured trinque on the relative usefulness of Lisp, or at least the popular free implementations. He discussed with whaack, pointing out the subtleties of different complexity proxies.

Diana_coman had tried to run trinque's deedbot paste uploader code(ii) and urged him to publish the server part too; he agreed. She had found the SBCL/ASDF "ecosystem" to be a mess rivaling Python.

  1. Or problems. [^]
  2. My own "bash one-liner" for pasting from standard input (it happens to be written in Python, because I tend to have that handy and trust it a bit more than curl):

    import urllib2, urllib, sys; print urllib2.urlopen(urllib2.Request('http://paste.deedbot.org', data=urllib.urlencode([('pastebox', sys.stdin.read())]))).geturl() [^]

2020-01-26

From the forum log, 24-26 December 2019

Filed under: Summaries — Jacob Welsh @ 02:40

From #trilema: 2019-12-24, 2019-12-25, 2019-12-26.

The logs were short for this interval around "thanksgiving" (as MP joked), though entry points tempting the reader deeper into the maze of links were plentiful as ever.

MP had liked an article by whaack. He noted being confused by the lack of sidebar on articles. Whaack said he'd have a look.

Bvt furnished the links to the English versions of the rest of MP's "essences" trilogy which MP and dorion hadn't found.

Diana_coman signed and mirrored jfw's updated keksum genesis and noted he had borked the article title.

BingoBoingo was pleased to see Uruguayan leftists fighting amongst themselves.

Hanbot pointed to the toxicity of evoorhees not having clued in a younger dorion about a then-recent meetup with MP in Romania; dorion clarified.

MP asked lobbes and diana_coman to de-list the #asciilifeform channel from their log display banners, and wondered what to do about his recently-signed castle deed, lamenting the troubles of working with people. Diana_coman had been pondering what should qualify for listing anyway; MP proposed that it's better to be right than principled, drawing analogy to the old theory vs. craptice distinction.

Diana_coman answered shrysr's insistent fuck-off with a -8 WoT rating. BingoBoingo updated several of his ratings.

Trinque said that the year's events had affected him and he suffered from an environmentally-induced suppression machine.

2020-01-25

From the forum log, 22-23 December 2019

Filed under: Summaries — Jacob Welsh @ 04:50

This time I'm aiming for a lighter approach to the summarizing, primarily relying on memory.

From #trilema: 2019-12-22, 2019-12-23.

MP talked about how you can't get 1960's Texas back, from all sorts of angles; and that the people then would have longed for their own perceived good old days, and so on recursively. Spanish poetry was involved. Finding happiness in a place requires reasonable expectations.

"Pichiaje" was introduced to the lexicon from the streets of Montevideo.

MP contemplated the merits of just killing all the other players in game - but then how would he sell them stuff? A buyers' internment camp was offered as a solution off-screen; he concluded that's what the US is after all.

Mocky dropped in to say hello and that he'd taken on more than he could handle with three girlfriends at once. MP wasn't buying it, demanding the pictures. Lobbes offered a visit to include settling a lunch debt.

MP went on to explain various female-centered or at least female-demanding arts - dance, theater, film, singing - as having always been a form of male competition. "Gentlemen" - the term itself meaning that they seek less bloody ways to compete than direct combat - had devised ever more elaborate feats for which to train their sluts (his general term of approval for acceptable young women), in much the same vein as horse racing.

BingoBoingo took some heat for his manner of proceeding in the emerging mike_c dispute, where he had been hired to represent mike_c's claims in the forum. Particular missteps included direct citation of text from a channel downstream in the flow of authority, and delaying in getting to the point of establishing the details and basis of the claim. He went on to rectify that by filing a deed of mike_c's last MPEx-signed account statement, seemingly revealing the assets in question to be valued around 110 BTC. There was also ongoing skeptical inquiry into his arguments presented so far, for example whether the swift payment of a large fee for a Lord's representation constituted an act of submission to the sovereignty of the Republic and its WoT-based power structure.

MP prompted dorion to learn the distinction between imminence and immanence. Dorion had read a Trilema series on essences, but not Kant.

2020-01-24

Forum log not-quite-summary, 18-19 December 2019

Filed under: Summaries, Tempus fugit — Jacob Welsh @ 06:30

Thanks to my buggy, unspecified, slow implementation of half a personal scheduling algorithm - and I dunno, perhaps some degree of avoidance for reasons yet uncertain - I've repeatedly deferred the task of catching up with the Forum Log of #trilema from where I left off for holidays in December. With the idea of taking care of both this and daily blogging in one stroke, I've been assigned to write articles summarizing what I read for the day, on the usual one-hour time limit, though I'm going for two hours daily on the reading. Here's the stubby result of my first try; on the reading I had reached the end of the 21st.

End of 2019-12-18(i)

Trinque expands for jfw on what he's after in a minimal operating system, excludes building towards the purpose of supporting particular large applications, and shares his favored type of client for this sort of work and how the client's needs affect the outcome. He critiques jfw's wallet security strategy and holds him to blame for the acknowledged flaws of the inherited software in his distribution; trinque notes his self-critical nature as he had made the same choices.

In a discussion of the consequences of negrating that arose upon the reappearance of absentee investor mike_c, trinque weighs in against repossession of funds.

2019-12-19

MP adds some wordplay to the fun he's having in morning "whorship". BingoBoingo has a tentative agreement for representing mike_c. He and MP two chat about local happenings; BingoBoingo has over two years now in Uruguay. He reports on his datacenter employment search; MP offers encouragement and likens the southern and northern Argentines (i.e. USians), their only difference in outcomes owing to the historical accident of life support from China. Strained Argentina-Uruguay relations are mentioned, then BingoBoingo's choice of venues for speaking with mike_c. BingoBoingo had two pleasant visits that month and is keen to see Northern Latino lands (presumably such as Costa Rica where MP will be returning).

  1. This part was a re-read, but included since that's where I started today. Its being addressed to me while I've not yet otherwise responded might have added some pointless awkwardness to the writing process. [^]

2020-01-20

Draft gbw-node frontend, part 6

Filed under: Bitcoin, Software — Jacob Welsh @ 21:32

Continued from:

The first of the input/output commands is to print a table of possibly-spendable outputs in the format required by the offline signer. While the Bitcoin protocol refers to transactions by 256-bit hash, the more compact confirmation coordinates (height, index) are included for convenience in the comment field. The queries are a bit lengthy, since we now join several tables to build the flat output file, but aren't doing anything too fancy once you break them down. The only difference when a tag is specified is the extra join to filter on its ID.

In some cases, BLOB fields need to be converted back to str.(i)

def cmd_unspent_outs(argv):
	'''
	unspent-outs [TAG]

	Display the unspent outputs table for addresses with the given TAG (or all watched addresses), as required by the offline wallet, ordered by age.
	'''
	if len(argv) > 0:
		tag_id = require_tag(argv.pop(0))
		r = db.execute('SELECT address, value, hash, output.n, block_height, tx.n FROM output \
				JOIN address ON output.address_id = address.address_id \
				JOIN tx ON output.tx_id = tx.tx_id \
				JOIN address_tag ON output.address_id = address_tag.address_id \
				WHERE spent IS NULL AND tag_id=? \
				ORDER BY block_height DESC', (tag_id,))
	else:
		r = db.execute('SELECT address, value, hash, output.n, block_height, tx.n FROM output \
				JOIN address ON output.address_id = address.address_id \
				JOIN tx ON output.tx_id = tx.tx_id \
				WHERE spent IS NULL \
				ORDER BY block_height DESC')
	for a, v, hash, n_out, height, n_tx in r:
		stdout.write('%s %s %s %s #blk %s tx %s\n' % (format_address(str(a)), format_coin(v), b2lx(hash), n_out, height, n_tx))

Idea: Add a command to print the outputs table for a given raw transaction. For example, this would enable spending unconfirmed or too recently confirmed outputs in a pinch, without requiring any further changes. Or more generally: all the data conversion code is already here so might as well make it accessible.

Next we proceed to the accounting commands, as they're really just another kind of output command. The balance of an address set is the total value of unspent outputs to addresses in the set.

def cmd_balance(argv):
	'''
	balance [TAG]

	Display confirmed balance of addresses with the given TAG (or all watched addresses).
	'''
	if len(argv) > 0:
		tag_id = require_tag(argv.pop(0))
		r = db.execute('SELECT COALESCE(SUM(value),0) FROM output \
				JOIN address_tag ON output.address_id = address_tag.address_id \
				WHERE spent IS NULL AND tag_id=?', (tag_id,))
	else:
		r = db.execute('SELECT COALESCE(SUM(value),0) FROM output WHERE spent IS NULL')
	bal, = r.fetchone()
	stdout.write('%s\n' % format_coin(bal))

Things get tricker for the register report as it attempts to usefully summarize several things in a small space. In particular, summing the incoming and outgoing value per transaction seems to require separate queries since the join criteria differ.(ii)

def cmd_register(argv):
	'''
	register [TAG]

	Display a tab-delimited transaction register report for addresses with the given TAG (or all watched addresses). Columns are:

	- confirmation block height
	- number of transaction within block
	- total deposits (new outputs)
	- total withdrawals (spent outputs)
	- running balance
	'''
	if len(argv) > 0:
		tag_id = require_tag(argv.pop(0))
		outs = db.execute('SELECT block_height, tx.n, COALESCE(SUM(value),0) FROM tx \
				JOIN output ON output.tx_id = tx.tx_id \
				JOIN address_tag ON output.address_id = address_tag.address_id \
				WHERE tag_id=? \
				GROUP BY tx.tx_id \
				ORDER BY block_height, tx.n', (tag_id,))
		ins = db.execute('SELECT block_height, tx.n, COALESCE(SUM(value),0) FROM tx \
				JOIN input ON input.tx_id = tx.tx_id \
				JOIN output ON input.input_id = output.spent \
				JOIN address_tag ON output.address_id = address_tag.address_id \
				WHERE tag_id=? \
				GROUP BY tx.tx_id \
				ORDER BY block_height, tx.n', (tag_id,))
	else:
		outs = db.execute('SELECT block_height, tx.n, COALESCE(SUM(value),0) FROM tx \
				JOIN output ON output.tx_id = tx.tx_id \
				GROUP BY tx.tx_id \
				ORDER BY block_height, tx.n')
		ins = db.execute('SELECT block_height, tx.n, COALESCE(SUM(value),0) FROM tx \
				JOIN input ON input.tx_id = tx.tx_id \
				JOIN output ON input.input_id = output.spent \
				GROUP BY tx.tx_id \
				ORDER BY block_height, tx.n')
	bal = 0
	for height, n, o_val, i_val in merge_moves(outs.fetchall(), ins.fetchall()):
		bal = bal + o_val - i_val
		stdout.write('%s\t%s\t%s\t%s\t%s\n' % (height, n, format_coin(o_val), format_coin(-i_val), format_coin(bal)))

A helper is used to join the two possibly uneven lists by transaction, inserting zeros for transactions found on only one side. Perhaps it could all be done in SQL with subqueries and some type of outer joins, but I wasn't quite seeing it, so resorted to the low level with an algorithm reminiscent of the merging step of classical mergesort.

# Merge ordered lists of total input and output values per transaction into single table with columns for both.
def merge_moves(outs, ins):
	i = o = 0

	while True:
		if o == len(outs):
			for height, n, val in ins[i:]:
				yield (height, n, 0, val)
			return
		o_height, o_n, o_val = outs[o]
		o_key = (o_height, o_n)

		if i == len(ins):
			for height, n, val in outs[o:]:
				yield (height, n, val, 0)
			return
		i_height, i_n, i_val = ins[i]
		i_key = (i_height, i_n)

		if o_key < i_key:
			yield (o_height, o_n, o_val, 0)
			o += 1
		elif i_key < o_key:
			yield (i_height, i_n, 0, i_val)
			i += 1
		else:
			yield (o_height, o_n, o_val, i_val)
			i += 1
			o += 1

Next, the input commands. For sanity's sake, we exclude newlines in tag names as implicitly required by the tags listing format.

def cmd_watch(argv):
	'''
	watch [TAG]

	Import a set of addresses to watch linewise from stdin, optionally named by the given TAG. Addresses can be associated with multiple tags using multiple watch commands.
	'''
	tag_id = None
	if len(argv) > 0:
		name = argv.pop(0)
		if '\n' in name:
			die('newline not allowed in tag name')
		tag_id = insert_or_get_tag_id(name)
	while True:
		l = stdin.readline()
		if len(l) == 0:
			break
		addr_id = insert_or_get_address_id(parse_address(l.rstrip('\n')))
		if tag_id is not None:
			try:
				db.execute('INSERT INTO address_tag (address_id, tag_id) VALUES (?,?)',
						(addr_id, tag_id))
			except IntegrityError:
				pass
		db.commit()

def cmd_push(argv):
	'''
	push

	Import raw hex transactions linewise from stdin and send to bitcoind.
	'''
	while True:
		line = stdin.readline()
		if len(line) == 0:
			break
		tx_hex = line.rstrip('\n')
		stdout.write('txid %s\n' % rpc('sendrawtransaction', tx_hex))

General or command-specific help, and a command registry allowing abbreviation:

def cmd_help(argv):
	'''
	help [COMMAND]

	Display help for a given command or list all commands.
	'''
	if len(argv) > 0:
		name = argv.pop(0)
		name, func = get_command(name)
		doc = getdoc(func)
		if doc is None:
			stdout.write('No help for %r\n' % name)
		else:
			stdout.write('gbw-node %s\n' % doc)
	else:
		stdout.write('''Usage: gbw-node COMMAND [ARGS]

Available commands (can be abbreviated when unambiguous):

%s
''' % '\n'.join([name for name, f in cmds]))

cmds = (
	('help', cmd_help),
	('scan', cmd_scan),
	('reset', cmd_reset),
	('tags', cmd_tags),
	('addresses', cmd_addresses),
	('unspent-outs', cmd_unspent_outs),
	('watch', cmd_watch),
	('push', cmd_push),
	('balance', cmd_balance),
	('register', cmd_register),
)

def get_command(name):
	rows = [r for r in cmds if r[0].startswith(name)]
	if len(rows) == 0:
		die('command not found: %s' % name)
	if len(rows) > 1:
		die('ambiguous command %s. Completions: %s' % (name, ' '.join([r[0] for r in rows])))
	return rows[0]

When invoked as a program (as opposed to imported elsewhere e.g. for testing), we connect to the database, enable foreign key constraints, and boost cache size and checkpoint interval from the meager defaults. These can be tuned if needed to optimize the scan process for your machine. Finally we dispatch to the given command.

Ideally, we'd create the database from schema here if not found.

def main():
	global db
	signal.signal(signal.SIGINT, signal.SIG_DFL)
	require_dir(gbw_home)
	db = sqlite3.connect(gbw_home + '/db', timeout=600) # in seconds
	db.execute('PRAGMA foreign_keys=ON')
	db.execute('PRAGMA cache_size=-8000') # negative means in KiB
	db.execute('PRAGMA wal_autocheckpoint=10000') # in pages (4k)
	if len(argv) < 2:
		die('missing command', help=True)
	get_command(argv[1])[1](argv[2:])

if __name__ == '__main__':
	main()

This concludes the node frontend. Congratulations if you've followed thus far! There's no magic in programming, just a ruthless decomposition of bigger problems into smaller ones, a search for useful and robust abstractions -- and of course a whole lot of background reading and practice.

In the next month or two I will be completing the missing pieces of the signer; meanwhile, the code here is quite ready to play with. Import some addresses, run a scan, run the reports, and let me know how it goes in the comments below.

  1. Well, so far the only such case is format_address, so perhaps it should just be changed to allow passing a buffer. [^]
  2. It's looking like the COALESCE trick is pointless here, since rows are only generated by the join when matching outputs are present; that is, the SUM aggregation is always getting at least one row. Was I overzealous before? I don't recall if I observed an actual problem here rather than just in cmd_balance. It does no harm to leave in though, at least as far as correctness. [^]

2020-01-19

Draft gbw-node frontend, part 5

Filed under: Bitcoin, Software — Jacob Welsh @ 19:02

Continued from:

Command implementations

The core scanning logic is in a helper function that takes a block's height and a memory view of its contents.

Referential integrity between blocks is ensured by scanning sequentially by height; that is, all relevant tx and output records from prior blocks will be known by the time we see the inputs that spend them. However, as far as I know this topological ordering is not guaranteed for the transaction sequence within a block (eg. tx 1 could spend outputs of tx 2, or vice versa) so we do separate passes over the transaction list for outputs and inputs.

def scan_block(height, v):
	stdout.write('block %s' % height)
	# [perf] computing every tx hash
	(blkhash, prev, time, target, txs), size = load_block(v)

The performance comment above was just to note some not-strictly-necessary work being done, in case the scan ended up horribly slow.(i)

An output is relevant if its script is standard and pays a known address. At least with foreign key constraints enabled, we can't insert an output until the tx record it references exists, but we don't know whether to insert the tx until we see if any of its outputs are relevant, so we again use a two-pass approach.

	count_out = 0
	n_tx = 0
	for (hash, size, txins, txouts) in txs:
		matched_outs = []
		for n, txout in enumerate(txouts):
			val, script = txout
			a = out_script_address(script)
			if a is not None:
				#print format_address(a)
				addr_id = get_address_id(a)
				if addr_id is not None:
					matched_outs.append((n, addr_id, val))
		if len(matched_outs) > 0:
			tx_id = insert_or_get_tx_id(hash, blkhash, height, n_tx, size)
			for n, addr_id, val in matched_outs:
				insert_output(tx_id, n, addr_id, val)
			count_out += len(matched_outs)
		n_tx += 1
	stdout.write(' new-outs %s' % count_out)

An input is relevant if it spends a known output. Recall that insert_input updates the corresponding output to create the back-reference, indicating it has been spent.

	# Inputs scanned second in case an output from the same block is spent.
	# Coinbase (input of first tx in block) doesn't reference anything.
	count_in = 0
	n_tx = 1
	for (hash, size, txins, txouts) in txs[1:]:
		matched_ins = []
		for n, txin in enumerate(txins):
			prevout_hash, prevout_n, scriptsig = txin
			prevout_tx_id = get_tx_id(prevout_hash)
			if prevout_tx_id is not None:
				prevout_id = get_output_id(prevout_tx_id, prevout_n)
				if prevout_id is not None:
					matched_ins.append((n, prevout_id))
		if len(matched_ins) > 0:
			tx_id = insert_or_get_tx_id(hash, blkhash, height, n_tx, size)
			for n, prevout_id in matched_ins:
				insert_input(tx_id, n, prevout_id)
			count_in += len(matched_ins)
		n_tx += 1
	stdout.write(' spent-outs %s\n' % count_in)

Assorted helpers: handling usage errors; looking up a tag ID that must exist.

def die(msg, help=False):
	stderr.write('gbw-node: %s\n' % msg)
	if help:
		cmd_help([])
	exit(-1)

def require_tag(name):
	i = get_tag_id(name)
	if i is None:
		die('tag not found: %r' % name)
	return i

The entry point for any user command "X" is the function "cmd_X", having help text in its docstring and taking a list of any supplied CLI arguments past the command name.

First, the sync commands. The scan process commits one database transaction per block.

def cmd_scan(argv):
	'''
	scan

	Iterate blocks from bitcoind, indexing transaction inputs and outputs affecting watched addresses. May be safely interrupted and resumed.

	NOT PRESENTLY SAFE TO RUN CONCURRENT INSTANCES due to the dumpblock to named pipe kludge.
	'''
	db.execute('PRAGMA synchronous=NORMAL')
	height = db.execute('SELECT scan_height FROM state').fetchone()[0]
	blockcount = max(-1, rpc('getblockcount') - CONFIRMS)
	while height < blockcount:
		height += 1
		scan_block(height, memoryview(getblock(height)))
		db.execute('UPDATE state SET scan_height = ?', (height,))
		db.commit()

def cmd_reset(argv):
	'''
	reset

	Reset the scan pointer so the next scan will proceed from the genesis block, to find transactions associated with newly watched addresses.
	'''
	db.execute('UPDATE state SET scan_height = -1')
	db.commit()

Next, commands to query the watched address sets (not in the original spec but trivial and clearly useful).

def cmd_tags(argv):
	'''
	tags

	List all tag names.
	'''
	for name, in db.execute('SELECT name FROM tag'):
		stdout.write(name + '\n')

def cmd_addresses(argv):
	'''
	addresses [TAG]

	List addresses with the given TAG (or all watched addresses).
	'''
	if len(argv) > 0:
		tag_id = require_tag(argv.pop(0))
		r = db.execute('SELECT address FROM address \
				JOIN address_tag ON address.address_id=address_tag.address_id \
				WHERE tag_id=?', (tag_id,))
	else:
		r = db.execute('SELECT address FROM address')
	for a, in r:
		stdout.write(format_address(str(a)) + '\n')

To be continued.

  1. I've found the Python profiler quite useful so far compared to such guesswork; still, optimization is something of a balance between experimentally-driven efforts and not doing obviously wasteful things from the start. [^]

Draft gbw-node frontend, part 4

Filed under: Bitcoin, Software — Jacob Welsh @ 04:36

Continued from:

Common database operations

As an internal convention, a "get_X_id" function will return the database ID for the row in table "X" named by its bulkier external reference, or None if not found. Similarly, "insert_or_get_X_id" will insert a row if needed and in either case return the ID. Some of these have only a single caller, but I find that collecting the various similar queries in one place and wrapping them into tidy functions helps readability.

The mapping of Python to SQLite types is fairly straightforward, except that buffer is needed to specify a BLOB.

The "parameter substitution" feature is used throughout, avoiding improper mixing of code and data that could manifest as SQL injection or thrashing the compiled statement cache.

def get_address_id(a):
	r = db.execute('SELECT address_id FROM address WHERE address=?', (buffer(a),)).fetchone()
	return None if r is None else r[0]

def insert_or_get_address_id(a):
	i = get_address_id(a)
	if i is not None:
		return i
	return db.execute('INSERT INTO address (address) VALUES (?)', (buffer(a),)).lastrowid

def get_tx_id(hash):
	r = db.execute('SELECT tx_id FROM tx WHERE hash=?', (buffer(hash),)).fetchone()
	return None if r is None else r[0]

def insert_or_get_tx_id(hash, blkhash, height, n, size):
	try:
		return db.execute('INSERT INTO tx (hash, block_hash, block_height, n, size) VALUES (?,?,?,?,?)',
				(buffer(hash), buffer(blkhash), height, n, size)).lastrowid
	except IntegrityError:
		# XXX check equality?
		return get_tx_id(hash)

I now think we should indeed catch that condition (differing transactions with identical hash), especially given the possibility of TXID collisions. Perhaps I left it out from excessive worry about scan performance. Or just laziness.

The mixture of check-first and try-first styles seen above also doesn't sit well. The possibility of TOCTTOUs,(i) depending on the details of transaction isolation level, would seem to make a strong case for try-first. It's a minor point though; the worst case here would be an uncaught IntegrityError halting the program gracefully.

def insert_output(tx_id, n, addr_id, val):
	try:
		db.execute('INSERT INTO output (tx_id, n, address_id, value) VALUES (?,?,?,?)',
				(tx_id, n, addr_id, val))
	except IntegrityError:
		r = db.execute('SELECT address_id, value FROM output WHERE tx_id=? AND n=?',
				(tx_id, n)).fetchone()
		if r != (addr_id, val):
			raise Conflict('output differs from previous content', tx_id, n, (addr_id, val), r)

def insert_input(tx_id, n, prevout_id):
	try:
		input_id = db.execute('INSERT INTO input (tx_id, n) VALUES (?,?)', (tx_id, n)).lastrowid
	except IntegrityError:
		input_id = db.execute('SELECT input_id FROM input WHERE tx_id=? AND n=?',
				(tx_id, n)).fetchone()[0]
	db.execute('UPDATE output SET spent=? WHERE output_id=?', (input_id, prevout_id))

def get_output_id(tx_id, n):
	r = db.execute('SELECT output_id FROM output WHERE tx_id=? AND n=?', (tx_id, n)).fetchone()
	return None if r is None else r[0]

def get_tag_id(name):
	r = db.execute('SELECT tag_id FROM tag WHERE name=?', (name,)).fetchone()
	return None if r is None else r[0]

def insert_or_get_tag_id(name):
	i = get_tag_id(name)
	if i is not None:
		return i
	return db.execute('INSERT INTO tag (name) VALUES (?)', (name,)).lastrowid

Next up, we'll finally get to implementing the commands themselves. To be continued.

  1. The "time of check to time of use" race condition. You know, like sitting down when some trickster's meanwhile moved the chair. [^]
« Newer PostsOlder Posts »

Powered by MP-WP. Copyright Jacob Welsh.