Fixpoint

2020-01-17

Draft gbw-node frontend, part 3

Filed under: Bitcoin, Software — Jacob Welsh @ 18:02

Continued from:

Base58

Bitcoin addresses are conventionally written in a special-purpose encoding and include a hash truncated to 32 bits for error detection. As the reference implementation explains:

Why base-58 instead of standard base-64 encoding?
- Don't want 0OIl characters that look the same in some fonts and could be used to create visually identical looking account numbers.
- A string with non-alphanumeric characters is not as easily accepted as an account number.
- E-mail usually won't line-break if there's no punctuation to break at.
- Doubleclicking selects the whole number as one word if it's all alphanumeric.

Of course, all these points would have been answered just as well by hexadecimal, and without the various burdens: case-sensitivity for the user (the surest way I've found to read these out is the fully explicit: "one five big-A three little-X ..."); more code for the implementer; and more work for the machine (as the lack of bit alignment demands a general base conversion algorithm).

We start with lookup tables to convert the digits 0-57 to the specified alphabet and back. I was once surprised to learn the scope of iteration variables in a Python "for" loop is not restricted to the loop body: a potential source of referential confusion, reflecting the language's casual approach to mutation. Thus, when at the global scope I like to ensure throwaway names, like "index" and "character" here, are safely contained in a function.

base58_alphabet = (string.digits + string.uppercase + string.lowercase).translate(None, '0OIl')
base58_inverse = [None]*256
def init_base58_inverse():
	for index, character in enumerate(base58_alphabet):
		base58_inverse[ord(character)] = index
init_base58_inverse()

To do base conversion we'll need to treat byte sequences as integers with the same ordering conventions as the reference code. Otherwise put: to decode from base-256 to abstract integers. Python 2 doesn't have a builtin for this. The algorithm is not optimal, but the base-58 part will be worse anyway.

def bytes_to_int(b):
	"Convert big-endian byte sequence to unsigned integer"
	i = 0
	for byte in b:
		i = (i << 8) + ord(byte)
	return i

To complete the bytes-to-ASCII converter we extract digits from the integer, least significant first, by iterated division with remainder by 58. Since the conversion to integer loses track of field width, the convention is to pad with the same number of base-58 zeros as there were base-256 leading zeros in the input. In further fallout from using a non-bit-aligned encoding, these are not naturally constant time or constant control-flow operations.

For the same bit cost of the error detection code we could have had error correction. But that would have required, like, math, and stuff.

def b2a_base58check(data):
	data += sha256d(data)[:4]

	leading_zeros = 0
	for b in data:
		if b != '\x00':
			break
		leading_zeros += 1

	data_num = bytes_to_int(data)

	digits = []
	while data_num:
		data_num, digit = divmod(data_num, 58)
		digits.append(digit)
	digits.extend([0] * leading_zeros)

	return ''.join(base58_alphabet[digit] for digit in reversed(digits))

Converting back to bytes uses the inverse operation at each step, but now there are cases of invalid input to reject: digits outside the specified alphabet and corruption detected by the checksum. (The precise function decomposition is a bit arbitrary and asymmetrical I'll admit.)

class Base58Error(ValueError):
	pass

class BadDigit(Base58Error):
	pass

class BadChecksum(Base58Error):
	pass

def a2b_base58(data):
	digits = [base58_inverse[ord(b)] for b in data]
	if None in digits:
		raise BadDigit

	leading_zeros = 0
	for digit in digits:
		if digit != 0:
			break
		leading_zeros += 1

	data_num = 0
	for digit in digits:
		data_num = 58*data_num + digit

	data_bytes = []
	while data_num:
		data_bytes.append(data_num & 0xFF)
		data_num = data_num >> 8
	data_bytes.extend([0] * leading_zeros)

	return ''.join(chr(b) for b in reversed(data_bytes))

def a2b_base58check(data):
	data = a2b_base58(data)
	payload = data[:-4]
	check = data[-4:]
	if check != sha256d(payload)[:4]:
		raise BadChecksum
	return payload

Finally we apply this encoding to Bitcoin addresses, which have a fixed 160-bit width plus an extra "version" byte that becomes the familiar leading "1".

class BadAddressLength(ValueError):
	pass

class BadAddressVersion(ValueError):
	pass

def parse_address(a):
	b = a2b_base58check(a)
	if len(b) != 21:
		raise BadAddressLength
	if b[0] != '\x00':
		raise BadAddressVersion(ord(b[0]))
	return b[1:]

def format_address(b):
	return b2a_base58check('\x00' + b)

All this format conversion groundwork out of the way, we'll start talking to the database and putting it all together. To be continued!

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.