The missing Adacore public download index, vintage 2018, while it lasts

2020-04-10

The missing Adacore public download index, vintage 2018, while it lasts

Filed under: Data, Historia, Software — Jacob Welsh @ 21:16

I usually browse the web with JavaScript disabled, if present at all: it's bad for your computer and it's bad for your mind. That it's bad for your computer should be clear if you've dug at all into browser security in the past, say, 15 years. Failing that, consider the intended behavior of the thing: to allow any page to consume unlimited processing and network resources even once loaded. It's as if the mailman, upon completion of a delivery, strolls right into your house and helps himself to whatever is in the fridge. As for why it's bad for your mind, one aspect is that it leads you to believe that a page "works" or contains some piece of information when it in fact does not, like a mirage, or the screensavers of the cathode-ray era, falling away as soon as you reach out to touch it.⁽ⁱ⁾

Being thus grounded in reality has its difficulties, such as when those around you mistakenly believe a certain link points to something other than a blank page, even expecting you to have an opinion on it. Worse is when you actually want the promised item. On these occasions you can of course choose to let in the hungry mailman; sometimes he'll even give you helpful instructions on how to do so (as if you'd need them, having barred the door in the first place). But sometimes, a little effort can reveal the secrets of the code and allow you to possess the thing for yourself. Some might call it "reverse engineering", but isn't that a strong term for what's really just reading ? It's reading some text that was foisted on you, albeit not in the manner the foisters would like for it to be read.

So it went the last time I was looking into the Ada programming language, in 2018. GNAT, the foremost public implementation, as I understand, was mostly developed as a frontend for the GNU Compiler Collection at US taxpayer expense by a company presently known as Adacore.⁽ⁱⁱ⁾ But their public download site demanded use of a JavaScript menu tree to filter options by platform and other categories, and offered no readily accessible listing of files or links otherwise.

A survey of the page source didn't directly reveal the hidden list, but turned up a number of scripts by reference. Skipping over some well-known "framework" wads left a promising "script.js", which straightaway pointed to a JSON "feed" containing a sizable collection of file metadata. A search in the code for how it turned this into URLs, and then some custom coding to do the same in a controlled context, was all that remained to produce a usable index. This done, I downloaded a couple GNAT binary releases for x86_64 Linux⁽ⁱⁱⁱ⁾ and added them to my archives.

Sadly, at the time I had neither a blog on which to boast of my exploits nor the kind of social engagement to motivate it. A year and change went by and I rectified that, but I didn't substantially revisit the Ada bootstrapping process until now, and had forgotten all about the indexing work.

Through looking into the publications of the now disbanded Republic and asking around, I found a series of recipes, notes, more notes, and a partial collection of dependencies, but little assurance that I'd obtained all necessary ingredients. In particular, I realized that the initial binary required to bootstrap the process had not been nailed down, but reports indicated the 2016 version was known to work. After the back-and-forth over what pieces someone might have on hand, I decided what I really wanted was access to the full collection.

In the unsurprising heathen manner, Adacore had broken their download links, apparently in the course of a move to Amazon's Cloudfront delivery network. Looking for the new locations, I was again greeted by the non-index page, felt the deja vu, and unearthed my old work. It appeared the "feed" format was unchanged and the new URL format was easily constructed from the old data. More surprisingly, the feed URL itself still pointed to the vanished "mirrors.cdn.adacore.com" hostname; as far as I could see, downloads wouldn't be working even with full JS. Behold the incredible bandwidth savings realized by moving to the Cloud! On the bright side, my existing SHA1 checksums provide some assurance that the historical files have not been diddled since 2018.

The code

Despite the tree-structured JSON format, the feed is essentially tabular. We'll use Python so as to accurately parse the JSON and convert to a more manageable comma-separated format (taking care that the separator characters don't occur in the values themselves).

import json

# http://mirrors.cdn.adacore.com/gpl_feed
rels = json.load(open('gpl_feed.json'))['feed']['releases']

fields = (
	'name',
	'id',
	'size',
	'sha1',
	'type',
	'client',
	'component',
	'date',
	'display_name',
	'display_order',
	'kind',
	'platform',
	'platform_display_name',
	'platform_display_order',
	'release_date',
	'release_name',
	'title',
)

table = [[str(obj[f]) for f in fields] for obj in rels]
table.sort()

for rec in table:
	for val in rec:
		assert ',' not in val and 'n' not in val

print ','.join(fields)
for rec in table:
	print ','.join(rec)

The output includes column headers and can be processed by any number of standard tools. How about a shell one-liner to produce the full URL listing?

awk -F, 'NR>1 { print "https://community.download.adacore.com/v1/" $4 "?filename=" $1 }' | uniq

Note that downloads are not uniquely identified by filename!

The data

The feed as retrieved 2018-08-16.
The CSV conversion.
The URL listing.

And no, I don't intend to go mirroring the whole 21 GB of it, though if you do, I'll gladly link it here. Once I'm more clear on what parts are truly needed, I'll probably host those.

Enjoy!

Terms pertaining to this effect, ranging from technical to marketing, include XHR, AJAX, and Web 2.0. [^]
At some point the code was imported by the main GCC project, but for reasons I haven't yet ascertained, that version was considered broken for the purposes of The Most Serene Republic, so Adacore remained the source of record while efforts were made for the Republic to take over that role. [^]
Namely, one dated 2007, being the oldest available, and 2014, for reasons I forget but perhaps because the next one came with a precipitous size increase. [^]

2 Comments »

AFAIK 2016 was the very last edible Gnat: IIRC all more recent editions are infected with gcc5ism.

Comment by Stanislav Datskovskiy — 2020-04-10 @ 21:45
> I don't intend to go mirroring the whole 21 GB of it, though if you do, I'll gladly link it here

Done; section 2 of this article.

Comment by Stanislav Datskovskiy — 2020-04-11 @ 17:49

RSS feed for comments on this post. TrackBack URL

Fixpoint

2020-04-10