Fixpoint

2021-03-04

#jwrd Logs for Mar 2021

Filed under: #jwrd logs, Logs — Jacob Welsh @ 17:54
Day changed to 2021-03-04
[17:54] cruciform: jfw, I won't be needing the office hours support this week, cheers
[18:27] jfw: alrighty
Day changed to 2021-03-05
[21:45] jfw learns that stripping SATA cables, even of the thicker 26 AWG variety, is kind of a bitch
Day changed to 2021-03-08
[04:02] jfw: Finally dug up how to kill the infernal Apple-aping scrollbars in my gentoo environment: cd /usr/share/gtk-2.0 && mv gtkrc gtkrc.disable
[04:04] jfw: gtk apps now look like windows 95 but that's an improvement over the modernity I'd been suffering with.
Day changed to 2021-03-15
[22:08] jfw: cruciform, will you be terribly disappointed if we skip the Tuesday class this week? Main issue is I haven't managed to do much digging yet on the flaky ethernet question or decide what to do about it, yet there's not much we can do without it so I'm not sure it will be a productive use of the time.
[22:09] jfw: Going afk but will be back within the hour to check.
[22:31] cruciform: jfw, no problem
[23:01] jfw: cruciform: thanks, I'll keep you posted but otherwise let's assume we're back on next week.
[23:01] cruciform: ok
[23:07] jfw: to expand a bit for the log: one of the refurb thinkpads we delivered has a NIC that tested out fine on my desk but has now been found to intermittently wedge, with link status according to port LEDs and ifconfig getting stuck either up (when the cable is disconnected) or down (after it's reconnected) and in both cases not passing traffic.
[23:09] jfw: my present approach is to get more acquainted with the relevant 'e1000e' Linux driver documentation and possibly code and look around for known bugfixes
Day changed to 2021-03-17
[20:52] jfw: sourcerer: grow up & learn to timeout & reconnect on your own already. freenode: you're not even worth grumbling at anymore; you're more of an inanimate object than a 219-LoC logbot.
Day changed to 2021-03-18
[18:18] jfw: cruciform, did you get a chance to look into the workings of the blog much yet?
[19:03] cruciform: jfw, not yet - I've been super busy this week
[19:03] cruciform: on that note, would it terribly inconvenience you to move this tuesday's session - perhps to Thursday?
[19:06] jfw: alright, well the blog isn't going anywhere. Yes I can switch next week to Thursday.
[19:07] cruciform: ok, ;et
[19:07] cruciform: let's do that
[19:07] cruciform: 18@00 UTC Thursday?
[19:08] cruciform: how
[19:08] cruciform: 's the troubleshooting of the laptop NIC issue going?
[19:08] jfw: yes, same time is good. Not much to report there yet, starting looking into driver docs.
[19:08] cruciform: I diagnosed a faulty RAM stick yesterday - was causing all sorts of issues
[19:09] jfw: in one of the thinkpads or this a different machine?
[19:09] cruciform: a thinkpad, yes - though not a JWRD one
[19:10] cruciform: an old T400 that I've been using for my prb-node
[19:10] cruciform: crashed during memtest86
[19:11] cruciform: currently running off of a mere 2GB RAM, with a 500MB swap file usage
[19:11] cruciform: trying to find replacement DDR2 SODIMMs
[19:19] jfw: yes finding ddr2 was a pain last I tried, possible but not cheap is my impression
[19:20] cruciform: yes, and given that the machine runs fine as is (albeit with a bit of swap on the SSD), may not bother
[19:21] cruciform: unless you think that'd be unwise (Samsung 860 Evo 1TB)?
[19:22] jfw: prb as I recall relies heavily on the ability of the 'utxo set' to be well cached; so 2G may not continue to be enough, though possibly with ssd this is moot.
[19:22] jfw: if it were me I'd try to max out the machine now, if I thought I wanted to keep it around; getting ddr2 isn't going to get any easier after all
[19:23] cruciform: good point
[19:24] cruciform: in terms of prb's performance, haven't noticed any problems with 2GB RAM + swap
Day changed to 2021-03-23
[18:35] jfw: cruciform: I've read what's supposed to pass for documentation on e1000e, the linux driver for "Intel Pro/1000" PCI-Express NICs such as the "82567LM" found in the Thinkpad X200's ICH9 chipset. There's a number of references to documents long since disappeared from intel.com, and some undefined symbols or abbreviations besides, so no claim to actual understanding is made or could possibly be made.
[18:35] jfw: There are some driver settings that could be tweaked for the sake of experiment, though they don't sound very likely to help to me, and for the most part the defaults are already conservative. ...
[18:38] jfw: For another approach, there are a number of patches to this driver code in the 4.9.x kernel branch subsequent to the 4.9.95 release we've been using, some of which do look conceivably relevant eg "Fix link check race condition". ...
[18:43] jfw: But clearly, these approaches would be new ground being explored from our perspective and not any kind of known solution that we have available to recommend. Thus it bears pointing out the extent of the obligations involved as I understand them:
[18:54] jfw: The machine was delivered as advertised, with working NIC; certainly it's unfortunate that it's been found in the field to exhibit intermittent failure and I'll be tightening the testing regimen to include multiple samples over some time span rather than just once; nonetheless it's understood that this is old and used hardware, not fundamentally of our design, and sold as-is apart from those
[18:54] jfw: aspects we explicitly check, add and remove. This of course is complicated by the situation that we are contracted to complete delivery of the training; however, known and working hardware is required to do this, whether sourced from us or however else; in other words, I don't find that we are required either to replace the laptop or to continue the training until the network problem is resolved,
[18:54] jfw: just as if the laptop had got lost or damaged in your possession. If you disagree, you're welcome to present the case here (within a reasonable time frame). Otherwise, the options I can offer you are:
[19:00] jfw: 1) we declare the previously "secure" machine to now be "online" and vice versa, since the secure machine doesn't need a NIC and perhaps the other one is working better. As far as hygiene, "online" has had relatively low exposure to the outside since its firmware flashing and OS install; that exposure isn't much additional risk in any case since those installation images derive from online
[19:00] jfw: systems, i.e. they aren't some fully independent line of Unixes developed in vitro or something; the airgap disclipline is more that the secure machine branches off from that history prior to being used for sensitive information and never subsequently interacts directly with the outside (at least prior to destruction of its storage devices).
[19:03] jfw: 2) you could order a new "online" machine, which would take some time as before; we can probably offer a discount for return of the old one since indeed I'd rather get it back in the lab and dig a bit more on the NIC failure.
[19:06] jfw: 3) you could essentially work with us as "remote lab hands" for driver-level experimentation on the NIC i.e. the knobs and patches aforementioned, understanding that this will take some back-and-forth and isn't guaranteed to fix it; indeed I'd have to say a hardware level failure seems the most likely explanation to me at this point, e.g. cracked solder joints from a decade of thermal
[19:07] jfw: expansion/contraction cycles manifesting as temperature-dependent failure.
[19:16] jfw: or 4) you could help test out a new hardware line for us, replacing the online thinkpad with an apu1 based system. This looks to be where we're headed for the medium term; we now have the supply chain verified, inventory in hand, and proven method for connecting SSD and TRNG such that it's a very promising "TRB node in a box" - a compact, silent and low-power box. This however is a "headless"
[19:16] jfw: machine, requiring an external serial console to configure, and no onboard battery, so less versatile than a laptop; and the performance of TRB on such a system is not yet known though in general the thing seems to be about half the speed of the Thinkpads if I recall.
[19:22] jfw: Personally I'd suggest option 1 as it's by far the simplest and least disruptive. It will involve swapping RAM and HDDs between the two machines, and preferably disconnecting & reconnecting the front-right daughterboard (usb, audio, modem, sdcard) from new-secure and new-online machines respectively. You'll need a pretty standard jeweler's screwdriver.
[19:23] jfw: philips head #0 to be specific is known to work.
[19:24] jfw: Let me know if you want to go that route and we can take Thursday to work on it.
[19:29] jfw: The full Thinkpad-replacement idea mentioned earlier involves apu1 as the system board but with mini-PCIe graphics card, some sort of chargeable battery and LCD panel, with a custom enclosure and external USB keyboard.
[19:29] sourcerer: 2021-02-16 04:42:36 (#jwrd) jfw: I'll also leave the teaser that we've begun looking into how to replace our thinkpad offering with all-new parts that aren't such a dwindling resource (not to mention hassle to source).
[19:34] jfw: Better integration can come in time, but in order to get there we need to start with something that we can sell that doesn't require a whole week of my time just to squeeze out two units one of which ends up failing in the field anyway.
Day changed to 2021-03-24
[17:19] cruciform: jfw, thanks for the update - I'll get back you this eve
Day changed to 2021-03-25
[02:11] cruciform: jfw, let's go with option 1
[02:12] cruciform: (though option 4 is intriguing!)
[17:04] jfw: cruciform: ok, see you in ~1h
[17:09] cruciform: jfw, see you soon
Day changed to 2021-03-29
[19:28] jfw: whaack / Guest66709 or anyone else: interested in a beginner level, self-contained C socket programming task?
Day changed to 2021-03-30
[16:27] cruciform_alt: jfw, I'm running rather behind schedule, and haven't got round to installing the router. May we reschedule for Thursday, at 18:00 UTC? Sorry for the late notice
[16:31] jfw cackles madly (not just because of you though, also this code I was skimming)
[16:31] jfw: yes I can do Thursday.
[16:33] jfw: "#if 0 /* wtf are 680 leds ... *//* <-- WTF is this comment? */" onto which I'd add, "wtf is this whole tarball?"
[16:33] cruciform_alt: heh - in other garbled messes, I'm still unpacking, having moved again this morning
[16:36] jfw: bills of lading all in order?
[16:37] jfw: I'd say something about "at least you're getting practiced at moving efficiently" but... can't say as I have, it's a pain every time
[16:40] cruciform_alt: heh - certainly heavily laden; no documentation, though
[16:41] cruciform_alt: as for efficiency; I've invested in plenty of labels and containers to untangle everything - but ye, moving SUCKS
[16:44] jfw: I suppose the labels & lists get used once there's too much to get away *without* using them, as they take extra time up front, so it doesn't exactly save time compared to a prior situation of less stuff
[16:47] jfw: perhaps the lesser organization is even an improvement for you though
[16:47] sourcerer: 2021-02-05 19:24:56 (#jwrd) cruciform: jfw, I have the opposite problem of neurotically over-cataloging/labelling - as recently, with moving shit into storage (the moving taking 10% the time of labelling)

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.