Fixpoint

2022-12-02

#jwrd Logs for Dec 2022

Filed under: #jwrd logs, Logs — Jacob Welsh @ 04:41
Day changed to 2022-12-02
[04:41] jfw: for the log: this week an old problem bubbled back up for us, where some of our email correspondents' domains fail to resolve on our outbound email server, despite resolving fine using the same djbdns client from other points in the network. cloudflare and akamai came up so far among the failing servers from a preliminary analysis of the resolver logs.
[04:42] jfw: (them and only them)
[04:45] jfw: not wanting to dive deep into dns protocol debugging for someone else so many steps removed, we added a qmail smtproutes file for the problem domains, mapping them statically to IP addresses. but they change from time to time (naturally, what with all that tremendous stability of the cloud) and we don't really have a way of noticing besides "emails not going through"
[04:49] jfw: so I've been putting some more thought & coding into our own monitoring system, into which will plug a dns resolver agent which can run wherever it happens to work, periodically reporting the lookup results for its whole configured list, to a central server which can check for changes as well as unexpected silence from the agent, and report it all in one place.
[04:52] jfw: this obviously supports the mail stack we're putting together for clients.
[04:56] jfw: the more general monitoring system works similarly - a scheduled job on each client queries whatever local data sources, collects them into a report and transmits that to the server.
[05:01] jfw: this is the style that in Nagios is called "passive checks" and in my experience didn't work that well there; but it makes much more sense to me than the "active" approach where the queries are initiated by the server (making it actually... the client??) in one big uber-schedule; just to name one annoyance there, every probe on a given server will light up in red and start sending alerts just
[05:01] jfw: because the network link is down, rather than just the ping probe.
[05:03] jfw: some such 'active' or at any rate remote-initiated checks still probably make sense eg simply checking that expected network ports are responding.
[05:05] jfw: that fits fine in my framework though, it'd just be another kind of agent, which perhaps runs on the report server (but doesn't have to) and simply tests those network connections instead of 'df' etc.
[05:10] jfw: so far it's shaping up to be mostly shell scripts; my first prototype from last year (whose main purpose was detecting changes to the public IP of a remote dhcp-encumbered network, because comcast isn't capable of being an ISP) had some php for the report receiver, but that's getting replaced with plain tcp based on my positive experience with the pastebin service.
[05:15] jfw: it used a cleartext shared secret to authenticate the agent (client), but now I'm doing GPG signing & encryption of the reports. the identity of the sender then determines how exactly the report is interpreted.
Day changed to 2022-12-04
[20:52] jfw: dorion: I've been debating how to process & present the dns change data; the two options that came up are either comparing the current state with the previous, making for an 'edge triggered' configuration which logs the event each time something changes; or else comparing with the currently-in-production data (or a copy thereof, if the servers are different), making it 'level triggered' ie
[20:52] jfw: highlights whether something is currently mismatched but not the changes.
[20:56] jfw: the first seems most useful for capturing the history, while the second for having a single place to check whether there's a mismatch (red light / green light). for the first, the interface could start as just a directory index of event files ordered by mtime, with the browser link coloring serving as 'unread' status.
[21:00] jfw: we may end up wanting both, and that's certainly doable too; but I'm thinking the first is the place to start because it captures the most data for the record.
[21:09] jfw: rather than dropping files in a web directory it could also simply dispatch emails. the problem with just doing that from the client was "how then do we find out when the client's emails are failing to go through", but now that it's distributed, we can just put the report receiver on the email server itself, eliminating network issues.
[21:11] jfw: the "haven't heard from agent" checking will be another step, because obviously it can't be triggered by report receipt like the rest.
Day changed to 2022-12-05
[15:01] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Dec-2022/#5612 -- ok, makes sense, sounds good.
[15:01] sourcerer: 2022-12-04 21:00:02 (#jwrd) jfw: we may end up wanting both, and that's certainly doable too; but I'm thinking the first is the place to start because it captures the most data for the record.
[21:37] jfw: taking at least one small step further on the branch of investigating why the dns queries fail in the first place, I found that flushing (removing) the iptables firewall rules "cured" it on the affected server (and yes, other domains still resolve fine with the rules in place).
[21:56] jfw: tcpdumping it, definitely seeing large responses, near but not over the 512b protocol limit; one thing the seemingly dropped/rejected responses have in common is the TC - message truncated - flag being set (shown as a '|' flag in the output)
[22:19] jfw: "With ENDS0, DNS clients (resolvers) can advertise their UDP buffer to the authoritative servers, which would use that value as an upper limit when sending responses. If a response was larger than the EDNS0 buffer advertised by the client, then the authoritative server would truncate it and mark it **TC bit**, so the resolver would use that signal to request the query again, using DNS/TCP." -
[22:19] jfw: needless to say I don't intend to use any "enhanced extended upgraded dns" nor dns/tcp
[22:24] jfw: on firwall side, I narrowed it down to flushing just the *output* rules (which include a clear "-j ACCEPT -m state --state NEW -p udp --dport domain", and now tried adding RELATED to the ESTABLISHED for good measure, to no effect)
[22:27] jfw: the times that it does work, it seems to be because of trying a different nameserver which gives it a digestible response straightaway. what the firewall has to do with which server it queries, I don't know
[22:29] jfw: a bad response in tcpdump output looks like "23296-| 0/5/17 (498)". 498 is the message length, 0 is the answers, 5/17 are the extraneous records it's dumping to fill the space instead of answering the question
[22:29] jfw: a good one: "19480*- 3/0/0 chan.ns.cloudflare.com. A 108.162.192.82, chan.ns.cloudflare.com. A 173.245.58.82, chan.ns.cloudflare.com. A 172.64.32.82 (88)"
[22:30] jfw: the 'bad' nameservers seem to be 192.* and the 'good', 162.*
[22:35] jfw: eh, that part at least makes sense, the 192.* are root servers and 162.* are cloudflare's own, so the problem is in whether the referral gets through, and most likely my filter is too restrictive to see what's failing there
[22:45] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Dec-2022/#5620 - ha, intentions or not I *am* using dns over tcp, or rather djbdns is when directed to by the fatass responses.
[22:45] sourcerer: 2022-12-05 22:19:36 (#jwrd) jfw: needless to say I don't intend to use any "enhanced extended upgraded dns" nor dns/tcp
[22:46] jfw: and that I learn this is the beauty of a default-deny firewall
[22:49] jfw: dorion: so I'm still pretty happy with the original "nameserver is broken" diagnosis, and the change monitoring still looks quite useful, but now we know specifically what's required for the agent to work.
[22:53] jfw: otherwise, I added a log rule for the outbound blocked traffic, which indeed captures the failed tcp attempts; tons of them in fact, so I've also excluded that from the log, as it's for finding unknown stuff.
[22:58] jfw: "If you're in one of the following situations, you need to configure your DNS server to answer TCP queries: You want to publish record sets larger than 512 bytes. (This is almost always a mistake.)"
[22:58] jfw: ^^^^
[23:04] jfw: but see, cloudflare neeeeeds five nameserver names, each of which with two IPv4s and two IPv6s. because they're now the designated AOL where 'everyone' on the 'independent' 'web' now lives.
Day changed to 2022-12-12
[00:17] jfw: implementing the dns change detection made for a rare occasion to use the 'comm' program
[06:23] jfw: and with a bit less midnight oil left in the tank: http://fixpoint.welshcomputing.com/2022/jwrd-rng-working-spec/
[14:56] dorion: jfw, niiice !
Day changed to 2022-12-14
[04:18] jfw: found an ugly pitfall in python's distutils/setuptools: 'python setup.py install' won't actually install a new module (.py file) if the target file in /usr/local/lib or wherever has newer mtime than the one you're explicitly telling it to install.
[04:19] jfw: ie they took one fundamental shortcoming of the 'make' approach and made it worse than any makefile in practice ever did.
[04:25] jfw: to expand: you may sometimes conclude things from file timestamps within a bounded frame of reference such as a project's working tree that's only been manipulated in certain 'usual' ways; you may not make such conclusions when installing things i.e. necessarily transplanting them across such a boundary and into a new environment.
[04:29] jfw: in this view it's not merely a pitfall but indeed a bug; and it's now quite impossible to fix because the whole point of the setup.py thing was to reduce proliferation of installation code by having the system already know how to do it right. existing systems won't have the fix.
[04:49] jfw: at least I see there's an "install -f" that gets around it; still messed up that it's not the default, and given that it's not the default, that it doesn't show any positive hint as to the work it's not doing.
[05:00] jfw: dorion: latest yrc is deployed on the server; restart at your convenience.
[05:02] jfw: and for the log: I did a bunch of work on yrc last week, basically resolving all the most critical shortcomings of the poor too-long-neglected thing. exciting things coming soon, after this round of live testing!
[05:10] jfw: hm, already seeing one detail I don't like about the input history: if the input prompt gets horizontally scrolled due to a long line (pretty common), the scroll position isn't reset when continuing up in the history to shorter lines. It does always place the cursor at end-of-line, ensuring that's always on screen, but looks like that's not quite enough.
[05:13] jfw: already loving the kill-back-word because I often don't notice typos until my fingers already reached the end of the word, so I was stuck tap-tap-tapping backspace, monkey style.
[05:22] jfw: I'm not entirely happy about the existence of horizontal scrolling for input in the first place; it's like that because the implementation was simpler, little more.
[05:23] jfw: 'bash' for instance does multi-line input, though sometimes trips over itself.
[20:03] jfw: http://fixpoint.welshcomputing.com/2022/re-the-panama-gambit-a-new-war-in-serbia/
[20:22] jfw: (updated to add some internal links; one of them just won't send the pingback no matter how many times I mash the button)
Day changed to 2022-12-15
[17:38] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Dec-2022/#5642 << found a better way: it's possible to set -f/force or any other options as default through the 'options' keyword argument to setup(), in setup.py. slim thanks to the docs, which list it in passing among the supported setup arguments but give no hint as to usage.
[17:38] sourcerer: 2022-12-14 04:49:04 (#jwrd) jfw: at least I see there's an "install -f" that gets around it; still messed up that it's not the default, and given that it's not the default, that it doesn't show any positive hint as to the work it's not doing.
[17:53] jfw: dorion: yrc redeployed on server with the easy fix for http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Dec-2022/#5645 .
[17:53] sourcerer: 2022-12-14 05:10:42 (#jwrd) jfw: hm, already seeing one detail I don't like about the input history: if the input prompt gets horizontally scrolled due to a long line (pretty common), the scroll position isn't reset when continuing up in the history to shorter lines. It does always place the cursor at end-of-line, ensuring that's always on screen, but looks like that's not quite enough.
[23:38] jfw: found & fixed another annoyance: digits not counting as "word" for the purpose of finding word boundaries. not redeploying quite yet, in case more comes up.
Day changed to 2022-12-18
[03:58] jfw: http://fixpoint.welshcomputing.com/2022/gpg-was-already-broken-on-centos-6/
[04:05] jfw: and in wordpress/mp-wp weird, somehow I can't have paragraph breaks in footnotes now.
[04:06] jfw: (it shows in the identifier tooltip only)
Day changed to 2022-12-20
[04:33] dorion: welcome back jwm !
Day changed to 2022-12-21
[17:50] jfw: I surmise he got a burst of energy to dust off the linux & bitcoin stuff for one last try before I leave
[17:55] jfw: my small order of breadboards arrived (Elegoo, chinese); sadly the tidy Labrador hookup doesn't work because its pin positioning expects the power rail holes to be staggered halfway between the columnar holes, like the letter alignment between two rows on a keyboard, whereas this breadboard has them all aligned in
[17:55] jfw: a straight grid.
[18:03] jfw: but, rotating it 180 degrees works decently, with only the power supply and 'not connected' pins in the board. holds it in place securely enough, with the row of 10 output & signal gen pins hanging off the top. not a big loss since they can easily be jumpered in from the top if needed.
[18:04] jfw: clever lil board.
[19:30] grethel: testing uno, dos, tres.
[19:30] jfw: hola grethel, que tal?
[19:31] jfw: has encontrado el canal!
[19:34] grethel: Hola, estoy bien y usted ?
[19:37] grethel: dorion esta ensenandome irc
[19:37] grethel: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Dec-2022/#5665 -- test.
[19:37] sourcerer: 2022-12-21 19:30:27 (#jwrd) grethel: testing uno, dos, tres.
[19:40] jfw: estoy bien tambien, gracias. es sencillo pero util, el logbot
[19:43] grethel: jfw, me voy ahora, hasta luego.
[19:44] jfw: o hasta pronto, parece
[21:16] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Dec-2022/#5660 -- ok.
[21:16] sourcerer: 2022-12-21 17:50:50 (#jwrd) jfw: I surmise he got a burst of energy to dust off the linux & bitcoin stuff for one last try before I leave
[21:20] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Dec-2022/#5661 -- hmm, ok. is there a breadboard that is compatiple w/ the laborador ? the linked picture doesn't have the staggered holes breadboard holes either.
[21:20] sourcerer: 2022-12-21 17:55:59 (#jwrd) jfw: my small order of breadboards arrived (Elegoo, chinese); sadly the tidy Labrador hookup doesn't work because its pin positioning expects the power rail holes to be staggered halfway between the columnar holes, like the letter alignment between two rows on a keyboard, whereas this breadboard has them all aligned in
[21:20] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Dec-2022/#5665 -- welcome.
[21:20] sourcerer: 2022-12-21 19:30:27 (#jwrd) grethel: testing uno, dos, tres.
[21:50] jfw: dorion: no, it does; guess I better illustrate.
[22:00] jfw: dorion: http://fixpoint.welshcomputing.com/wp-content/uploads/2022/12/labrador-breadboard-alignment.jpg - clearer?
[22:04] jfw: I haven't turned up anything yet on whether this is even supposed to be a standard breadboard characteristic; just that the hole spacing is .1" and .3" across the center gap
Day changed to 2022-12-22
[15:46] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Dec-2022/#5682 -- yeah, thanks.
[15:46] sourcerer: 2022-12-21 22:00:56 (#jwrd) jfw: dorion: http://fixpoint.welshcomputing.com/wp-content/uploads/2022/12/labrador-breadboard-alignment.jpg - clearer?
Day changed to 2022-12-23
[23:13] grethel: jfw, could you recommend me a beginner programming book?
Day changed to 2022-12-24
[00:34] jfw: grethel: not in general like that - it's a long time since I was a beginner and the environment has changed - but perhaps I can help you pick one. Are there any you looked at so far? Do you have something in mind that you want to do with it?
[00:47] jfw: grethel: one way to approach it is to figure out what language you want to learn first. There are some core concepts they all have in common, but each has its own details, accidental complexities and other baggage that you will have to spend time learning to work with. So I figure, if you have to invest that time anyway, best to do it with something you're likely to use in practice. However, some
[00:47] jfw: are definitely worse than others.
[01:25] jfw: my general advice there would be to start with a *simple but difficult* language like C and *not* a more popular "safe, beginner friendly" one like Python or Java (I guess JavaScript these days).
[01:31] jfw: for one thing, you will find out quickly if it's not for you; otherwise, it will force you to learn good habits early, acquiring discipline and rigorous thinking that will serve you well in any language and set you apart from the "just poke around until it seems to work" mediocrities.
[19:29] jfw: grethel: to narrow down the choices, after asking around a bit, dorion & I are happy with recommending either Ada, which has a book by John Barnes, or Scheme, which has SICP.
[19:32] jfw: both of these have a tradition of use for teaching languages, and are also used in practice in our own network, for instance Eulora uses Ada and Gales Bitcoin Wallet uses Scheme.
[19:35] jfw: R also came up favorably for data processing.
[19:42] jfw: and in general, stick around and don't be afraid to ask questions or say what's on your mind. there's a lot of support available here for those who want to learn :)
[19:59] jfw: since I see the SICP site changed its URL but provided a handy zip file, I've mirrored it for safe keeping.
Day changed to 2022-12-25
[00:17] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Dec-2022/#5644 - this is still going to happen; it's been a bit of a pain reducing the history from fine-grained commits down to cohesive readability-oriented patches but I've got halfway through by count, possibly more by substance.
[00:17] sourcerer: 2022-12-14 05:02:31 (#jwrd) jfw: and for the log: I did a bunch of work on yrc last week, basically resolving all the most critical shortcomings of the poor too-long-neglected thing. exciting things coming soon, after this round of live testing!
[00:22] jfw: in particular there were some moves of large functions within the file, as I became more aware of what the ordering should be as I went, which diff doesn't handle well especially when combined with other nearby changes, so I tried to reduce the noise by introducing things in the right place the first time, and of course got that slightly wrong on the first try.
Day changed to 2022-12-26
[19:09] dorion: jfw, retrying the ubuntu 14.04 install on grethel's hp laptop, the error upon trying to partition the disks is : "the efi file system creation in partition #1 of SCSI1 (0,0,0) (sda) failed."
[19:10] dorion: for the log, this hp pavillion 15-eg0025nr doesn't have a legacy BIOS option, forces UEFI.
[19:11] dorion: jfw, I think the failure may be related to the samsung nvme disk it has.
[19:12] jfw: dorion: does it have any other logs perhaps on tty1-5?
[19:12] dorion: jfw, gonna take me a couple mins to check.
[19:18] dorion: I have half a mind to see if the chino will trade it for 2 lenovos similar to the one I just picked up.
[19:18] jfw: ahh, nvme, sounds plausible because that won't be an sd* at all
[19:23] dorion: the one he sold me had nvme, in the sata bay, but he had an sata adapter and swaped the nvme for the sata.
[19:24] jfw: "nvme in the sata bay" - I didn't think this was possible, did it perhaps have a mini sata adapter of some sort on it already?
[19:25] jfw: like, I thought the whole point was to attach it directly to PCI, bypassing the bridging to & from SATA protocol.
[19:27] dorion: yeah, he had an adapter.
[19:54] jfw: looking at flight options, I'm thinking of springing for the business class (663 for Copa, 619 for American); the alternative is anyway around $400 at best for 2 checked bags. the extra weight capacity could be helpful for the move and getting ahead of the herd should be helpful for my nerves
[19:56] dorion: sounds good.
[19:56] jfw: dorion, I recall you did a vacuum pack for clothing once; any tips on that?
[19:57] dorion: no tips really, but can confirm it worked fine, pretty self-explanatory.
[19:58] dorion: I can look up which I went with if desired.
[19:58] jfw: please; always better to have something at least known to work than sorting through the torrent of same-looking amazon junk
[20:10] dorion: https://www.amazon.com/20-Compression-Comforters-Blankets-Included/dp/B0973DGD8P
[20:12] jfw: dorion: thanks. included hand pump sounds good just in case the electric vac is out of commission.
[20:14] dorion: exactly.
[20:15] jfw: grethel: regarding the Barnes Ada book if you go for it, note that it's editions corresponding to Ada 2012 to look for; newer is possibly fine but not worth paying extra. Basically, the 2012 revision of the language brought significant and useful changes, while the newer ones brought much more dubious ones.
[20:15] sourcerer: 2022-12-24 19:29:38 (#jwrd) jfw: grethel: to narrow down the choices, after asking around a bit, dorion & I are happy with recommending either Ada, which has a book by John Barnes, or Scheme, which has SICP.
Day changed to 2022-12-27
[19:16] jfw: I played with my new toys a bit more the other day, it being Christmas after all; and I must sadly report that the Labrador, for all its great potential in theory, in practice ain't worth the cost of postage at least to me (nor can I really imagine for whom it would be worth something - noobs who don't know basic electronics?
[19:16] sourcerer: 2022-12-21 17:55:59 (#jwrd) jfw: my small order of breadboards arrived (Elegoo, chinese); sadly the tidy Labrador hookup doesn't work because its pin positioning expects the power rail holes to be staggered halfway between the columnar holes, like the letter alignment between two rows on a keyboard, whereas this breadboard has them all aligned in
[19:16] jfw: they're in even worse position to work around broken tools)
[19:18] jfw: not sure it's even worth delving but for starters I can now confirm firsthand that the 24 open / 170 closed github 'issues', while surely not all legitimate or bugs, are indeed a good estimator of the software's bugginess.
[19:19] jfw: on windows no less, the only platform where it even works at all.
[19:20] jfw: (well I guess I can't speak for mac.)

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.