Fixpoint

2022-11-02

#jwrd Logs for Nov 2022

Filed under: #jwrd logs, Logs — Jacob Welsh @ 04:58
Day changed to 2022-11-02
[04:58] jfw: dorion: latest update
[20:58] jfw: so my original confusion re module vs driver was quite justified by the sloppy usage found in dovecot. lacking a proper definition, they came up with ./configure options named like
[20:58] sourcerer: 2022-10-13 04:56:21 (#jwrd) jfw: ("module" seems to mean the same as "plugin" from what I've seen so far)
[20:58] jfw: --with-foobar=plugin, nevermind that foobar isn't and never was a plugin and what they meant was module.
[20:58] jfw: *module vs plugin
[21:36] jfw: dorion: any druthers on what to do with libauthdb_imap, per that latest comment? that is, I can either remove it permanently or leave it as a working option by adding another static module loading category for it. easy enough either way. it's for delegating password checks to a second IMAP server; not really sure when or why one would want that.
[21:41] jfw: besides that, 5, 7 and 13 (old-stats and ssl stuff), I'm about done with the cleanups & conversions to builtin indicated in the article and comments.
[21:45] jfw: namely: removing plugins fs-compress, mail-crypt, var-expand-crypt, fts-lucene; removing now-unused load points 1, 2 (one of its two), 4, 6, 9, 11; removing the 'plugin mode' to force builtin or absent for authdb_ldap, authdb_lua, mech_gssapi, and the 4 sql drivers; removing /proc/self/io stats.
[21:49] jfw: the ssl ones might not be too bad once I stop avoiding them; but, if we're quite satisfied with leaving the ssl support broken, I could just stub them out too.
Day changed to 2022-11-03
[01:14] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5274 -- my first impulse is to say rip it out and see if it breaks anything, could always fall back to sustaining it to the new static modules structure. though on further consideration, that opens us up to potentially even more time sink. so I say go for the static modules structure from the start.
[01:14] sourcerer: 2022-11-02 21:36:23 (#jwrd) jfw: dorion: any druthers on what to do with libauthdb_imap, per that latest comment? that is, I can either remove it permanently or leave it as a working option by adding another static module loading category for it. easy enough either way. it's for delegating password checks to a second IMAP server; not really sure when or why one would want that.
[01:15] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5275 -- nice !
[01:15] sourcerer: 2022-11-02 21:41:55 (#jwrd) jfw: besides that, 5, 7 and 13 (old-stats and ssl stuff), I'm about done with the cleanups & conversions to builtin indicated in the article and comments.
[01:29] jfw: dorion: sounds good, and I'm doing similar with the ssl stuff; feels a bit silly to have all that overhead just to load a single predetermined module, but that's just the way this beast is shaped. better not to mess too much with stuff I have no real way of testing atm.
[01:31] jfw: we're not aiming here to make it beautiful, just to wash the mud off its feet.
[02:02] dorion: jfw, 10-4.
[03:17] jfw: just lovely... some of the modules' <modname>_init functions seem to use an incorrect function signature, taking 'void' (no arguments) instead of the one 'struct module *' argument. seen so far: the authdb_*, mech_*, and SQL driver_* that I converted to builtin only (removing the offending module init code anyway), and now authdb_imap which I suspect the compiler will complain about now that dlsym
[03:17] jfw: is out and it has a chance to catch the error
[03:34] jfw: well, there's a ways to go still on the build system rewiring so I guess we'll see when we get there.
[16:39] jfw: down to just #7, the plugins to the old-stats process, not to be confused with the old-stats plugin.
[17:37] jfw: and look at that, those two undocumented modules (old_stats_mail and stats_auth) also fall into the wrong-signature camp.
[17:40] jfw: I'm just taking the usual by now approach with them, not scrutinizing their merit but just adding yet another static module category.
Day changed to 2022-11-04
[01:19] jfw: that and a final clipping of leftover dynamic module system tendrils is done. finally on to giving that new makefile & config.h a try.
[02:26] jfw: a first obstacle, a list of 302 settings in the generated config.h to sift through.
[02:26] jfw: for the first sift, perhaps a script to check which are actually referenced in the code.
[02:35] jfw: at least that weeds out 43 of them.
[02:42] jfw: though those 43 aren't necessarily the ones that ought to go, for instance instead of using the perfectly good HAVE_SQLITE, they instead use BUILD_SQLITE which is defined in some more elaborate way but seems to mean exactly the same thing.
[02:43] jfw: I think I'll ignore that "ought" though and go for minimal code changes.
[04:42] jfw: openssl kids pissing in the water supply and the community drinking it up as usual - serves me right for thinking I might give post-2016 software a chance.
[04:44] jfw: from their docs, "ASN1_STRING_data() is similar to ASN1_STRING_get0_data() except the returned value is not constant. This function is deprecated: applications should use ASN1_STRING_get0_data() instead."
[04:46] jfw: which is to say, "we made up some new thing for no particular reason, everyone should now use it so as to force everyone to follow our changes so that we can look relevant"
[04:47] jfw: and I guess that's how we end up with 302 config knobs.
Day changed to 2022-11-05
[03:45] jfw: seeing rampant unchecked heap allocations (BN_new, ECDSA_SIG_new etc) in lib-dcrypt/dcrypt-openssl.c. supposedly the worst impact would be crashing with null pointer deref instead of failing in some more orderly way due to memory exhaustion
[03:46] jfw: (found by way of looking at the usage of some more of these newfangled openssl functions)
[03:54] jfw: coming in through 2019 and 2020 commits to add support for 'JSON Web Signature' and 'JSON Web Key' RFCs
[03:56] jfw: because asn.1 wasn't bad enough on its own, clearly.
[04:20] jfw: hmph, this whole 'lib-dcrypt' thing only dates to 2016.
[04:56] jfw: found a first incorrect autoconf result: it concluded 'mremap' isn't supported, because it's using the glibc private __USE_GNU instead of the documented _GNU_SOURCE macro to enable its visibility.
[18:28] jfw: dorion: I'm being quite tempted to rip out this 'dcrypt' thing altogether, since it keeps coming up in a bad way and far as I can see has no legitimate use
[18:28] sourcerer: 2022-10-22 17:32:13 (#jwrd) jfw: which brings us to the thirteenth: lib-dcrypt/dcrypt.c, which loads libdcrypt_openssl, which is built if BUILD_DCRYPT_OPENSSL. this is another that's linked into ~everything by way of libdovecot, but the things that actually attempt to initialize it are few: mail-crypt and var-expand-crypt (plugins), passdb_oauth2 (builtin), and some doveadm dumping commands.
[18:40] jfw: the JWS/JWK stuff seems only used for oauth2. since I had trouble retracing where I even found the JWS term, the only reference is commit 7dee27819, Sep 2019, "dcrypt: Add signature format / Needed to implement RFC7515"
[18:41] jfw: so... I guess they added this dependency but didn't then actually implement the thing.
[18:47] jfw: from a quick scan for heathen homebrew crypto, they're also carrying imported implementations of md4, md5, sha1, sha2, sha3 and hmac, and nominally their own crc32 (table based) and pkcs5. these aren't under dcrypt though and may be trickier to cut.
[18:48] jfw: I'd probably class dcrypt as a whole as heathen homebrew crypto actually; uses openssl primitives but no telling what dubious things it's doing with them. All elliptic curve based, ofc.
[18:53] jfw: dorion: so I'm open to input re dcrypt; on the costs of not cutting side, there's at least 3 more of these openssl api config switches to plow out
[18:53] sourcerer: 2022-11-04 04:46:52 (#jwrd) jfw: which is to say, "we made up some new thing for no particular reason, everyone should now use it so as to force everyone to follow our changes so that we can look relevant"
[18:56] jfw: vs. 5 down so far from dcrypt and 1 from elsewhere.
[18:58] jfw: I'd say the config.h sanity-checking process although laborious has been valuable so far; I'm about halfway through the file.
[18:58] jfw: I've fixed the mremap code.
[18:59] jfw: (that was a musl compat issue if it was unclear)
[20:01] jfw: 'OAuth 2.0, which stands for "Open Authorization", is a standard designed to allow a website or application to access resources hosted by other web apps on behalf of a user... and is now the de facto industry standard for online authorization.' afaik, the soi-disant 'industry' there means google + facebook
[20:08] jfw: as to what it claims to want to be - a delegated authorization mechanism, I'm picturing something like an accounting SaaS connecting to a bank account with read-only access - I don't see the need when you can simply set up dedicated API keys with the desired privileges and give those to the 3rd party services in place of your main account credentials. and as to what it's actually used for esp.
[20:08] jfw: wrt. dovecot, it seems to be merely a 'login with facebook' button, i.e. 'the industry' becomes trusted gatekeeper and can help themselves to all your users' data, or at least the users within their domain if that minimal level of segregation is maintained.
[20:38] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5313 -- sounds ok to me to rip it out.
[20:38] sourcerer: 2022-11-05 18:53:25 (#jwrd) jfw: dorion: so I'm open to input re dcrypt; on the costs of not cutting side, there's at least 3 more of these openssl api config switches to plow out
[20:38] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5316 -- nice.
[20:38] sourcerer: 2022-11-05 18:58:04 (#jwrd) jfw: I'd say the config.h sanity-checking process although laborious has been valuable so far; I'm about halfway through the file.
[20:41] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5319 -- exactly.
[20:41] sourcerer: 2022-11-05 20:01:28 (#jwrd) jfw: 'OAuth 2.0, which stands for "Open Authorization", is a standard designed to allow a website or application to access resources hosted by other web apps on behalf of a user... and is now the de facto industry standard for online authorization.' afaik, the soi-disant 'industry' there means google + facebook
[20:43] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5317 -- nice, sounds good.
[20:43] sourcerer: 2022-11-05 18:58:42 (#jwrd) jfw: I've fixed the mremap code.
[23:12] jfw: dcrypt + oauth2 pruning seems to have gone quite smoothly (the only doubt being that I don't have a build to test yet); they were pretty well modularized, no tangle of deep roots and tendrils, perhaps since they're such recent additions anyway.
[23:24] dorion: cool.
Day changed to 2022-11-06
[17:30] jfw: what that looks like: 1, 2
[17:59] jfw: a confused configure switch description; what they meant was that *their code* will define socklen_t to int if you don't have it, as determined by *not* defining HAVE_SOCKLEN_T in config.h.
[18:02] jfw: although what it actually says would be simpler, simply #defining socklen_t to int in config.h where necessary.
[18:21] jfw: a very smelly bit, preventing a single config.h from working on different architectures. not sure if they could have done better, might just be one of those toxic waste products of the C world.
[18:50] jfw: a dubious check addition - "fix the build" by simply dropping the attempted initialization step; does that mean the initialization wasn't necessary? then why keep it at all?
[18:51] jfw: in this case, the function exists in libressl in gales, so I'm just dropping the check.
[18:52] jfw: unfortunately there's still a bunch more of these on the older & more basic "ssl iostream" code as opposed to the snipped "dcrypt".
[19:00] jfw: next I learn that openssl has undocumented public APIs despite their otherwise hefty man page collection: SSL_get_current_compression, SSL_COMP_get_name etc.
[19:01] jfw: there's a reason though; from the openssl 1.0.2 docs, "Once the identities of the compression methods for the TLS protocol have been standardized, the compression API will most likely be changed. Using it in the current state is not recommended." meanwhile in libressl, "These functions are deprecated and have no effect. They are provided purely for compatibility with legacy application code."
[19:03] jfw: possibly 'the industry' gave up pushing the integrated compression thing after the 'CRIME' attack disclosure.
[19:10] jfw: which was like 2012, meaning dovecot didn't get the memo after 10 years.
[20:16] jfw: it's looking like TLS 1.3 support is set to become an increasing obstacle to heathen interfacing - if it isn't there already, eg my link archiver running on centos with stock python+openssl already fails to connect to some servers.
[20:19] jfw: my hunch is it'll be more of a problem on the client side like in that example ("surely everyone runs the Latest Browser!") than on the server side (where it's more likely to hurt the client developers who drop compatibility with existing service providers)
[20:20] jfw: at least it looks like we'll have the option of upgrading libressl in gales, though I wouldn't do that lightly.
[20:29] jfw: if it's true that it's only a client side problem, then it might be preferable to just do the bridging on toilet boxes
[20:31] jfw: in the larger view, ssl isn't even the only game in town anymore, eg telegram cooked up their own protocol, perhaps whatsapp likewise
[20:37] jfw: plus, the 'upgrade libressl' path comes with no guarantee that it won't need to be done again, and again
[20:42] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5332 -- "34 files changed, 14304 deletions(-)" -- nice !
[20:42] sourcerer: 2022-11-06 17:30:08 (#jwrd) jfw: what that looks like: 1, 2
[20:43] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5335 -- myeah
[20:43] sourcerer: 2022-11-06 18:21:23 (#jwrd) jfw: a very smelly bit, preventing a single config.h from working on different architectures. not sure if they could have done better, might just be one of those toxic waste products of the C world.
[20:44] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5336 -- only thought criminals ask such questions, jfw.
[20:44] sourcerer: 2022-11-06 18:50:39 (#jwrd) jfw: a dubious check addition - "fix the build" by simply dropping the attempted initialization step; does that mean the initialization wasn't necessary? then why keep it at all?
[20:44] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5337 -- makes sense.
[20:44] sourcerer: 2022-11-06 18:51:33 (#jwrd) jfw: in this case, the function exists in libressl in gales, so I'm just dropping the check.
[21:43] jfw: I'd think http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5346 is the greater thought crime, as seen from both the "omg why wouldn't you upgrade?" and the "omg why would you talk to heathens?" camps
[21:43] sourcerer: 2022-11-06 20:29:35 (#jwrd) jfw: if it's true that it's only a client side problem, then it might be preferable to just do the bridging on toilet boxes
Day changed to 2022-11-08
[19:48] jfw: possibly the justification for unchecked allocations was that they replace the ssl/crypto memory allocation routines with ones that explicitly abort on failure
[19:48] sourcerer: 2022-11-05 03:45:55 (#jwrd) jfw: seeing rampant unchecked heap allocations (BN_new, ECDSA_SIG_new etc) in lib-dcrypt/dcrypt-openssl.c. supposedly the worst impact would be crashing with null pointer deref instead of failing in some more orderly way due to memory exhaustion
[19:51] jfw: this however won't work on libressl, on which a twitter thread was the whole of the 'discussion' that turned up.
[19:55] jfw: while on the openssl side, they quietly changed the function signatures at some point, resulting in this HAVE_SSL_NEW_MEM_FUNCS mess. on the bright side, I can just drop all that code.
[21:16] jfw: possibly the first case of "woah, I actually have no idea what to do about this" - interaction of mmap() and write()
[21:18] jfw: specifically, it's testing whether write() calls to a file are immediately visible through a memory mapping of the same file, and at least on linux it concluded that they are.
[21:19] jfw: on first glance I can't tell from the docs whether this can be relied on, and evidently they couldn't either; but it's also unclear that the test is reliable.
[21:22] jfw: if I set it to the conservative option ("shared mmaps don't get updated by write()s" as they put it) then it effectively disables at least some uses of mmap
[21:26] jfw: posix says "The application must ensure correct synchronization when using mmap() in conjunction with any other file access method"
[21:28] jfw: linux says of MAP_SHARED, "Updates to the mapping are visible to other processes mapping the same region, and (in the case of file-backed mappings) are carried through to the underlying file." but this doesn't treat the reverse direction.
[22:10] jfw: the history indicates that the check was added (and modified in various ways over time) by necessity for OpenBSD, versions up to 3.5 at least.
[22:14] jfw: there's also an "mmap_disable" setting which does about the same thing, overridden by the detected MMAP_CONFLICTS_WRITE. Indexes on NFS also come up as a case where it's needed.
[22:17] jfw: so I figure we're safe enough to leave MMAP_CONFLICTS_WRITE off (which due to its negative definition means leaving mmap enabled).
[22:18] jfw: but adding the openbsd context to the config.h comments.
Day changed to 2022-11-09
[00:20] jfw: I'm about 90% through config.h now, mostly out of the SSL thickets and into the integer type swamp
[00:20] sourcerer: 2022-11-06 18:21:23 (#jwrd) jfw: a very smelly bit, preventing a single config.h from working on different architectures. not sure if they could have done better, might just be one of those toxic waste products of the C world.
[00:24] jfw: the planned approach is to take advantage of musl and gcc specifics so that one static config.h will work at least on any platform we port Gales to.
[02:33] jfw: another oddity, AC_DEFINE(STATIC_CHECKER,, [Building with static code analyzer]). at first I thought it was the "do it right only when the boss is looking" thing, but then I find it *reducing* the amount of static checking being done when STATICK_CHECKER is enabled.
[20:56] jfw: so triggered. adding new y2ks because we totally didn't have enough of those already.
[21:06] jfw: and the usage looks just as FUBAR - tries to handle negative timestamps yet uses -1 as some internal error indicator
[21:08] jfw: unfortunately Gales musl doesn't allow us to just assume 64-bit time_t, though I'm seeing that upstream recently bit the bullet on making that happen.
Day changed to 2022-11-10
[18:38] jfw: then I'd say this branch of it reveals a pronounced bathophobia. for background, 'gmtime' is a standard libc routine for converting a time_t (linear unix timestamp) to broken-down (year-month-day...) representation. somehow, the inverse operation didn't make it into the standards but is supported at least on linux & bsd, as
[18:38] jfw: 'timegm'. so what's the application to do if that's not supported? well I dunno, maybe take a look at any old libc implementation and copy the absolutely trivial conversion formula, then use that across the board and forget about the useless timegm. but no, they must not Reinvent the Wheel! and must leverage the (defective) system library to the max! so they brute-force the inversion by doing a
[18:38] jfw: binary search on the domain of the forward function. and *that* part is hand-coded, even, no library bsearch routine.
[18:43] jfw: now, binary search happens to be easy to fuck up with off-by-one errors and the like; add in the "maybe signed, maybe not" and magic MAX_BITS and who knows what's going on.
[18:44] jfw: and we know how their testing coverage is for code branches they "aren't [currently] using".
[18:46] jfw: but behold, even on the other branch they don't trust timegm either.
[18:51] jfw: worse actually - they want to detect invalid broken-down times as an error, and seem to rely on undocumented behavior of timegm to do this.
[19:02] jfw: ...though that may be just a failing of the docs (linux man-pages), as the glibc manual says "timegm is functionally identical to mktime except..." which provides the missing link to the normalization behavior dovecot's using. and the musl implementation does the same.
[19:21] jfw: bottom line, I'm cutting out the binary search code and requiring timegm, but still don't have a grip on TIME_T_MAX_BITS, of which there's still more murky usage.
Day changed to 2022-11-11
[00:58] jfw: an interesting and aggravating find coming out of the timekeeping mess
[01:07] jfw: the dovecot corner of the timekeeping mess, that is, because computer timekeeping is universally, absolutely and categorically a mess from what I've seen.
[01:09] jfw: but dovecot seems to be at fault for relying on it for more things than strictly necessary.
[01:59] jfw: a bit of good news is I was able to excise the usage of TIME_T_MAX_BITS from that code by dropping only the detection of *forward* time jumps, which don't have any of the problems cited on the wiki page, while the detection was unreliable and they weren't doing anything with it anyway besides logging.
[02:02] jfw: that leaves just imap_mktime and some tests to sort out.
[02:02] sourcerer: 2022-11-09 21:06:41 (#jwrd) jfw: and the usage looks just as FUBAR - tries to handle negative timestamps yet uses -1 as some internal error indicator
[16:52] jfw: and the more I look at its error behavior, the less sense it makes
[17:00] jfw: it's called only by imap_parse_date and imap_parse_datetime ; those will return false if the string doesn't fit the expected format, but return true and blithely pass through the in-band overflow if imap_mktime fails (such as for overflow or simply invalid month/day numbers etc). so I guess the search spills out to all their callers to see wtf happens in that case
[18:39] jfw: so here's one result. the IMAP APPEND command is used to add a new client-supplied message to a given mailbox. (why? perhaps for saving a draft, or the local copy of a sent message; there's a separate COPY command which seems to be the way to move existing messages to another mailbox.) It allows the client to specify an "internal date" for the new message, which basically amounts to the file
[18:39] jfw: timestamp; it's interpreted as the date/time the message was received. Now because IMAP was designed by complete idiots, this internal date is transmitted in a broken-down representation, and so must be parsed, and might be numerically invalid even if structurally valid (punctuation in the right places and such). based on the demented error handling of imap_mktime and friends, the exceptional
[18:39] jfw: cases are as follows:
[18:45] jfw: if the internal date overflows a 32-bit signed time_t *or* is invalid with the year field under 100, it's set to INT_MIN which comes out to year 1901 (the mirror image of y2038).
[18:45] jfw: *underflows
[18:50] jfw: if it overflows same *or* is invalid with year over 100, it's set to some large number (derived from TIME_T_MAX_BITS in different possible ways)
[18:53] jfw: it then receives the same treatment as a valid date in the future, which is to discard it and just use the current system time as if no internal date had been supplied to the APPEND.
[18:54] jfw: because reasons, and perhaps also other reasons.
[18:59] jfw: if the date is set to one second before the epoch, it's the magic forbidden value and is again treated as if no time was given. finally, if there's a y2038 overflow going on, bets are off.
[19:02] jfw: (I suppose everything would be considered "in the future" since system time would wrap to 1901.)
[19:04] jfw: an error is returned to the client only in the case of *structurally* invalid date.
[19:15] jfw: so at least for this instance, I could drop the "unpredictable large number" overflow value in imap_mktime and instead just leave it at -1.
[19:18] jfw: but really if -1 is disallowed then so should be all negative values.
Day changed to 2022-11-12
[01:42] jfw: I think I'm going to do exactly that (changing both underflows and overflows to -1, signalling an error), in a "not sure exactly what effects this will have but can't be any worse off" sort of move.
[01:46] jfw: besides the APPEND case there are 4 other uses of imap_parse_date*, 1 of which is a thin wrapper yielding a further 2 uses. the value always gets stored somewhere for later consumption which is why the current behavior is still not apparent, but at least in the nearsighted view, in none of the cases is the "clipping" style overflow detected, while some of them detect -1 and handle it as an error.
[01:47] jfw: well possibly only one of them even does that. but that's easily remedied.
[01:49] jfw: I've had absolutely enough of this bit of shit-shovelling and am quite ready to point a firehose at the stable and let the runoff flow where it may.
[02:25] jfw: "changing both underflows and overflows to -1" - I meant, changing *all* negative values as well as positive overflows to -1.
[02:57] jfw: done, and tightened up some of the callers to notice the error. then removed the last uses of TIME_T_MAX_BITS from test case definitions, with the result that there should be some failures on 32-bit signed time_t (y2038 broken) systems, which seems to me quite as it should be.
[03:09] jfw: how's that for a pareto distribution: I get through the first half of the pre-sifted config.h in a day and change; the second half takes a week, with a single one of the 259 switches - around the "95% complete" mark - taking a full three of those days.
[21:28] jfw: on wrapping up a loose end on the time_t stuff, I'm finding some places where it actually forces 32-bit timestamps eg struct mbox_index_header and struct mbox_list_index_record, Director's struct user
[21:31] jfw: and we know they had occasion to notice the mismatch at the very least because there are casts in order to compare
[21:36] jfw: and I'd probably be more than willing to drop mbox support as a whole, but dunno where else the pathology might manifest.
[21:39] jfw: myeah, not just mbox.
Day changed to 2022-11-13
[04:24] jfw: with config.h dusted, conversion of the recursive nest of automake files to a simple makefile is underway, with a first milestone reached of building libdovecot.a from 341 source files.
[04:26] jfw: the step of "auto-download UnicodeData.txt so we can pretend it's not part of our codebase" will NOT be replicated in the new build system.
[19:27] jfw: and 76 test programs for the various library components now link.
[20:19] jfw: getting into the real payloads (bin & libexec programs) it's looking to get more annoying, due to the various micro-libraries besides libdovecot that the objects are grouped into for linking into just a couple programs each.
[20:21] jfw: that whole structure could be reproduced in the makefile if needed but it might be simpler to just say everything that's shared at all goes into libdovecot.
[20:32] jfw: assuming no symbol naming conflicts, the results should be identical; and the codebase seems pretty good about prefixing all the externally visible symbols to prevent conflicts
[20:39] jfw: a downside is it's reducing the information readily apparent from the build system about what links into what; it'll be "anything could link into anything, grep around or check the linker map to see for sure"
[20:40] jfw: but this opacity already exists for the bulk of libdovecot
[20:40] jfw: and it seems pretty shallow really
Day changed to 2022-11-15
[04:28] jfw: 8 makefiles ingested so far after the 20 that made up libdovecot.
[04:29] jfw: like 31 to go.
[04:32] jfw: my process has been to import a listing of *.c from each subdir, comparing that against the sources listed in that dir's Makefile.am to ensure every file is accounted for one way or another.
[04:33] jfw: sometimes all the files go into one binary but usually they're split into a couple. it's especially annoying that the Makefile.am file lists aren't sorted.
[04:33] jfw: the files all have a pretty similar structure which I've gotten used to, but sometimes there are quirks.
[04:35] jfw: since each makefile.am triggers changes to multiple parts of my new unified makefile, I'm attempting to accelerate things a bit by 'transposing' those, aiming to do each step codebase-wide.
[04:36] jfw: the simpler ones at least.
[04:40] jfw: I might give up that comparing against the directory part and just take the lists straight from the makefile.am.
[04:42] jfw: the new makefile is up to 1137 physical lines, which really just reflects how many files are involved.
Day changed to 2022-11-16
[05:02] jfw: 47 of 59 automakefiles down.
[23:28] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5426 - some first exceptions: imap/imap_client.c and login-common/client-common.c both define client_input, client_destroy and client_disconnect
[23:28] sourcerer: 2022-11-13 20:32:25 (#jwrd) jfw: assuming no symbol naming conflicts, the results should be identical; and the codebase seems pretty good about prefixing all the externally visible symbols to prevent conflicts
[23:29] jfw: and clients_destroy_all too
[23:33] jfw: and they're not the only ones to define those same functions, ugh.
[23:39] jfw: easily enough sorted by pulling imap_client.o back out of libdovecot.a and just listing it for each of the two programs it links into.
[23:41] jfw: I don't have a particularly fancy way to find these conflicts, just a nm | awk | sort | uniq -d thing
Day changed to 2022-11-17
[00:08] jfw: an uglier one, which triggered this, was a conflict not even between symbols within libdovecot but between it and a directly linked object file. a global 'stats_metrics' pointer is defined by both stats/main.c and stats/test-stats-common.c. I had put the latter into libdovecot because it was otherwise linked explicitly into several test programs. however, what seems to happen is that some files in
[00:08] jfw: libdovecot (stats/client-reader.o, stats/client-writer.o, stats/stats-service-openmetrics.o) refer to a global stats_metrics, which is supposed to be the one from stats/main.o. But when stats/test-stats-common.o is included in libdovecot, the linker seemingly misses that it already has the symbol in main.o and concludes that it needs to pull in that unit, which in turn has undefined references to
[00:08] jfw: things that are only in the test programs it's intended for.
[00:12] jfw: the link command in question is simply "cc -o stats/stats stats/main.o libdovecot.a". clearly main.o comes first in the search order. so I guess this is one of those arcane details of linker behavior that you don't get from the summary in the manual.
[00:14] jfw: "the reference was in the library so it refers preferentially to symbols defined in the same library" or something.
[00:23] jfw: in this case I'd say the root problem is that stats/stats-common.h declares that global *stats_metrics, and stats_startup_time too, but there's no stats-common.c to actually hold the variables so they got duplicated between main and test programs.
[00:31] jfw: "make the pain go away" rather than notice that the error is trying to tell you something.
[04:36] jfw: I've got through all those makefiles, including a bunch more that emerged from subdirectories (including plugins), and I seem to be down to one last link failure to untangle, another one where more stuff is being pulled in than should be.
[04:39] jfw: another duplicate definition "just to avoid linker error", indeed.
[05:07] jfw: 'twas a unit test for a humble string compariosn function (i_strccdascmp) that for some reason was written in doveadm/doveadm-util.c thereby bringing all kinds of baggage along for the ride for any callers.
[05:13] jfw: "make" now completes beautifully, albeit still outside the target environment. final Makefile is 2036 lines and config.h 686 lines, which replaces...
[05:14] jfw: 6287 lines of Makefile.am, 3442 lines of m4 and 907 lines of configure.ac.
[05:15] jfw: and needless to say the structure of the new makefile is way simpler than those.
[05:17] jfw: but to go on comparing apples to pigshit, just for giggles, that's a nearly 4x reduction.
[05:20] jfw: (there's also a couple more makefiles to digest, for man pages and the like.)
Day changed to 2022-11-18
[05:04] jfw: doc makefiles duly digested, old build system purged and nearly ready to release. I'm thinking to do it as a fresh tarball with bumped version number since there's been so much work, and publish that alongside a patch series extracted from my work in git for the historical record / easier inspection of what was done.
[05:06] jfw: but for instance the gales tree won't have to carry 26 patches to be applied in the gport, several of which just being bulk file removals.
[05:06] jfw: "the world starts here"
Day changed to 2022-11-19
[18:14] jfw: for a latest update: early on (before attacking the dynamic modules & build system) I'd made some cuts to dovecot's fairly short but messy RNG interface code, removing /dev/urandom and arc4random options in favor of the "no failure cases as long as it exists" getrandom() syscall only.
[18:17] jfw: said syscall exists on gales libc and its default 4.9 kernel, but as it turns out not my slightly older gentoo system, or 2.6.x kernels, which makes me think it's too new to fully rely on.
[18:18] jfw: so I've reverted that change and instead untangled the code so the different options, fallback modes and error handling for each are much clearer.
[18:20] jfw: now on finally running the test suite, I get a pile of debug spew from doveadm/dsync/test-dsync-mailbox-tree-sync, which doesn't seem right as it buries the actual test results; then lib-charset/test-charset shows three assertion failures then segfaults.
[18:21] jfw: "but besides that, Mrs. Lincoln, how did you enjoy the play?"
Day changed to 2022-11-20
[01:11] jfw: well, both those failures are still there on the minimally-patched, no-modules, full-autoconf version. possibly I didn't see them before because it runs the test programs in a different order, with the biggest one, lib/test-lib, first; and that one reports "1 / 41562 tests failed", which failure stops it from getting to the remaining programs (because it's too simplistic to be able to report the
[01:11] jfw: overall status if it didn't die early, I suppose)
[01:26] jfw: oh lord, testing the unpatched release now on a glibc gentoo, it doesn't even build at all with --disable-shared.
[01:29] jfw: "not a bug - we don't build it that way!"
[01:35] jfw: past that, test suite fails though at a later stage, lib-program-client/test-program-client-net. looks like maybe ipv6 related.
[01:36] jfw: "not a bug - nobody disables ipv6 even though nobody uses it!"
[01:41] jfw: but, test-dsync-mailbox-tree-sync includes the debug spew so at least that one is entirely unrelated to my gales/musl environments or patching.
[03:28] jfw: "which failure stops it from getting to the remaining programs (because it's too simplistic to be able to report the overall status if it didn't die early, I suppose)" - smartened up by the mysql experience, I set to fixing that offensive laziness *first* thing so as to get some idea of the scale of the brokenness. the good news is that at the high level, only 3 of the 113 test programs are
[03:28] jfw: failing (each of which has potentially many individual test cases).
[03:29] jfw: it took over 6 minutes to run the whole set, making for the third way in which they're trying their best to train me to NEVER EVER RUN TEST SUITES
[03:31] jfw: (that's hardly the worst I've seen of slow test suites either, to be sure)
Day changed to 2022-11-21
[14:41] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5362 -- looks like a mess, but glad we can just drop it.
[14:41] sourcerer: 2022-11-08 19:55:42 (#jwrd) jfw: while on the openssl side, they quietly changed the function signatures at some point, resulting in this HAVE_SSL_NEW_MEM_FUNCS mess. on the bright side, I can just drop all that code.
[14:41] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5371 -- ok, sounds reasonable.
[14:41] sourcerer: 2022-11-08 22:17:30 (#jwrd) jfw: so I figure we're safe enough to leave MMAP_CONFLICTS_WRITE off (which due to its negative definition means leaving mmap enabled).
[14:42] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5392 -- man, bathophobia indeed on the timekeeping. glad you got to the bottom of it and sounds like a good result.
[14:42] sourcerer: 2022-11-11 01:59:32 (#jwrd) jfw: a bit of good news is I was able to excise the usage of TIME_T_MAX_BITS from that code by dropping only the detection of *forward* time jumps, which don't have any of the problems cited on the wiki page, while the detection was unreliable and they weren't doing anything with it anyway besides logging.
[14:42] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5410 -- ok, makes sense. when drinking from spittoon, puke.
[14:42] sourcerer: 2022-11-12 01:42:05 (#jwrd) jfw: I think I'm going to do exactly that (changing both underflows and overflows to -1, signalling an error), in a "not sure exactly what effects this will have but can't be any worse off" sort of move.
[14:43] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5413 -- yeah, I hear ya.
[14:43] sourcerer: 2022-11-12 01:49:58 (#jwrd) jfw: I've had absolutely enough of this bit of shit-shovelling and am quite ready to point a firehose at the stable and let the runoff flow where it may.
[14:43] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5415 -- cool.
[14:43] sourcerer: 2022-11-12 02:57:43 (#jwrd) jfw: done, and tightened up some of the callers to notice the error. then removed the last uses of TIME_T_MAX_BITS from test case definitions, with the result that there should be some failures on 32-bit signed time_t (y2038 broken) systems, which seems to me quite as it should be.
[14:43] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5416 -- yeah, seriously.
[14:43] sourcerer: 2022-11-12 03:09:24 (#jwrd) jfw: how's that for a pareto distribution: I get through the first half of the pre-sifted config.h in a day and change; the second half takes a week, with a single one of the 259 switches - around the "95% complete" mark - taking a full three of those days.
[14:43] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5421 -- nice !
[14:43] sourcerer: 2022-11-13 04:24:43 (#jwrd) jfw: with config.h dusted, conversion of the recursive nest of automake files to a simple makefile is underway, with a first milestone reached of building libdovecot.a from 341 source files.
[14:44] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5422 -- word.
[14:44] sourcerer: 2022-11-13 04:26:20 (#jwrd) jfw: the step of "auto-download UnicodeData.txt so we can pretend it's not part of our codebase" will NOT be replicated in the new build system.
[14:44] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5427 -- doesn't seem like a bad fallback option.
[14:44] sourcerer: 2022-11-13 20:39:45 (#jwrd) jfw: a downside is it's reducing the information readily apparent from the build system about what links into what; it'll be "anything could link into anything, grep around or check the linker map to see for sure"
[14:44] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5444 -- ok, nice.
[14:44] sourcerer: 2022-11-16 23:39:50 (#jwrd) jfw: easily enough sorted by pulling imap_client.o back out of libdovecot.a and just listing it for each of the two programs it links into.
[14:45] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5451 -- good find.
[14:45] sourcerer: 2022-11-17 00:23:44 (#jwrd) jfw: in this case I'd say the root problem is that stats/stats-common.h declares that global *stats_metrics, and stats_startup_time too, but there's no stats-common.c to actually hold the variables so they got duplicated between main and test programs.
[14:45] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5452 -- "we don't deplot it that way anyways.." or something.
[14:45] sourcerer: 2022-11-17 00:31:53 (#jwrd) jfw: "make the pain go away" rather than notice that the error is trying to tell you something.
[14:45] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5456 -- win.
[14:45] sourcerer: 2022-11-17 05:13:26 (#jwrd) jfw: "make" now completes beautifully, albeit still outside the target environment. final Makefile is 2036 lines and config.h 686 lines, which replaces...
[14:46] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5461 -- sounds like a good plan.
[14:46] sourcerer: 2022-11-18 05:04:31 (#jwrd) jfw: doc makefiles duly digested, old build system purged and nearly ready to release. I'm thinking to do it as a fresh tarball with bumped version number since there's been so much work, and publish that alongside a patch series extracted from my work in git for the historical record / easier inspection of what was done.
[14:46] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5466 -- ok.
[14:46] sourcerer: 2022-11-19 18:18:17 (#jwrd) jfw: so I've reverted that change and instead untangled the code so the different options, fallback modes and error handling for each are much clearer.
[14:46] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5476 -- ok.
[14:46] sourcerer: 2022-11-20 03:28:47 (#jwrd) jfw: "which failure stops it from getting to the remaining programs (because it's too simplistic to be able to report the overall status if it didn't die early, I suppose)" - smartened up by the mysql experience, I set to fixing that offensive laziness *first* thing so as to get some idea of the scale of the brokenness. the good news is that at the high level, only 3 of the 113 test programs are
[22:40] jfw: dorion: hey, that made for not a bad fast-forward back through the highlights of the past two weeks.
[22:44] jfw: I should clarify that http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5422 doesn't mean I removed unicode support or something, just checked the referenced data into the repository proper; there ended up another couple similar files in addition to the snapshotted wiki docs (which I already knew about).
[22:44] sourcerer: 2022-11-13 04:26:20 (#jwrd) jfw: the step of "auto-download UnicodeData.txt so we can pretend it's not part of our codebase" will NOT be replicated in the new build system.
[22:46] jfw: combind with the new build system, the result is that there will be no difference between a repository snapshot and the published tarball; as it should be for open source, and contrary to the usual automake slop.
[23:06] jfw: the first of the three test failures, lib-charset/test-charset (specific test "charset iconv", and if we skip past the segfault, also "charset iconv utf7 state") is caused by musl's iconv not supporting UTF-7 source encoding. utf7 is this weird thing for squeezing unicode strings into an ascii-only channel, because base64 wasn't fancy enough or something. I think it might have actually been
[23:06] jfw: invented by the IMAP folks, and dovecot has its own conversion code, so it's unclear why they think they need iconv to support it.
[23:09] jfw: the commit that introduced the test just says "lib-charset: Added UTF-7 iconv() unit test / Possibly crashes on FreeBSD? Not verified yet. But a good test in any case." at least we learn that it wasn't added to test some specific code change.
[23:10] jfw: I at least certainly don't learn why it's a good test, though.
[23:10] jfw: current thinking is to just delete the test.
[23:11] jfw: the second failure is the same network thing seen also with stock dovecot on glibc.
[23:11] sourcerer: 2022-11-20 01:35:14 (#jwrd) jfw: past that, test suite fails though at a later stage, lib-program-client/test-program-client-net. looks like maybe ipv6 related.
[23:14] jfw: the third is the 1 / 41562 in lib/test-lib, namely "file_cache_errors", and seems to have something to do with mmap or mremap.
[23:14] sourcerer: 2022-11-20 01:11:48 (#jwrd) jfw: well, both those failures are still there on the minimally-patched, no-modules, full-autoconf version. possibly I didn't see them before because it runs the test programs in a different order, with the biggest one, lib/test-lib, first; and that one reports "1 / 41562 tests failed", which failure stops it from getting to the remaining programs (because it's too simplistic to be able to report the
[23:15] jfw: since I'd noticed and supposedly fixed some brokennes wrt mremap, this seems the most interesting.
[23:15] sourcerer: 2022-11-05 04:56:41 (#jwrd) jfw: found a first incorrect autoconf result: it concluded 'mremap' isn't supported, because it's using the glibc private __USE_GNU instead of the documented _GNU_SOURCE macro to enable its visibility.
[23:20] jfw: *combined, *brokenness. can't seem to type today! this membrane keyboard 'disposable' might be nearing its mileage limit.
Day changed to 2022-11-22
[14:52] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5514 -- glad to hear it.
[14:52] sourcerer: 2022-11-21 22:40:40 (#jwrd) jfw: dorion: hey, that made for not a bad fast-forward back through the highlights of the past two weeks.
[14:53] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5515 -- ok, thanks for clarifying.
[14:53] sourcerer: 2022-11-21 22:44:12 (#jwrd) jfw: I should clarify that http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5422 doesn't mean I removed unicode support or something, just checked the referenced data into the repository proper; there ended up another couple similar files in addition to the snapshotted wiki docs (which I already knew about).
[14:54] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5517 -- word.
[14:54] sourcerer: 2022-11-21 22:46:01 (#jwrd) jfw: combind with the new build system, the result is that there will be no difference between a repository snapshot and the published tarball; as it should be for open source, and contrary to the usual automake slop.
[14:55] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5522 -- makes sense to me.
[14:55] sourcerer: 2022-11-21 23:10:46 (#jwrd) jfw: current thinking is to just delete the test.
Day changed to 2022-11-23
[19:00] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5473 - it's indeed ipv6 at issue, specifically the lack thereof in kernel, but the test design is pretty lame too; it's aiming to verify the handling of connection refused errors, and does so by trying a couple variants of the localhost address, but doesn't check the actual error codes, just the number of times an error handler was
[19:00] sourcerer: 2022-11-20 01:35:14 (#jwrd) jfw: past that, test suite fails though at a later stage, lib-program-client/test-program-client-net. looks like maybe ipv6 related.
[19:00] jfw: triggered. the one using ::1 (ipv6 localhost) emits "Address family not supported by protocol" (EAFNOSUPPORT) on both socket() and connect() calls, thus the error count is one higher than expected.
[19:02] jfw: kind of suspicious actually that it gets to trying a connect() even after the socket() failed
[19:23] jfw: feh, just a misleading error message (in program_client_net_connect_real); it doesn't actually try connect() after the failed socket(), they just meant "net_connect_ip(%s) failed" not "connect(%s) failed", net_connect_ip being a convenience function from their lower level library which could fail for several different causes only one of which is connect().
[19:39] jfw: corrected the error message but there's another weird, the subsequent (expected) errors for ipv4 addresses are still halfway-showing the ipv6 one
[19:52] jfw: looks like that's because it's being done as a multi-IP fallback thing and the particular struct field being referenced in that case only tracks the first one, though another field tracks the remaining ones
[19:59] jfw: myeah, here, prclient->address gets the first of the "ips" array (posing as a pointer) while prclient->ips gets all of them, and the broken semiduplication is ok because... wait, comments? docs? what are those?
[20:00] jfw: (#L250 for the equally barren struct definition)
[20:01] jfw: I'm not even sure what this whole "program client" subsystem is, didn't even find docs at that level.
[20:03] jfw: anyways I think I'll just shut up this failure by removing the ipv6 address from the list, not seeing any obvious way to acknowledge the specific error of ipv6 not existing.
[21:20] jfw: re iconv utf7 support, we decided to at least have a look at what that code is used for. what comes up at first glance is message header decoding, message body decoding, and searching.
[21:20] sourcerer: 2022-11-21 23:06:39 (#jwrd) jfw: the first of the three test failures, lib-charset/test-charset (specific test "charset iconv", and if we skip past the segfault, also "charset iconv utf7 state") is caused by musl's iconv not supporting UTF-7 source encoding. utf7 is this weird thing for squeezing unicode strings into an ascii-only channel, because base64 wasn't fancy enough or something. I think it might have actually been
[21:23] jfw: looking for hints as to what might be valid character sets for email messages, I check rfc2822 which of course offers, "At the most basic level, a message is a series of characters. A message that is conformant with this standard is comprised of characters with values in the range 1 through 127 and interpreted as US-ASCII characters [ASCII]." ahhh, if only.
[21:28] jfw: dorion: it occurs to me there's a decision here which could use making explicit: do we mean to support all the email extensions that the consumer has come to expect, like unicode bodies, unicode headers, attachments, html and multipart alternatives? I expect the software we're looking at pretty much covers it all, but it's possible to take the approach of whittling things away as they come up
[21:30] jfw: I've had some experience by now with the "email is ASCII, dammit" lifestyle, both at the extreme end when using 'less' as my mail reader, where I admit it was kinda satisfying to see spam mails having headers like "From: =?utf-8?B?5Lya5ZOh6LOH5qC844Gv5Y+W44KK5raI44GV44KM44G+44GZIA==?= <contact@amazon.co.jp>"
[21:31] jfw: and in the middle with alpine on an ascii terminal, where it at least decodes the junk but squashes it for display, resulting in the no less satisfying "From: ???????????? <contact@amazon.co.jp>"
[21:32] jfw: otoh, it's a pain for communicating with the locals en espa?ol
[21:33] jfw: plus the autocorrect-induced ?microsoft quotes?, en?dashes, condensed ellipsis and whatnot
[21:37] jfw: and for said locals, microsoft-email still seems a preferable bridging point compared to the chat apps
[21:40] jfw: the multipart alternatives thing gets twisted further into nonsense like "We have tried to send you this email as HTML (pictures and words) but it wasn't possible. In order for you to see what we had hoped to show you please click here to view online in your browser:" (-paypal)
[21:42] jfw: and I've seen worse from the panamanians where the message is just html containing an img - not even attached but loaded from their server - containing the rasterized message.
[21:45] jfw: I reckon there's no real limit to how deranged people can get in their choice of and use of the tools, so there's no sense in trying to accomodate all of it, but, where to draw the line?
[21:46] jfw: next up, "we tried to email you but you're not using the latest iphone, click here to buy it"
[21:52] jfw: (I share my experience for better illustration, but the real question is what will be the needs of the users of the service)
[23:07] jfw: let's try some data science on it: popularity contest of charsets seen in the wild, from my mail heap of ~4 years, spam and all.
[23:08] jfw: kinda fun when you hit the "-bash: /bin/grep: Argument list too long" limit and have to switch to find|xargs.
[23:10] jfw: filtering first on a 'content-type' header cuts out a lot of html-embedded garbage, which I certainly hope dovecot wouldn't be trying to parse.
[23:10] jfw: so there's exactly one reference to utf7, namely "unicode-1-1-utf-7
[23:14] jfw: ... which looks quite odd, indeed glibc iconv doesn't accept it; looking at the message in question, it's spam from a very confused sort of server, looking at first like backscatter (a bounce message from a legit server responding to a message that forged your address as return path), except parts of it purport to be gmail but the top level has none of the usual gmail trappings.
[23:16] jfw: so, thus far I'd say we're quite safe to not support utf7 message encoding.
[23:21] jfw: checked what was up with that lone 'utf-8' that didn't fit with the rest of them, there's a ^M (CR) character that snuck onto the end of it.
[23:29] jfw: the '3dus-ascii' etc. is an artifact of the sloppy parsing, it's from a message that was quoted in full raw form in another message so it got escaped.
Day changed to 2022-11-25
[23:35] jfw: http://fixpoint.welshcomputing.com/2022/a-quick-survey-of-filthy-character-sets-from-my-mail-archives/
Day changed to 2022-11-26
[01:55] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5478 - because it became pretty much required for debugging the last test failure, I got the piece of this consumed by lib/test-lib down from 117 s to 12.1 s. Mostly by cutting out some of the more aggressive tests of CPU time limit enforcement (their hack for generating system time consumption to measure was particularly fucked up),
[01:55] sourcerer: 2022-11-20 03:29:56 (#jwrd) jfw: it took over 6 minutes to run the whole set, making for the third way in which they're trying their best to train me to NEVER EVER RUN TEST SUITES
[01:55] jfw: then the rest by reducing iteration counts by 10x-20x on what were basically random fuzz tests.
[01:58] jfw: fuzz testing is nifty and all but 1) guided fuzz testing using compiler instrumentation ala AFL is way smarter than unguided which they're doing and 2) it benefits from all the cycles it can get so is a very poor fit for a general test suite that everyone's supposed to run every time.
[02:11] jfw: grrrr, so at least one culprit in the file_cache_errors failure is the usual (by now) expecting specific error strings, because who ever heard of checking the standard error code symbols and "Cannot allocate memory" is totally something different from "Out of memory".
[02:11] sourcerer: 2022-11-21 23:14:40 (#jwrd) jfw: the third is the 1 / 41562 in lib/test-lib, namely "file_cache_errors", and seems to have something to do with mmap or mremap.
[02:17] jfw: they streamlined that madness with test_expect_error_string and test_expect_error_string_n_times functions, which together show up in 75 uses.
[02:29] jfw: yup, replaced 2 instances of that with the weaker test_expect_errors and now it passes. which means my work as far as we know so far was done right the first time. of course.
[02:29] sourcerer: 2022-11-21 23:15:27 (#jwrd) jfw: since I'd noticed and supposedly fixed some brokennes wrt mremap, this seems the most interesting.
[02:41] jfw: ah, those numbers were with optimization disabled for debugging; with normal build flags the new time is down to 10.6 s, while the old time is essentially unchanged so it's an even better ratio.
[02:41] sourcerer: 2022-11-26 01:55:54 (#jwrd) jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Nov-2022/#5478 - because it became pretty much required for debugging the last test failure, I got the piece of this consumed by lib/test-lib down from 117 s to 12.1 s. Mostly by cutting out some of the more aggressive tests of CPU time limit enforcement (their hack for generating system time consumption to measure was particularly fucked up),
[02:48] jfw: overall test time down to 4m27s (still 2x slower than the build time but what can you do), and indeed it's now 100% passing. fuck yea.
Day changed to 2022-11-28
[22:23] jfw: for an update, I made some simple code tweaks in places suggested by compiler warnings; most were not-fully-tidy bits from my own recent work; one was a possibly uninitialized pointer use. like gcc I wasn't readily able to confirm whether the pointer was properly set before being referenced, so I zero-initialized it so that it'll at least trap if there is such a bug rather than silently fucking
[22:23] jfw: with who knows what memory.
[22:24] jfw: that was perhaps a more valuable find than anything coming from running the test suite.
[22:27] jfw: I also bumped the version and updated the NEWS file to summarize all the changes more from the sysadmin's than code reviewer's perspective, preparing for publication as jwrd-dovecot-2.4.0.
[22:30] jfw: Using the release candidate tarball, I tried building a much-simplified gport; only two minor difficulties came up from the change of environment and testing setup (one compile error and one test hitting the unix domain socket address path length limit)
[22:31] jfw: after iterating the tarball I now have the gport built, with clean bill of health from the test suite.
[22:36] jfw: so it's finally time for the more traditional parts of sysadminning, RTFMing and all that, which we've found usually goes pretty quick after wrestling with the beasts in the code pit.
[22:36] sourcerer: 2022-10-10 01:11:19 (#jwrd) jfw: I won't mind the work itself, but the grass does momentarily appear greener on the "imap is easy, you just apt-get something and edit some configs" side
[22:38] jfw: *one was a possibly uninitialized pointer use *in the original*, to make that clear.
[22:41] jfw: given how much informed & defensive coding they seem to do otherwise, I'm almost thinking it's a deliberate thing, don't initialize variables at point of declaration so that the compiler warnings can help prove that it's later set in the proper sequence, and this one just comes up for me due to different gcc version.
[22:46] jfw: informed & defensive in a particular, modern, head-cockroachy way, that is. a programmer enlightented in the true ways would not go jerking off on dlsym wrapper frameworks and m4 libraries.
Day changed to 2022-11-29
[01:49] jfw: I just couldn't bear to leave that one uncertain so I looked closer at the code and, disappointingly perhaps, the pointer is not in fact used uninitialized; nothing much to see here. I'll leave my zeroing though to keep the compiler happy.
[01:50] jfw: (new_tag in imap_client_input_idle_cmd)

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.