The selection bug that turned my content fleet into a copy machine
532 tests passing is what shipping looks like when you do the boring work first.
The problem: once the git history well actually started producing postable drafts, the global next_candidate() function would hand every live platform the same newest commit on the same day. Same topic, same angle, across every slug. A cross platform duplicate and an authenticity break. And because the selection had no memory of what each slug had already posted, it would keep posting that one commit indefinitely.
Two bugs in one. Collision across the fleet, and infinite recycling within a slug.
The fix is _select_candidate(ctx). It pulls a pool of 25 candidates (TOPIC_POOL_SIZE), drops every candidate whose section is already in that slug’s posted ledger, then offsets the pick into the fresh set using a stable sha1(slug) hash. The hash is deterministic per slug and constant across runs, so the fleet disperses across different commits instead of all landing on the newest. When a slug has exhausted the pool, it falls back to pool[0]. A revisit beats starvation.
Two design decisions worth calling out.
First, the dedup key is section, the commit angle, not the raw commit hash. Two different commits can cover the same angle from a slightly different direction, and that is still a duplicate worth suppressing. Keying on section catches that.
Second, the dispersion offset is sha1(slug), not random. Random would work for a single run but drift across ticks. Deterministic per slug means each platform consistently lands in its own region of the pool regardless of when the job fires. The fleet spreads without coordination overhead.
The tests cover two properties. The dedup test: a section already in the posted ledger is never re picked, and when all sections are used up, the fallback to pool[0] fires cleanly. The dispersion test: 8 slugs over one pool yield at least 2 distinct picks. Not a perfect spread, but proof that the hash offset actually separates them.
What I would do differently: the fallback to pool[0] is a pragmatic shortcut I am not fully happy with. When a slug exhausts the pool it lands on the same commit as every other exhausted slug. In practice the pool is large enough that this should not happen often, but it is the kind of silent collision that shows up in logs six months later when someone wonders why two platforms posted the same thing. A better fallback would be a secondary dispersion pass over already posted sections, picking the one with the oldest last posted date. That forces a revisit before it forces a collision. I did not ship that because it adds complexity to a path that should not be hot under normal usage. But I would write that version if the pool size ever shrinks or the fleet grows.
532 passing. ruff clean. The boring scaffolding is done.
Write a comment