Fedi Presence Is a Discovery Problem
Fediverse evangelists talk about distribution. The real problem is discovery, and it is hard enough that most people claiming to care about fedi just join mastodon.social and disappear.
I wanted real geographic and community diversity, not concentration on a single instance. If one instance defederates something, goes offline, or gets acquired by someone with a different moderation philosophy, you lose everything there. The right setup is multiple live accounts across distinct instances, posted to in parallel.
That means mapping the space first. Starting point: 227 candidate instances across four protocol families. Mastodon, Misskey, Lemmy, mbin. Not nearly enough to run a real selection against after filtering for actually responsive, actually open, actually multilingual instances.
The first fix was raising CAP_PER_FAMILY from 70 to 300 in fedi_harvest.py and threading the probe with a 16 worker ThreadPoolExecutor. Serial probing against hundreds of instances at varying response speeds creates a latency wall. Threading dissolved it. The result was 918 candidate records: 300 Mastodon, 300 Misskey, 300 Lemmy, 18 mbin. That is a workable population to run selection against.
From 918, three accounts are now live and posting: piaille.fr, masto.es, and ruhr.social, all @arihantdeva with real grounded posts verified. The strategy in plan_200.json runs distinct instance first to eliminate redundancy, then instance multiplication once the initial spread is established. Target is a 200 account floor. Three down.
Three more are parked. nrw.social is approval pending, which is just a time problem. planet.moe and defcon.social are blocked by a confirm page captcha, and that is the real lesson from this build. The probe correctly identifies an instance as accepting registrations. It does not know there is a captcha in the registration flow until you actually attempt to complete it. You find out late, after the instance has already been counted as viable. Wasted signal from an otherwise solid data run.
Two things I would do differently. First: add a captcha detection pass to the probe itself. Check the registration page HTML for challenge flows before the instance enters the candidate pool. Right now that filtering happens at mint time, which is too late. Second: weight approval pending instances lower in the first batch. Manual admin review is fine for building long term presence. It is a bad choice for the first wave when you need to verify the pipeline end to end quickly.
The 918 record pool is the right foundation now. Discovery is solved. The remaining friction is at the registration edge, and that is a tractable fix: one targeted update to the probe that detects captcha gated registration pages before inclusion.
Write a comment