Threads and the fediverse, again.

Fediverso e Threads, ci risiamo.

I have previously articulated my perspective on the narrative surrounding large social networks federating through ActivityPub, elucidating why such a model is inherently non-scalable. What proves vexing is the response from individuals like Eugen Rochko, the programmer behind Mastodon, who, instead of substantiating the scalability, tend to fixate on nuances such as ‘the interaction between instances primarily relies on the follow mechanism.’

While there is some truth in this assertion (albeit not entirely), the predicament lies in its exclusivity to the existing software, a fact conveniently overlooked by Rochko, who appears to perceive it as the sole embodiment.

However, the reality is more nuanced than Rochko acknowledges, as he selectively neglects other pertinent facets of the issue.


1) No, on Threads and others, traffic does not solely hinge on the follow mechanism. This distinction arises because these platforms are not working the same way as Mastodon.

 

Consider a prominent social network. Envision a notable figure—take, for instance, Chiara Ferragni, who commands an audience of approximately 20 million accounts. In the parlance of the Fediverse, we refer to them as followers. Now, picture responding to Ferragni with a charming or perhaps discordant image, and observe someone replying in the following manner:

In the realm of the Fediverse, I, sarawhatever, shared a URL linking to an image of me enjoying an ice cream, only to have someone respond by posting a rendition of my image embedded within another. Setting aside the necessity of downloading my image for use in theirs, the crux of the matter lies in the fact that, if we were to analogize this scenario to the likes of Chiara Ferragni, during a moment of ebb in popularity, the image alone  might garner a mere fifty thousand views.

Regrettably, the situation can exacerbate: should my image incite controversy, we effortlessly escalate to a million downloads. Why, you may ask? Because Meta’s algorithm is designed to maximize views, inevitably accentuating contentious content.

A singular post sent via ActivityPub (which dispatches images as links) risks endowing Rochko’s instance with comparable traffic to that which would ensue if all his users posted the same image, and at least one person viewed them.

Just one post.

50K GET requests.


When we contemplate users who are renowned singers or entire television networks boasting millions of followers and subscribers, and then peruse the ensuing commentary, one assertion becomes evident: the notion that traffic is contingent solely upon the act of following is demonstrably inaccurate.

More precisely, we can posit that scalability aligns in a more linear fashion with the number of followers, contingent upon the algorithm presiding on the other end of the federation, determining the prominence assigned to a post—or a controversy.

 

Should the algorithm decree that your post, accompanied by your image, shall be witnessed by 25 million pairs of eyes today, that is the threshold your server must endure. Period

 


2) No, your Cache or CDN is not the answer.

 

The naive response at this juncture is that your instance can always store images on S3 or another cloud service. Granted. Yet, we embarked upon this journey of decentralization to liberate ourselves from the online oligarchs, only to consider entrusting our content to yet another digital magnate.

However, the discourse on Content Delivery Networks (CDN) introduces a secondary frontier. Take, for instance, this blog; it is occasionally referenced on Facebook and other social platforms.

Upon such citations, the blog experiences a surge in visits. What do I, as an observer through my unassuming Grafana, witness in these instances?

The reason major systems are not confined to a single server or a solitary IP address associated with a host, as is the case in the Fediverse, lies in their manifestation as a million-headed hydra known as a Content Delivery Network (CDN). If, for instance, I am in Düsseldorf and perusing Facebook, a post appears before me in the following guise:

https://scontent-dus1-1.xx.fbcdn.net/v/t39.2365-6/whatever

However, were I to observe the same from, say, Rome, the post materializes as originating from https://scontent-rome1-1.xx.fbcdn.net/v/t39.2365-6/whatever. In essence, to aggregate the data comprehensively, I must collate logs under the umbrella of ‘fbcdn.net.’

Now, we confront two scenarios: one in which my image (a copy) is uploaded to Facebook’s CDN (transforming the experience into a semblance of having a Facebook account concerning privacy and profiling), or the alternative where it remains where it is (perhaps due to its age or simply because it isn’t uploaded or viewed in Rome).

When dealing with images, we are left with a choice: either the demise of our server under the weight of the load, or the surrender of control to Facebook, which then proceeds to manipulate it at its convenience while profiling our Fediverse profile (albeit remotely). Since we are not users, Facebook is not bound by certain privacy obligations that are contingent upon the existence of a contract—i.e., user status.

On the other hand, if the CDN transports our post or toot containing the image in the form of a link, we must brace ourselves for an onslaught of GET requests, each accompanied by a Referrer: the specific node within the CDN managing the request.

Last but not least, the UI will do at least one GET to download the SEO of the instance/user/post.

Either allow Facebook to seize control of the content and manage the delivery, facilitating profiling while relinquishing some data protections, or your server succumbs to the load the moment someone decides to follow Taylor Swift

 


3) Ah, yes. You mentioned ‘links’?

Indeed. When we navigate to a major social platform and click on an ‘external’ link, we aren’t directed straight to the external link. Instead, we are directed to a system that ‘conducts checks’ (essentially, it extracts information from the browser, thoroughly profiles us, collects every possible data point) and then redirects us toward the linked content.

This necessitates, in major systems, that the ‘intermediary link’ at least sends a HEAD request to the final content, essentially loading the so-called SEO thumbnail. This implies that, depending on the number of people reading the thread, your server receives at least one call, irrespective of whether the link is followed.

Therefore, if you post a reply to Taylor Swift, containing the SEO of the message or just your avatar, everyone scrolling through that thread will concurrently load the avatar. Even if your link is not followed, with each single view, at least the avatar will be loaded, triggering a GET request, or placed in some cache, and saved for… well, who knows how long.

In this scenario, the links themselves, including the link to your post, generate traffic. This traffic is not proportionate to the number of followers Taylor has on YOUR instance, but rather in proportion to the followers Taylor has on HER ‘instance.’ This is because your link, or at least the link to your post, is read by every Meta client or shifted to Meta (which is akin to having an account there concerning privacy and profiling).


4) Now emerges the predicament of reports, akin to the proverbial wrench in the works. Moderation becomes the unwelcome guest at the feast.

 

If I dispatch objectionable content, it can be reported to the moderator. In the Fediverse, even reports from a specific instance are directed to the moderator.

Up until now, in the Fediverse, the pace of reports has maintained a leisurely rhythm—perhaps one report per bimester—making it manageable on a hobbyist scale. However, what transpires if I send a flamboyant photo and receive 23,000 reports from a major player? It implies that I must grapple with the resolution of 23,000 reports.

Contrastingly, on major systems, I envision reports being amalgamated based on content. Given that the format of reports on Mastodon is free-form—meaning I can submit a report for ‘Offending the God of Buttered Rabbits’—should I aspire to be a diligent moderator, I am compelled to peruse each one, as someone might have a valid legal reason to mention.

In short: not even the instance’s sysadmin scales effortlessly.


You may wonder: does Eugen Rochko not know these things? Of course, he knows them well. However, for some reason, he seems inclined to downplay the issue or feign its nonexistence. He is acutely aware that if a Mastodon user opts to follow Taylor Swift, then with every post, there comes a thread of responses to unravel. Should the user click to read the thread, the traffic on the Mastodon server is contingent on Taylor’s success, not the number of follows on that particular instance.

Yet, he pretends not to acknowledge this reality.

Pleroma has devised a rate-limiting system that I personally appreciate, offering a potential means to mitigate the issue. If Pleroma falls short, I will implement rate limiting on the ingress reverse proxy, and if that proves insufficient, the firewall will come into play. Should even that fail, the maximum throughput of my Docsis 3.1 connection will serve as a last resort to restrict traffic.

However, the problem persists: if major players enter the scene and fail to curb their traffic, numerous small instances will encounter issues.

As for why Rochko feigns ignorance, that remains a mystery.

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *