Tags: servers

39

sparkline

Monday, April 7th, 2025

Denial

The Wikimedia Foundation, stewards of the finest projects on the web, have written about the hammering their servers are taking from the scraping bots that feed large language models.

Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.

Drew DeVault puts it more bluntly, saying Please stop externalizing your costs directly into my face:

Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale.

And no, a robots.txt file doesn’t help.

If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned.

Free and open source projects are particularly vulnerable. FOSS infrastructure is under attack by AI companies:

LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.

You try to do the right thing by making knowledge and tools freely available. This is how you get repaid. AI bots are destroying Open Access:

There’s a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.

My own experience with The Session bears this out.

Ars Technica has a piece on this: Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries .

So does MIT Technology Review: AI crawler wars threaten to make the web more closed for everyone.

When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.

The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.

If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.

If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.

Friday, March 28th, 2025

Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries - Ars Technica

As it currently stands, both the rapid growth of AI-generated content overwhelming online spaces and aggressive web-crawling practices by AI firms threaten the sustainability of essential online resources. The current approach taken by some large AI companies—extracting vast amounts of data from open-source projects without clear consent or compensation—risks severely damaging the very digital ecosystem on which these AI models depend.

Wednesday, March 26th, 2025

Go To Hellman: AI bots are destroying Open Access

AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.

Friday, March 21st, 2025

FOSS infrastructure is under attack by AI companies

More on how large language bots are DDOSing the web:

LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.

Thursday, March 20th, 2025

Please stop externalizing your costs directly into my face

Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale.

This matches my experience with The Session. In fact, while I had this article open in a tab, I had to go deal with a tsunami of large language model bots. It’s really fucking depressing.

Please stop legitimizing LLMs or AI image generators or GitHub Copilot or any of this garbage. I am begging you to stop using them, stop talking about them, stop making new ones, just stop. If blasting CO2 into the air and ruining all of our freshwater and traumatizing cheap laborers and making every sysadmin you know miserable and ripping off code and books and art at scale and ruining our fucking democracy isn’t enough for you to leave this shit alone, what is?

Friday, January 26th, 2024

Concatenating text

Why the heck is everyone reaching for React as soon as something on the screen needs to update? And why do we insist on squishing our frontend concerns together with our backend concerns?

I’m glad I’m not the only one constantly asking myself those questions.

Look: is the idea of physically separating “code that runs business logic and builds markup” from “code that handles realtime interactions” really that awful? You can write both in JS if you like, I promise I won’t judge, but we can’t keep pretending that they’re basically the same.

Friday, November 24th, 2023

CSS { In Real Life } | Choosing a Green Web Host

Earlier this month, Jeremy Keith posed the question: “How green is my server?”. As Jeremy notes, it’s surprisingly hard to get that information! So how do you ensure that you’re hosting your website on a green server?

Sunday, November 19th, 2023

How green is my server?

The Session does very well in terms of performance. You can see the data from the Chrome UX Report (CRUX).

What’s good for performance is good for the environment. Sure enough, The Session gets a very high score from the website carbon calculator:

Hurrah! This web page achieves a carbon rating of A+

This is cleaner than 99% of all web pages globally

But under the details about hosting it says:

Oh no, it looks like this web page uses bog standard energy

The Session is hosted on DigitalOcean, who tend to be quite tight-lipped about their energy suppliers. Fortunately others have done some sleuthing to figure out which facilities are running on green energy.

One of the locations to get the green thumnbs up is the Amsterdam facility housed by Equinix. That’s where The Session is hosted.

I’m glad that I was able to find out that the site is running on 100% renewable energy, but I wish I didn’t need to go searching to find this out. DigitalOcean need to be a lot more transparent about the energy sources for their hosting facilities.

Tuesday, November 15th, 2022

Mastodon is just blogs

Do you still miss Google Reader, almost a decade after it was shut down? It’s back!

A Mastodon server is a feed reader, shared by everyone who uses that server.

I really like Simon’s description of the fediverse:

A Mastodon server (often called an instance) is just a shared blog host. Kind of like putting your personal blog in a folder on a domain on shared hosting with some of your friends.

Want to go it alone? You can do that: run your own dedicated Mastodon instance on your own domain.

This is spot-on:

Mastodon is just blogs and Google Reader, skinned to look like Twitter.

Wednesday, October 19th, 2022

JavaScript

A recurring theme in my writing and talks is “lay off the JavaScript, people!” But I have to make a conscious effort to specify that I mean client-side JavaScript.

I thought it would be obvious from the context that I was talking about the copious amounts of JavaScript being shipped to end users to download, parse, and execute. But nothing’s ever really obvious. If I don’t explicitly say JavaScript in the browser, then someone inevitably thinks I’m having a go at JavaScript, the language.

I have absolutely nothing against JavaScript the language. Just like I have nothing against Python or Ruby or any other language that you might write with on your machine or your web server. But as soon as you deliver bytes over the wire, I start having opinions. It just so happens that JavaScript is the universal language for client-side coding so that’s why I call for restraint with JavaScript specifically.

There was a time when JavaScript only existed in web browsers. That changed with Node. Now it’s possible to write code for your web server and code for web browsers using the same language. Very handy!

But just because it’s the same language doesn’t mean you should treat it the same in both circumstance. As Remy puts it:

There are two JavaScripts.

One for the server - where you can go wild.

One for the client - that should be thoughtful and careful.

I was reading something recently that referred to Eleventy as a JavaScript library. It really brought me up short. I mean, on the one hand, yes, it’s a library of code and it’s written in JavaScript. It is absolutely technically correct to call it a JavaScript library.

But in my mind, a JavaScript library is something you ship to web browsers—jQuery, React, Vue, and so on. Whereas Eleventy executes its code in order to generate HTML and that’s what gets sent to end users. Conceptually, it’s like the opposite of a JavaScript library. Eleventy does its work before any user requests a URL—JavaScript libraries do their work after a user requests a URL.

To me it seems obvious that there should an entirely different mindset for writing code intended for a web browser. But nothing’s ever really obvious.

I remember when Node was getting really popular and npm came along as a way to manage all the bundles of code that people were assembling in their Node programmes. Makes total sense. But then I thought I heard about people using npm to do the same thing for client-side code. “That can’t be right!” I thought. I must’ve misunderstood. So I talked to someone from npm and explained how I must be misunderstanding something.

But it turned out that people really were treating client-side JavaScript no different than server-side JavaScript. People really were pulling in megabytes of other people’s code to ship to end users so that they could, I dunno, left pad numbers or something.

Listen, I don’t care what you get up to in the privacy of your own codebase. But don’t poison the well of the web with profligate client-side JavaScript.

Thursday, September 9th, 2021

Meet the Self-Hosters, Taking Back the Internet One Server at a Time

Taking the indie web to the next level—self-hosting on your own hardware.

Tired of Big Tech monopolies, a community of hobbyists is taking their digital lives off the cloud and onto DIY hardware that they control.

Monday, May 24th, 2021

Doc Searls Weblog · How the cookie poisoned the Web

Lou’s idea was just for a server to remember the last state of a browser’s interaction with it. But that one move—a server putting a cookie inside every visiting browser—crossed a privacy threshold: a personal boundary that should have been clear from the start but was not.

Once that boundary was crossed, and the number and variety of cookies increased, a snowball started rolling, and whatever chance we had to protect our privacy behind that boundary, was lost.

The Doctor is incensed.

At this stage of the Web’s moral devolution, it is nearly impossible to think outside the cookie-based fecosystem.

Sunday, March 14th, 2021

Solar Protocol

This website is hosted across a network of solar powered servers and is sent to you from wherever there is the most sunshine.

Friday, February 26th, 2021

The Future of Web Software Is HTML-over-WebSockets – A List Apart

One of the other arguments we hear in support of the SPA is the reduction in cost of cyber infrastructure. As if pushing that hosting burden onto the client (without their consent, for the most part, but that’s another topic) is somehow saving us on our cloud bills. But that’s ridiculous.

Sunday, November 29th, 2020

Rendering Spectrum | CSS-Tricks

Sensible advice from Chris:

So what’s the best rendering method? Whatever works best for you, but perhaps a hierarchy like this makes some general sense:

  1. Static HTML as much as you can
  2. Edge functions over static HTML so you can do whatever dynamic things
  3. Server generated HTML what you have to after that
  4. Client-side render only what you absolutely have to

Thursday, October 8th, 2020

The Widening Responsibility for Front-End Developers | CSS-Tricks

Chris shares his thoughts on the ever-widening skillset required of a so-called front-end developer.

Interestingly, the skillset he mentions half way through (which is what front-end devs used to need to know) really appeals to me: accessibility, performance, responsiveness, progressive enhancement. But the list that covers modern front-end dev sounds more like a different mindset entirely: APIs, Content Management Systems, business logic …the back of the front end.

And Chris doesn’t even touch on the build processes that front-end devs are expected to be familiar with: version control, build pipelines, package management, and all that crap.

I wish we could return to this:

The bigger picture is that as long as the job is building websites, front-enders are focused on the browser.

Tuesday, September 22nd, 2020

The Economics of the Front-End - Jim Nielsen’s Weblog

I do think large tech companies employ JavaScript frameworks because, amongst other things, it saves them money at their scale. And what Big Tech does trickles down in the form of default choices for many others (“they’re doing it and are insanely successful, so mimicking them can’t be a bad idea”). However, the scale at which smaller projects operate doesn’t necessarily translate to the same kind of cost savings.

Friday, July 31st, 2020

Pinboard is Eleven (Pinboard Blog)

I probably need to upgrade the Huffduffer server but Maciej nails why that’s an intimidating prospect:

Doing this on a live system is like performing kidney transplants on a playing mariachi band. The best case is that no one notices a change in the music; you chloroform the players one at a time and try to keep a steady hand while the band plays on. The worst case scenario is that the music stops and there is no way to unfix what you broke, just an angry mob. It is very scary.

Monday, July 27th, 2020

Is my host fast yet?

This is an interesting project to try to rank web hosts by performance:

Real-world server response (Time to First Byte) latencies, as experienced by real-world users navigating the web.

Friday, June 19th, 2020