Tags: testing

191

sparkline

Sunday, March 2nd, 2025

Hallucinations in code are the least dangerous form of LLM mistakes

The moment you run LLM generated code, any hallucinated methods will be instantly obvious: you’ll get an error. You can fix that yourself or you can feed the error back into the LLM and watch it correct itself.

Compare this to hallucinations in regular prose, where you need a critical eye, strong intuitions and well developed fact checking skills to avoid sharing information that’s incorrect and directly harmful to your reputation.

With code you get a powerful form of fact checking for free. Run the code, see if it works.

Tuesday, January 21st, 2025

Moving on from React, a Year Later

Many interactions are not possible without JavaScript, but that doesn’t mean we should look to write more than we have to. The server doing something useful is a requirement for building an interesting business. The client doing something is often a nice-to-have.

There’s also this:

It’s really fast

One of the arguments for a SPA is that it provides a more reactive customer experience. I think that’s mostly debunked at this point, due to the performance creep and complexity that comes in with a more complicated client-server relationship.

Friday, January 10th, 2025

Website Speed Test

Here’s a handy free tool from Calibre that’ll give your website a performance assessment.

Monday, December 16th, 2024

Choosing a geocoding provider

Yesterday when I mentioned my paranoia of third-party dependencies on The Session, I said:

I’ve built in the option to switch between multiple geocoding providers. When one of them inevitably starts enshittifying their service, I can quickly move on to another. It’s like having a “go bag” for geocoding.

(Geocoding, by the way, is when you provide a human-readable address and get back latitude and longitude coordinates.)

My paranoia is well-founded. I’ve been using Google’s geocoding API, which is changing its pricing model from next March.

You wouldn’t know it from the breathlessly excited emails they’ve been sending about it, but this is not a good change for me. I don’t do that much geocoding on The Session—around 13,000 or 14,000 requests a month. With the new pricing model that’ll be around $15 to $20 a month. Currently I slip by under the radar with the free tier.

So it might be time for me to flip that switch in my code. But which geocoding provider should I use?

There are plenty of slop-like listicles out there enumerating the various providers, but they’re mostly just regurgitating the marketing blurbs from the provider websites. What I need is more like a test kitchen.

Here’s what I did…

I took a representative sample of six recent additions to the sessions section of thesession.org. These examples represent places in the USA, Ireland, England, Scotland, Northern Ireland, and Spain, so a reasonable spread.

For each one of those sessions, I’m taking:

  • the venue name,
  • the town name,
  • the area name, and
  • the country.

I’m deliberately not including the street address. Quite often people don’t bother including this information so I want to see how well the geocoding APIs cope without it.

I’ve scored the results on a simple scale of good, so-so, and just plain wrong.

  • A good result gets a score of one. This is when the result gives back an accurate street-level result.
  • A so-so result gets a score of zero. This when it’s got the right coordinates for the town, but no more than that.
  • A wrong result gets a score of minus one. This is when the result is like something from a large language model: very confident but untethered from reality, like claiming the address is in a completely different country. Being wrong is worse than being vague, hence the difference in scoring.

Then I tot up those results for an overall score for each provider.

When I tried my six examples with twelve different geocoding providers, these were the results:

Geocoding providers
Provider USA England Ireland Spain Scotland Northern Ireland Total
Google 1111117
Mapquest 1111117
Geoapify 0110103
Here 1101003
Mapbox 11011-13
Bing 1000001
Nominatim 0000-110
OpenCage -11000-1-1
Tom Tom -1-100-11-2
Positionstack 0-10-11-1-2
Locationiq -10-100-1-3
Map Maker -10-1-1-1-1-5

Some interesting results there. I was surprised by how crap Bing is. I was also expecting better results from Mapbox.

Most interesting for me, Mapquest is right up there with Google.

So now that I’ve got a good scoring system, my next question is around pricing. If Google and Mapquest are roughly comparable in terms of accuracy, how would the pricing work out for each of them?

Let’s say I make 15,000 API requests a month. Under Google’s new pricing plan, that works out at $25. Not bad.

But if I’ve understood Mapquest’s pricing correctly, I reckon I’ll just squeek in under the free tier.

Looks like I’m flipping the switch to Mapquest.

If you’re shopping around for geocoding providers, I hope this is useful to you. But I don’t think you should just look at my results; they’re very specific to my needs. Come up with your own representative sample of tests and try putting the providers through their paces with your data.

If, for some reason, you want to see the terrible PHP code I’m using for geocoding on The Session, here it is.

Thursday, August 29th, 2024

80 / 20 accessibility · marcus.io

So my observation is that 80% of the subject of accessibility consists of fairly simple basics that can probably be learnt in 20% of the time available. The remaining 20% are the difficult situations, edge cases, assistive technology support gaps and corners of specialised knowledge, but these are extrapolated to 100% of the subject, giving it a bad, anxiety-inducing and difficult reputation overall.

Monday, May 20th, 2024

Home - Sa11y

Another handy accessibility testing tool that can be used as a bookmarklet.

Sunday, March 10th, 2024

Bookmarklets for testing your website

I’m at day two of Indie Web Camp Brighton.

Day one was excellent. It was really hard to choose which sessions to go to because they all sounded interesting. That’s a good problem to have.

I ended up participating in:

  • a session on POSSE,
  • a session on NFC tags,
  • a session on writing, and
  • a session on testing your website that was hosted by Ros

In that testing session I shared some of the bookmarklets I use regularly.

Bookmarklets? They’re bookmarks that sit in the toolbar of your desktop browser. Just like any other bookmark, they’re links. The difference is that these links begin with javascript: rather than http. That means you can put programmatic instructions inside the link. Click the bookmark and the JavaScript gets executed.

In my mind, there are two different approaches to making a bookmarklet. One kind of bookmarklet contains lots of clever JavaScript—that’s where the smart stuff happens. The other kind of bookmarklet is deliberately dumb. All they do is take the URL of the current page and pass it to another service—that’s where the smart stuff happens.

I like that second kind of bookmarklet.

Here are some bookmarklets I’ve made. You can drag any of them up to the toolbar of your browser. Or you could create a folder called, say, “bookmarklets”, and drag these links up there.

Validation: This bookmarklet will validate the HTML of whatever page you’re on.

Validate HTML

Carbon: This bookmarklet will run the domain through the website carbon calculator.

Calculate carbon

Accessibility: This bookmarklet will run the current page through the Website Accessibility Evaluation Tools.

WAVE

Performance: This bookmarklet will take the current page and it run it through PageSpeed Insights, which includes a Lighthouse test.

PageSpeed

HTTPS: This bookmarklet will run your site through the SSL checker from SSL Labs.

SSL Report

Headers: This bookmarklet will test the security headers on your website.

Security Headers

Drag any of those links to your browser’s toolbar to “install” them. If you don’t like one, you can delete it the same way you can delete any other bookmark.

Monday, March 4th, 2024

Bugs I’ve filed on browsers | Read the Tea Leaves

I think filing bugs on browsers is one of the most useful things a web developer can do.

Agreed!

Thursday, February 22nd, 2024

PageSpeed Insights bookmarklet

I’m a little obsessed with web performance. I like being able to check a page’s core web vitals quickly and easily.

Four years ago, I made a Lighthouse bookmarklet. Whatever web page you were on, when you clicked on the bookmarklet you’d get the Lighthouse results for that page. Handy!

It doesn’t work anymore. This is probably because Google are in the loop. Four years is pretty good innings for anything involving that company.

I kid (mostly). Lighthouse itself is still going strong, despite being a Google product. But the bookmarklet needs updating.

Rather than just get Lighthouse results, I figured that the full PageSpeed Insights results would be even better. If your website is in the Chrome UX report, you get to see those CrUX details too.

So here’s the updated bookmarklet:

PageSpeed Insights

Drag that up to your desktop browser’s bookmarks toolbar. Press it whenever you want to test the page you’re on.

Wednesday, August 9th, 2023

Coding prototypes

We do quite a bit of prototyping at Clearleft. There’s no better way to reduce risk than to get something in front of users as quickly as possible to test whether you’re on the right track or not.

As Benjamin said in the podcast episode on prototyping:

It’s something to look at, something to prod. And ideally you’re trying to work out what works and what doesn’t.

Sometimes the prototype is mocked up in Figma. Preferably it’s built in code—HTML, CSS, and JavaScript. Having a prototype built in the materials of the medium helps establish a plausible suspension of disbelief during testing.

Also, as Trys said in that same podcast episode:

Prototypical code isn’t production code. It’s quick and it’s often a little bit dirty and it’s not really fit for purpose in that final deliverable. But it’s also there to be inspiring and to gather a team and show that something is possible.

I can’t reiterate that enough: prototype code isn’t production code.

I’ve written about the two different mindsets before:

So these two kinds of work require very different attitudes. For production work, quality is key. For prototyping, making something quickly is what matters.

Addy recently wrote an excellent blog post on the topic of prototyping. The value of a prototype is in the insight it imparts, not the code.

It’s crucial to remember that in a prototype, the code serves merely as a medium—a way to facilitate understanding. It’s a means to an end, not the end itself. The code of a prototype is disposable and mutable. In contrast, the lessons learned from a prototype, the insights gained from user interaction and feedback, are far more durable and impactful.

This!

It can be tempting to re-use code from a prototype. I get it. It seems like a waste to throw away code and build something from scratch. But trust me—and I speak from experience here—it will take more time to wrangle prototype code into something that’s production-ready.

The problem is that quality is often invisible. Think about semantics, performance, security, privacy, and accessibility. Those matter—for production code—but they’re under the surface. For someone who doesn’t understand the importance of those hidden qualities a prototype that looks like it works seems ready to ship. It’s understandable that they’d balk at the idea of just throwing that code away and writing new code. Sometimes the suspension of disbelief that a prototype is aiming for works too well.

As is so often the case, this isn’t a technical problem. It’s a communication issue.

Monday, June 5th, 2023

Accessibility Personas

Personas are often toothless, but these accessibility personas from gov.uk are more practical and useful than most:

Each profile has a different simulation of their persona’s condition and runs the assistive technology they use to help them.

You can use these profiles to experience the web from the perspective of the personas and gain more understanding of accessibility issues.

Wednesday, May 31st, 2023

Accessibility audits for all

It’s often said that it’s easier to make a fast website than it is to keep a website fast. Things slip through. If you’re not vigilant, performance can erode without you noticing.

It’s a similar story for other invisible but important facets of your website: privacy, security, accessibility. Because they’re hidden from view, you won’t be able to see if there’s a regression.

That’s why it’s a good idea to have regular audits for performance, privacy, security, and accessibility.

I wrote about accessibility testing a while back, and how there’s quite a bit that you can do for yourself before calling in an expert to look at the really gnarly stuff:

When you commission an accessibility audit, you want to make sure you’re getting the most out of it. Don’t squander it on issues that you can catch and fix yourself. Make sure that the bulk of the audit is being spent on the specific issues that are unique to your site.

I recently did an internal audit of the Clearleft website. After writing up the report, I also did a lunch’n’learn to share my methodology. I wanted to show that there’s some low-hanging fruit that pretty much anyone can catch.

To start with, there’s keyboard navigation. Put your mouse and trackpad to one side and use the tab key to navigate around.

Caveat: depending on what browser you’re using, you might need to update some preferences for keyboard navigation to work on links. If you’re using Safari, go to “Preferences”, then “Advanced”, and tick “Press Tab to highlight each item on a web page.”

Tab around and find out. You should see some nice chunky :focus-visible styles on links and form fields.

Here’s something else that anyone can do: zoom in. Increase the magnification to 200%. Everything should scale proportionally. How about 500%? You’ll probably see a mobile-friendly layout. That’s fine. As long as nothing is broken or overlapping, you’re good.

At this point, I reach for some tools. I’ve got some bookmarklets that do similar things: tota11y and ANDI. They both examine the source HTML and CSS to generate reports on structure, headings, images, forms, and so on.

These tools are really useful, but you need to be able to interpret the results. For example, a tool can tell you if an image has no alt text. But it can’t tell you if an image has good or bad alt text.

Likewise, these tools are great for catching colour-contrast issues. But there’s a big difference between a colour-contrast issue on the body copy compared to a colour-contrast issue on one unimportant page element.

I think that demonstrates the most important aspect of any audit: prioritisation.

Finding out that you have accessibility issues isn’t that useful if they’re all presented as an undifferentiated list. What you really need to know are which issues are the most important to fix.

By the way, I really like the way that the Gov.uk team prioritises accessibility concerns:

The team puts accessibility concerns in 2 categories:

  1. Theoretical: A question or statement regarding the accessibility of an implementation within the Design System without evidence of real-world impact.
  2. Evidenced: Sharing new research, data or evidence showing that an implementation within the Design System could cause barriers for disabled people.

The team will usually prioritise evidenced issues and queries over theoretical ones.

When I wrote up my audit for the Clearleft website, I structured it in order of priority. The most important things to fix are at the start of the audit. I also used a simple scale for classifying the severity of issues: low, medium, and high priority.

Thankfully there were no high-priority issues. There were a couple of medium-priority issues. There were plenty of low-priority issues. That’s okay. That’s a pretty good distribution.

If you’re interested, here’s the report I delivered…

Accessibility audit on clearleft.com

Colour contrast

There are a few issues with the pink colour. When it’s used on a grey background, or when it’s used as a background colour for white text, the colour contrast isn’t high enough.

The SVG arrow icon could be improved too.

Recommendations

Medium priority
  • Change the pink colour universally to be darker. The custom property --red is currently rgb(234, 33, 90). Change it to rgb(210, 20, 73) (thanks, James!)
Low priority
  • The SVG arrow icon currently uses currentColor. Consider hardcoding solid black (or a very, very dark grey) instead.

Images

Alt text is improving on the site. There’s reasonable alt text at the top level pages and the first screen’s worth of case studies and blog posts. I made a sweep through these pages a while back to improve the alt text but I haven’t done older blog posts and case studies.

Recommendations

Medium priority
  • Make a sweep of older blog posts and case studies and fix alt text.
Low priority
  • Images on the contact page have alt text that starts with “A photo of…” — this is redundant and can be removed.

Headings

The site is using headings sensibly. Sometimes the nesting of headings isn’t perfect, but this is a low priority issue. For example, on the contact page there’s an h1 followed by two h3s. In theory this isn’t correct. In practice (for screen reader users) it’s not an issue.

Recommendations

Low priority
  • On the home page, “UX London 2023” should probably be h3 instead of h1.
  • On the case studies index page we’re currently using h3 headings for the industry sector (“Charities”, “Education” etc.) but these should probably not be headings at all. On the blog index page we use a class “Tags” for a similar purpose. Consider reusing that pattern on the case studies index page.
  • On the about index page, “We’re driven to be” is an h3 and the subsequent three headings are h2s. Ideally this would be reversed: a single h2 followed by three h3s.

Link text

Sometimes the same text is used for different links.

Recommendations

Low priority
  • On the home page the text “Read the case study” is re-used for multiple links. It would be better if each link were different e.g. “Read about The Natural History Museum.”

Forms

The only form on the site is the newsletter sign-up form. It’s marked up pretty well: the input has an associated label, although a visible (clickable) label would be better.

Tabbing order

The site doesn’t use JavaScript to mess with tabbing order for keyboard users. The source order of elements in the markup generally makes sense so all is good.

The focus styles are nice and clear too!

Structure

The site is using HTML landmark elements sensibly (header, nav, main, footer, etc.).

Tuesday, August 30th, 2022

Improving the information architecture of the Smart Pension member app | Design and tech | Smart – retirement, savings and financial wellbeing

Here’s a really excellent, clearly-written case study that unfortunately includes this accurate observation:

In recent years the practice of information architecture has fallen out of fashion, which is a shame as you can’t design something successfully without it. If a user can’t find a feature, it’s game over - the feature may as well not exist as far as they’re concerned.

I also like this insight:

Burger menus are effective… at hiding things.

Tuesday, June 7th, 2022

Introducing Opportunities & Experiments: Taking the Guesswork out of Performance - WebPageTest Blog

WebPageTest just got even better! Now you can mimic the results of what would’ve previously required actually shipping, like adding third-party scripts, switching from a client-rendered to a server-rendered architecture and other changes that could potentially have a big effect on performance. Now you can run an experiment to get the results before actual implementation.

Monday, June 6th, 2022

Paper Prototype CSS

A stylesheet to imitate paper—perfect for low-fidelity prototypes that you want to test.

Saturday, April 30th, 2022

CSS { In Real Life } | My Browser Support Strategy

This is a great succinct definition of progressive enhancement:

Progressive enhancement is a web development strategy by which we ensure that the essential content and functionality of a website is accessible to as many users as possible, while providing an improved experience using newer features for users whose devices are capable of supporting them.

Tuesday, January 11th, 2022

The monoculture web

Firefox as the asphyxiating canary in the coalmine of the web.

Thursday, October 7th, 2021

My Challenge to the Web Performance Community — Philip Walton

I’ve noticed a trend in recent years—a trend that I’ve admittedly been part of myself—where performance-minded developers will rebuild a site and then post a screenshot of their Lighthouse score on social media to show off how fast it is.

Mea culpa! I should post my CrUX reports too.

But I’m going to respectfully decline Phil’s advice to use any of the RUM analytics providers he recommends that require me to put another script element on my site. One third-party script is one third-party script too many.

Tuesday, September 14th, 2021

Benjamin Parry~ Writing ~ Engineering a better design test ~ @benjaminparry

It sometimes feels like we end up testing the limitations of our tools rather than the content and design itself.

What Benjamin found—and I heartily agree—is that HTML prototypes give you the most bang for your buck:

At the point of preparing for usability testing, it seemed ludicrous to move to any prototyping material other than the one we were already building in. The bedrock of the web: HTML, CSS and Javascript.

Accessibility testing

I was doing some accessibility work with a client a little while back. It was mostly giving their site the once-over, highlighting any issues that we could then discuss. It was an audit of sorts.

While I was doing this I started to realise that not all accessibility issues are created equal. I don’t just mean in their severity. I mean that some issues can—and should—be caught early on, while other issues can only be found later.

Take colour contrast. This is something that should be checked before a line of code is written. When designs are being sketched out and then refined in a graphical editor like Figma, that’s the time to check the ratio between background and foreground colours to make sure there’s enough contrast between them. You can catch this kind of thing later on, but by then it’s likely to come with a higher cost—you might have to literally go back to the drawing board. It’s better to find the issue when you’re at the drawing board the first time.

Then there’s the HTML. Most accessibility issues here can be caught before the site goes live. Usually they’re issues of ommission: form fields that don’t have an explicitly associated label element (using the for and id attributes); images that don’t have alt text; pages that don’t have sensible heading levels or landmark regions like main and nav. None of these are particularly onerous to fix and they come with the biggest bang for your buck. If you’ve got sensible forms, sensible headings, alt text on images, and a solid document structure, you’ve already covered the vast majority of accessibility issues with very little overhead. Some of these checks can also be automated: alt text for images; labels for inputs.

Then there’s interactive stuff. If you only use native HTML elements you’re probably in the clear, but chances are you’ve got some bespoke interactivity on your site: a carousel; a mega dropdown for navigation; a tabbed interface. HTML doesn’t give you any of those out of the box so you’d need to make your own using a combination of HTML, CSS, JavaScript and ARIA. There’s plenty of testing you can do before launching—I always ask myself “What would Heydon do?”—but these components really benefit from being tested by real screen reader users.

So if you commission an accessibility audit, you should hope to get feedback that’s mostly in that third category—interactive widgets.

If you get feedback on document structure and other semantic issues with the HTML, you should fix those issues, sure, but you should also see what you can do to stop those issues going live again in the future. Perhaps you can add some steps in the build process. Or maybe it’s more about making sure the devs are aware of these low-hanging fruit. Or perhaps there’s a framework or content management system that’s stopping you from improving your HTML. Then you need to execute a plan for ditching that software.

If you get feedback about colour contrast issues, just fixing the immediate problem isn’t going to address the underlying issue. There’s a process problem, or perhaps a communication issue. In that case, don’t look for a technical solution. A design system, for example, will not magically fix a workflow issue or route around the problem of designers and developers not talking to each other.

When you commission an accessibility audit, you want to make sure you’re getting the most out of it. Don’t squander it on issues that you can catch and fix yourself. Make sure that the bulk of the audit is being spent on the specific issues that are unique to your site.