Automated Performance Testing: Lighthouse CI, GitHub Actions & Real User Monitoring
Catch Performance Regressions Before They Ship with Automated Pipelines

Performance regressions rarely come from one bad commit. They accumulate. A new font here, an extra third-party script there, an image somebody forgot to compress. Each one passes review because each one is small.
Over weeks, the compound effect drags your Lighthouse score from 92 to 74, and when you dig into git blame looking for the culprit, there isn’t one.
The problem isn’t carelessness. It’s that performance isn’t tested the way functionality is. Nobody ships without running their test suite. But most teams still treat Lighthouse like a manual spot-check, something you remember to run the week before a big release—if that.
This article sets up the antidote: an automated pipeline with four layers. Lighthouse CI blocks bad PRs. Performance budgets fail the build. Real User Monitoring collects field data. Playwright catches interaction slowdowns. Each layer catches what the others miss.
Sounds good? Let’s dive in.
tipThis builds on my older guide to automating Lighthouse audits with Mocha and Chai. That approach still works, but Lighthouse CI has become the standard since then. Start here if you’re setting up from scratch.
Layer 1: Lighthouse CI in GitHub Actions
Lighthouse CI (@lhci/cli) is the Chrome team’s officially maintained tool. It’s solid, plays nicely with GitHub Actions, and the treosh/lighthouse-ci-action handles all the wiring for you.
Here’s the minimal setup that runs on every PR:
name: Performance Check
on: [pull_request]
jobs:
lighthouse:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm ci && npm run build
- uses: treosh/lighthouse-ci-action@v12
with:
urls: |
http://localhost:4321/
http://localhost:4321/articles/
budgetPath: ./budget.json
uploadArtifacts: trueThe action builds your site, spins up a server, runs Lighthouse, uploads the results as artifacts. If you set performance budgets (Layer 2), it’ll fail the check when thresholds are exceeded. No manual intervention needed.
Tuning with lighthouserc.js
The minimal setup works, but once you want real control, add a config file at your project root:
module.exports = {
ci: {
collect: {
url: [
'http://localhost:4321/',
'http://localhost:4321/articles/',
'http://localhost:4321/gallery/',
],
numberOfRuns: 3,
startServerCommand: 'npm run preview',
startServerReadyPattern: 'Local',
},
assert: {
assertions: {
'categories:performance': ['error', { minScore: 0.9 }],
'categories:accessibility': ['warn', { minScore: 0.9 }],
'categories:seo': ['warn', { minScore: 0.9 }],
'largest-contentful-paint': ['error', { maxNumericValue: 2500 }],
'cumulative-layout-shift': ['error', { maxNumericValue: 0.1 }],
'interactive': ['error', { maxNumericValue: 3500 }],
},
},
upload: {
target: 'temporary-public-storage',
},
},
};Three settings that actually matter:
numberOfRuns: 3 — CI runners are noisy, and a single fluke (CPU spike, network hiccup) can tank an otherwise fine score. Three runs let you take the median and smooth out the noise. I tested 5 runs once; the extra accuracy wasn’t worth the extra pipeline time.
error vs warn — error blocks the merge; warn just shows a notification. I set Core Web Vitals to error since they directly impact rankings, and accessibility/SEO scores to warn so I see them without blocking every PR.
temporary-public-storage — Each audit report gets a shareable URL that lasts 7 days. There’s a self-hosted Lighthouse CI server if you want persistent dashboards and history, but for most teams, temporary storage is fine.
Reference the config in your workflow:
- uses: treosh/lighthouse-ci-action@v12
with:
configPath: ./lighthouserc.js
uploadArtifacts: trueWhich URLs to Audit
One URL per layout type, not per page.
url: [
'http://localhost:4321/',
'http://localhost:4321/articles/',
'http://localhost:4321/articles/core-web-vitals-optimization-strategies/',
'http://localhost:4321/gallery/',
],Homepage, listing page, detail page, gallery. Four templates, four audits. If you try to audit every URL instead, your pipeline balloons from 2 minutes to 15, and developers stop waiting for checks to finish.
Layer 2: Performance Budgets
Budgets are hard numeric limits—they fail the build when exceeded. Think of them as test assertions for your bundle instead of your code.
budget.json:
[
{
"path": "/*",
"timings": [
{ "metric": "largest-contentful-paint", "budget": 2500 },
{ "metric": "interactive", "budget": 3500 },
{ "metric": "cumulative-layout-shift", "budget": 0.1 }
],
"resourceSizes": [
{ "resourceType": "script", "budget": 100 },
{ "resourceType": "image", "budget": 300 },
{ "resourceType": "total", "budget": 500 }
],
"resourceCounts": [
{ "resourceType": "third-party", "budget": 5 }
]
}
]Timing budgets should aim for the “good” thresholds, not “poor.” LCP at 2500ms, not 4000ms. You want the pipeline to catch drift early, before your metrics slide from green to yellow.
Resource size budgets are in KB. 100KB for scripts is tight and deliberate for a content site—it forces intentional choices. An SPA with React might honestly need 200-300KB, but you should decide that explicitly, not wake up three months later wondering why your bundle got so fat.
Images are typically the biggest LCP culprit you’ll catch here. If you’re not already optimizing images aggressively, see Image Optimization: Responsive, Next-Gen Formats & Automation for techniques that slash image size by 40-60%.
Third-party resource count at 5 is the sneaky one—people often skip it. But setting a hard limit forces a real conversation every time someone wants to add an external script. I know a team that had two abandoned tracking libraries and a forgotten A/B testing snippet running in production, all surfaced by a third-party: 5 budget.
When a budget breaks, the error is crystal clear: LCP exceeded budget of 2500ms (actual: 3100ms). The developer fetches the Lighthouse artifact, spots what changed (usually a new image, dependency, or font), fixes it, and resubmits. No guessing, no hunting for the culprit.
Layer 3: Real User Monitoring
Lab data (Lighthouse) catches regressions before they ship. Field data tells you what’s really happening once users actually load your site.
Google’s web-vitals library measures all three Core Web Vitals from actual user visits. It’s tiny—about 1.5KB gzipped—so there’s no excuse not to use it.
import { onLCP, onINP, onCLS } from 'web-vitals';
function sendToAnalytics(metric) {
const body = JSON.stringify({
name: metric.name,
value: metric.value,
rating: metric.rating, // 'good', 'needs-improvement', or 'poor'
delta: metric.delta,
id: metric.id,
navigationType: metric.navigationType,
});
// sendBeacon survives page unload; fetch with keepalive is the fallback
if (navigator.sendBeacon) {
navigator.sendBeacon('/api/vitals', body);
} else {
fetch('/api/vitals', { body, method: 'POST', keepalive: true });
}
}
onLCP(sendToAnalytics);
onINP(sendToAnalytics);
onCLS(sendToAnalytics);If you’re already using GA4, you can skip the custom backend and send directly to Google:
import { onLCP, onINP, onCLS } from 'web-vitals';
function sendToGA({ name, delta, value, id }) {
gtag('event', name, {
value: delta,
metric_id: id,
metric_value: value,
metric_delta: delta,
});
}
onLCP(sendToGA);
onINP(sendToGA);
onCLS(sendToGA);Now you’ve got CWV data flowing into GA4 as custom events. Segment by device, country, page, connection speed, logged-in status—whatever you need to spot patterns.
For advanced RUM patterns like CDN optimization and field data segmentation strategies, see Advanced Core Web Vitals Optimization.
Why Not Just Use CrUX?
CrUX (Chrome User Experience Report) is the field data Google uses for ranking, so it matters. But it’s got real gaps:
- Chrome-only (Safari, Firefox, mobile users? Not included)
- 28-day lag (not real-time)
- Requires minimum traffic before it even reports
Your own RUM handles what CrUX can’t: all browsers, real-time feedback, and works on low-traffic pages too. Together: CrUX shows you what Google sees for ranking purposes; RUM shows you what your actual users experience. You genuinely need both.
Layer 4: Playwright for Interaction Testing
Lighthouse is great at measuring initial page loads. It’s terrible at measuring what happens after: user interactions. Searches, form submissions, tab switches, dynamic filtering—Lighthouse can’t see any of that.
But your Playwright tests can. If you’re already running E2E tests, adding performance assertions costs almost nothing.
const { test, expect } = require('@playwright/test');
test('article page loads within performance budget', async ({ page }) => {
await page.goto('http://localhost:4321/articles/');
await page.waitForLoadState('networkidle');
const metrics = await page.evaluate(() => {
return new Promise((resolve) => {
new PerformanceObserver((list) => {
const entries = list.getEntries();
const lcp = entries.find(e => e.entryType === 'largest-contentful-paint');
resolve({
lcp: lcp ? lcp.startTime : null,
cls: performance.getEntriesByType('layout-shift')
.reduce((sum, entry) => sum + entry.value, 0),
});
}).observe({ type: 'largest-contentful-paint', buffered: true });
});
});
expect(metrics.lcp).toBeLessThan(2500);
expect(metrics.cls).toBeLessThan(0.1);
});
test('search interaction responds quickly', async ({ page }) => {
await page.goto('http://localhost:4321/');
const startTime = Date.now();
await page.fill('[data-testid="search-input"]', 'core web vitals');
await page.waitForSelector('[data-testid="search-results"]');
const responseTime = Date.now() - startTime;
// Rough INP proxy: interaction to visible result
expect(responseTime).toBeLessThan(500);
});Same browser session, same page navigation—you’re just adding different assertions at the end.
bonusFor the accessibility side of this setup, see Automated Accessibility Testing with axe-core, Playwright & GitHub Actions.
Putting It All Together
Here’s all four layers running in a single GitHub Actions workflow. Build once, test in parallel:
name: Quality Gate
on:
pull_request:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci
- run: npm run build
- uses: actions/upload-artifact@v4
with:
name: build-output
path: dist/
lighthouse:
needs: build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/download-artifact@v4
with:
name: build-output
path: dist/
- uses: treosh/lighthouse-ci-action@v12
with:
configPath: ./lighthouserc.js
uploadArtifacts: true
playwright:
needs: build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci
- run: npx playwright install --with-deps
- uses: actions/download-artifact@v4
with:
name: build-output
path: dist/
- run: npx serve dist/ &
- run: npx playwright test
env:
BASE_URL: http://localhost:3000The build job runs once and uploads artifacts. Lighthouse and Playwright both use that same build and run in parallel (needs: build, but not dependent on each other). Your total pipeline time is roughly whichever job is slower, not the sum of both.
On a typical Astro site, this entire workflow runs in under 3 minutes. Fast enough that developers won’t skip it out of impatience.
What Each Layer Catches
| Layer | What it catches | When |
|---|---|---|
| Lighthouse CI | Score regressions, new render-blocking resources | Pre-merge |
| Performance budgets | Bundle bloat, third-party creep, threshold violations | Pre-merge |
| CrUX | Real-world performance trends as Google sees them | Weekly (28-day lag) |
RUM (web-vitals) | Device/geo-specific issues, all browsers, real-time | Continuous |
Lighthouse alone misses field issues. CrUX alone lags by weeks. RUM alone can’t prevent regressions from shipping. Budgets alone don’t measure user experience. No single layer is enough on its own. But all four together? You have complete coverage.
bonusFor the Core Web Vitals thresholds these budgets are based on, see the Core Web Vitals in 2026 series. For the original Lighthouse automation approach (Mocha + Chai, circa 2019), the original guide still works.
The Actual Checklist
If you take one thing from this article, let it be the pipeline itself. Everything else is tuning.
- Lighthouse CI running on every PR (
treosh/lighthouse-ci-action@v12) lighthouserc.jsconfigured withnumberOfRuns: 3anderror-level CWV assertionsbudget.jsonwith timing, resource size, and third-party count limitsweb-vitalswired up, sending LCP, INP, CLS to your analytics- Playwright performance tests for your main interactive flows
- Weekly CrUX check in Search Console (calendar reminder helps)
That’s the complete stack. Set it up once, tweak the thresholds as your site evolves, and stop stressing about when performance degraded.
Happy building.
This article complements my Core Web Vitals in 2026 series and my guide on Automating Frontend Workflows with GitHub Actions. For accessibility automation using a similar CI setup, see Automated Accessibility Testing with axe-core & Playwright.





