How do I run Lighthouse in CI/CD?

Use the treosh/lighthouse-ci-action GitHub Action or install @lhci/cli directly. Configure a lighthouserc.js file with your URLs and performance assertions, then add a GitHub Actions workflow that builds your site and runs Lighthouse CI on every pull request.

What is a good Lighthouse performance score?

A Lighthouse performance score of 90 or above is considered good. However, Lighthouse lab scores don't always reflect real-user experience. Use field data from Chrome User Experience Report (CrUX) or Real User Monitoring (RUM) alongside Lighthouse for a complete picture.

Should I use Lighthouse CI or PageSpeed Insights API?

Use Lighthouse CI for automated testing in your CI/CD pipeline — it runs locally, is fast, and blocks bad merges. Use PageSpeed Insights API when you need field data from CrUX alongside lab results. They serve different purposes and work well together.

How do I track Core Web Vitals for real users?

Use Google's web-vitals JavaScript library to measure LCP, INP, and CLS from actual user visits. Send the data to your analytics platform (Google Analytics, a custom endpoint, or a third-party RUM service) to track Core Web Vitals segmented by page, device, and geography.

Automated performance testing pipeline with Lighthouse CI running in GitHub Actions

Image credits: Google Gemini

Engineering and Development

Automated Performance Testing: Lighthouse CI, GitHub Actions & Real User Monitoring

Catch Performance Regressions Before They Ship with Automated Pipelines

Rishi Kumar Chawda · November 18, 2025 · Updated March 17, 2026 · 9 min read

performancetestingci-cdautomationweb development

Performance regressions rarely come from one bad commit. They accumulate. A new font here, an extra third-party script there, an image somebody forgot to compress. Each one passes review because each one is small.

Over weeks, the compound effect drags your Lighthouse score from 92 to 74, and when you dig into git blame looking for the culprit, there isn’t one.

The problem isn’t carelessness. It’s that performance isn’t tested the way functionality is. Nobody ships without running their test suite. But most teams still treat Lighthouse like a manual spot-check, something you remember to run the week before a big release—if that.

This article sets up the antidote: an automated pipeline with four layers. Lighthouse CI blocks bad PRs. Performance budgets fail the build. Real User Monitoring collects field data. Playwright catches interaction slowdowns. Each layer catches what the others miss.

Sounds good? Let’s dive in.

tip
This builds on my older guide to automating Lighthouse audits with Mocha and Chai. That approach still works, but Lighthouse CI has become the standard since then. Start here if you’re setting up from scratch.

Layer 1: Lighthouse CI in GitHub Actions

Lighthouse CI (@lhci/cli) is the standard tool for this job. It plays nicely with GitHub Actions, and the treosh/lighthouse-ci-action handles most of the wiring for you.

Here’s the minimal setup that runs on every PR:

name: Performance Check
on: [pull_request]
jobs:
  lighthouse:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci && npm run build
      - uses: treosh/lighthouse-ci-action@v12
        with:
          urls: |
            http://localhost:4321/
            http://localhost:4321/articles/
          budgetPath: ./budget.json
          uploadArtifacts: true

The action builds your site, spins up a server, runs Lighthouse, uploads the results as artifacts. If you set performance budgets (Layer 2), it’ll fail the check when thresholds are exceeded. No manual intervention needed.

Tuning with `lighthouserc.js`

The minimal setup works, but once you want real control, add a config file at your project root:

module.exports = {
  ci: {
    collect: {
      url: [
        'http://localhost:4321/',
        'http://localhost:4321/articles/',
        'http://localhost:4321/gallery/',
      ],
      numberOfRuns: 3,
      startServerCommand: 'npm run preview',
      startServerReadyPattern: 'Local',
    },
    assert: {
      assertions: {
        'categories:performance': ['error', { minScore: 0.9 }],
        'categories:accessibility': ['warn', { minScore: 0.9 }],
        'categories:seo': ['warn', { minScore: 0.9 }],
        'largest-contentful-paint': ['error', { maxNumericValue: 2500 }],
        'cumulative-layout-shift': ['error', { maxNumericValue: 0.1 }],
        'total-blocking-time': ['error', { maxNumericValue: 200 }],
      },
    },
    upload: {
      target: 'temporary-public-storage',
    },
  },
};

Three settings that actually matter:

numberOfRuns: 3 — CI runners are noisy, and a single fluke (CPU spike, network hiccup) can tank an otherwise fine score. Three runs let you take the median and smooth out the noise. I tested 5 runs once; the extra accuracy wasn’t worth the extra pipeline time.

error vs warn — error blocks the merge; warn just shows a notification. I set the most important performance assertions to error because they catch regressions early, and accessibility/SEO scores to warn so I still see them without blocking every PR.

temporary-public-storage — Each audit report gets a shareable URL that lasts 7 days. There’s a self-hosted Lighthouse CI server if you want persistent dashboards and history, but for most teams, temporary storage is fine.

Reference the config in your workflow:

- uses: treosh/lighthouse-ci-action@v12
  with:
    configPath: ./lighthouserc.js
    uploadArtifacts: true

Which URLs to Audit

One URL per layout type, not per page.

url: [
  'http://localhost:4321/',
  'http://localhost:4321/articles/',
  'http://localhost:4321/articles/core-web-vitals-optimization-strategies/',
  'http://localhost:4321/gallery/',
],

Homepage, listing page, detail page, gallery. Four templates, four audits. If you try to audit every URL instead, your pipeline balloons from 2 minutes to 15, and developers stop waiting for checks to finish.

Layer 2: Performance Budgets

Budgets are hard numeric limits—they fail the build when exceeded. Think of them as test assertions for your bundle instead of your code.

budget.json:

[
  {
    "path": "/*",
    "timings": [
      { "metric": "largest-contentful-paint", "budget": 2500 },
      { "metric": "total-blocking-time", "budget": 200 },
      { "metric": "cumulative-layout-shift", "budget": 0.1 }
    ],
    "resourceSizes": [
      { "resourceType": "script", "budget": 100 },
      { "resourceType": "image", "budget": 300 },
      { "resourceType": "total", "budget": 500 }
    ],
    "resourceCounts": [
      { "resourceType": "third-party", "budget": 5 }
    ]
  }
]

Timing budgets should aim for the “good” thresholds, not “poor.” LCP at 2500ms, not 4000ms. You want the pipeline to catch drift early, before your metrics slide from green to yellow.

For interaction work in lab runs, treat Total Blocking Time as a useful proxy, not a replacement for real INP. The production check still comes from field data and RUM.

Resource size budgets are in KB. 100KB for scripts is tight and deliberate for a content site—it forces intentional choices. An SPA with React might honestly need 200-300KB, but you should decide that explicitly, not wake up three months later wondering why your bundle got so fat.

Images are typically the biggest LCP culprit you’ll catch here. If you’re not already optimizing images aggressively, see Image Optimization: Responsive, Next-Gen Formats & Automation for techniques that slash image size by 40-60%.

Third-party resource count at 5 is the sneaky one—people often skip it. But setting a hard limit forces a real conversation every time someone wants to add an external script. I know a team that had two abandoned tracking libraries and a forgotten A/B testing snippet running in production, all surfaced by a third-party: 5 budget.

When a budget breaks, the error is crystal clear: LCP exceeded budget of 2500ms (actual: 3100ms). The developer fetches the Lighthouse artifact, spots what changed (usually a new image, dependency, or font), fixes it, and resubmits. No guessing, no hunting for the culprit.

Layer 3: Real User Monitoring

Lab data (Lighthouse) catches regressions before they ship. Field data tells you what’s really happening once users actually load your site.

Google’s web-vitals(opens in new tab) library measures all three Core Web Vitals from actual user visits. It is small enough that adding it is usually much cheaper than the regressions it helps you catch.

import { onLCP, onINP, onCLS } from 'web-vitals';

function sendToAnalytics(metric) {
  const body = JSON.stringify({
    name: metric.name,
    value: metric.value,
    rating: metric.rating,  // 'good', 'needs-improvement', or 'poor'
    delta: metric.delta,
    id: metric.id,
    navigationType: metric.navigationType,
  });

  // sendBeacon survives page unload; fetch with keepalive is the fallback
  if (navigator.sendBeacon) {
    navigator.sendBeacon('/api/vitals', body);
  } else {
    fetch('/api/vitals', { body, method: 'POST', keepalive: true });
  }
}

onLCP(sendToAnalytics);
onINP(sendToAnalytics);
onCLS(sendToAnalytics);

If you’re already using GA4, you can skip the custom backend and send directly to Google:

import { onLCP, onINP, onCLS } from 'web-vitals';

function sendToGA({ name, delta, value, id }) {
  gtag('event', name, {
    value: delta,
    metric_id: id,
    metric_value: value,
    metric_delta: delta,
  });
}

onLCP(sendToGA);
onINP(sendToGA);
onCLS(sendToGA);

Now you’ve got CWV data flowing into GA4 as custom events. Segment by device, country, page, connection speed, logged-in status—whatever you need to spot patterns.

For advanced RUM patterns like CDN optimization and field data segmentation strategies, see Advanced Core Web Vitals Optimization.

Why Not Just Use CrUX?

CrUX (Chrome User Experience Report) is the public field dataset behind the reporting ecosystem people rely on for CWV and SEO discussions, so it matters. But it’s got real gaps:

Chrome-only (Safari, Firefox, mobile users? Not included)
28-day lag (not real-time)
Requires minimum traffic before it even reports

Your own RUM handles what CrUX can’t: all browsers, real-time feedback, and low-traffic pages. Together, CrUX shows you the public field view while RUM shows you what your own users are actually experiencing. You genuinely need both.

Layer 4: Playwright for Interaction Testing

Lighthouse is great at measuring initial page loads. It’s terrible at measuring what happens after: user interactions. Searches, form submissions, tab switches, dynamic filtering—Lighthouse can’t see any of that.

But your Playwright tests can. If you’re already running E2E tests, adding performance assertions costs almost nothing.

const { test, expect } = require('@playwright/test');

test('article page loads within performance budget', async ({ page }) => {
  await page.goto('http://localhost:4321/articles/');
  await page.waitForLoadState('networkidle');

  const metrics = await page.evaluate(() => {
    return new Promise((resolve) => {
      new PerformanceObserver((list) => {
        const entries = list.getEntries();
        const lcp = entries.find(e => e.entryType === 'largest-contentful-paint');
        resolve({
          lcp: lcp ? lcp.startTime : null,
          cls: performance.getEntriesByType('layout-shift')
            .reduce((sum, entry) => sum + entry.value, 0),
        });
      }).observe({ type: 'largest-contentful-paint', buffered: true });
    });
  });

  expect(metrics.lcp).toBeLessThan(2500);
  expect(metrics.cls).toBeLessThan(0.1);
});

test('search interaction responds quickly', async ({ page }) => {
  await page.goto('http://localhost:4321/');

  const startTime = Date.now();
  await page.fill('[data-testid="search-input"]', 'core web vitals');
  await page.waitForSelector('[data-testid="search-results"]');
  const responseTime = Date.now() - startTime;

  // Rough INP proxy: interaction to visible result
  expect(responseTime).toBeLessThan(500);
});

Same browser session, same page navigation—you’re just adding different assertions at the end.

bonus
For the accessibility side of this setup, see Automated Accessibility Testing with axe-core, Playwright & GitHub Actions.

Putting It All Together

Here’s all four layers running in a single GitHub Actions workflow. Build once, test in parallel:

name: Quality Gate
on:
  pull_request:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'npm'
      - run: npm ci
      - run: npm run build
      - uses: actions/upload-artifact@v4
        with:
          name: build-output
          path: dist/

  lighthouse:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/download-artifact@v4
        with:
          name: build-output
          path: dist/
      - uses: treosh/lighthouse-ci-action@v12
        with:
          configPath: ./lighthouserc.js
          uploadArtifacts: true

  playwright:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'npm'
      - run: npm ci
      - run: npx playwright install --with-deps
      - uses: actions/download-artifact@v4
        with:
          name: build-output
          path: dist/
      - run: npx serve dist/ &
      - run: npx playwright test
        env:
          BASE_URL: http://localhost:3000

The build job runs once and uploads artifacts. Lighthouse and Playwright both use that same build and run in parallel (needs: build, but not dependent on each other). Your total pipeline time is roughly whichever job is slower, not the sum of both.

On a typical Astro site, this entire workflow runs in under 3 minutes. Fast enough that developers won’t skip it out of impatience.

What Each Layer Catches

Layer	What it catches	When
Lighthouse CI	Score regressions, new render-blocking resources	Pre-merge
Performance budgets	Bundle bloat, third-party creep, threshold violations	Pre-merge
CrUX	Real-world performance trends as Google sees them	Weekly (28-day lag)
RUM (`web-vitals`)	Device/geo-specific issues, all browsers, real-time	Continuous

Lighthouse alone misses field issues. CrUX alone lags by weeks. RUM alone can’t prevent regressions from shipping. Budgets alone don’t measure user experience. No single layer is enough on its own. But all four together? You have complete coverage.

bonus
For the Core Web Vitals thresholds these budgets are based on, see the Core Web Vitals in 2026 series. For the original Lighthouse automation approach (Mocha + Chai, circa 2019), the original guide still works.

The Actual Checklist

If you take one thing from this article, let it be the pipeline itself. Everything else is tuning.

Lighthouse CI running on every PR (treosh/lighthouse-ci-action@v12)
lighthouserc.js configured with numberOfRuns: 3 and error-level CWV assertions
budget.json with timing, resource size, and third-party count limits
web-vitals wired up, sending LCP, INP, CLS to your analytics
Playwright performance tests for your main interactive flows
Weekly CrUX check in Search Console (calendar reminder helps)

That’s the complete stack. Set it up once, tweak the thresholds as your site evolves, and stop stressing about when performance degraded.

Happy building.

This article complements my Core Web Vitals in 2026 series and my guide on Automating Frontend Workflows with GitHub Actions. For accessibility automation using a similar CI setup, see Automated Accessibility Testing with axe-core & Playwright.