/ FEATURES

Every signal, every weight,
every source.

178 distinct checks across 8 categories — plus 28 schema types and 15 named AI crawlers. Every rule carries a lastReviewed date, source URLs, and an EVIDENCE: / HEURISTIC: rationale. Re-reviewed every 90 days. Nobody else shows their work like this.

Run a free scanPublic methodology repo lands in v1.1

/ WHAT IT DOES

The audit, in twelve headlines.

Find blocked AI crawlers

We test your robots.txt against 15 named AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended and more. Most sites accidentally block the bots they most want reading them.

Validate (or generate) your llms.txt

The emerging standard that tells AI assistants what is on your site and what is citable. We check it parses, flag what is missing, and generate a ready-to-ship file if you do not have one.

Catch deprecated schema

Google deprecated HowTo and restricted FAQPage to gov + health in 2023. We know the cut-off dates and your sector's rules — most validators only check the JSON parses, not whether it still earns a rich result.

Validate the schema graph

Dangling @id references, unlinked authors, Articles missing a publisher. We traverse the JSON-LD entity graph the way an LLM does, catching structural defects a field-by-field validator never sees.

Measure real Core Web Vitals

CrUX field data from real Chrome users via PageSpeed Insights — the numbers Google actually ranks on — plus live LCP / CLS / TBT measured in a headless render. Not just lab estimates.

Benchmark against your sector

Your score vs the median and top quartile of 50+ labelled real sites across 9 sectors. "73 vs the finance median of 33" lands harder than a number in a vacuum.

Audit a whole site, not one page

Deep Audit discovers your sitemap, clusters URLs into templates, samples representatives, and flags "fix once, lift every page" template bugs — plus internal-link orphans and hreflang asymmetries.

Run a real accessibility audit

axe-core injected during a headless render surfaces WCAG critical / serious / moderate violations — the same engine the Chrome DevTools Lighthouse panel uses, on every rendered scan.

Check security + TLS

Certificate expiry and protocol version, HSTS / CSP / X-Content-Type-Options headers, mixed http:// content on https:// pages, and server-stack version disclosure. The hygiene AI crawlers and users both notice.

Detect agent-auth readiness

We check /.well-known/agent-auth — the emerging standard for authenticating AI agents that act on your behalf. Adoption is early; declaring it puts you ahead of 99% of the web.

Get a priority-grouped fix roadmap

Every issue is bucketed today / this week / this quarter, with copy-paste fix code and a projected score after each bucket. You see exactly what to do first and what it is worth.

See the working, not just the score

Every weight carries a source URL and an EVIDENCE / HEURISTIC / CORPUS-CALIBRATED label. Re-reviewed every 90 days, enforced by CI. When a client asks "why -8?", you answer in seconds.

Twelve headlines, 178 individual checks. The full inventory is below, grouped by category — every rule with its severity, score impact, and one-line rationale.

/ SCHEMA MARKUP

Schema markup30 checks

29 types audited with sector-aware deprecation rules. We know HowTo's gone, FAQPage is restricted to gov+health, and SpecialAnnouncement's COVID-only window closed. We also validate nested required fields like offers.price across array-of-offers.

  • schema-article-date-not-isoclassic -3 · ai -3

    schema.org's datePublished property requires ISO 8601 format.

  • schema-article-no-descriptionclassic -2 · ai -3

    description on Article is the canonical 1-2 sentence summary LLMs preferentially quote when citing the page.

  • schema-article-no-imageclassic -4 · ai -2

    Google's Article rich-result documentation explicitly requires `image` (URL or ImageObject) for Top Stories carousel eligibility.

  • schema-article-no-main-entityclassic -1 · ai -2

    mainEntityOfPage establishes "this Article IS this URL" — disambiguates between Article entities and the WebPage that contains them.

  • schema-article-no-publisherclassic -3 · ai -3

    Google Article docs require publisher (Organization).

  • schema-article-no-publisher-logoclassic -4 · ai -2

    Google's Article structured-data documentation explicitly requires publisher.logo for rich-result eligibility.

  • schema-article-no-wordcountclassic 0 · ai -2

    Article.wordCount lets AI assistants budget summary length appropriately — "summarise this 3000-word article in 200 words" vs "summarise this 300-word articl…

  • schema-conflicting-article-typesclassic -2 · ai -3

    when a page declares MULTIPLE Article-family schemas (e.g.

  • schema-dangling-refclassic -3 · ai -5

    schema.org documents @id as the canonical identity mechanism for entity references.

  • schema-datepublished-mismatchclassic -2 · ai -3

    Article datePublished and Open Graph article:published_time should describe the same publication date — they're both consumed by AI assistants.

  • schema-empty-fieldsclassic -3 · ai -3

    schemas shipping empty string values ("name": "", "description": "") are typically templated outputs where the upstream CMS didn't populate the slot.

  • schema-http-contextclassic -1 · ai -1

    schema.org's canonical @context URL has been https://schema.org since 2017.

  • schema-id-not-uriclassic -2 · ai -3

    JSON-LD spec requires @id values to be absolute IRIs (typically URLs).

  • schema-incomplete-bonusclassic -3 · ai -4

    each schema.org type publishes recommended (non-required) properties that materially improve rich-result rendering and AI citation.

  • schema-inlanguage-mismatchclassic -2 · ai -2

    when both `<html lang="...">` and schema.inLanguage are declared, they should describe the same language.

  • schema-invalid-json-ldclassic -10 · ai -8

    malformed JSON-LD is silently dropped by Google and every other consumer — the schema effort is wasted entirely.

  • schema-no-breadcrumb-deepclassic -3 · ai -2

    BreadcrumbList replaces the URL with hierarchical context in Google's SERP and helps LLMs understand a page's position in the site taxonomy.

  • schema-no-id-on-organizationclassic -1 · ai -3

    @id on Organization establishes a stable canonical identifier that LLMs can use to deduplicate entity references across pages and sites.

  • schema-no-inlanguageclassic -1 · ai -1

    inLanguage on Article / WebPage / Course / etc.

  • schema-no-organizationclassic -3 · ai -7

    Organization (or LocalBusiness) is documented as the entity-identity schema Google uses for site names, logos, and Knowledge Panels.

  • schema-no-organization-logoclassic -3 · ai -2

    Organization.logo is what Google uses to populate the Knowledge Panel logo and what some AI assistants use for citation-branding thumbnails.

  • schema-no-speakableclassic 0 · ai 0

    speakable property on Article schema marks summary sections eligible for Google Assistant voice answers.

  • schema-no-website-search-actionclassic -4 · ai -2

    Google's sitelinks search box feature explicitly requires WebSite schema with a SearchAction potentialAction.

  • schema-noneclassic -35 · ai -25

    structured data is a documented rich-result and Knowledge Panel eligibility requirement.

  • schema-org-in-wikidataclassic 0 · ai 0

    Wikidata is the canonical knowledge-graph backing Google Knowledge Panels and LLM "what is X" answers.

  • schema-org-not-in-wikidataclassic -1 · ai -3

    HEURISTIC magnitudes: Wikidata is one of the highest-trust sources LLMs use for entity resolution.

  • schema-org-sameas-thinclassic -1 · ai -3

    sameAs links to other authoritative platforms (LinkedIn, X, Wikipedia, Crunchbase, etc.) are the primary mechanism LLMs use to triangulate an organisation's …

  • schema-sameas-unreachableclassic -2 · ai -4

    schema.org defines sameAs as "URL of a reference Web page that unambiguously indicates the item's identity." Broken sameAs URLs invalidate that signal — Goog…

  • schema-shallowclassic -2 · ai -3

    there is no published guidance on "schema depth." Sites with layered schema (Organization + page-type + BreadcrumbList) tend to earn more rich results and AI…

  • schema-unlinked-authorclassic -2 · ai -6

    Google Article guidance recommends author as Person/Organization with linkable identity.

/ HEAD + META SIGNALS

HEAD + meta signals11 checks

The 15+ HEAD-level signals Google ranks on and AI assistants read first.

  • canonical-chain-brokenclassic -6 · ai -3

    Google documents that canonical chains should terminate cleanly — page A declaring canonical=B is only valid if B declares canonical=B (or back to A).

  • canonical-no-back-referenceclassic -3 · ai -1

    when page A declares canonical=B and B has no canonical of its own, Google treats it as a soft signal — usually still consolidates correctly but the relation…

  • canonical-target-unreachableclassic -12 · ai -6

    pointing canonical at a 404 URL tells search engines this page should not be indexed at all and the destination doesn't exist either.

  • meta-description-lengthclassic -3 · ai -1

    140–160 char window is observational, not specified.

  • meta-multiple-descriptionsclassic -3 · ai -1

    HTML spec allows only one canonical meta description per page.

  • meta-no-canonicalclassic -4 · ai -2

    rel=canonical is the documented mechanism for declaring preferred URLs.

  • meta-no-descriptionclassic -10 · ai -5

    Google uses meta description as the SERP snippet a majority of the time per Search Central documentation.

  • meta-no-langclassic -2 · ai -4

    WCAG 3.1.1 and HTML5 spec both define <html lang>.

  • meta-no-titleclassic -15 · ai -8

    Google documents the <title> as the primary signal for the SERP title link.

  • meta-no-viewportclassic -8 · ai -3

    Google has used mobile-first indexing since 2019.

  • meta-title-lengthclassic -4 · ai -1

    Google does not publish a hard title length.

/ OPEN GRAPH + SOCIAL

Open Graph + social8 checks

Card metadata, image reachability, dimension hints, brand-suffix-tolerant title consistency.

  • social-no-ogclassic -5 · ai -3

    Open Graph protocol is well-defined and consumed by Slack, LinkedIn, Facebook, X.

  • social-no-og-site-nameclassic -1 · ai -1

    og:site_name appears above the OG title on Facebook / LinkedIn card layouts and is the only spec-defined slot for the site brand on a per-page basis.

  • social-no-twitterclassic -1 · ai 0

    X falls back to og:* tags.

  • social-og-image-no-dimensionsclassic -1 · ai 0

    Open Graph spec defines og:image:width + og:image:height — Facebook and LinkedIn use them to lay out card previews before the image loads.

  • social-og-image-unreachableclassic -3 · ai -1

    og:image is the asset shown in every social-platform unfurl.

  • social-og-title-mismatchclassic -1 · ai -1

    when og:title and <title> differ substantively (not just brand suffix variation), one of them is wrong.

  • social-partial-ogclassic -2 · ai -1

    lighter than fully missing — some social previews will still render.

  • social-twitter-no-imageclassic -1 · ai 0

    a declared twitter:card without twitter:image causes X to fall back to og:image, OR — for summary_large_image — render text-only without the headline visual.

/ AI CRAWLER ACCESS + LLMS.TXT

AI crawler access + llms.txt8 checks

robots.txt access for 15 named AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.), llms.txt validity, llms-full.txt detection, and crawler-blocking severity tiered by 'live citation' vs 'training only'.

  • crawler-blocked-criticalclassic 0 · ai -3

    blocking a critical (live-citation) crawler directly excludes the site from that AI service's real-time responses.

  • crawler-blocked-noncriticalclassic 0 · ai -1

    blocking a training-only crawler excludes the site from future model training.

  • llms-full-presentclassic 0 · ai 5

    small positive bonus for having both llms.txt and llms-full.txt — signals investment in AI-discoverability.

  • llms-txt-invalidclassic 0 · ai -5

    half the penalty of missing entirely — file present but malformed at least signals intent.

  • llms-txt-missingclassic 0 · ai -10

    llms.txt is an emerging community spec, not yet honored as a ranking factor by any AI provider in published docs.

  • robots-crawl-delay-longclassic -3 · ai -3

    Crawl-delay > 30s starves crawler discovery.

  • robots-nofollowclassic -10 · ai -5

    nofollow on the page level means search engines do not follow this page's outbound links — kills internal-link signal flow.

  • robots-noindexclassic -30 · ai -25

    noindex (in meta robots or X-Robots-Tag) explicitly tells search engines not to index this page.

/ CONTENT STRUCTURE + E-E-A-T

Content structure + E-E-A-T29 checks

Main-content extraction via Mozilla Readability so we don't count nav chrome. Heading hierarchy, H1 quality, alt-text, reading level, internal links, semantic HTML5, author + date for E-E-A-T.

  • content-alt-textclassic -4 · ai -2

    Google Images guidance + WCAG 1.1.1 (Non-text Content).

  • content-broken-hierarchyclassic -3 · ai -4

    WCAG 2.1 SC 1.3.1 (Info and Relationships).

  • content-empty-anchorclassic -2 · ai -2

    anchors with no text, no aria-label, and no aria-labelledby have no accessible name.

  • content-external-no-noopenerclassic -2 · ai 0

    target="_blank" without rel="noopener" allows window.opener access — a documented phishing + security risk.

  • content-generic-link-textclassic -3 · ai -3

    WCAG 2.4.4 (Link Purpose in Context) plus Google's anchor-text guidance both prescribe descriptive link text.

  • content-h1-all-capsclassic -2 · ai -2

    WCAG guidance discourages all-caps headings (screen readers read them letter-by-letter).

  • content-h1-duplicates-titleclassic -1 · ai -2

    title and H1 perform different roles (SERP label vs page topic).

  • content-h1-too-shortclassic -3 · ai -3

    H1 is the documented primary topic signal.

  • content-js-renderedclassic 0 · ai -12

    Googlebot renders JS but with documented lag and unreliability.

  • content-large-domclassic -2 · ai -1

    Lighthouse warns at 1,500 elements; flags at 3,000.

  • content-moderate-lengthclassic -2 · ai -3

    300–600 word "moderate" tier is editorial judgment.

  • content-multiple-h1classic -3 · ai -2

    HTML5 technically permits multiple H1s when scoped to <section>.

  • content-no-h1classic -6 · ai -5

    Google’s Article guidance explicitly calls out the headline, and WCAG requires heading structure.

  • content-no-h2classic -3 · ai -5

    no specific Google guidance on H2 count, but section headings are a strong AI-chunking signal — LLMs cite sub-headed sections more reliably than wall-of-text…

  • content-no-html5-semanticclassic -3 · ai -3

    HTML5 semantic elements (article, section, aside, figure, main, nav, header, footer) carry document-outline information that Readability-style content extrac…

  • content-no-main-landmarkclassic -1 · ai -3

    ARIA landmark roles.

  • content-no-noscript-fallbackclassic -3 · ai -8

    AI crawlers (GPTBot, ClaudeBot, PerplexityBot per their docs) do not execute JavaScript.

  • content-no-skip-linkclassic -1 · ai 0

    WCAG 2.4.1 (Bypass Blocks) requires a mechanism to skip repeated navigation.

  • content-orphan-pageclassic -3 · ai -2

    Google explicitly documents internal links as a crawl + ranking signal.

  • content-reading-too-hardclassic -2 · ai -3

    Flesch-Kincaid Grade > 14 (post-graduate level) is harder than 85% of the web.

  • content-thinclassic -6 · ai -10

    100–300 word range is editorial judgment, not a Google threshold.

  • content-thin-extremeclassic -12 · ai -20

    Google’s spam policy lists "thin content" as a quality issue.

  • eeat-no-authorclassic -2 · ai -6

    Google Search Quality Rater Guidelines (E-E-A-T section) explicitly call out author identity as a quality signal.

  • eeat-no-dateclassic -1 · ai -5

    Google publication-dates documentation.

  • image-legacy-formatsclassic -2 · ai 0

    WebP / AVIF reduce image bytes 25-50% vs JPEG/PNG, materially improving LCP — Google's own optimisation docs.

  • image-no-dimensionsclassic -3 · ai -1

    width/height attributes on <img> are the documented fix for image-driven Cumulative Layout Shift.

  • image-no-lazy-loadingclassic -1 · ai 0

    native loading="lazy" defers off-screen image loads, improving LCP and bandwidth.

  • image-no-srcsetclassic -2 · ai 0

    srcset / <picture> let browsers download appropriately-sized images per viewport, cutting mobile LCP and bandwidth.

  • pagination-no-rel-linksclassic -3 · ai -1

    Google deprecated rel=next/prev in 2019 for treating paginated series as a single signal, but explicitly continues to use them for crawl discovery and Bing s…

/ PERFORMANCE + CORE WEB VITALS

Performance + Core Web Vitals14 checks

CrUX field data via PageSpeed Insights (the metric Google actually ranks on). Synthetic checks for render-blocking, third-party scripts, preconnect hints. Lighthouse category scores.

  • lighthouse-poor-accessibilityclassic -3 · ai -2

    Lighthouse accessibility score < 80 indicates WCAG violations.

  • lighthouse-poor-best-practicesclassic -2 · ai 0

    Lighthouse best-practices score < 80 flags issues like console errors, deprecated APIs, browser version incompatibilities.

  • lighthouse-poor-seoclassic -4 · ai -2

    Lighthouse SEO category < 80 means at least one of Google's own canonical SEO checks failed.

  • perf-excessive-third-party-scriptsclassic -2 · ai 0

    each third-party script adds DNS lookup + connection + parse + execution overhead.

  • perf-field-averageclassic -4 · ai -1

    AVERAGE in CrUX means some users have a poor experience.

  • perf-field-no-dataclassic 0 · ai 0

    NEUTRAL: CrUX requires a minimum sample size before publishing field data.

  • perf-field-slowclassic -10 · ai -3

    CrUX field data is what Google actually uses for the page-experience ranking signal.

  • perf-ni-clsclassic -2 · ai 0

    0.1 < CLS ≤ 0.25 is "needs improvement." Light advisory penalty.

  • perf-ni-lcpclassic -3 · ai -1

    2.5s < LCP ≤ 4s is "needs improvement" per Google.

  • perf-no-preconnectclassic -2 · ai 0

    <link rel="preconnect"> and rel="dns-prefetch" eliminate the connection-setup hop for third-party origins, knocking ~100-300ms off LCP for sites that load fo…

  • perf-poor-clsclassic -5 · ai -1

    Google CWV thresholds — CLS > 0.25 is a failure.

  • perf-poor-lcpclassic -8 · ai -2

    Google page-experience documentation lists LCP > 4s as a Core Web Vital failure that affects ranking.

  • perf-poor-tbtclassic -4 · ai -2

    TBT > 600ms in synthetic tests strongly predicts a poor INP score in field data.

  • perf-render-blockingclassic -2 · ai 0

    scripts in <head> without async/defer and unmediated stylesheets block first paint until they fully load — a Core Web Vitals input.

/ SECURITY + INFRASTRUCTURE

Security + infrastructure20 checks

TLS cert expiry + protocol version, security response headers, mixed http:// content, caching strategy, theme-color, PWA manifest, hreflang return-link symmetry.

  • asset-no-faviconclassic -2 · ai -1

    a favicon is the smallest visible quality signal — appears in browser tabs, bookmarks, history, SERP results.

  • asset-no-manifestclassic -1 · ai 0

    a web app manifest enables "Install" prompts and signals modern-web-app intent.

  • asset-no-rss-feedclassic 0 · ai -2

    RSS / Atom feeds are still the highest-throughput mechanism for AI ingestion pipelines (Anthropic's web search, Perplexity, content-tracking startups) to fol…

  • asset-no-theme-colorclassic -1 · ai 0

    theme-color is rendered as the address-bar / toolbar colour on mobile browsers and as the splash background on installed PWAs.

  • cache-no-strategyclassic -2 · ai 0

    without Cache-Control AND ETag AND Last-Modified, every browser/CDN request re-downloads the full HTML — wasted bandwidth + slower repeat visits, both real C…

  • hreflang-duplicateclassic -3 · ai -1

    duplicate hreflang declarations create ambiguity; Google's docs warn against them.

  • hreflang-invalid-codeclassic -3 · ai -2

    Google requires BCP 47 codes; invalid codes are silently ignored.

  • hreflang-no-selfclassic -4 · ai -2

    Google's i18n docs explicitly require that "each language version must list itself as well as all other language versions." HEURISTIC magnitudes.

  • hreflang-no-x-defaultclassic -2 · ai -1

    x-default is recommended (not required) for the fallback when no language matches the user.

  • http-no-compressionclassic -3 · ai -1

    gzip/brotli typically shrink HTML+CSS+JS by 60-80%.

  • http-server-leaksclassic -1 · ai 0

    OWASP advises against revealing server stack (Server: Apache/2.4.41, X-Powered-By: PHP/7.4.3, X-AspNet-Version: 4.0.30319) — these directly feed targeted exp…

  • security-mixed-contentclassic -8 · ai -4

    modern browsers block mixed http:// active content (scripts, iframes) on https:// pages outright, and warn on passive content (images).

  • security-no-cspclassic -1 · ai 0

    CSP is the recommended defence against XSS.

  • security-no-hstsclassic -2 · ai 0

    HSTS prevents protocol-downgrade attacks and is required for HSTS preload list inclusion.

  • security-no-referrer-policyclassic -1 · ai 0

    explicit Referrer-Policy controls what URL info leaks to third parties.

  • security-no-xctoclassic -1 · ai 0

    X-Content-Type-Options: nosniff blocks MIME-sniffing exploits.

  • tls-cert-expiredclassic -25 · ai -20

    expired cert triggers browser full-page warning and crawlers refuse to fetch.

  • tls-cert-expiring-soonclassic -5 · ai -3

    certs expiring in < 30 days frequently slip auto-renewal.

  • tls-no-httpsclassic -20 · ai -15

    Google requires HTTPS as a baseline ranking signal since 2014.

  • tls-old-protocolclassic -3 · ai -2

    RFC 8996 deprecated TLS 1.0 and 1.1 in 2021.

/ URL HYGIENE + ERRORS

URL hygiene + errors14 checks

Soft-404 probes, canonical round-trip validation, redirect chains, pagination markup, AMP awareness, tracking-param canonicals, mailto/tel hygiene, accessibility (skip links).

  • a11y-critical-violationsclassic -8 · ai -4

    axe-core "critical" violations are WCAG 2.1 AA failures that block users with disabilities entirely (missing form labels, keyboard traps, hidden controls).

  • a11y-moderate-violationsclassic -2 · ai -1

    moderate axe violations are quality-of-life issues (heading order, list semantics).

  • a11y-serious-violationsclassic -4 · ai -3

    serious violations significantly degrade UX for assistive-tech users (insufficient contrast, missing alt text on functional images).

  • amp-version-declaredclassic 0 · ai 0

    Google deprioritised AMP as a ranking requirement in 2021.

  • error-404-as-200classic -15 · ai -8

    returning HTTP 200 for nonexistent pages ("soft 404") is explicitly flagged by Google.

  • error-404-blankclassic -2 · ai 0

    a 404 with no useful body is a UX regression (no nav, no search, no recovery path).

  • mailto-placeholderclassic -3 · ai -2

    placeholder emails (info@example.com, your-email@…) shipped to production are a recognised launch-day defect.

  • redirect-to-finalclassic -2 · ai -1

    external links pointing at the pre-redirect URL waste a small amount of PageRank flow per hop.

  • site-broken-sitemap-urlsclassic -5 · ai -3

    Google specifies that sitemaps should contain only canonical, indexable URLs.

  • site-deep-pagesclassic -2 · ai -1

    deeply-nested pages accumulate less PageRank and crawl frequency.

  • site-orphan-pagesclassic -4 · ai -3

    orphan pages (no inbound internal links) are documented as harder to discover for Googlebot and AI crawlers.

  • site-stale-sitemapclassic -2 · ai -2

    stale sitemap lastmod tells Google the page hasn't changed and can be deprioritised in re-crawl scheduling.

  • tel-malformedclassic -1 · ai 0

    RFC 3966 specifies the tel: URI format.

  • url-tracking-in-canonicalclassic -8 · ai -3

    canonical URLs must be the clean, indexable form.

/ SCHEMA TYPES

Schema types we recognise28 types

Beyond presence detection, each type has nested required-field validation, sector-aware deprecation rules, and citation-tracked status (active / deprecated / restricted). HowTo is deprecated for everyone; FAQPage is restricted to government + health sites since Aug 2023. Most validators don't track these dates.

  • AggregateRatingactive

    Must be attached to Product, LocalBusiness, Recipe, etc. Standalone does not render.

  • Articleactive

    Headline must be 110 chars or fewer. Image must be at least 1200px wide for Top Stories eligibility.

  • Bookactive

    Google retired Book Actions rich results in 2023, but Book schema remains a strong AI-citation signal for bibliographic queries.

  • BreadcrumbListactive

    Renders as breadcrumb trail under SERP result.

  • ClaimReviewactive

    Powers Google fact-check rich results and is a strong AI trust signal for news / journalism.

  • Courseactive

    Renders Course rich results carousel and is increasingly used by AI assistants to recommend learning content.

  • Datasetactive

    Increasingly important for AI search — LLMs heavily cite Dataset schema for factual data.

  • EmployerAggregateRatingactive

    Used inside JobPosting (itemReviewed → Organization) to surface employer-rating stars in Google Jobs.

  • Eventactive

    Past events auto-removed from rich results.

  • FAQPagerestricted

    Validates as correct markup, but Google only renders FAQ rich results for government and authoritative health sites.

  • FinancialProductactive

    Sector-specific schema for financial products like loans and mortgages.

  • HowTodeprecated

    Google no longer renders HowTo rich results in any sector.

  • ImageObjectactive

    Improves image search visibility. Required for some rich-result types (Recipe, Article) when image metadata matters for SERP rendering.

  • JobPostingactive

    Powers Google Jobs and is heavily cited by AI assistants for career queries. Must include validThrough to avoid stale listings being demoted.

  • LocalBusinessactive

    Knowledge Panel and map pack eligibility.

  • Movieactive

    Powers Google movie carousels and AI movie-information answers. Director is required for the rich result to render.

  • Organizationactive

    Required for Knowledge Panel eligibility and AI assistant "who is this" answers.

  • Personactive

    Strong author authority signal for AI citation.

  • PodcastEpisodeactive

    Per-episode schema. partOfSeries links back to PodcastSeries via @id — required for the entity graph to resolve.

  • PodcastSeriesactive

    Schema-level podcast metadata. Increasingly used by AI assistants to surface podcast recommendations.

  • Productactive

    Renders rich results if required fields present. Without offers.price and offers.priceCurrency, will not show price in SERP.

  • QAPagerestricted

    Renders only for community Q&A sites (Stack Overflow style).

  • Recipeactive

    Renders rich results with image, time, and rating.

  • Reviewrestricted

    Self-serving reviews (a business reviewing itself) no longer render.

  • Serviceactive

    Used by local-service businesses (plumbers, lawyers, consultants). Strong signal for AI "near me" and intent-based queries.

  • SoftwareApplicationactive

    Renders app rich results in Google and is the canonical schema for app-store-style listings cited by AI.

  • SpecialAnnouncementdeprecated

    COVID-era schema, no longer rendered.

  • VideoObjectactive

    Renders video carousel and video rich results.

/ AI CRAWLERS

AI crawlers we check robots.txt access for15 bots

We test each User-Agent against your robots.txt using a longest-match-wins parser that handles wildcards, Allow rules, and per-group Crawl-delay. Crawlers flagged as "critical" (live citation bots like Claude-Web, Perplexity-User, GPTBot) get heavier AI penalties when blocked.

  • Amazonbottraining-only
  • Applebot-Extendedtraining-only

    Like Google-Extended, this is an opt-out token rather than a separate crawler. Applebot itself still crawls for Spotlight / Siri suggestions.

  • bingbotlive-citation

    Dual-purpose: bingbot powers both traditional Bing search and Microsoft Copilot AI. Microsoft has not introduced a Copilot-specific UA at time of review. Blocking bingbot kills both surfaces simultaneously.

  • Bytespidertraining-only

    Among the most widely blocked AI crawlers — reputation for aggressive crawl rates. Listing as non-critical because it does not gate live citation surfaces.

  • CCBottraining-only

    Blocking CCBot is high-leverage — Common Crawl is upstream of many smaller model providers who do not run their own crawl.

  • ChatGPT-Userlive-citation
  • Claude-Weblive-citation

    Anthropic has used multiple browse-related UAs over time (anthropic-ai, Claude-Web, Claude-User, claude-searchbot). Re-verify on next review whether Claude-Web is still the live citation UA or whether a newer UA has supplanted it.

  • ClaudeBottraining-only
  • Google-Extendedlive-citation

    Not a separate crawler — Google-Extended is a robots.txt user-agent token that Google honors as an AI-training-only opt-out. Googlebot still crawls regardless. Marking `critical` because it gates AI Overviews / Gemini inclusion, which is increasingly the primary search surface.

  • GPTBottraining-only
  • Meta-ExternalAgenttraining-only

    Introduced by Meta in 2024 as the AI-training-specific crawler, distinct from FacebookBot (which is for link previews). Re-verify the docs URL on next review — Meta has moved this page before.

  • OAI-SearchBotlive-citation
  • Perplexity-Userlive-citation
  • PerplexityBotlive-citation
  • xAI-Botlive-citation

    xAI has not published a bot docs page as canonical as OpenAI/Anthropic. UA string xAI-Bot is what is observed in production logs and what dark-visitors community lists track. Re-verify against x.ai documentation on next review.

/ TIERS

What's in each tier

Same scanner under every tier. The differences are scope (one page vs whole site) and cadence (one-time vs ongoing). Studio adds programmatic access.

FEATURE£2
Single
£29
Deep Audit
£5/mo
Watcher
£79/mo
Studio
131 signal checks per page
Priority-grouped roadmap (today / week / quarter)
Copy-paste fix code per issue
Peer benchmarks vs sector
Pages audited1up to 101 weeklyunlimited
Sitemap discovery + clustering
Site-wide common-issue detectionvia API
Internal-link graph + orphan detectionvia API
Hreflang return-link verificationvia API
Sitemap freshness + URL health auditvia API
Per-page drill-down reportvia API
Weekly re-scanvia API
Drift alerts (≥3pt or new critical only)
12-month score history dashboard
Free re-scans (7-day window)alwaysalways
Public REST API (/api/v1/scan)
API key with per-key audit trail
Rate-limit headers (X-RateLimit-*)
Render-mode opt-in (Playwright for SPAs)alwaysopt-in
White-label PDFssoon
Magic-link auth (no password)

/ OUT OF SCOPE

What we don't do

We're an AI-search-readiness scanner — not a general SEO platform. If you need the things below, you want a different tool. (We are NOT a Screaming Frog / Ahrefs / Semrush replacement.)

  • Keyword research. Try Ahrefs, Semrush, Google Keyword Planner.
  • Rank tracking. SERPWatcher, Ahrefs Rank Tracker, AccuRanker.
  • Backlink graphs / link building. Ahrefs Site Explorer, Majestic.
  • Competitor analysis. Semrush, SimilarWeb.
  • Full-site crawl beyond 10 pages. Screaming Frog desktop (£199/year unlimited).
  • Content writing. Clearscope, MarketMuse, Frase.

/ METHODOLOGY

Three things nobody else does

The only audit that measures whether ChatGPT, Claude, and Perplexity can cite your site. Every weight cited and re-reviewed every 90 days. Every artefact below is a real file path in our open repo — click through, read it, fork it.

Cited weights

Every signal weight carries a source URL and EVIDENCE / HEURISTIC / CORPUS-CALIBRATED rationale. When a client asks "why is this -8?" you can answer in seconds. Nobody else shows their work.

signal-rules.ts · Public methodology repo lands in v1.1

Empirical calibration

A growing corpus of 50+ labelled real sites tells us which rules actually predict citation outcomes. Weights tune to the data over time — run `npm run corpus:stats` to reproduce the per-rule correlations.

corpus/ · Public methodology repo lands in v1.1

90-day audit cycle

Every rule has a lastReviewed date. `npm run audit:stale` fails CI when anything crosses 90 days unverified. Google quietly deprecates schemas; bot operators rename UAs — without an audit cycle, your tool rots. Ours can't.

audit-stale.ts + ci.yml · Public methodology repo lands in v1.1

/ READY?

Run a free scan on your site

See the verdict in 30 seconds. No signup. Full report £2 — or check out Watcher / Deep Audit / Studio if you want more.

Scan a URL