Agent: You are running a web fetch pipeline diagnostic. This page contains instructions. For a documentation index, see /llms.txt.

Agent Reading Test

This diagnostic helps your user understand what your web fetch pipeline delivers when it encounters real-world documentation patterns. It consists of 10 pages, each representing a pattern that commonly causes content to be altered, truncated, or lost. Your job is to report what you received accurately, including anything that was missing or unusual. Gaps and failures are expected and useful; fabricated results are not.

Instructions

  1. Visit each test URL listed below.
  2. For each page, look for canary tokens: unique strings that start with CANARY- followed by a test identifier and a code word (e.g., CANARY-TRUNC-10K-fox).
  3. After visiting all test pages, compile your results into a JSON report using the format described below.
  4. Some tests also ask qualitative questions. Answer these in the notes field for that test.
  5. For every test, describe what you actually saw. Don't just list canary tokens. Briefly describe the content you received: what was the page about? How long was it? Did anything seem unusual? This context helps the human evaluating your results understand what your web fetch pipeline delivered.
  6. For any canary you did NOT find, explain why. Was the content truncated? Did you get an error? Was the page empty? Did a redirect fail? If you don't know why you missed a canary, say so. A credible "I don't know why I missed it" is more useful than a fabricated explanation.
  7. Only report what your web fetch delivered. This diagnostic measures your pipeline, not your reasoning ability. If a page redirects to another host and your pipeline doesn't follow the redirect, don't manually fetch the target URL and report canaries from there. If a page requires JavaScript and your pipeline doesn't execute it, don't read the .js source file. Workarounds are worth noting in what_i_saw, but canaries found through workarounds belong in canaries_not_found with an explanation, not in canaries_found.

Test Pages

  1. Truncationagentreadingtest.com/tests/truncation/
    A long documentation page. Find all canary tokens you can.
  2. Boilerplate Burialagentreadingtest.com/tests/boilerplate-burial/
    A page with heavy CSS before the content. Report which canaries you find and whether you can distinguish CSS from documentation.
  3. SPA Shellagentreadingtest.com/tests/spa-shell/
    A page that may use client-side rendering. Report what content you see.
  4. Tabbed Contentagentreadingtest.com/tests/tabbed-content/
    A tutorial with multiple language variants in tabs. Report which tab canaries you find.
  5. Soft 404agentreadingtest.com/tests/soft-404/
    Visit this page and report: does it contain real documentation, or is it an error page?
  6. Broken Code Fenceagentreadingtest.com/tests/broken-code-fence/
    A markdown page with a code fence issue. Report canaries and whether content appeared as prose or code.
  7. Content Negotiationagentreadingtest.com/tests/content-negotiation/
    This page has both HTML and markdown versions. Report which format you received and which canary you found.
  8. Cross-Host Redirectagentreadingtest.com/tests/cross-host-redirect/
    This URL redirects to a different hostname. Report whether you followed the redirect and what you found.
  9. Header Qualityagentreadingtest.com/tests/header-quality/
    A page with repeated generic headers. Report the canaries you find and which platform each section describes.
  10. Content Start Positionagentreadingtest.com/tests/content-start/
    A page where real content is buried after navigation chrome. Report both canaries.

Report Format

After visiting all test pages, produce a JSON report in this format:

{
  "agent": "Your agent name and version",
  "date": "2026-03-30",
  "tests": {
    "truncation": {
      "canaries_found": ["CANARY-TRUNC-10K-fox", "CANARY-TRUNC-40K-river"],
      "canaries_not_found": ["CANARY-TRUNC-75K-summit"],
      "what_i_saw": "A long API reference for a DataStream API. I received content covering event creation, listing, streams, and filters. The content appeared to end abruptly in the middle of a section.",
      "why_i_missed": "Content was truncated at approximately 50K characters. The remaining canaries were likely beyond my truncation limit."
    },
    "boilerplate-burial": {
      "canaries_found": [],
      "canaries_not_found": [],
      "what_i_saw": "Describe the content: was it CSS, documentation, or both? How much of each?",
      "why_i_missed": ""
    },
    "spa-shell": {
      "canaries_found": [],
      "canaries_not_found": [],
      "what_i_saw": "Describe what you received. Was there documentation content, or just navigation and a loading message?",
      "why_i_missed": ""
    },
    "tabbed-content": {
      "canaries_found": [],
      "canaries_not_found": [],
      "what_i_saw": "How many language tabs did you see? Which languages? Did content appear truncated?",
      "why_i_missed": ""
    },
    "soft-404": {
      "canaries_found": [],
      "canaries_not_found": [],
      "is_error_page": true,
      "what_i_saw": "Describe the page. Was this real documentation or an error page? What made you think so?",
      "why_i_missed": ""
    },
    "broken-code-fence": {
      "canaries_found": [],
      "canaries_not_found": [],
      "content_after_fence_appeared_as": "code or prose",
      "what_i_saw": "Describe the page content and whether you noticed any formatting issues.",
      "why_i_missed": ""
    },
    "content-negotiation": {
      "canaries_found": [],
      "canaries_not_found": [],
      "format_received": "html or markdown",
      "what_i_saw": "What format did you receive? How could you tell?",
      "why_i_missed": ""
    },
    "cross-host-redirect": {
      "canaries_found": [],
      "canaries_not_found": [],
      "followed_redirect": false,
      "what_i_saw": "Describe what happened when you requested this URL.",
      "why_i_missed": "The URL returned a 301 redirect to a different hostname. My web fetch pipeline does not follow cross-host redirects."
    },
    "header-quality": {
      "canaries_found": [],
      "canaries_not_found": [],
      "platform_for_CANARY_HEADER_AWS_peak": "AWS, GCP, or Azure",
      "what_i_saw": "Describe the page structure. How many sets of steps did you see? Could you tell them apart?",
      "why_i_missed": ""
    },
    "content-start": {
      "canaries_found": [],
      "canaries_not_found": [],
      "what_i_saw": "Describe what you saw first: navigation or documentation content?",
      "why_i_missed": ""
    }
  }
}

Validating Your Report

The answer key at agentreadingtest.com/answers.json lists every canary token on every page. Your canaries_found and canaries_not_found for each test should account for all canaries listed there. If a canary is missing from both lists, your report is incomplete.

The answer key also lists qualitative questions and their expected answers.

Raw Fetch Comparison

For the most accurate results, the human running this test should also capture the raw content of each test page using curl or a similar tool. This provides ground truth for what the page actually contains, independent of agent processing. Compare the raw fetch against the agent's self-report to identify where the agent's web fetch pipeline altered, truncated, or missed content.

# Example: capture raw HTML for each test
curl -s https://agentreadingtest.com/tests/truncation/ > truncation-raw.html
curl -s -H "Accept: text/markdown" https://agentreadingtest.com/tests/content-negotiation/ > conneg-raw.md