Start here
AI answer engines don't read your site the way Google does. They prefer clean structure, explicit facts, and predictable markup. Here's the checklist we run through on every client site.
1. Semantic HTML
Use real headings (h1 → h2 → h3), real lists (<ul>, <ol>), real blockquotes. Don't fake them with styled divs.
2. Structured data
At minimum: Organization, WebSite with SearchAction, and Article on blog posts. Add FAQPage anywhere you have a Q&A section.
3. Concise answer paragraphs
Under every H2 that poses a question, give a one-sentence direct answer in the first paragraph. AI engines extract these verbatim.
4. llms.txt
A markdown file at /llms.txt that describes your site to AI crawlers — your sitemap for bots that aren't Googlebot.
5. Allow AI crawlers
In robots.txt, explicitly allow GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, CCBot.
6. Author attribution
Real names, real bios, Person schema. AI engines prefer citable sources with credibility signals.
7. Canonical URLs
One URL per page. No duplicates. No session-ID query strings.
8. Fast, static HTML
AI crawlers often don't execute JavaScript. If your content is only visible after hydration, it's invisible to them.
9. Speakable schema
On summary content, add Speakable schema for voice assistants.
10. Fresh sitemap
sitemap.xml updated on every deploy. Include a lastmod timestamp.
That's the foundation. We'll drill into each one in future posts.
