L3 Academy

Module 3: How Ingestion Works

Understand the automated pipeline from bug report to GitHub issue.

Estimated time: 15 minutes

The Pipeline

When someone reports a bug through Marker.io or creates a ticket in Notion, the ingestion service turns it into a fully standardized GitHub Issue — automatically. Here's the full flow:

External Source (Marker.io or Notion)

HMAC-SHA256 Signature Verification

Dedup Check (skip Marker-origin Notion pages)

Source-Specific Parser → NormalizedTicket

Load Repo Registry (sync.yml via GitHub raw URL)

Claude API: classify + route + rewrite

GitHub Issues API: create standardized issue

If Claude fails at any point, a fallback path creates a raw (unformatted) issue using keyword-based repo routing.

Sources

The service handles two webhook sources:

Marker.io

Marker.io is a visual bug reporting widget embedded on client websites. Users click a button, annotate a screenshot, and submit. The webhook payload includes:

  • Title and description
  • Reporter name and email
  • The URL where the bug was filed
  • Screenshots and attachments
  • Priority level

Notion Webhooks

Notion automation webhooks fire when a page is created or updated in a tracked database. The parser extracts:

  • Title from the Name property
  • Description from rich text properties
  • Type mapping: BUGBug, New FeatureFeature, ImprovementImprovement, DesignDesign
  • Priority mapping: P0–P4 from the Priority select property
  • Additional fields like Figma links and level-of-effort estimates

Marker-Origin Dedup

Here's a subtlety: Marker.io creates Notion pages when bugs are filed. Without dedup, both the Marker webhook AND the Notion page-created webhook would fire, creating duplicate issues. The dedup check inspects the Created By field — if it contains "marker" (case-insensitive), the Notion webhook is skipped.

Security: HMAC-SHA256 Verification

Every webhook request is verified using HMAC-SHA256 signatures before processing:

SourceSignature HeaderSecret
Marker.iox-hub-signature-256MARKER_WEBHOOK_SECRET
Notionx-notion-signatureNOTION_WEBHOOK_SECRET

The verification uses crypto.timingSafeEqual to prevent timing attacks. The HMAC is computed over the raw request body (not the parsed JSON) — this is critical because the signature must match the exact bytes received.

Claude Classification

This is where the magic happens. The ingestion service calls Claude (currently claude-sonnet-4-6) using tool use with a forced tool call to standardize_issue.

What Claude Decides

For every incoming ticket, Claude determines:

  1. Title — Formatted as [TYPE] PLATFORM: concise description
  2. Body — Structured markdown with ## Description, ## Acceptance Criteria, ## Implementation Guidance, and ## Steps to Reproduce (bugs only)
  3. Labels — Always includes needs-review (never ai-triaged), plus type and platform labels
  4. Repo — Which org/repo the issue belongs to, based on the registry
  5. Complexitylow, medium, or high

How Repo Routing Works

Claude receives the full repo registry from sync.yml — each entry with its repo, description, and keywords. Combined with the ticket content and any screenshots (sent as image content blocks via Claude's vision capability), Claude picks the best matching repo.

If Claude fails, a keyword fallback scans the ticket title, description, and source URL against each repo's keyword list. The repo with the most keyword hits wins.

The <!-- l3-standardized --> Marker

Every AI-standardized issue gets an invisible HTML comment appended: <!-- l3-standardized -->. This signals the standardize-issue.yml GitHub Actions workflow to skip re-processing — without it, the workflow would call Claude again on the issue it just created.

API Endpoints

The ingestion service is deployed on Vercel at l3-platform-ingestion.vercel.app:

EndpointPurpose
POST /api/webhook?source=markerReceives Marker.io webhooks
POST /api/webhook?source=notionReceives Notion webhooks
POST /api/notion-webhookDedicated Notion endpoint (no query param)
POST /api/standardizeInternal: re-standardize an existing raw issue
GET /api/healthHealth check

Observability

All Claude calls are traced via Langfuse. Each standardization creates a trace named standardize-issue with a generation named classify-and-format, recording the model, input, output, and token usage. If Langfuse keys aren't set, tracing is silently disabled.

Check Your Understanding

Why does the ingestion service always apply 'needs-review' and never 'ai-triaged'?
What prevents duplicate issues when a Marker.io bug also creates a Notion page?

Checkpoints

I understand the webhook → Claude → GitHub Issue flow
I've reviewed the ingestion API endpoints
I understand that Claude calls are traced via Langfuse

Module Assessment

Module Assessment

1. What does the ingestion service use to verify webhook authenticity?

2. What happens when Claude fails to classify an incoming ticket?

3. What does the '<!-- l3-standardized -->' marker in an issue body do?

4. How does Claude determine which repo an issue belongs to?