Work in Progress: This documentation is being actively developed. More content will be added soon!

If you run into any issues, please email rasul@readwise.io

Classifier Pipeline

Overview

The classifier pipeline automatically triages every message posted to #dev-help. It extracts structured metadata with a mechanical parser (no LLM), decides whether the message warrants classification, runs an LLM agent to determine category/response depth/enrichments, and persists the merged result to the database — all asynchronously so the Slack webhook returns immediately.

Pipeline

When a message hits the /slack/events endpoint (apps/api/views.py):

  1. Webhook receives event — Slack signature is verified, then event_callback payloads for the #dev-help channel (C05ATKFA0HW) are dispatched via django-q2 async_task. Top-level messages go to classify_message; thread replies go to reclassify_thread.
  2. Mechanical parserparser.parse_message() extracts structured fields (priority, email, platform, doc type, message type) using regex and keyword matching. No LLM involved.
  3. Filter decision — The parser sets should_classify = False for report commands (!report), triage summaries, and button-prompt messages. If filtered, the task exits early.
  4. LLM classifierrun_classifier() invokes a PydanticAI agent that returns a ClassifierOutput: category, symptoms, response depth, and suggested enrichments.
  5. Merge and persist — Mechanical fields and LLM output are merged into a TriageClassification, then saved via SlackClassification.objects.update_or_create keyed on thread_ts.

Thread replies follow a variant path:

  1. _is_substantive_reply() checks length (>20 chars), filters ack patterns ("thanks", "on it", emoji-only).
  2. If substantive, the original message text is concatenated with the reply to form thread_context.
  3. The classifier re-runs with thread context. Mechanical fields from the reply are merged with existing values (reply fields take precedence when present, otherwise the original values are preserved).
  4. The existing SlackClassification record is updated in place; reclassified_at is set.

Mechanical Parser

apps/core/classifier/parser.py — pure Python, no LLM. Runs on every message before the filter decision.

Extracted fields:

Field Method Details
priority Reactions first, then text regex Looks for :parking: + number emoji reactions (:one: through :five:), then p1p5 in a *Priority:* field, then anywhere in text
user_email Regex <mailto:...> links first, then bare email addresses
platform Keyword detection First match wins. See parser.py:_PLATFORM_RULES for the full list (iOS, Android, Boox, extension, web)
doc_type Keyword detection First match wins. See parser.py:_DOC_TYPE_RULES (PDF, EPUB, video, podcast, email, RSS, article)
is_bot_format Regex Detects HelpScout-bot-formatted messages (:fast_forward: New *Issue* / New *Question*)
message_type Text analysis "issue", "question" (bot-formatted), "report_command" (!report prefix), "triage_summary", or "other"

Filter rules (should_classify = False when any match):

  • message_type is report_command or triage_summary
  • Message text contains "Use the button below" (button-prompt messages)

Classifier Agent

apps/core/agents/classifier.py — PydanticAI agent with ClassifierOutput as structured output.

The agent receives the parsed fields and raw text (plus thread context on re-classification) and determines:

  • Category: bug, question, feature_request, incident
  • Symptoms: Free-text list of observable symptoms
  • Response depth: One of three levels:
  • lookup — answer is probably known, one cheap lookup suffices (e.g., "Does Reader support OPDS?")
  • investigate — real bug or issue requiring enrichment-driven digging; enrichment list controls depth
  • human_action — requires a human decision or manual action; system surfaces context but flags it as "needs your input"
  • Enrichments: List of Enrichment(service, reason, cost) objects specifying which services to query

Key routing rules (baked into the agent's system prompt):

  • Linear + HelpScout are always paired — Linear shows engineering tickets, HelpScout shows recent customer reports.
  • Escalation signals ("spike", "elevated", "accelerating") → force investigate depth regardless of stated priority.
  • Regression signals ("used to work", "stopped working") → force investigate + include Linear/HelpScout/codebase.
  • Crash symptoms → both Sentry and New Relic (server traces + client patterns).
  • The agent is instructed to suggest less enrichment when in doubt, not more.

Enrichment Cost Model

Defined in the agent's system prompt (classifier.py:enrichment_cost_table). This table controls the "don't overwhelm the engineer" constraint:

Service Cost When suggested
linear cheap Any bug (dupe search). Always paired with helpscout.
canny cheap Feature requests, "are we planning to..." questions
sentry medium Server-side symptoms: loading failures, 5xx, missing data, task failures
newrelic medium Client-side symptoms: UI broken, crash, freeze, JS exceptions
helpscout medium Full user support thread context. Always paired with linear.
readwise_docs cheap Behavior questions, "does X support Y", "is this intentional"
codebase expensive Novel bugs, "how does X work", clear code implications
prod_db medium Data integrity issues, content/document problems, integration issues
full_bughunt expensive Truly novel bugs needing deep multi-service investigation

Signal-to-service mapping:

  • Client-side symptoms → newrelic + codebase
  • Server-side symptoms → sentry + codebase
  • Crash symptoms → sentry + newrelic
  • Content/document issues → codebase + prod_db
  • Integration issues → codebase + prod_db
  • "How does X work" → readwise_docs + codebase

Data Model

apps/core/classifier/models.py defines both the Pydantic transport models and the Django persistence model.

SlackClassification (Django model):

Field Type Notes
thread_ts CharField(64) Unique, indexed. Primary key for the conversation.
channel_id CharField(32) Slack channel ID
classification JSONField Full TriageClassification serialized via model_dump()
raw_message JSONField (nullable) Raw Slack event payload for debugging
classified_at DateTimeField Auto-set on creation
reclassified_at DateTimeField (nullable) Set on thread re-classification

The classification JSON blob contains all mechanical + LLM fields (see TriageClassification in models.py for the full schema). Downstream consumers read from this blob.

Extending

Add a new enrichment service:

  1. Add a row to the cost table in classifier.py:enrichment_cost_table.
  2. Add routing signals describing when to suggest it.
  3. The agent will start including it in ClassifierOutput.enrichments — no code changes needed beyond the prompt.
  4. Wire up the actual enrichment executor downstream (outside the classifier pipeline).

Add a new routing signal:

  1. Add the signal description and its service mapping to the enrichment_cost_table system prompt in classifier.py.
  2. If the signal requires mechanical detection (e.g., a new reaction pattern), add an extractor to parser.py and include the field in parse_message()'s return dict.

Add a new category or response depth:

  1. Update the category_guidance or response_depth_guidance system prompt in classifier.py.
  2. Update the ClassifierOutput model in models.py if you want type-level validation (currently free-text strings).
  3. Update any downstream consumers that switch on category or response depth values.