# Weekly traces hour for agent quality

> Review real AI traces in a standing weekly session and turn the sharpest failures and good catches into eval cases.

- Canonical HTML: https://growth.iangoh.com/growth-ideas/weekly-traces-hour-for-agent-quality/
- Source: [newsletter.posthog.com](https://newsletter.posthog.com/p/the-golden-rules-of-agent-first-product)
- GrowthDex source hub: [PostHog Newsletter](/sources/posthog-newsletter-newsletter-posthog-com-2/)
- Last checked: 2026-05-26
- Rarity: rare
- Budget: low
- Channels: Product, Retention, Support
- Stages: ai products, retention, quality, evaluation
- Key metric: PostHog reviews real rated agent sessions weekly and uses those findings to create future eval cases.

## Why this can grow

Agent failures are often too specific and situational to notice through synthetic tests alone. A recurring trace review forces the team to watch what users actually asked, where the model drifted, and which interventions felt helpful. Converting those observations into eval cases compounds the learning instead of letting each debugging session disappear into chat history.

## Ian's take

From scaling consumer platforms across MENA and Southeast Asia, my default is to distrust growth work that only looks good in a slide. My bias is to treat this as a small market test first. Make the audience narrow, make the promise concrete, and let the first real response decide whether it deserves more work. For retention, I would watch the second and third use, not just the first click. A tactic is real when it changes a habit. For this tactic, I would watch one clear growth signal before putting more time or budget behind it.

## Action plan

1. Define one narrow startup segment where weekly traces hour for agent quality can create a measurable lift.
2. Turn the tactic into one offer, page, campaign, or workflow for the Product and Retention channel.
3. Use the evidence from newsletter.posthog.com to set the first version of the message, format, and audience.
4. Launch a small test for 7 to 14 days with one success metric: one measurable growth signal.
5. Review the result, keep the winning message, remove weak variants, and turn the learning into a repeatable growth playbook.

## Source-backed example

PostHog says the team runs a weekly traces hour, manually reviews real sessions with ratings, and then turns both bad failures and strong interventions into evals so future model or prompt changes do not regress those behaviors.

## Adjacent tactics in the same lane

- [AI response rating with follow-up context](/growth-ideas/ai-response-rating-with-follow-up-context/) - same source, 2 shared channels, 3 shared stages
- [Uncertainty, source, and progress cues in AI UI](/growth-ideas/uncertainty-source-and-progress-cues-in-ai-ui/) - same source, 2 shared channels, 2 shared stages
- [Workflow-first AI demand validation](/growth-ideas/workflow-first-ai-demand-validation/) - same source, 2 shared channels, 1 shared stage
- [Task-based model routing for AI speed](/growth-ideas/task-based-model-routing-for-ai-speed/) - same source, 2 shared channels, 1 shared stage

## Read GrowthDex essays

Browse the plain-English essay index at [GrowthDex Blog](/blog/).

## Related GrowthDex essays

- [AI products stop feeling smart when they hide their context](/blog/ai-products-stop-feeling-smart-when-they-hide-their-context/) - AI products, product-led growth, brand trust

## Advisory

If you want help turning this into a working growth system, Ian Goh offers advisory at https://iangoh.com/advisory.