Skip to main content

Monitoring

Monitoring is the Monitoring tab at /<workspace>/code — a unified operational view of every repo in the workspace, so when something goes sideways you don't have to bounce between Cloud Console, Logs Explorer, Error Reporting, and your billing dashboard to reconstruct context. Repos are grouped by domain (frontend / backend / infrastructure / …) and rendered as accordions, identical in shape to the Codebase, Tests, and Deployments tabs on the same page.

📷 Screenshot: Monitoring tab with one backend repo expanded to its Overview panel.

At-a-glance status

Each repo accordion's summary carries a status chip computed at the board level: green / warning / error / not-configured / deploying. The chip's tooltip explains why (e.g. "3 unresolved errors in last 7d", "p95 latency exceeded threshold"), so you can scan a workspace of services and triage by color before opening anything.

A status chip flips to not-configured when the repo doesn't have a monitoring provider connected yet — open the Settings panel inside that repo to wire one up.

The seven panels

Expanding a repo reveals seven side-by-side panels:

  • Overview — request volume, p95 latency, 5xx rate, CPU, memory, and instance count over a sliding window (15 min by default), plotted as small sparklines next to current values.
  • Errors — unresolved exception groups with frequency, first/last seen, affected versions, and a "resolve" / "ignore" / "open agent" action row. Resolving here is the same surface as the GCP integration in Settings.
  • Logs — recent log entries filtered by severity (defaults to WARNING+). Click a row for the full structured payload, jump out to Cloud Logging for the raw query, or pin the log line into chat for an agent to investigate.
  • Uptime — uptime checks and their recent results over the last 24h, plus the SLA percentage for the window. Failed regions are called out individually so you can tell whether a blip was global or local.
  • Cost — per-service cost over the selected range, sourced from your billing export. Anomalies (sustained week-over-week increases that don't match deploy traffic) are flagged so you don't find out from finance.
  • Alerts — the alert policies attached to this service, with current firing state. Open a policy to edit thresholds; from here you can also generate a new alert from observed-but-unmonitored behavior — the agent drafts the policy, you review and apply.
  • Settings — connect the monitoring provider for this repo (GCP today, with Vercel/AWS/Azure providers stubbed for future drop-in), pick which service or resource maps to this repo, and toggle which panels are active.

Each panel fetches on expand, so opening a busy workspace doesn't pre-load everything for every repo.

Multi-cloud, one UI

Under the hood, every panel speaks to a monitoring provider through a fixed interface (getOverview, getLogs, getUptime, getCost, getAlerts, getBatchStatus). The platform ships a complete GCP provider today; Vercel, AWS, and Azure adapters slot into the same interface as they're added, with no UI changes — your team sees a consistent panel layout regardless of where each service runs.

If a panel returns not implemented for the active provider, the UI renders a clean "coming soon" placeholder rather than an error.

Incident triage with agents

The agent that owns a service watches its monitoring stream and reacts:

  • New error groups → opens an Inbox notification, attaches the relevant logs as context, and (where confidence is high) drafts a fix as a Roadmap item.
  • Spiking 5xx or latency → posts a chat message in the connected channel, links the deploy that correlates with the spike, and proposes a rollback action.
  • Cost anomalies → flags the service, surfaces the responsible request path, and suggests where to dig (oversized retries, accidental fan-out, missed cache).

Triggering the agent manually is one click on any panel — it picks up the current view (panel, time window, selected severity) as context so you don't have to repeat what you're looking at.

How Monitoring closes the loop

  • From Deployments — every deploy emits a marker on Monitoring's timelines, so a regression is anchored to a specific release.
  • To Roadmap → Errors — actionable issues triaged here promote into Roadmap items (with the originating logs / metrics attached as agent context).
  • To Discover → Analytics — sustained user-facing degradation in Monitoring (latency, 5xx) gets cross-referenced with traffic and conversion movement on Analytics, so the conversation shifts from "the graph is red" to "we lost 4% of conversions this week — here's the change that did it."