Gooey Ops
A unified operations-monitoring platform: uptime checks, SSL certificate tracking, third-party status aggregation, and AI-assisted alert triage and incident summaries.
Highlights
- AI-assisted alert triage and automatically generated incident summaries to cut on-call noise
- Pluggable checker architecture (HTTP, TCP, DNS, SSL) behind a single dispatch interface
- Provider-based notification fan-out (email, webhook, Slack) with escalation, quiet hours, and repeat intervals
Skills
Overview#
Gooey Ops is a monitoring platform I built to consolidate the things teams usually stitch together from three or four separate tools: uptime monitoring, SSL certificate expiry tracking, aggregation of third-party service status pages, and the alerting layer that ties them together. On top of that alerting layer, it uses AI to triage incoming alerts and generate human-readable incident summaries, so on-call engineers see signal instead of a wall of raw notifications. It's an npm-workspaces monorepo built around a Fastify + Prisma + TypeScript API.
The Problem#
Operational signal is fragmented. Uptime lives in one tool, certificate expiry in another, upstream-vendor outages in a dozen status pages nobody watches, and alert routing in yet another product. Gooey Ops unifies these behind one org-scoped data model and one alerting engine.
My Role#
Founder and sole engineer — architecture, data model, checker and worker subsystems, and notification layer.
Architecture & Approach#
The system follows a clean routes → services → Prisma request flow, with services owning business logic and verifying org membership internally rather than trusting the route layer alone. Everything is org-scoped through an organization/membership model with a role hierarchy (owner > admin > member > viewer).
Two subsystems carry most of the interesting design:
- Checkers implement a single
Checkerinterface, one per protocol (HTTP, TCP, DNS, SSL), dispatched through a registry so new probe types are a single addition rather than a refactor. - Workers are polling loops with a shared shape — configurable poll interval and concurrency, an active-check counter, and a process-then-persist-then-alert cycle. They drive the checkers, transition entity status, and hand off to the notification layer.
Technical Highlights#
- AI-assisted alerting. An AI layer sits on top of the raw check results to triage incoming alerts and generate concise, human-readable incident summaries from the underlying failures — turning a noisy stream of notifications into something an on-call engineer can act on quickly.
- Provider-based notifications. Email, webhook, and Slack providers implement a common interface and are fanned out by a notification service, layered under an alert-policy model with escalation delays, repeat intervals, and quiet hours.
- Security-first auth. Argon2id password hashing, short-lived JWT access tokens with rotating refresh tokens, account lockout on repeated failures, and dedicated audit/security-event logging.
- Strict ESM + TypeScript discipline. Native ESM with required
.jsimport extensions andexactOptionalPropertyTypesenforced throughout — the kind of constraints that keep a growing codebase honest.
Skills Demonstrated#
AI/LLM integration for alert triage and incident summarization, backend systems design, concurrency and worker-pool patterns, pluggable/extensible architecture, multi-tenant authorization, and disciplined strict-TypeScript engineering.