

Enterprise WhatsApp Business inbox replacing third-party BSPs with a custom Meta Cloud API integration, multi-provider AI agent (Claude/GPT-4o/Gemini), RAG knowledge base, and real-time Zoho CRM data — achieving 89.7% AI self-service rate and 96% reduction in lead response time.

Open-source adaptive HTTP reverse proxy in TypeScript/Bun, sustaining ~6,800 req/s at p99 41ms baseline. Protects distributed backends from traffic spikes and cascading failures via hybrid rate limiting, distributed circuit breaking, and real-time pressure-based throttling.

Open-source interactive CLI (npx create-saas-app-cli) that scaffolds a fully-configured Turborepo monorepo in seconds — with database, auth, queues, payments, observability, and CI/CD selected interactively at prompt time.
Client
Personal
Timeline
2025 – Present
My Role
Sole Developer & Architect
Category
OTHERS
ObservabilityOS is a production-grade, AI-native observability and incident response platform built as a Turborepo monorepo. It features a publishable TypeScript SDK (@observability-os/sdk) with batched async ingestion and retry logic, a statistical anomaly engine using Z-score deviation across 12 rolling time windows, a multi-provider LLM failover pipeline (AICredits → Claude → GPT-4o-mini → Mock heuristics) with circuit breakers and per-token cost tracking, real-time log streaming via Redis Pub/Sub, multi-channel alerting (Slack, Discord, Microsoft Teams) using the Adapter pattern, BullMQ job queues for async anomaly processing, SaaS billing via Stripe + Razorpay with plan-gated quota enforcement, PII scrubbing before data persistence, GitHub OAuth authentication, dual-layer caching (Redis + in-memory), and an auto-generated post-mortem report generator producing structured Markdown incident reports.
Publishable TypeScript SDK with batched async ingestion, configurable flush intervals, and concurrency-safe retry logic
Statistical Z-score anomaly detection engine across 12 rolling 5-minute windows for error rate, CPU, and latency
Multi-provider LLM failover pipeline (AICredits → Anthropic Claude → OpenAI GPT-4o-mini → Mock heuristics) with per-provider circuit breakers
Real-time log streaming via Redis Pub/Sub channels
Multi-channel alerting (Slack Block Kit, Discord Embeds, MS Teams MessageCard) with Adapter pattern and circuit breakers
BullMQ async job queue for anomaly processing with graceful Redis-absent fallback
Redis sliding-window rate limiting using sorted-set atomic pipelines
Dual-layer caching (in-memory L1 + Redis L2) with auto-backfill
Plan-gated multi-tenant ingest API with Redis-tracked byte quotas per billing tier
PII scrubbing at ingestion boundary (emails, JWTs, credit cards, auth headers, DB URIs)
Multi-provider SaaS billing (Stripe + Razorpay) with subscription lifecycle management
Auto-generated post-mortem reports with TTD/TTR metrics, AI narrative, and SRE checklists
Morning digest email system with per-project AI summary and health stats
Chaos simulator app for end-to-end failure scenario testing
The Challenge
Most observability tools are glorified log viewers — they lack intelligent anomaly detection, provide no automated root-cause analysis, and cost thousands per month. Building an observability SaaS that competes with Datadog requires real-time log ingestion, statistical anomaly detection, AI-powered diagnostics, and multi-channel alerting — all while keeping latency sub-100ms and maintaining graceful degradation when dependencies fail.
The Solution
Architected a full observability platform in a Turborepo monorepo with 4 layers: (1) Ingestion — Next.js API routes with SHA-256 auth, Zod validation, PII scrubbing, Redis rate limiting, and MongoDB bulk insert; (2) Anomaly — Z-score engine computing rolling standard deviations over 12×5-min windows with configurable thresholds and deploy correlation; (3) AI — 4-tier LLM failover chain with per-provider circuit breakers (CLOSED/OPEN/HALF_OPEN), AbortController timeouts, and plan-aware routing; (4) Notification — Adapter-pattern dispatcher for Slack, Discord, Teams with circuit-breaker-protected webhook delivery.
Personal
2025 – Present
Sole Developer & Architect
20 Technologies
Built and published an npm-installable TypeScript SDK (@observability-os/sdk) with concurrency-safe buffering and automatic retry
Z-score anomaly engine distinguishes genuine incidents from normal traffic fluctuations — eliminating false pager noise
Multi-provider AI failover ensures zero-downtime diagnostics even when individual LLM providers are unavailable
Redis sliding-window rate limiting with atomic sorted-set pipelines — no naive 'reset every minute' approach
Dual payment provider integration (Stripe + Razorpay) with plan-gated quota enforcement







