The quality layer
your AI is missing
33 auto-routing skills. Multi-agent pipelines. Code quality gates. Test quality gates. Stack-aware rules. Two commands to install.
Every feature goes through
a 10-agent pipeline
Not prompts. Not templates. Agents that read your code, verify their claims, and refuse to proceed without evidence.
zuvo:brainstorm
Three agents explore your codebase, research best practices, and analyze the problem before you write a single line.
zuvo:plan
Four agents analyze architecture, select patterns, review testability, and decompose into bite-sized TDD tasks.
zuvo:execute
Each task: RED test, GREEN code, spec review, quality review. Two independent reviewers verify every change.
33 skills. Zero manual routing.
Say what you need. Zuvo's meta-skill router matches your intent to the right skill automatically. Click any skill to see details, flags, and when to use it.
Three agents (Code Explorer, Domain Researcher, Business Analyst) run in parallel to understand your codebase, research best practices, and map the problem space. Produces an approved design specification before any code is written.
Four agents (Architect, Tech Lead, QA Engineer, Team Lead) decompose an approved spec into ordered TDD tasks with exact code targets and verification commands. Each task follows RED-GREEN-REFACTOR.
Implements plan tasks sequentially. Each task: Implementer writes failing test then code, Spec Reviewer checks alignment with spec, Quality Reviewer scores CQ1-CQ22 and Q1-Q17 with evidence. Critical gate = 0 sends the task back for correction.
Creates an isolated git worktree for branch-safe development. CREATE mode sets up worktree with smart directory selection. FINISH mode offers merge, PR, or cleanup options with safety verification.
Six-step protocol for processing code review feedback: understand the comment, verify against current code, decide fix-or-pushback, implement with evidence. Prevents blind agreement with reviewer suggestions.
Runs blast radius and duplication analysis in parallel, then TDD implementation with CQ/Q quality gates. Designed for small features with clear scope that don't need the full pipeline.
Structured code review with parallel audit agents. Examines uncommitted changes, staged diffs, commit ranges, or specific paths. Produces a tiered report (MUST-FIX / RECOMMENDED / NIT) with confidence scores, then optionally applies fixes with verification.
Four-phase workflow: Evaluate (analyze scope), Test (verify baseline), Act (make changes), Prove (verify nothing broke). Resumable CONTRACT tracks state across sessions. Batch mode processes queued files.
Systematic bug investigation: reproduce, narrow, diagnose, fix, verify. Produces a structured report with root cause analysis and regression test. Optional --regression flag triggers git bisect to find the breaking commit.
Scans coverage gaps, classifies production code into 11 categories (VALIDATOR, SERVICE, CONTROLLER, HOOK, PURE, COMPONENT, GUARD, API-CALL, ORCHESTRATOR, STATE-MACHINE, ORM/DB), selects test patterns per type, writes tests with Q1-Q17 quality gates enforced.
Discovers routes in your app, scores user flows by criticality (auth, CRUD, payment), generates .spec.ts files with page objects and quality gates. Code-first with optional live browser validation.
Detects systematic anti-patterns across your test suite, then fixes one pattern at a time with full production context. Avoids scattered one-off fixes by targeting patterns holistically.
Batch audit of test files against Q1-Q17 quality gates and AP1-AP26 anti-patterns. Detects orphan tests, phantom mocks, untested public methods. Tiered output (A/B/C/D) with critical gate enforcement.
Measures baseline timing, audits runner configuration against TP1-TP17 checklist, identifies the slowest tests, and produces an impact-ranked action plan. Verify mode compares against saved baseline.
Application security audit covering OWASP Top 10, injection, XSS, SSRF, auth/authz, multi-tenant isolation, secrets, headers, dependencies, business logic, and infrastructure. Uses Sentry 3-tier confidence model for finding severity. Dual scoring: static posture + runtime exploitability.
Hybrid penetration testing across 7 dimensions. Source-to-sink tracing for injection paths, exploit verification against running targets, CMS-specific checks, runtime configuration analysis. Uses Shannon methodology with Sentry confidence filtering.
Batch audit of production files against CQ1-CQ22 quality gates and CAP1-CAP14 anti-patterns. Tiered output (A/B/C/D grades), critical gate enforcement, evidence-backed scoring, cross-file pattern analysis, and a prioritized execution plan.
API endpoint integrity across 10 dimensions (D1-D10): input validation, payload design, pagination, error handling, caching, HTTP semantics, API waterfalls, rate limiting, auth patterns, and documentation. Optional GET probing on non-production targets.
Database performance and safety audit: query patterns, indexes, schema design, connection management, transactions, migrations, caching strategy, query optimization, ORM anti-patterns, observability, data lifecycle, and DB security. Code-level checks for all ORMs with optional live analysis.
Dependency health and internal coupling audit across 10 dimensions: supply chain vulnerabilities, version freshness, dead dependencies, license compliance, bundle weight, circular dependencies, coupling metrics, architecture boundary violations, barrel file health, and change coupling.
Full-stack performance health check: rendering, bundles, assets, API/network, algorithms, memory, database, caching, Web Vitals, backend runtime, concurrency, and framework-specific pathologies. Evidence-based Impact Models with confidence tiers and a prioritized optimization roadmap.
Codebase organization across 13 dimensions: directory consistency, naming conventions, folder depth, colocation, barrel exports, separation of concerns, file size distribution, dead code, complexity distribution, duplication, root organization, documentation, and git churn hotspots.
CI/CD pipeline optimization across 10 dimensions: caching strategy, parallelism, conditional execution, artifact management, secret handling, action pinning, timeouts, Docker layer optimization, test integration, and pipeline speed benchmarks. Primary support: GitHub Actions.
Environment config across 8 dimensions: variable completeness, unused vars, startup validation, secret exposure, environment parity (dev/staging/prod), type safety, default values, and documentation. Supports .env, process.env, import.meta.env, and framework-specific patterns.
SEO/GEO site audit covering 200+ checks: meta tags, structured data, AI crawler readiness, content quality, GEO (Generative Engine Optimization), performance, mobile, images, canonical URLs, sitemaps, i18n. Framework-aware: Astro, Next.js, Hugo, WordPress, React.
Design with conscious, traceable decisions. Persists design system in .interface-design/ (system.md + system.json) for cross-session consistency. Domain exploration, component construction with mandatory checkpoints, and craft validation tests.
UI/UX consistency audit with DX1-DX20 checklist covering states, consistency, accessibility, responsive behavior, and interaction patterns. Optional visual audit via chrome-devtools screenshots and automated WCAG accessibility via axe-core. DAP1-DAP12 anti-pattern detection.
Multi-agent UI review with 4 specialist perspectives: UX Researcher (flows, friction), Visual Designer (hierarchy, spacing, typography), i18n/Multilingual QA (text overflow, RTL, locale), and Accessibility/Performance Auditor (WCAG, contrast, loading). Lead Designer synthesizes into prioritized fixes with exact code.
Three modes: review existing codebase architecture (A1-A9 dimensions), create Architecture Decision Records, or design new systems from requirements. Uses CodeSift for module discovery, dependency mapping, structural metrics, and temporal coupling detection.
Write and update technical documentation from actual codebase analysis. Generates README, API reference, runbook, onboarding guide, or changelog. Update mode patches stale sections without rewriting from scratch.
Generate PowerPoint (PPTX) presentations using python-pptx. Professional slides with consistent theming, speaker notes, and visual variety. Can generate from a topic, from a markdown file, or as outline-only.
Manage the project's tech debt backlog. Used by audit and review skills to persist findings via fingerprint-based deduplication. Browse, fix, dismiss, prioritize, and get batch suggestions for accumulated issues.
39 gates stand between
your AI and bad code
Every production file is scored against CQ1-CQ22. Every test file against Q1-Q17. Critical gate = 0 means the agent stops and fixes before proceeding. No exceptions.
Code Quality
CQ1-CQ22Production code gates with evidence requirements
Test Quality
Q1-Q17Test file gates with scoring threshold
// Evidence standard file:function:line for each critical gate scored as 1
Depth, not breadth
Other tools give you a workflow. Zuvo gives you a workflow with 39 quality gates, evidence requirements, and zero trust in agent claims.
| Feature | Zuvo | Superpowers | gstack | Compound |
|---|---|---|---|---|
| Total skills | 33 | 14 | 28 | 6 |
| Auto-routing | ✓ | ✓ | — | — |
| Multi-agent pipeline | 10 agents | basic | — | — |
| Code quality gates | CQ1-CQ22 | — | — | — |
| Test quality gates | Q1-Q17 | — | — | — |
| Evidence requirements | ✓ | — | — | — |
| Security audit (OWASP) | S1-S14 | — | 1 cmd | — |
| Performance audit | 12 dim | — | — | — |
| DB audit | 60+ checks | — | — | — |
| Stack-specific rules | 5 stacks | — | — | — |
| TDD enforcement | ✓ | ✓ | — | — |
| Verification protocol | ✓ | ✓ | — | — |
| CodeSift integration | ✓ | — | — | — |
| Backlog persistence | ✓ | — | — | ✓ |
What developers say
“Before Zuvo, my AI would hallucinate tests passing. Now it can't claim success without running the actual command. The verification protocol alone is worth it.”
“33 skills sounds like marketing fluff until you use the security audit. S1-S14 with Sentry confidence scoring caught three auth bypass patterns our manual review missed.”
“The pipeline changed how I work. Brainstorm explores the codebase before I even describe the problem. Half the time it finds relevant existing code I didn't know about.”
Frequently asked questions
Everything you need to know about Zuvo — modes, tokens, stacks, and how things work under the hood.
What is auto-routing?
You describe what you want in plain language — "review my changes", "add a notification feature", "audit security" — and Zuvo's meta-skill router automatically picks the right skill. No slash commands to memorize. The router is injected at session start via a SessionStart hook and matches your intent to one of 33 skills.
What do the skill modes mean — deep, quick, auto, full?
full — default mode, comprehensive analysis across all dimensions. --quick — faster scan, fewer dimensions, good for iterative checks. --deep — maximum thoroughness, more files analyzed, cross-file pattern detection. --auto — skip plan approval and human interaction, run end-to-end. --dry-run — preview what would happen without writing files. Each skill documents its modes in the expanded details above.
What is CodeSift and do I need it?
CodeSift is an MCP server for semantic code search, call chain tracing, complexity analysis, and module detection. Zuvo uses it for deep code exploration — tracing how a function flows through your codebase, finding duplicates, detecting architectural boundaries. It's optional. Zuvo works without it in degraded mode (falls back to grep/read), but CodeSift reduces token usage by 15-30% and enables features like trace_route, detect_communities, and find_clones.
How many tokens does a full pipeline run cost?
For a medium-complexity feature (5-10 files): Brainstorm ~30-50K, Plan ~40-60K, Execute ~15-25K per task (8 tasks typical). Total: 200-300K tokens. Smaller features (3-4 tasks) run 100-150K. CodeSift reduces usage by 15-30% vs degraded mode. Individual skills like zuvo:review or zuvo:code-audit are much cheaper — typically 10-30K.
What environments does Zuvo support?
Claude Code — full support: parallel agents via Task tool, model routing, user interaction via AskUserQuestion. Codex — parallel execution with TOML agents, capped at 6 threads, no user interaction. Cursor — sequential execution only, no agent spawning. All environments produce identical output — execution strategy adapts but quality gates remain the same.
What stacks and frameworks are supported?
Stack detection is automatic from your config files. Currently supported: TypeScript (tsconfig.json), React / Next.js (package.json deps), NestJS (@nestjs/core), Python (pyproject.toml), PHP / Yii2 (composer.json). Test runners detected: Vitest, Jest, PHPUnit, Codeception. ORMs: Prisma. Stack-specific rules load automatically — you don't configure anything.
Does Zuvo modify my project files?
During installation: nothing is modified. At runtime: pipeline specs go to docs/specs/, tech debt backlog to memory/backlog.md, design systems to .interface-design/. All paths are deterministic and gitignore-friendly. You can delete these directories anytime without affecting Zuvo.
What's the difference between zuvo:build and the full pipeline?
zuvo:build is for scoped work — 1-5 files, clear scope, no design decisions needed. It runs blast radius + duplication scan, then TDD with quality gates. The full pipeline (brainstorm → plan → execute) is for features touching 5+ files or requiring design exploration. It adds 10 agents, spec/plan documents, and multi-stage review. If you're unsure, the router asks which approach fits.
What happens when a quality gate fails?
CQ1-CQ22 has 6 critical gates (CQ3, CQ4, CQ5, CQ6, CQ8, CQ14) that block immediately if scored 0 — the agent must fix before proceeding. Q1-Q17 has 5 critical gates (Q7, Q11, Q13, Q15, Q17). Non-critical failures are scored and tracked. If a fix takes under 5 minutes, the agent fixes it now. Otherwise, it's persisted to the backlog with a confidence score.
Can I use specific skills without auto-routing?
Yes. Invoke directly: zuvo:review, zuvo:code-audit src/services/, zuvo:security-audit --live-url http://localhost:3000. Slash commands also work: /review, /build, /refactor. For tasks that don't need a skill, state your intent clearly ("just change the port to 3001") and the router won't activate.
Is Zuvo free?
Zuvo is open source under the MIT license. All 33 skills, all quality gates, all agent definitions are included. Install from the marketplace and use everything. You pay for the Claude API tokens consumed — Zuvo itself has no license fee.
How do I update Zuvo?
Enable auto-updates: /plugin → Select zuvo-marketplace → Enable auto-update. Or update manually: claude plugin marketplace update greglas75/zuvo-marketplace followed by claude plugin update zuvo.
Stop trusting.
Start verifying.
One command. 33 skills. Every line of code understood before it's written, verified after it's written, tracked if it has issues.