📣 XDevOps will open for public access soon! Join the launch list: feedback@xdevops.ai Book a call Request a demo

Natural-Language Infrastructure, Ontology‑Driven

XDevOps is a cognitive agent that turns plain English into safe, auditable cloud & on‑prem operations. Every action is validated against a knowledge graph (SHACL), executed via mTLS backends, and traced end‑to‑end.

Do more with: diagnose incidents

🤖 Cognitive Agent 🧭 Ontology + SHACL 🔐 mTLS Connectors ⚡ Event-Driven 🔎 Explainable Plans
Built on RDF/SHACL
OpenTelemetry-native
Milvus‑powered RAG
Redis Streams
All mutating commands automatically carry a unique correlationId tag for lineage and audits.

At a glance

Start with a goal. The Task engine (LLM) infers intent and emits a structured Task JSON; XDevOps validates with SHACL and routes to the right cognitive agents (Scenario Engine, Shell Coach, Provisioning, Compliance). Execution is mTLS and fully auditable.

  • 🧭 Intent Entry Point (Task): LLM infers mode/stream/owners → Task(JSON) → routed to agents
  • 🧯 Real-time Shell Troubleshooting: diagnose failures, explain root cause, propose safe fixes
  • 🚦 Scenario Engine: runbooks with pre-change policy gates & automatic re-planning
  • 📈 NLP Observability: ask Prometheus/Loki in English; get charts, trends, anomaly alerts
  • ⚙️ Event-Driven Provisioning: materialise infra when events fire—no polling
  • 🧪 Autonomous Diagnostics (ADO): multi‑agent triage with CID, hypotheses → verification → fix

🧠 Explainable fixes
🔏 Policy-first execution
🕵️ Full audit trail

Overview

Natural language → plan → policy check → execution → events

🎯 The Objective of XDevOps

Natural language is the ultimate interface between humans and cognitive agents—rich with nuance, intent, and context.

  • No more YAML wrestling.
  • No rigid forms or brittle scripts.
  • Say what you want; the agent does the rest—safely.

⚡ Why now?

Breakthroughs in LLMs and knowledge graphs make this practical today—what was once sci‑fi is now operational reality.

  • Reasoning agents that understand policies & context.
  • RDF/SHACL graphs to enforce standards before change.
  • Event‑driven execution for real‑time reconciliation.

🚀 Our mission

Make natural language the fastest, safest way to run infrastructure—from design to troubleshooting.

  • Human‑centric, policy‑first automation.
  • Audit‑ready by design.
  • Portable across PaaS • Hybrid • On‑Prem.

Metrics & KPIs — Product-Aligned SRE/DevOps Subset

We focus on the subset of SRE/DevOps metrics that proves value for platform and infra teams.

① Flow unlocks value ⚡

Shift from project outputs to product value streams. Optimize how fast value flows from intent to production.

+0%Flow Velocity ↑ (features/week)
0%Lead Time ↓ (idea → prod)
0%Flow Efficiency ↑ (active / wait)
Flow Metrics: Velocity • Time • Efficiency • Load • Distribution

② Make work visible 🔎

Bottlenecks hide in handoffs. Use end-to-end telemetry and the knowledge graph to surface constraints early.

0mMTTR (median)
0xBlocked Work Detected
0%Scenario Pass Rate
0%Observability Coverage
Signals: WIP • Blockers • Queue time • Policy failures • Coverage

③ Govern by outcomes 🎯

Budget and governance follow product lines. Enforce policy pre-change and track customer impact.

0%Policy Compliance (pre-flight)
0%Change Failure Rate ↓
0%Cost per Change ↓
0xError Budget Burn (rate)
Outcomes: Compliance • Reliability • Unit economics • SLOs

Cognitive Agentic KPIs 🧠

Quantify autonomy, safety, and learning velocity of XDevOps agents.

0.00Autonomy Depth Index (ADI)
0mEvent→Goal (p50)
0%Graph-Validated Safety Coverage (GVSC)
0%Zero-Touch Rate (ZTR)
0%Preventive Opportunity Capture (POCR)

Key Capabilities

Each box is an agent skill with guard‑rails, explanations, and full lineage.

🧭 Intent Entry Point — Task Intelligence

Capture a goal in natural language—an LLM infers the intent and emits a structured Task (JSON) that safely routes to the right cognitive agents (Provisioning, Scenario Engine, Shell Coach, Compliance). Tasks = intent, Agents = action.

⚡ !task create micro "Enable canary for checkout"
→ infers Feature • micro → Scenario Engine

🧯 "Follow up on incident #1423"
→ infers Bug Fix • lite → checklist & owners → Shell Coach

📈 "Migrate our SLOs to 99.9%"
→ infers Tech Debt/Risk • full → dependencies & observability → Provisioning & Observability agents
LLM intent inference Agent routing JSON Schema SHACL pre-flight Milvus memory Redis events FastAPI engine

⚙️ Event‑Driven Provisioning

Autonomously creates & reconciles infra the moment a cloud or on‑prem event fires—zero polling with full audit.

ResourceCreated/DeletedIdempotent plansGraph lineage

🚦 Scenario Engine

Design repeatable runbooks; the agent executes, adapts & explains each step until policy passes—then re‑plans on failure.

Policy gates (SHACL)Rollback pathsExplainable steps

🖥️ Interactive Shell Coach

Run commands in your own terminal—the agent annotates, fixes & learns in real time. Safer changes, faster outcomes.

Command fixerCorrelation tagsRun‑command hygiene

📈 Observability via NLP

Query Prometheus & Loki in English—get instant charts, trends & anomaly alerts without memorising DSLs.

Time‑series insightsAnomaly alertsRoot‑cause prompts

🎓 Certification Learning Support

Accelerate certifications—the agent crawls fresh docs, builds adaptive study plans & quizzes you to mastery.

Adaptive quizzesDoc crawlingWeak‑spot drills

🧠 Multi‑RAG Personalisation

Capabilities, Knowledge & Story corpora tailor every answer to your standards, repos & runbooks.

Org‑specific answersVector searchContinuous learning

🔁 Knowledge Transfer

All chats & shell sessions are vectorised, searchable & replayable—perfect for onboarding & audits.

Session memoryReplay & shareEvidence packs

🧩 Git & IaC Intelligence

PRs, commits, Terraform & Helm live in vectors—ask for diffs, impact & drift instantly.

IaC parsingImpact analysisDrift checks

🛡️ Policy‑First Automation

A SHACL‑validated knowledge graph enforces tags, budgets & security before every change.

Pre‑flight checksStandards & tagsBudget guard‑rails

Autonomous Diagnostics Orchestrator (ADO)

Multi‑agent diagnostics for SRE/Platform teams. Hypotheses are generated, verified with data, and summarized with evidence.

How it works (at a glance)

  1. You launch: !diag checkout 5xx spike endpoint_id=42 window=45m
  2. Orchestrator normalizes context and issues tasks with a fresh CID.
  3. KB Agent enriches app context (owners, similar incidents, suspected patterns).
  4. Hypothesis Agent drafts 2–5 likely causes + test plan.
  5. Verification Agent runs PromQL/LogQL/K8s/Git checks and returns verdicts.
  6. Fix‑Proposal Agent synthesizes safe remediation steps with blast‑radius notes.
  7. Orchestrator streams progress and posts a final Markdown summary with confidence.
Traffic is coordinated via Redis Streams; agents resolve credentials locally (secrets masked as <MASKED>).

🧪 Diagnostics — Simulation

Pick an example and watch the orchestrator run a simulated investigation with a generated CID.

Select an example above to start the simulation.

    Provisioning & Requests — Ontology‑Driven

    Cognitive planning • SHACL validation • mTLS execution • event emission

    How a request flows

    1. Intent capture: You describe the outcome in natural language.
    2. Plan synthesis: Agent generates an ordered CLI plan with dependency checks.
    3. Ontology validation: Plan is validated in RDF via SHACL (tags, budgets, security).
    4. mTLS execution: Commands run via agent backends with no shell substitution.
    5. Event emission: Created/Deleted events flow to the graph for lineage & dashboards.

    Safety & governance

    • Run‑command hygiene: one script line per --scripts, no chaining (;/&&/|).
    • SSH key policy: provide --ssh-key-values or auto‑generate securely.
    • Non‑mutating ops: strip tags automatically to keep reads pure.
    • Fixer loop: if a step fails, the agent proposes corrected steps—no repetition of failures.
    GraphDB (RDF/SHACL) Milvus (RAG) Redis Streams mTLS Agent

    Product-Aligned SRE/DevOps Metrics Subset

    A subset of SRE/DevOps metrics tailored to infra & platform teams, organized by Flow Streams: Feature, Bug Fix, Risk, and Technical Debt.

    Flow Streams & Value Mapping

    Pick a stream to highlight its purpose, leading indicators, SRE/DORA metrics, cognitive KPIs, safeguards, and economics.

    Stream-to-metrics value mapping
    StreamPurposeLeading IndicatorsSRE/DORACognitive KPIsSafeguardsEconomics
    🧩 Feature Deliver new user value PR cycle time ↓, feature throughput ↑, review latency ↓ Deploy freq ↑, Lead time ↓, CFR stable, SLO impact ≤ 0 ADI ↑, E2GT ↓, GVSC ≥95%, ZTR ↑ Pre‑flight policy, canary, cost/tag gates $/feature ↓, NPS ↑
    🛠️ Bug Fix Restore reliability fast MTTA ↓, bug deflection ↑, duplicate pattern match MTTR ↓, CFR ↓, incident count ↓ Root‑cause precision ↑, ADI(runbooks) ↑, E2GT ↓ Safe rollback, change windows, postmortem required Incident minutes avoided ↑, cost‑of‑quality ↓
    🛡️ Risk Reduce exposure proactively Open risks ↓, policy failures ↓, patch lead time ↓ Error budget burn ↓, CFR —, compliance pass ↑ GVSC ≥99%, POCR ↑, ZTR ↑ SHACL policy gates, mandatory controls Risk $ avoided ↑, audit findings ↓
    🧱 Tech Debt Pay down toil & complexity Toil hours ↓, hotspot churn ↓, flaky tests ↓ Lead time ↑ short‑term, then ↓; CFR stable Self‑Improvement Rate ↑, ACS ↑ Contract tests, perf gates, backward‑compat Cost‑to‑serve ↓, infra efficiency ↑

    1) Discover → Frame

    • Map top user intents & compliance policies.
    • Connect observability & IaC repos to vectors.
    • Define ontology classes for your domain.
    • Tag Streams: Feature • Bug Fix • Risk • Tech Debt.

    2) Pilot → Govern

    • Enable SHACL policies for tags, budget, security.
    • Shadow run vs manual; diff plans & outcomes.
    • Instrument correlation IDs & event lineage.
    • Baseline SRE: Deploy Freq, Lead Time, MTTR, CFR, SLO burn.

    3) Operate → Scale

    • Promote runbooks to Scenario Engine.
    • Shell coaching by default for risky ops.
    • Onboard via replayable sessions & evidence.
    • Add Cognitive KPIs: ADI, E2GT, GVSC, ZTR, POCR.

    KPI Helper

    Quick glossary for Cognitive Agentic KPIs referenced above.

    ADI

    Autonomy Depth Index — how many steps agents complete without intervention.

    E2GT

    Event‑to‑Goal Time — median time from event ingestion to verified outcome.

    GVSC

    Graph‑Validated Safety Coverage — % actions passing SHACL gates.

    ZTR

    Zero‑Touch Rate — share of requests completed with no manual edits.

    POCR

    Preventive Opportunity Capture Rate — % predicted issues acted on early.

    ACS

    Agent Correctness Score — judged accuracy of agent decisions.

    Deployment Targets

    Same cognitive core, different footprints — PaaS • Hybrid • On‑Prem.

    ☁️ PaaS

    Managed multi‑tenant control‑plane. Fastest start, zero infra to manage.

    Free TierMulti‑tenantSLA
    • Org‑scoped tenant & keys.
    • mTLS agent connectors.
    • SLA & security hardening.
    Free Tier (PaaS):
    • Up to 2 agent connectors & 1 environment
    • 300 actions/month, 7‑day event retention
    • 3 seats (SSO ready), community support
    • No credit card during beta

    🏗️ Hybrid

    Cloud control‑plane + on‑prem agents for restricted or air‑gapped workloads.

    • Outbound‑only agents.
    • Private RAG stores.
    • Bring‑your‑KMS.

    🔒 On‑Prem

    Single‑tenant, fully isolated. All data & models within your perimeter.

    • Self‑host GraphDB/Milvus.
    • Offline updates.
    • Custom compliance packs.

    Talk to us

    We’re opening soon. Free Tier available for PaaS. Get early access or schedule time with the team.