Natural-Language Infrastructure, Ontology‑Driven
XDevOps is a cognitive agent that turns plain English into safe, auditable cloud & on‑prem operations. Every action is validated against a knowledge graph (SHACL), executed via mTLS backends, and traced end‑to‑end.
Do more with: diagnose incidents
At a glance
Start with a goal. The Task engine (LLM) infers intent and emits a structured Task JSON; XDevOps validates with SHACL and routes to the right cognitive agents (Scenario Engine, Shell Coach, Provisioning, Compliance). Execution is mTLS and fully auditable.
- 🧭 Intent Entry Point (Task): LLM infers mode/stream/owners → Task(JSON) → routed to agents
- 🧯 Real-time Shell Troubleshooting: diagnose failures, explain root cause, propose safe fixes
- 🚦 Scenario Engine: runbooks with pre-change policy gates & automatic re-planning
- 📈 NLP Observability: ask Prometheus/Loki in English; get charts, trends, anomaly alerts
- ⚙️ Event-Driven Provisioning: materialise infra when events fire—no polling
- 🧪 Autonomous Diagnostics (ADO): multi‑agent triage with CID, hypotheses → verification → fix
Overview
Natural language → plan → policy check → execution → events
🎯 The Objective of XDevOps
Natural language is the ultimate interface between humans and cognitive agents—rich with nuance, intent, and context.
- No more YAML wrestling.
- No rigid forms or brittle scripts.
- Say what you want; the agent does the rest—safely.
⚡ Why now?
Breakthroughs in LLMs and knowledge graphs make this practical today—what was once sci‑fi is now operational reality.
- Reasoning agents that understand policies & context.
- RDF/SHACL graphs to enforce standards before change.
- Event‑driven execution for real‑time reconciliation.
🚀 Our mission
Make natural language the fastest, safest way to run infrastructure—from design to troubleshooting.
- Human‑centric, policy‑first automation.
- Audit‑ready by design.
- Portable across PaaS • Hybrid • On‑Prem.
Metrics & KPIs — Product-Aligned SRE/DevOps Subset
We focus on the subset of SRE/DevOps metrics that proves value for platform and infra teams.
① Flow unlocks value ⚡
Shift from project outputs to product value streams. Optimize how fast value flows from intent to production.
② Make work visible 🔎
Bottlenecks hide in handoffs. Use end-to-end telemetry and the knowledge graph to surface constraints early.
③ Govern by outcomes 🎯
Budget and governance follow product lines. Enforce policy pre-change and track customer impact.
Cognitive Agentic KPIs 🧠
Quantify autonomy, safety, and learning velocity of XDevOps agents.
Key Capabilities
Each box is an agent skill with guard‑rails, explanations, and full lineage.
🧭 Intent Entry Point — Task Intelligence
Capture a goal in natural language—an LLM infers the intent and emits a structured Task (JSON) that safely routes to the right cognitive agents (Provisioning, Scenario Engine, Shell Coach, Compliance). Tasks = intent, Agents = action.
⚡ !task create micro "Enable canary for checkout"
→ infers Feature • micro → Scenario Engine
🧯 "Follow up on incident #1423"
→ infers Bug Fix • lite → checklist & owners → Shell Coach
📈 "Migrate our SLOs to 99.9%"
→ infers Tech Debt/Risk • full → dependencies & observability → Provisioning & Observability agents
⚙️ Event‑Driven Provisioning
Autonomously creates & reconciles infra the moment a cloud or on‑prem event fires—zero polling with full audit.
🚦 Scenario Engine
Design repeatable runbooks; the agent executes, adapts & explains each step until policy passes—then re‑plans on failure.
🖥️ Interactive Shell Coach
Run commands in your own terminal—the agent annotates, fixes & learns in real time. Safer changes, faster outcomes.
📈 Observability via NLP
Query Prometheus & Loki in English—get instant charts, trends & anomaly alerts without memorising DSLs.
🎓 Certification Learning Support
Accelerate certifications—the agent crawls fresh docs, builds adaptive study plans & quizzes you to mastery.
🧠 Multi‑RAG Personalisation
Capabilities, Knowledge & Story corpora tailor every answer to your standards, repos & runbooks.
🔁 Knowledge Transfer
All chats & shell sessions are vectorised, searchable & replayable—perfect for onboarding & audits.
🧩 Git & IaC Intelligence
PRs, commits, Terraform & Helm live in vectors—ask for diffs, impact & drift instantly.
🛡️ Policy‑First Automation
A SHACL‑validated knowledge graph enforces tags, budgets & security before every change.
Autonomous Diagnostics Orchestrator (ADO)
Multi‑agent diagnostics for SRE/Platform teams. Hypotheses are generated, verified with data, and summarized with evidence.
How it works (at a glance)
- You launch: !diag checkout 5xx spike endpoint_id=42 window=45m
- Orchestrator normalizes context and issues tasks with a fresh CID.
- KB Agent enriches app context (owners, similar incidents, suspected patterns).
- Hypothesis Agent drafts 2–5 likely causes + test plan.
- Verification Agent runs PromQL/LogQL/K8s/Git checks and returns verdicts.
- Fix‑Proposal Agent synthesizes safe remediation steps with blast‑radius notes.
- Orchestrator streams progress and posts a final Markdown summary with confidence.
<MASKED>
).🧪 Diagnostics — Simulation
Pick an example and watch the orchestrator run a simulated investigation with a generated CID.
Provisioning & Requests — Ontology‑Driven
Cognitive planning • SHACL validation • mTLS execution • event emission
How a request flows
- Intent capture: You describe the outcome in natural language.
- Plan synthesis: Agent generates an ordered CLI plan with dependency checks.
- Ontology validation: Plan is validated in RDF via SHACL (tags, budgets, security).
- mTLS execution: Commands run via agent backends with no shell substitution.
- Event emission: Created/Deleted events flow to the graph for lineage & dashboards.
Safety & governance
- Run‑command hygiene: one script line per --scripts, no chaining (;/&&/|).
- SSH key policy: provide --ssh-key-values or auto‑generate securely.
- Non‑mutating ops: strip tags automatically to keep reads pure.
- Fixer loop: if a step fails, the agent proposes corrected steps—no repetition of failures.
Product-Aligned SRE/DevOps Metrics Subset
A subset of SRE/DevOps metrics tailored to infra & platform teams, organized by Flow Streams: Feature, Bug Fix, Risk, and Technical Debt.
Flow Streams & Value Mapping
Pick a stream to highlight its purpose, leading indicators, SRE/DORA metrics, cognitive KPIs, safeguards, and economics.
Stream | Purpose | Leading Indicators | SRE/DORA | Cognitive KPIs | Safeguards | Economics |
---|---|---|---|---|---|---|
🧩 Feature | Deliver new user value | PR cycle time ↓, feature throughput ↑, review latency ↓ | Deploy freq ↑, Lead time ↓, CFR stable, SLO impact ≤ 0 | ADI ↑, E2GT ↓, GVSC ≥95%, ZTR ↑ | Pre‑flight policy, canary, cost/tag gates | $/feature ↓, NPS ↑ |
🛠️ Bug Fix | Restore reliability fast | MTTA ↓, bug deflection ↑, duplicate pattern match | MTTR ↓, CFR ↓, incident count ↓ | Root‑cause precision ↑, ADI(runbooks) ↑, E2GT ↓ | Safe rollback, change windows, postmortem required | Incident minutes avoided ↑, cost‑of‑quality ↓ |
🛡️ Risk | Reduce exposure proactively | Open risks ↓, policy failures ↓, patch lead time ↓ | Error budget burn ↓, CFR —, compliance pass ↑ | GVSC ≥99%, POCR ↑, ZTR ↑ | SHACL policy gates, mandatory controls | Risk $ avoided ↑, audit findings ↓ |
🧱 Tech Debt | Pay down toil & complexity | Toil hours ↓, hotspot churn ↓, flaky tests ↓ | Lead time ↑ short‑term, then ↓; CFR stable | Self‑Improvement Rate ↑, ACS ↑ | Contract tests, perf gates, backward‑compat | Cost‑to‑serve ↓, infra efficiency ↑ |
1) Discover → Frame
- Map top user intents & compliance policies.
- Connect observability & IaC repos to vectors.
- Define ontology classes for your domain.
- Tag Streams: Feature • Bug Fix • Risk • Tech Debt.
2) Pilot → Govern
- Enable SHACL policies for tags, budget, security.
- Shadow run vs manual; diff plans & outcomes.
- Instrument correlation IDs & event lineage.
- Baseline SRE: Deploy Freq, Lead Time, MTTR, CFR, SLO burn.
3) Operate → Scale
- Promote runbooks to Scenario Engine.
- Shell coaching by default for risky ops.
- Onboard via replayable sessions & evidence.
- Add Cognitive KPIs: ADI, E2GT, GVSC, ZTR, POCR.
KPI Helper
Quick glossary for Cognitive Agentic KPIs referenced above.
ADI
Autonomy Depth Index — how many steps agents complete without intervention.
E2GT
Event‑to‑Goal Time — median time from event ingestion to verified outcome.
GVSC
Graph‑Validated Safety Coverage — % actions passing SHACL gates.
ZTR
Zero‑Touch Rate — share of requests completed with no manual edits.
POCR
Preventive Opportunity Capture Rate — % predicted issues acted on early.
ACS
Agent Correctness Score — judged accuracy of agent decisions.
Deployment Targets
Same cognitive core, different footprints — PaaS • Hybrid • On‑Prem.
☁️ PaaS
Managed multi‑tenant control‑plane. Fastest start, zero infra to manage.
- Org‑scoped tenant & keys.
- mTLS agent connectors.
- SLA & security hardening.
- Up to 2 agent connectors & 1 environment
- 300 actions/month, 7‑day event retention
- 3 seats (SSO ready), community support
- No credit card during beta
🏗️ Hybrid
Cloud control‑plane + on‑prem agents for restricted or air‑gapped workloads.
- Outbound‑only agents.
- Private RAG stores.
- Bring‑your‑KMS.
🔒 On‑Prem
Single‑tenant, fully isolated. All data & models within your perimeter.
- Self‑host GraphDB/Milvus.
- Offline updates.
- Custom compliance packs.
Talk to us
We’re opening soon. Free Tier available for PaaS. Get early access or schedule time with the team.