An engineer describes what should happen; the stack writes the code, validates it in a sandbox, executes under two safety gates, and signs the audit log. Below is how it's built — and what one run looks like end to end on a CNC mill.
The engineer types what they want. No G-code, no Python, no SDK calls — just a sentence.
$ agent.prompt: "Run a 4-corner probing routine on the reference fixture, compute XY offset and tool-length offset, and write the corrected origin to G54. Halt and alert me if any probe deflection exceeds 0.2 mm."
The agent queries the vector store for documents that match the intent — vendor manuals, internal SOPs, fixture drawings, prior calibration logs. Grounding is explicit; nothing is hallucinated.
$ rag.retrieve("probing + tool-length offset + 3-axis VMC") loaded 8 contexts in 2.4 s · embed dim 1024 · top-k 8 ✓ G-Code Programming Manual — Probing Cycles [vendor] ✓ Tool-Length Offset Procedure [SOP] ✓ Reference Fixture Drawing #RF-204 [CAD] ✓ Machine Safety Boundaries — Spindle Region [safety] ✓ Calibration Run History (last 6 mo) [runs] + 3 more
Output is real code, not pseudocode — typed against the retrieved manual, parameterised from the prompt, runnable by the executor in the next stage.
# auto-generated · agent run f4a1 · 14:23:01 from cnc import Controller from calib import compute_offset, write_g54 mill = Controller(host="10.0.4.12", auth=KEY) # safe approach mill.send("G00 G54 X0 Y0 Z25") mill.send("G31 Z-20 F500") # probe down (tool-length ref) z_ref = mill.read_probe() mill.send("G00 Z25") # 4-corner XY probing probes = [] for x, y in [(-50,-50),(50,-50),(50,50),(-50,50)]: mill.send(f"G00 X{x*0.9} Y{y*0.9}") mill.send(f"G31 X{x} Y{y} F200") probes.append(mill.read_probe()) mill.send("G00 Z25") dx, dy = compute_offset(probes) write_g54(mill, dx, dy) print(f"applied G54: dX={dx:+.3f} dY={dy:+.3f}")
Before a single command leaves the workstation, the script runs against a virtual mill. Each command is checked against the equipment's allowlist and the operating envelope.
$ sandbox.run(script) → simulator boot ............................ 0.4 s → allowlist check: ✓ G00 (rapid) — allowed ✓ G31 (probe) — allowed ✓ G54 origin-write — allowed ✓ all Z values > -25 mm — within safety envelope ✓ spindle-on commands — none requested (correct for probing) → virtual run: ✓ 4 / 4 probe points succeeded ✓ no collision predicted ✓ computed offset: dX=+0.140 mm dY=-0.080 mm ✓ estimated runtime: 2 m 47 s PASS — ready for hardware
The operator confirms in person. Once approved, the script flows down through the runtime stack — each layer signs off before the spindle moves.
The agent renders a self-contained report — chart, table, log location — and writes it to the validation team's shared store. No screenshotting, no spreadsheet wrangling.
$ dashboard.generate(run="2026-05-23_calibration") CALIBRATION REPORT · 2026-05-23 14:27 ───────────────────────────────────── Fixture RF-204 (reference) Probe points 4 / 4 successful Max deflection 0.08 mm (threshold 0.2 mm · headroom 60%) Computed offset dX = +0.140 mm dY = -0.080 mm Z reference +0.000 mm (unchanged) Applied to G54 (active WCS) ⬢ visualisation [4-corner offset plot, deflection bars] ⬢ audit log audit-log/2026-05-23_1424.signed ⬢ stored at results/2026-05-23_calibration.json DONE · ready for next shift
| Sym | Parameter | Typ | Unit |
|---|---|---|---|
tman |
time, manual baseline | ~3 | weeks |
tagt |
time, agent end-to-end | ~12 ▼ ~99% | min |
Nprm |
prompts to working run | 3 | — |
Lgen |
code generated (Python + G-code) | ~120 | LOC |
Cinf |
inference cost per run · cloud / local | 0.85 / 0 | USD |
Ngate |
safety gates between code and hardware | 2 | per cmd |
δmax |
max probe deflection observed | 0.08 | mm |
Python 3.12
→
OPC UA · Modbus
→
vLLM · Llama 3 70B
→
Milvus
→
FastAPI
When generated code fails — a parameter out of range, a protocol mismatch, a simulator exception — the failure is fed back into the next iteration. The agent re-plans, re-emits, and converges on a working result.
Every answer is retrieved from your datasheets, vendor manuals, runbooks, and tribal-knowledge captures. The agent cites your documents instead of a generic best practice from the open web.
Before a single command reaches real equipment, the script is executed against a virtual instance or simulator. Behaviour is observed, errors are caught, and only validated runs are released to the floor.
Bringing the agent online for a new instrument is a documentation upload — datasheet, programmer's manual, SOP. The RAG layer indexes it; the protocol adapter is wired in; no model retraining.
The agent is not allowed to act freely. Three layers of guardrails stand between generated code and your equipment — sandbox, allowlist, and a human in the loop.
Every generated script runs against a virtual instance or simulator before any command leaves the workstation. Allowlist coverage, motion envelope, and estimated runtime are all verified ahead of real hardware.
At runtime, the protocol adapter re-checks every command against the equipment's allowlist. Anything off-list is blocked at the wire — no exceptions, no overrides.
Even with sandbox and allowlist passing, a real-equipment run requires a typed operator confirmation. The agent never moves hardware on its own authority.
A short conversation is the fastest way to see whether this approach fits your team. No commitment — just a scoped discussion of what your floor looks like.