§00 · A worked example

From plain English to validated equipment code.

An engineer describes what should happen; the stack writes the code, validates it in a sandbox, executes under two safety gates, and signs the audit log. Below is how it's built — and what one run looks like end to end on a CNC mill.

Engineering field-note before/after diagram. Left panel BEFORE · MANUAL · ~3 weeks: a cluttered drafting desk with handwritten G-code, vendor manuals, coffee mug, mechanical pencil, calipers, and an 18:42 wall clock. Right panel AFTER · WITH AGENT · ~12 min: a clean terminal showing a probe-the-fixture prompt, generated G-code, a CNC mill silhouette with a check mark, and a 12-min timer. A rust-orange arrow points from left to right; INITIAL · BEFORE → AFTER · REV. 01 · 2026 plate in the bottom-left corner.
FIG. 01Scenario · CNC mill calibration
chk · 0x4e1c
Equipment
3-axis vertical milling centertouch probe · spindle off · reference fixture mounted
Task
4-corner probing + offset correctionmeasure XY origin drift, write G54, halt if probe deflection > 0.2 mm
Operator
Validation engineerno G-code authoring experience required
When this matters
A firmware update silently breaks test scriptsthe agent reads the release notes and rewrites them
How it works

The engineer types what they want. No G-code, no Python, no SDK calls — just a sentence.

$ agent.prompt:

"Run a 4-corner probing routine on the reference fixture,
 compute XY offset and tool-length offset, and write the
 corrected origin to G54. Halt and alert me if any probe
 deflection exceeds 0.2 mm."

The agent queries the vector store for documents that match the intent — vendor manuals, internal SOPs, fixture drawings, prior calibration logs. Grounding is explicit; nothing is hallucinated.

$ rag.retrieve("probing + tool-length offset + 3-axis VMC")
  loaded 8 contexts in 2.4 s ·  embed dim 1024 · top-k 8

   G-Code Programming Manual — Probing Cycles      [vendor]
   Tool-Length Offset Procedure                    [SOP]
   Reference Fixture Drawing #RF-204               [CAD]
   Machine Safety Boundaries — Spindle Region      [safety]
   Calibration Run History (last 6 mo)             [runs]
  + 3 more

Output is real code, not pseudocode — typed against the retrieved manual, parameterised from the prompt, runnable by the executor in the next stage.

# auto-generated · agent run f4a1 · 14:23:01
from cnc import Controller
from calib import compute_offset, write_g54

mill = Controller(host="10.0.4.12", auth=KEY)

# safe approach
mill.send("G00 G54 X0 Y0 Z25")
mill.send("G31 Z-20 F500")        # probe down (tool-length ref)
z_ref = mill.read_probe()
mill.send("G00 Z25")

# 4-corner XY probing
probes = []
for x, y in [(-50,-50),(50,-50),(50,50),(-50,50)]:
    mill.send(f"G00 X{x*0.9} Y{y*0.9}")
    mill.send(f"G31 X{x} Y{y} F200")
    probes.append(mill.read_probe())
    mill.send("G00 Z25")

dx, dy = compute_offset(probes)
write_g54(mill, dx, dy)
print(f"applied G54: dX={dx:+.3f} dY={dy:+.3f}")

Before a single command leaves the workstation, the script runs against a virtual mill. Each command is checked against the equipment's allowlist and the operating envelope.

$ sandbox.run(script)

→ simulator boot ............................ 0.4 s
→ allowlist check:
    G00 (rapid)                — allowed
    G31 (probe)                — allowed
    G54 origin-write          — allowed
    all Z values > -25 mm     — within safety envelope
    spindle-on commands       — none requested  (correct for probing)

→ virtual run:
    4 / 4 probe points succeeded
    no collision predicted
    computed offset: dX=+0.140 mm  dY=-0.080 mm
    estimated runtime: 2 m 47 s

PASS — ready for hardware

The operator confirms in person. Once approved, the script flows down through the runtime stack — each layer signs off before the spindle moves.

probe ref fixture · write G54 offset?
[Enter] confirm · [Esc] abort
confirmed by m.pletner @ 14:24:18
[ ]
Agent submits signed script
Allowlist gate runtime re-check · 41 cmds
PASS
Operator gate human confirms in person
CONFIRMED
Protocol adapter TCP/IP · 10.0.4.12 · controller LAN
SENT
Mill controller probe cycle running · 2 m 47 s
PROBING
Result dX = +0.140 mm · dY = −0.080 mm · audit log signed
DONE

The agent renders a self-contained report — chart, table, log location — and writes it to the validation team's shared store. No screenshotting, no spreadsheet wrangling.

$ dashboard.generate(run="2026-05-23_calibration")

CALIBRATION REPORT · 2026-05-23 14:27
─────────────────────────────────────
  Fixture          RF-204 (reference)
  Probe points     4 / 4 successful
  Max deflection   0.08 mm  (threshold 0.2 mm · headroom 60%)
  Computed offset  dX = +0.140 mm   dY = -0.080 mm
  Z reference      +0.000 mm  (unchanged)
  Applied to       G54  (active WCS)

  ⬢ visualisation     [4-corner offset plot, deflection bars]
  ⬢ audit log         audit-log/2026-05-23_1424.signed
  ⬢ stored at         results/2026-05-23_calibration.json

DONE  ·  ready for next shift
Six-panel engineering diagram of the operator workflow on a CNC mill: 01 PROMPT (operator request as a speech bubble), 02 RETRIEVE (RAG pulls context and machining procedures), 03 GENERATE (machine program with G31/G54/G43/M06), 04 VALIDATE (simulation and constraint checks against a virtual mill), 05 EXECUTE (closed-loop probing cycle on the real machine), 06 REPORT (measurement traceability with offset table, tolerance, checksum and timestamp); FIG. 01 · Operator workflow · REV. 01 · 2026
Performance log · calibration run
2026-05-23 · 14:24 UTC · op m.pletner · run f4a1
Sym Parameter Typ Unit
tman time, manual baseline ~3 weeks
tagt time, agent end-to-end ~12 ▼ ~99% min
Nprm prompts to working run 3
Lgen code generated (Python + G-code) ~120 LOC
Cinf inference cost per run · cloud / local 0.85 / 0 USD
Ngate safety gates between code and hardware 2 per cmd
δmax max probe deflection observed 0.08 mm
Stack   runtime · per stage
Python 3.12 OPC UA · Modbus vLLM · Llama 3 70B Milvus FastAPI
How it wins against general AI tools

When generated code fails — a parameter out of range, a protocol mismatch, a simulator exception — the failure is fed back into the next iteration. The agent re-plans, re-emits, and converges on a working result.

General AIHands you an answer once. You catch the bugs and re-prompt manually.

Every answer is retrieved from your datasheets, vendor manuals, runbooks, and tribal-knowledge captures. The agent cites your documents instead of a generic best practice from the open web.

General AITrained on whatever was crawled. Quietly fabricates the rest.

Before a single command reaches real equipment, the script is executed against a virtual instance or simulator. Behaviour is observed, errors are caught, and only validated runs are released to the floor.

General AIReturns text. Whether it actually runs is your problem.

Bringing the agent online for a new instrument is a documentation upload — datasheet, programmer's manual, SOP. The RAG layer indexes it; the protocol adapter is wired in; no model retraining.

General AIBounded by its training corpus. New gear → new prompt-engineering or a fine-tune.
Safety model

Built for production floors, not demos.

The agent is not allowed to act freely. Three layers of guardrails stand between generated code and your equipment — sandbox, allowlist, and a human in the loop.

01 · Sandbox —

Validated before touching hardware

Every generated script runs against a virtual instance or simulator before any command leaves the workstation. Allowlist coverage, motion envelope, and estimated runtime are all verified ahead of real hardware.

02 · Allowlist —

Per-device command allowlist

At runtime, the protocol adapter re-checks every command against the equipment's allowlist. Anything off-list is blocked at the wire — no exceptions, no overrides.

03 · Operator gate —

Engineer confirms execution

Even with sandbox and allowlist passing, a real-equipment run requires a typed operator confirmation. The agent never moves hardware on its own authority.

Want to see this on your own equipment?

A short conversation is the fastest way to see whether this approach fits your team. No commitment — just a scoped discussion of what your floor looks like.