How it works — Maxim Pletner

FIG. 01Scenario · CNC mill calibration

chk · 0x4e1c

Equipment

3-axis vertical milling centertouch probe · spindle off · reference fixture mounted

Task

4-corner probing + offset correctionmeasure XY origin drift, write G54, halt if probe deflection > 0.2 mm

Operator

Validation engineerno G-code authoring experience required

When this matters

A firmware update silently breaks test scriptsthe agent reads the release notes and rewrites them

How it works

The engineer types what they want. No G-code, no Python, no SDK calls — just a sentence.

$ agent.prompt:

"Run a 4-corner probing routine on the reference fixture,
 compute XY offset and tool-length offset, and write the
 corrected origin to G54. Halt and alert me if any probe
 deflection exceeds 0.2 mm."

The agent queries the vector store for documents that match the intent — vendor manuals, internal SOPs, fixture drawings, prior calibration logs. Grounding is explicit; nothing is hallucinated.

$ rag.retrieve("probing + tool-length offset + 3-axis VMC")
  loaded 8 contexts in 2.4 s ·  embed dim 1024 · top-k 8

  ✓ G-Code Programming Manual — Probing Cycles      [vendor]
  ✓ Tool-Length Offset Procedure                    [SOP]
  ✓ Reference Fixture Drawing #RF-204               [CAD]
  ✓ Machine Safety Boundaries — Spindle Region      [safety]
  ✓ Calibration Run History (last 6 mo)             [runs]
  + 3 more

Output is real code, not pseudocode — typed against the retrieved manual, parameterised from the prompt, runnable by the executor in the next stage.

# auto-generated · agent run f4a1 · 14:23:01
from cnc import Controller
from calib import compute_offset, write_g54

mill = Controller(host="10.0.4.12", auth=KEY)

# safe approach
mill.send("G00 G54 X0 Y0 Z25")
mill.send("G31 Z-20 F500")        # probe down (tool-length ref)
z_ref = mill.read_probe()
mill.send("G00 Z25")

# 4-corner XY probing
probes = []
for x, y in [(-50,-50),(50,-50),(50,50),(-50,50)]:
    mill.send(f"G00 X{x*0.9} Y{y*0.9}")
    mill.send(f"G31 X{x} Y{y} F200")
    probes.append(mill.read_probe())
    mill.send("G00 Z25")

dx, dy = compute_offset(probes)
write_g54(mill, dx, dy)
print(f"applied G54: dX={dx:+.3f} dY={dy:+.3f}")

Before a single command leaves the workstation, the script runs against a virtual mill. Each command is checked against the equipment's allowlist and the operating envelope.

$ sandbox.run(script)

→ simulator boot ............................ 0.4 s
→ allowlist check:
   ✓ G00 (rapid)                — allowed
   ✓ G31 (probe)                — allowed
   ✓ G54 origin-write          — allowed
   ✓ all Z values > -25 mm     — within safety envelope
   ✓ spindle-on commands       — none requested  (correct for probing)

→ virtual run:
   ✓ 4 / 4 probe points succeeded
   ✓ no collision predicted
   ✓ computed offset: dX=+0.140 mm  dY=-0.080 mm
   ✓ estimated runtime: 2 m 47 s

PASS — ready for hardware

The operator confirms in person. Once approved, the script flows down through the runtime stack — each layer signs off before the spindle moves.

probe ref fixture · write G54 offset?

[Enter] confirm · [Esc] abort

→ confirmed by m.pletner @ 14:24:18

[ ]

Agent submits signed script

→

⊞

Allowlist gate runtime re-check · 41 cmds

PASS

⚐

Operator gate human confirms in person

CONFIRMED

⇄

Protocol adapter TCP/IP · 10.0.4.12 · controller LAN

SENT

▦

Mill controller probe cycle running · 2 m 47 s

PROBING

✓

Result dX = +0.140 mm · dY = −0.080 mm · audit log signed

DONE

The agent renders a self-contained report — chart, table, log location — and writes it to the validation team's shared store. No screenshotting, no spreadsheet wrangling.

$ dashboard.generate(run="2026-05-23_calibration")

CALIBRATION REPORT · 2026-05-23 14:27
─────────────────────────────────────
  Fixture          RF-204 (reference)
  Probe points     4 / 4 successful
  Max deflection   0.08 mm  (threshold 0.2 mm · headroom 60%)
  Computed offset  dX = +0.140 mm   dY = -0.080 mm
  Z reference      +0.000 mm  (unchanged)
  Applied to       G54  (active WCS)

  ⬢ visualisation     [4-corner offset plot, deflection bars]
  ⬢ audit log         audit-log/2026-05-23_1424.signed
  ⬢ stored at         results/2026-05-23_calibration.json

DONE  ·  ready for next shift

Six-panel engineering diagram of the operator workflow on a CNC mill: 01 PROMPT (operator request as a speech bubble), 02 RETRIEVE (RAG pulls context and machining procedures), 03 GENERATE (machine program with G31/G54/G43/M06), 04 VALIDATE (simulation and constraint checks against a virtual mill), 05 EXECUTE (closed-loop probing cycle on the real machine), 06 REPORT (measurement traceability with offset table, tolerance, checksum and timestamp); FIG. 01 · Operator workflow · REV. 01 · 2026

▣ Performance log · calibration run

2026-05-23 · 14:24 UTC · op m.pletner · run f4a1

Sym	Parameter	Typ	Unit
`t_man`	time, manual baseline	~3	weeks
`t_agt`	time, agent end-to-end	~12 ▼ ~99%	min
`N_prm`	prompts to working run	3	—
`L_gen`	code generated (Python + G-code)	~120	LOC
`C_inf`	inference cost per run · cloud / local	0.85 / 0	USD
`N_gate`	safety gates between code and hardware	2	per cmd
`δ_max`	max probe deflection observed	0.08	mm

Stack runtime · per stage

Python 3.12 → OPC UA · Modbus → vLLM · Llama 3 70B → Milvus → FastAPI

How it wins against general AI tools

When generated code fails — a parameter out of range, a protocol mismatch, a simulator exception — the failure is fed back into the next iteration. The agent re-plans, re-emits, and converges on a working result.

General AIHands you an answer once. You catch the bugs and re-prompt manually.

Every answer is retrieved from your datasheets, vendor manuals, runbooks, and tribal-knowledge captures. The agent cites your documents instead of a generic best practice from the open web.

General AITrained on whatever was crawled. Quietly fabricates the rest.

Before a single command reaches real equipment, the script is executed against a virtual instance or simulator. Behaviour is observed, errors are caught, and only validated runs are released to the floor.

General AIReturns text. Whether it actually runs is your problem.

Bringing the agent online for a new instrument is a documentation upload — datasheet, programmer's manual, SOP. The RAG layer indexes it; the protocol adapter is wired in; no model retraining.

General AIBounded by its training corpus. New gear → new prompt-engineering or a fine-tune.

From plain English to validated equipment code.

The agent reads its own errors

Trained on your equipment, not the internet

Every script proves itself in simulation

Add a machine by adding its manual

Built for production floors, not demos.

Validated before touching hardware

Per-device command allowlist

Engineer confirms execution

Want to see this on your own equipment?