The layer between a working prompt and a production deployment is what passes — or fails — security review. Credential brokers, scoped tool surfaces, policy-gated approvals, and the audit layer that compliance, incident response, and post-mortem all read from. This is the deep work.
A deployment with no evaluation surface, no audit log, no rollback path. Add structured evaluation, decision logging, and reversibility before the next incident — not after.
You have agents acting against real systems. If one starts taking actions it shouldn't, what's the blast radius? Map it before the post-mortem does — and add the policy gates that bound it.
Two hundred registered applications, half of them with permissions nobody can explain. A workload identity audit — what's needed, what's stale, what's a credential compromise waiting to happen. Followed by a remediation plan that doesn't break production.
Workspace provisioning requires four IT tickets and two weeks. Automate compliant workspace creation with DLP, naming, retention, and lifecycle policies enforced by design — not by help desk review.
You built the center-of-excellence. Nobody uses the standards. A governance automation layer that enforces DLP, naming conventions, environment policies, and lifecycle stages — at the platform level, not by gatekeeping.
You want staff to have an AI assistant that only knows what it's allowed to know — not a generic ChatGPT wrapper that might leak or invent. Scoped retrieval, source attribution, and the audit trail to prove it on every response.
Six-figure contractor spend processing forms that haven't changed in a decade. A multi-stage extraction pipeline with consensus voting, deterministic validators, and Raw → Suggested → Final audit layering. Eighty percent automated, twenty percent for human judgment. See the extraction pipeline case study for the pattern.
Your legal team won't sign off until they can audit who decided what. Build orchestration where every agent decision is logged, attributable, and reversible — and where the governance dashboard is the same surface incident response uses.
At 500 seats, M365 Copilot is $180K a year. Before the next renewal, measure whether it's actually generating return — by team, by use case, by surface — and cut what isn't working.
Copilot Studio, Power Automate, Azure AI Foundry, GitHub Copilot — all licensed, all running as isolated pilots. Wire them into a coherent agent architecture with shared identity, audit, and tool boundaries — instead of four parallel governance problems.
One to two weeks, fixed-fee. Written report covering identity, credential surface, tool scope, approval flow, audit, and the specific compliance gaps to close. Threat model, prioritized remediation list, one-page InfoSec summary. The right starting point when an agent project hasn't passed security review yet.
A focused review of a specific agent deployment — MCP server boundaries, credential exposure, blast-radius mapping, and the gaps the next audit will find. Same rigor as Discovery, narrower scope. See the methodology essay for what's actually involved.
Eight to twelve weeks, fixed-fee. End-to-end build on your stack — identity, credential broker, scoped tools, policy gating, audit. Wired into your existing auth and observability. Ships behind a real approval flow with IaC, runbooks, and architecture docs. Handed off to your engineers.
Monthly retainer with flexible hours. Weekly review of active agent work, async PR / design review, on-demand escalation for incidents. For teams with engineers who want experienced eyes on the agentic side without hiring senior headcount.
Agent governance — security review, identity architecture, audit-trail design, blast-radius bounding — is becoming a named vertical. Microsoft shipped the Agent Governance Toolkit in April 2026. Google launched Agent Identity at Cloud Next. Deloitte stood up a formal practice around API governance for agentic AI. Organizations that deployed agents in 2024–2025 are discovering they can't monitor, audit, or halt them — and the cost of getting that wrong scales with deployment size.
I'm Samuel S. I've built agent infrastructure for a global enterprise (see /work) and operate a 70-container AI infrastructure at home as a working lab — the only way to keep current on what's actually production-ready and what's vendor pitch deck. The writing in /writing covers the patterns: agent security review, MCP server boundaries, Entra workload identities, MCP in Copilot Studio.