Engineering, on contract.
Hardcore Engineering Services
Hands-on engineering for teams whose AI agents have to work in production, not just in the demo. Audits, cost rescue, evals, builds, red-teaming. Indicative price bands below. Every engagement is scoped before invoice.
Tokenmaxxing: spend less on models, not less on quality.
Most LLM bills run 3–10× larger than they need to be: wrong models for the job, no caching, no observability, locked to one vendor, over-blown context. Every engagement here ships the same cost-and-context pass I run on my own gateway daily.
#1 Agent Audit
from £6,000Your agent behaves differently on identical inputs and nobody can explain why. In two weeks you get the architecture mapped, the failure modes catalogued, a cost and latency baseline, and a written 90-day roadmap your team can act on.
- Architecture & failure-mode review
- Cost, latency & token-spend baseline
- Eval-readiness assessment
- Written 90-day roadmap
A support-triage agent files the same ticket under three different labels in one afternoon, and the debugging thread is all screenshots, no repro. The usual culprits: an unpinned model version drifting silently, and no eval baseline to catch it. Pinning the version and building a golden set from real traffic turns "nobody can say why" into a ranked failure-mode list your team can start fixing on Monday.
I run multi-model routing, evals and OpenTelemetry tracing on my own production gateway every day. The same discipline lands in your report, not textbook theory.
#2 LLM Cost Rescue
from £4,000Your inference bill scales faster than your user base. Right-sized models per task, caching, prompt diet and open-source routing: the tokenmaxxing pass as a stand-alone rescue.
- Per-route model right-sizing (incl. OSS candidates)
- Caching & context-window diet
- Spend observability dashboard
- Vendor lock-in exit options, costed
The bill doubles month on month while usage stays flat, because every query, password resets and FAQ lookups included, goes through a frontier model at premium per-token rates. Deflecting cheap queries to a small open-source model behind a routing layer, plus a semantic cache for repeats, commonly halves the bill before any deeper work starts.
Tokenmaxxing is my daily practice. I run OSS model routing with per-team budgets and spend observability on my own stack, not just recommend it.
#3 Agent Evals & Reliability
from £8,000You change a prompt and have no idea if you made it better or worse. I build the eval harness, regression gates and guardrails that let you ship changes weekly with evidence, not vibes.
- Eval harness wired into CI
- Golden datasets from your real traffic
- Regression gates & release checklist
- Guardrails for the failure modes that matter
A one-line prompt tweak ships on Tuesday. The regression surfaces on Friday, in a customer complaint, and the rollback debate takes longer than the change did. A CI eval harness with golden sets drawn from real traffic moves that discovery to before merge: the regression fails a gate instead of paging support. Releases go from monthly-and-nervous to weekly.
Eval-first is how I build my own agents. Harnesses, guardrails and failure-mode catalogues are standing tooling in my stack, not a research project.
#4 Agentic Workflow Build
from £15,000You prototyped an agent in a notebook but nobody is comfortable putting it in front of customers. I take it to production: tracing, evals and human-in-the-loop fallback baked in. You own the code; scope changes are quoted, never silently absorbed.
- End-to-end production agent or workflow
- Observability, evals & guardrails from day one
- Deploy + handover documentation
- 2-week warranty: bug fixes & stability; new features scoped separately
A contract-summarisation prototype wins every internal demo and ships to zero customers, because nobody will sign off on running it unsupervised. Productionising it means structured outputs with schema validation, tracing on every tool call, eval-gated deploys, and a human-in-the-loop fallback for low-confidence output. That is the difference between a demo and a system you can put in front of customers.
17 years shipping production systems: apprentice, then Staff Platform Engineer at Tractable AI, then founding engineer at Intropy. Taking prototypes to production is the job I have done for a decade.
#5 Red-Team & Injection-Proofing
from £7,000Your agent has tools that touch real systems: databases, email, payments. And you have never tested what a malicious input makes it do. I attack it the way an adversary would, then close the paths I find.
- Indirect prompt-injection & tool-abuse testing
- Attack-vector map across your tools & MCP surface
- Provenance, allow-listing & approval-gating fixes
- Written findings + severity ranking
An agent with database and email tools obeys whatever it reads, and one poisoned document in the knowledge base is enough to steer it into mailing out what it can see. A structured red-team pass against the OWASP LLM Top 10 reliably surfaces exploit paths like these; the fixes are tool-call approval gates, input provenance tagging, and an action allow-list. Close the paths before launch, not after the incident.
SRE and reverse-engineering background. I think about attack surface and blast radius by instinct, which is exactly the lens agent tooling needs and rarely gets.
AI Platform Setups
Repeatable, fixed-price installs that drop into your stack. Tooling only: if you need the diagnosis and the golden datasets too, that is an engagement above. DevOps and platform engineering is my home turf: these are the foundations that make everything above cheaper and safer to run.
Recruiter? I'm not taking permanent roles. But your client with the stuck AI project? That contract Hardcore Engineering will take.
Hardcore Engineering supplies services company-to-company: statement-of-work, deliverable-based engagements structured for outside-IR35 working (status determination sits with the client). If you have a client with a stuck AI project, a build to ship, or a team that needs senior AI capacity, that is a contract for the company. Refer one that closes and there's a 10–15% referral fee in it for you.
Here's a pitch you can copy-paste to a client:
Not sure which of these fits?
Describe what is stuck in three sentences. Worst case you get a pointer back; best case it becomes a scoped plan.