AI Agent Reliability Engineer - Chaps

Craft Docs Limited, Inc.
London
About Craft & Chaps

At Craft, we rethink productivity from first principles. Our products disappear into the background so people can do their life's work-fast, joyfully, and without friction.

Chaps is our new AI-first product, focused on turning a constellation of large-language-model agents into a seamless personal productivity assistant.

About the role

Our AI Product team is looking for an engineer who obsesses over making multi-agent systems robust, observable, and continuously improving . You'll build the test harnesses, evaluation pipelines, and monitoring layers that keep dozens of collaborating agents on-task, on-budget, and on-time.

In practice, that means:
  • Designing automated evals that exercise complete agent workflows-catching regressions before they reach users.
  • Instrumenting every prompt, tool-call, and model hop with rich telemetry so we can trace root causes in minutes, not days.
  • Creating feedback loops that turn logs, user ratings, and synthetic tests into better prompts and safer behaviors.
  • Future-proofing agentic systems by allowing quality to evolve with LLM intelligence.
You will partner with product, research, and infra to ship an AI assistant users can trust-no surprises, no downtime.

What we're looking for

You must have:
  • Hands-on experience with LLM evaluation frameworks (e.g., OpenAI Evals, LangSmith, LLM-Harness) and a track record of turning eval results into product-ready gating.
  • Observability chops -you've wired up tracing/metrics for distributed systems (OpenTelemetry, Prometheus, Grafana) and know how to set SLOs that actually matter.
  • Prompt-engineering fluency -few-shot, function-calling, RAG orchestration-and an instinct for spotting ambiguity or jailbreak vectors.
  • Production-grade Python/TypeScript skills and comfort shipping through CI/CD (GitHub Actions, Terraform, Docker/K8s).
  • A bias for experimentation : you automate A/B tests, cost-latency trade-off studies, and rollback safeguards as part of the dev cycle.
It would be great if you have:
  • Experience scaling multi-agent planners or tool-using agents in real products.
  • Familiarity with vector databases, semantic diff tooling, or RLHF/RLAIF pipelines.
  • A knack for weaving human feedback (support tickets, thumbs-downs) into automated regression tests.
Our Culture
  • Think differently. We value novel ideas over legacy playbooks-and we give you room to explore.
  • People first. You instrument systems so users never feel the bumps; you collaborate so teammates never feel stuck.
  • Pragmatic craftsmanship. We ship fast, but we measure twice-data accuracy, latency budgets, and reliability all matter.
  • Clear communicators. You translate metrics into stories that product managers and designers understand, sparking better decisions.
Join us if you want to make AI that works-every request, every time.
Posted 2025-07-15

Recommended Jobs

Structural Designer (CAD)

SMP Group
London

About SMP Group SMP Group is a leading, multi -award winning producer in POS (Point of Sale) and Out-of-Home Advertising. We are the trusted partner for some of the world’s most recognisable brand…

View Details
Posted 2025-07-16

Chef De Partie - Full Time

Compass Group
London

Salary: £16.25 per hour Shift hours: Full Time Chef de Partie – Monday to Friday – £16.25 per hour – Canary Wharf We’re on the lookout for a talented  Chef de Partie to join our dynamic team i…

View Details
Posted 2025-07-15

Senior Social Media Creative

Next 15 Group plc
London

Cubaka is an award-winning Social Media agency based in the UK that works with brands such as Hartley's Jelly & Jam, Bathroom Brands, bp, PlayOJO, AXA Investment Managers, Linda McCartney Foods,…

View Details
Posted 2025-06-05

english specialist tutor

Prospero Teaching
London

needs to be qualified -must be local to Barking and Dagenham -able to drive -Hybrid -highly qualified -SEN/SEMH experience

View Details
Posted 2025-06-30

Group Business Partner in London Area

London

Job description Freshminds has partnered with a leading entertainment brand looking to hire a Commercial Analyst / Group Business Partner to drive key strategic initiatives within its Operations a…

View Details
Posted 2025-07-04

Oracle eBusiness HR/Payroll Consultant

Talenterprize
London

Engagement Type:  Support Description: Talenterprize, working with central Government, are recruiting for an Oracle R12 HR Payroll consultant. Working within an established team to deliver the Or…

View Details
Posted 2025-05-22

Clinical Service Delivery Manager

Doctor Care Anywhere
London

Thanks for stopping by! We’re Doctor Care Anywhere:  a leading digital platform, with a clear vision to be the primary care provider of choice for digital healthcare – and that all starts with our br…

View Details
Posted 2025-07-09

Mobile Maintenance Gardener

Keyman Personnel
London

A chance to work with a leading lawncare provider, across London, for a prestigious portfolio of clients. Job Title: Mobile Maintenance Gardener When: Permanent position – start date ASAP …

View Details
Posted 2025-06-26

Senior Manager, Global Medical Affairs

Theramex
London

About the Role: We are seeking a highly motivated and experienced Senior Manager, Global Medical Affairs to join our dynamic team at Theramex. This role is pivotal in driving medical strategy, s…

View Details
Posted 2025-06-05

Executive Personal Assistant - Permanent - Hybrid

Sage
London

Executive Business Partner, Sales and Marketing – Permanent - £49,000 - £51,000.  Join us and be part of a mission-driven, independent publisher. You’ll work with a diverse group of people who shar…

View Details
Posted 2025-06-20