Work with us

We're a small team doing real work with enterprise clients. Fully remote, no fluff.

Open positions

AI Developer in Test

Fully RemoteFull-timeImmediate start

Open

We're looking for experienced AI Developers in Test to work on validating LLM-based applications, multi-agent systems, and RAG pipelines. You'll make sure AI outputs are safe, reliable, and actually useful for end users.

AI System Validation & Testing

Design and implement testing frameworks for LLM-based applications, multi-agent systems and RAG pipelines
Evaluate AI outputs for accuracy, factual correctness, contextual relevance, bias, safety and compliance
Implement governance workflows, risk scoring and compliance reporting
Set up continuous monitoring for production AI systems with automated alerts for anomalies

Testing Infrastructure & Automation

Validate LLM APIs across multiple providers using Postman, REST Assured, pytest
Integrate AI testing suites into CI/CD pipelines for regression testing, benchmarking and deployment gates
Version control for prompts and test cases using Git, plus MLOps tools like MLflow or Weights & Biases

Data Engineering & Evaluation

Work with JSON, CSV, Parquet, JSONL — build evaluation datasets from real-world scenarios
Design automated metrics and human-in-the-loop workflows including inter-annotator agreement
Create and maintain domain-specific evaluation benchmarks and ground truth datasets

Leadership & Client Work

Build and lead a testing team, be the go-to expert on AI quality
Own client relationships — retention and growth
Share your knowledge within the B-Sure Digital consulting community

What you need

Hands-on experience with GenAI apps — prompt engineering, API integrations, output workflows
Familiarity with RAG, vector databases, embedding strategies and multi-agent systems
Understanding of GenAI challenges: hallucinations, prompt sensitivity, output variability
Solid testing background in enterprise environments
Proficiency with API testing frameworks, CI/CD and Git
Data handling skills and knowledge of AI evaluation methodologies
Comfortable working in ambiguous environments — this is a new field, you'll help define it
Strong communication skills — you'll talk to both engineers and stakeholders

Nice to have

Experience using LLMs (ChatGPT, Claude, etc.) for test generation
Knowledge of computer vision for visual testing
Experience with AWS, Azure, or GCP
ISTQB or similar certifications

What you get

Competitive salary
Fully remote — work from wherever
Budget for training and conferences
Access to the latest AI tools and platforms
Diverse international client projects

Apply Now

Don't see your role?

We're always open to hearing from good engineers. Drop us a message and we'll talk.

Get in Touch