Work with us
We're a small team doing real work with enterprise clients. Fully remote, no fluff.
Open positions
AI Developer in Test
Fully RemoteFull-timeImmediate start
We're looking for experienced AI Developers in Test to work on validating LLM-based applications, multi-agent systems, and RAG pipelines. You'll make sure AI outputs are safe, reliable, and actually useful for end users.
AI System Validation & Testing
- Design and implement testing frameworks for LLM-based applications, multi-agent systems and RAG pipelines
- Evaluate AI outputs for accuracy, factual correctness, contextual relevance, bias, safety and compliance
- Implement governance workflows, risk scoring and compliance reporting
- Set up continuous monitoring for production AI systems with automated alerts for anomalies
Testing Infrastructure & Automation
- Validate LLM APIs across multiple providers using Postman, REST Assured, pytest
- Integrate AI testing suites into CI/CD pipelines for regression testing, benchmarking and deployment gates
- Version control for prompts and test cases using Git, plus MLOps tools like MLflow or Weights & Biases
Data Engineering & Evaluation
- Work with JSON, CSV, Parquet, JSONL — build evaluation datasets from real-world scenarios
- Design automated metrics and human-in-the-loop workflows including inter-annotator agreement
- Create and maintain domain-specific evaluation benchmarks and ground truth datasets
Leadership & Client Work
- Build and lead a testing team, be the go-to expert on AI quality
- Own client relationships — retention and growth
- Share your knowledge within the B-Sure Digital consulting community
What you need
- Hands-on experience with GenAI apps — prompt engineering, API integrations, output workflows
- Familiarity with RAG, vector databases, embedding strategies and multi-agent systems
- Understanding of GenAI challenges: hallucinations, prompt sensitivity, output variability
- Solid testing background in enterprise environments
- Proficiency with API testing frameworks, CI/CD and Git
- Data handling skills and knowledge of AI evaluation methodologies
- Comfortable working in ambiguous environments — this is a new field, you'll help define it
- Strong communication skills — you'll talk to both engineers and stakeholders
Nice to have
- Experience using LLMs (ChatGPT, Claude, etc.) for test generation
- Knowledge of computer vision for visual testing
- Experience with AWS, Azure, or GCP
- ISTQB or similar certifications
What you get
- Competitive salary
- Fully remote — work from wherever
- Budget for training and conferences
- Access to the latest AI tools and platforms
- Diverse international client projects
Don't see your role?
We're always open to hearing from good engineers. Drop us a message and we'll talk.
Get in Touch