➡️ Apply here: DevOps/Cloud Engineer
👩💼 Want to stand out? Improve your resume to appeal to recruiters, hiring managers, and Applicant Tracking Systems. ➡️ Improve your resume
DevOps/Cloud Engineer
Responsibilities:
Manage and operate Google Kubernetes Engine (GKE) clusters at platform level, including upgrades, dependency management, resource optimisation, and failure handling across multi-tenant environments.
Own and scale Weaviate vector database infrastructure, including cluster tuning, replication, sharding, and long-term operational stability in support of RAG-based AI systems.
Design and enforce security across the platform: GCP IAM, Workload Identity, VPC configuration, Private Service Connect, and Secret Manager.
Build and maintain end-to-end observability for LLM-based systems using Grafana, LangFuse, and LangSmith — covering performance, latency, token usage, and alerting.
Own infrastructure as code using Terraform, and manage GitOps workflows via ArgoCD or equivalent.
Take production ownership of the platform — not just initial deployment, but ongoing health, incident response, upgrades, and long-running system reliability.
Participate in customer-facing engagements including architecture reviews, technical discussions, and stakeholder meetings.
Support and mentor junior engineers on platform standards and best practices.
Requirements – Must have:
5+ years of hands-on experience in infrastructure or platform engineering.
Deep GKE and Kubernetes expertise: multi-cluster operations, RBAC, Workload Identity, node pool management, networking, and complex upgrades — at platform level, not single-application setups.
Proven experience operating multi-tenant shared infrastructure across teams or use cases.
Experience with Weaviate or comparable vector databases (Elasticsearch, Opensearch) in production, with understanding of retrieval systems and RAG architectures.
Strong GCP skills: IAM, VPC, Secret Manager, Cloud Monitoring, and GCP-native security practices.
Terraform and IaC experience with GitOps delivery (ArgoCD, Flux or equivalent).
Demonstrated production ownership: someone who has dealt with systems breaking, not just building them.
Requirements – Nice to have:
Experience with LangFuse or LangSmith for LLM observability.
Background in regulated industries.
Customer-facing experience in a consulting or professional services context.
Experience in Life Science
Languages: English: B2+
