➡️ Apply here: Site Reliability Engineer

👩‍💼 Want to stand out? Improve your resume to appeal to recruiters, hiring managers, and Applicant Tracking Systems. ➡️ Improve your resume

Site Reliability Engineer (SRE)

🌍 Fully Remote (offices in Limassol, Kyiv, London, Tbilisi)

🕓 Shifts: 17:00–01:00 or 00:00–08:00 CET

Our client is one of the fastest-growing B2B iGaming solution providers in Europe, with over 100 remote team members across the continent. They deliver high-quality software platforms, payment integrations, marketing tools, and technical support for online casino and betting operators.

As the company continues to expand, we’re looking for an experienced Site Reliability Engineer who will help strengthen infrastructure reliability and scalability, ensuring top performance under high transaction volumes and strict regulatory demands.

Key Requirements:
• 2–5 years in SRE / Infrastructure / Platform / Production DevOps
• Strong Linux experience in production
• Networking: TCP/IP, DNS, HTTP, load balancers, TLS
• Kubernetes in production (cluster ops, networking, ingress)
• AWS experience (EC2, ALB/NLB, RDS, S3, IAM, EKS or self-managed K8s)
• Terraform, Ansible (IaC), Helm (optional)
• Observability tools: Prometheus, Alertmanager, Grafana, ELK, Loki
• Containers and image lifecycle (Docker)
• Troubleshooting across application, network, and infrastructure layers
• CI/CD pipelines: Jenkins, GitLab CI, GitHub Actions, ArgoCD
• Incident response experience and participation in post-incident reviews
• Availability for late-evening and night shifts 17:00–01:00 or 00:00–08:00 CET

Bonus Skills:
• Experience with high-load or real-time systems
• CDNs, log aggregation, real-time analytics
• Scripting: Python, Bash, Go
• Knowledge of Java/PHP ecosystems
• Databases: PostgreSQL, MySQL, MongoDB
• Message systems: Kafka, Redis, RabbitMQ
• External API integrations

Key Responsibilities:
• Ensure reliability, scalability, and performance of distributed services
• Operate and improve Kubernetes clusters
• Manage AWS-based infrastructure
• Build and maintain IaC with Terraform and Ansible
• Enhance monitoring, logging, and alerting stacks
• Handle production incidents end-to-end and reduce MTTR
• Maintain SLOs, SLIs, and error budgets for critical systems
• Automate operations and reduce manual toil
• Collaborate with engineering teams to embed SRE practices

Success Metrics:
• < 1% downtime for critical services • SLO: 99.85–99.95% availability • 90–95% of infrastructure managed as code • Consistent reduction of MTTR • Completed post-incident actions and improved system resilienceWhy you’ll love it: ✅ International, respectful, and goal-driven team ✅ Competitive salary based on experience ✅ Fully remote work with optional office access (Limassol, Kyiv, London, Tbilisi) ✅ Flexible schedule — performance matters more than hours ✅ Unlimited paid time off and sick leave ✅ Private medical insurance ✅ Wellness & learning compensation (gym, courses, Netflix, spa days, etc.) ✅ Career development opportunities and biannual learning raffles ✅ Modern tech stack and challenging high-load projects ✅ Supportive culture with team-building events and referral bonusesSeniority level Mid-Senior level Employment type Full-time Job function Engineering and Information Technology Industries Software Development

OnHires is hiring Site Reliability Engineer

Previous and next articles

Previous and next articles

Similar jobs