➡️ Apply here: Site Reliability Engineer

👩‍💼 Want to stand out? Improve your resume to appeal to recruiters, hiring managers, and Applicant Tracking Systems. ➡️ Improve your resume

Are you looking for a space for innovation, diversity, and self-development? Then Tazetec has the right challenge for you!

is looking for **Site Reliability Engineer (SRE)**

Your profile:
* Experience with AWS or hybrid data center setups;
* Reading logs and stacktraces to determine the root cause of the incident;
* Infrastructure as Code: Terraform, Helm, Ansible, (optional) Werf;
* Linux administration and container orchestration (K8s) skills;
* Experience with monitoring/observability stacks: Prometheus, Grafana, ELK, Loki, etc.;
* Strong understanding of TCP/IP, DNS and load balancers;
* Familiarity with incident response, postmortems, and blameless culture;
* Availability to work between 5 PM and 8 AM CET, in one of the following shifts: 17:00–01:00 or 00:00–08:00.

Bonus points for:
* Background in high-throughput environments (e.g., financial, trading, etc.);
* Experience with CDNs, and real-time log aggregation;
* Proficiency in one or more scripting languages (Python, Bash, Go);
* Knowledge of Java, PHP with their respective web-development frameworks;
* Hands-on experience with MSSQL, PostgreSQL, MongoDB, etc;
* Exposure to Kafka, Redis or other event-driven systems.

Your day-to-day contribution:
* Maintain and improve SLA/SLO/SLI metrics for critical systems (e.g. KYC, payments);
* Manage and support highly available, scalable infrastructure (K8s, cloud and bare metal);
* Implement and manage monitoring, logging, and alerting (e.g., Prometheus, Grafana, Loki, ELK);
* Automate deployments and operations using CI/CD pipelines (Jenkins, ArgoCD, Helm, etc.);
* Conduct post-incident reviews, define action items, and reduce mean time to recovery (MTTR);
* Participate in on-call rotation to ensure 24/7 system reliability;
* Secure infrastructure in line with regulations (e.g., data integrity, jurisdictional compliance);
* Collaborate with Dev, QA, DevOps and Ops to improve services stability and uptime.

Success Metrics:
* < 1% downtime for any user-/partner-facing services; * SLO 99.95%; * 95% of infrastructure managed via code and automation; * Documented runbooks and alert playbooks per service group.What's next: * I step: HR interview (30 minutes); * II step: technical interview (1 hour); * III step: introduction to the team (1 hour); * IV step: final decision.Our benefits package: * Inspiring and diverse culture surrounded by experienced and enthusiastic colleagues; * Flexible working hours to ensure your work-life balance; * Ability to choose to work remotely or at the office; * Star players/Top Salary policy; * Paid vacation days; * An attractive package of medical insurance; * Inspiring and diverse culture surrounded by experienced and enthusiastic colleagues; * Opportunities for personal and professional development: monthly compensation for your leisure activities, fully paid courses and workshops, etc; * An active corporate life: team-buildings, sports activities, corporate parties, etc; * A collaborative and welcoming environment for your initiatives.Are you ready to take on this challenge? Send your English CV to recruitment@tazetec.com

Tazetec is hiring Site Reliability Engineer

Previous and next articles

Previous and next articles

Similar jobs