➡️ Apply here: Site Reliability Engineer
👩💼 Want to stand out? Improve your resume to appeal to recruiters, hiring managers, and Applicant Tracking Systems. ➡️ Improve your resume
Are you looking for a space for innovation, diversity, and self-development? Then Tazetec has the right challenge for you!
is looking for **Site Reliability Engineer (SRE)**
Your profile:
* Experience with AWS or hybrid data center setups;
* Reading logs and stacktraces to determine the root cause of the incident;
* Infrastructure as Code: Terraform, Helm, Ansible, (optional) Werf;
* Linux administration and container orchestration (K8s) skills;
* Experience with monitoring/observability stacks: Prometheus, Grafana, ELK, Loki, etc.;
* Strong understanding of TCP/IP, DNS and load balancers;
* Familiarity with incident response, postmortems, and blameless culture;
* Availability to work between 5 PM and 8 AM CET, in one of the following shifts: 17:00–01:00 or 00:00–08:00.
Bonus points for:
* Background in high-throughput environments (e.g., financial, trading, etc.);
* Experience with CDNs, and real-time log aggregation;
* Proficiency in one or more scripting languages (Python, Bash, Go);
* Knowledge of Java, PHP with their respective web-development frameworks;
* Hands-on experience with MSSQL, PostgreSQL, MongoDB, etc;
* Exposure to Kafka, Redis or other event-driven systems.
Your day-to-day contribution:
* Maintain and improve SLA/SLO/SLI metrics for critical systems (e.g. KYC, payments);
* Manage and support highly available, scalable infrastructure (K8s, cloud and bare metal);
* Implement and manage monitoring, logging, and alerting (e.g., Prometheus, Grafana, Loki, ELK);
* Automate deployments and operations using CI/CD pipelines (Jenkins, ArgoCD, Helm, etc.);
* Conduct post-incident reviews, define action items, and reduce mean time to recovery (MTTR);
* Participate in on-call rotation to ensure 24/7 system reliability;
* Secure infrastructure in line with regulations (e.g., data integrity, jurisdictional compliance);
* Collaborate with Dev, QA, DevOps and Ops to improve services stability and uptime.
Success Metrics:
* < 1% downtime for any user-/partner-facing services;
* SLO 99.95%;
* 95% of infrastructure managed via code and automation;
* Documented runbooks and alert playbooks per service group.What's next:
* I step: HR interview (30 minutes);
* II step: technical interview (1 hour);
* III step: introduction to the team (1 hour);
* IV step: final decision.Our benefits package:
* Inspiring and diverse culture surrounded by experienced and enthusiastic colleagues;
* Flexible working hours to ensure your work-life balance;
* Ability to choose to work remotely or at the office;
* Star players/Top Salary policy;
* Paid vacation days;
* An attractive package of medical insurance;
* Inspiring and diverse culture surrounded by experienced and enthusiastic colleagues;
* Opportunities for personal and professional development: monthly compensation for your leisure activities, fully paid courses and workshops, etc;
* An active corporate life: team-buildings, sports activities, corporate parties, etc;
* A collaborative and welcoming environment for your initiatives.Are you ready to take on this challenge? Send your English CV to recruitment@tazetec.com