➡️ Apply here: Senior Site Reliability Engineer
🔔 Monitor #sre #team_lead jobs
👩💼 Want to stand out? Improve your resume to appeal to recruiters, hiring managers, and Applicant Tracking Systems. ➡️ Improve your resume
We are seeking a Senior Site Reliability Engineer to drive the deployment, monitoring and automation of critical infrastructure, ensuring high system reliability and performance for our customers.
Responsibilities:
* Deploy and manage Prometheus for real-time data monitoring and collection
* Deploy and manage Prometheus Exporters to gather metrics from diverse sources
* Develop custom Prometheus Exporters in Python for systems lacking native Prometheus support
* Integrate Grafana for visualization and develop insightful dashboards to facilitate data-driven decision-making
* Configure alerts in Grafana for rapid alarming around system outages
* Configure GitLab CI pipelines to support continuous integration workflows
* Manage service configuration and infrastructure automation with Ansible and Terraform
* Oversee containerized services and deployments in Docker and Kubernetes
* Collaborate on network establishment change requests with relevant stakeholders
* Maintain direct communication channels with customer teams
* Work with other team members to ensure system scalability, reliability and efficiency
* Assist in troubleshooting and resolving system-related issues to maintain high system uptime and performance
Requirements:
* 3+ years of working experience in Site Reliability Engineering or related roles
* Expertise in DevOps, Grafana, Grafana Mimir and Prometheus
* Proficiency in deploying and managing Prometheus Exporters and developing custom exporters in Python
* Skills in integrating and configuring Grafana dashboards and alerts
* Competency in configuring GitLab CI pipelines
* Background in infrastructure automation using Ansible and Terraform
* Knowledge of managing containerized services with Docker and Kubernetes
* Capability to collaborate effectively with customer teams and internal stakeholders
* Understanding of troubleshooting methodologies for system-related issues
* Flexibility to work on network establishment and change requests
* Upper-Intermediate English language proficiency (B2)
We offer:
* We connect like-minded people
* Delivering innovative solutions to industry leaders, making a global impact
* Enjoyable working environment, whether it is the vibrant office or the comfort of your own home
* Opportunity to work abroad for up to two months per year
* Relocation opportunities within our offices in 55+ countries
* Corporate and social events
* We invest in your growth
* Leadership development, career advising, soft skills and well-being programs
* Certifications, including GCP, Azure and AWS
* Unlimited access to LinkedIn Learning and Udemy
* Free English classes with certified teachers
* We cover it all
* Participation in the Employee Stock Purchase Plan
* Monetary bonuses for engaging in the referral program
* Comprehensive medical & family care package
* Five trust days per year (sick leave without a medical certificate)
* Benefits package (sports activities, a variety of stores and services)
EPAM Georgia is a team of innovators united by a passion for technology. The dynamic and inclusive culture we embrace helps positively impact our communities, clients, and employees. Here you will collaborate with multi-national teams, contribute to numerous cutting-edge projects, deliver the most creative solutions, and have an opportunity to learn. Our people are at the heart of our success, and we are proud to provide talents with a solid ground to develop and grow.
