➡️ Apply here: Senior Database Reliability Engineer (DBRE) & Architect
🔔 Monitor #sre #architect jobs
👩💼 Want to stand out? Improve your resume to appeal to recruiters, hiring managers, and Applicant Tracking Systems. ➡️ Improve your resume
This position is open at a global product-led IT company specializing in infrastructure stability and security solutions. Their products are recognized as the industry standard in the Hosting and Enterprise segments, powering over 500,000 servers worldwide.
In 2025, the company is evolving its data management strategy, shifting from traditional database administration to an Internal Database-as-a-Service (DBaaS) model. This role requires a visionary engineer to design resilient distributed systems, automate infrastructure through code, and transform databases into a reliable service for product teams. This is an ideal opportunity for those ready to handle petabytes of data and build high-scale platform solutions.
Key Challenges & Responsibilities:
Designing and implementing a self-service platform (Terraform + Ansible) for deploying HA clusters (PostgreSQL, ClickHouse, MongoDB, Redis) in a heterogeneous environment (Bare Metal, OpenNebula, K8s, Public Clouds)
Managing rapidly growing analytics clusters (12+ clusters, tens of terabytes), focusing on sharding, ReplicatedMergeTree, and building reliable S3 backup pipelines under high load
Maintaining and scaling infrastructure for Apache Airflow and Redash, ensuring the reliability of ETL pipelines and visualization tools
Implementing SRE practices in data management: replacing manual incident response with automated self-healing mechanisms and defining SLO/SLIs
Migrating legacy solutions to modern cloud patterns and implementing Kubernetes operators for stateful workloads
Serving as a technical authority for product teams to optimize data schemas and SQL queries for high-load systems
Tech Stack:
DB: PostgreSQL 15+ (Patroni, PgBouncer), ClickHouse (Sharded/Replicated), MongoDB, Redis, Kafka
Data & Analytics: Apache Airflow, Redash
Infrastructure: Hybrid Cloud (3+ private DCs, OpenNebula, K8s, Bare Metal, AWS, GCP, Azure, DO)
IaC & CI/CD: Terraform, Ansible, Python/Go, GitLab, Jenkins, Gerrit
Observability: VictoriaMetrics, Grafana, Loki
Requirements
Must have:
5+ years of PostgreSQL expertise: deep knowledge of MVCC, locking mechanics, expert-level Patroni/PgBouncer configuration, and experience with seamless major version upgrades under load
ClickHouse mastery: experience operating large clusters, understanding ZooKeeper/ClickHouse Keeper, sharding, replication internals, and performance diagnostics at the data-part level
Engineering mindset (SRE/DevOps): experience writing complex Terraform modules and Ansible roles; proficiency in Python or Go for automation is a major asset
Hybrid environment experience: understanding the nuances of running DBs on Bare Metal vs. Kubernetes vs. Public Cloud, with the ability to optimize TCO and disk subsystem performance (NVMe, Network Storage)
Systems approach: understanding the full stack from network packets to business logic, including security standards (FIPS, Audit logs) and Disaster Recovery
Nice to Have:
Experience building an Internal Developer Platform (IDP)
Experience operating databases in Kubernetes via operators (CloudNativePG, Altinity Operator)
Background working with Cloud or Hosting providers on similar services
Benefits
Fully remote work from any location worldwide and flexible working hours
Opportunity to impact architectural decisions for services used by thousands of companies globally
24 days of vacation, 10 national holidays, and unlimited paid sick leave
Compensation for private medical insurance
Reimbursement for co-working spaces and gym/sports activities
Dedicated budget for education, training, and conferences
Reward program for innovative ideas that lead to company patents
