โก๏ธ Apply here: DevOps Engineer
๐ Monitor #devops jobs
๐ฉโ๐ผ Want to stand out? Improve your resume to appeal to recruiters, hiring managers, and Applicant Tracking Systems. โก๏ธ Improve your resume
Company Description:
We are TBC โ a technology company where a bold and determined team creates customer-oriented services, products, and opportunities. Through innovation and technology, we fulfill our mission and make peopleโs lives easier. We support your growth, help you achieve your goals, and empower you to create your own success story โ all that matters is that you believe.
Job Description:
We are looking for a DevOps Engineer to join our Data Department, which is building Big Data and AI Platforms from scratch. You will play a key role in designing, deploying, and scaling the core infrastructure that powers big data processing, machine learning pipelines, and enterprise analytics. This position involves working across Microsoft Azure, collaborating with data engineers, ML engineers, and analytics teams to build a secure, automated, and cost-efficient foundation for the organizationโs future Data & AI ecosystem.
Key Responsibilities:
* Design, deploy, and operate core infrastructure for a new Data and AI Platform โ covering data ingestion, transformation, ML model training, and analytical workloads.
* Architect and manage Azure resources โ including subscriptions, IAM, networking, monitoring, and FinOps governance โ to support large-scale data and ML environments.
* Build and operate Kubernetes-based platforms (AKS) to orchestrate microservices across data, ML, and analytics layers.
* Implement and maintain microservices and event-driven architectures, leveraging Ingress controllers, service meshes and distributed load balancing.
* Develop and maintain Infrastructure as Code (IaC) using Terraform and Terragrunt, building modular, reusable, and environment-specific components that follow the DRY principle.
* Establish GitOps workflows with Argo CD and Azure DevOps, ensuring fully automated, auditable, and consistent deployments across all environments.
* Implement monitoring and observability stacks (Prometheus, Grafana, Azure Monitor) for end-to-end visibility into data, compute, and network layers.
* Apply FinOps principles โ perform cost analysis, tagging, budgeting, and optimization.
* Collaborate with Data and ML teams to deploy and manage core platforms and tools such as Databricks, MLFlow and vector-enabled databases for AI workloads.
* Manage high-performance load balancers for real-time ML inference and large-scale data services.
* Ensure secure network architecture across hybrid environments, managing VNETs, subnets, private endpoints, DNS (Azure Private Resolver), and VPN routing.
* Contribute to the design of scalable, cost-effective, and reliable data infrastructure โ from concept to production.
Qualifications:
* 3+ years of hands-on DevOps experience, preferably in data or AI-focused environments.
* Proven experience building or scaling cloud-native data and ML platforms from the ground up.
* Deep knowledge of cloud infrastructure concepts, governance, and automation.
* Strong background in Linux administration, containerization, and Kubernetes orchestration (AKS, on-prem, or hybrid).
* Expertise with Terraform and Terragrunt, building modular IaC for multi-environment deployments.
* Proficiency with CI/CD and GitOps automation using Argo CD, Flux, or Azure DevOps.
* Hands-on experience with observability tools (Prometheus, Grafana, Azure Monitor).
* Solid understanding of networking principles (VNETs, DNS, routing, VPNs, firewalls).
* Excellent problem-solving, automation, and documentation skills.
Additional Information:
Nice to Have:
* Experience with Databricks for data engineering, analytics, or ML workflows.
* Exposure to data lakehouse architectures (Delta Lake, Parquet, Synapse).
* Experience with FinOps automation tools (Kubecost, CloudHealth, Azure Cost Management).
* Familiarity with policy-as-code frameworks (OPA, Azure Policy, Conftest).
* Experience with service mesh and API gateway technologies (Istio, Linkerd, Kong).
* Familiarity with autoscaling frameworks (Karpenter, Cluster Autoscaler) for dynamic data workloads.
* Experience implementing FinOps practices โ cost monitoring, usage optimization, and budgeting.
* Experience with message brokers and streaming platforms (Kafka, NATS, Azure Service Bus).
* Collaboration experience with Data Engineering, Analytics, or MLOps teams.
* Exposure to LLM-based or AI-driven workloads and model deployment strategies.
