➡️ Apply here: Generative AI Operations Engineer (GenAI Ops)
👩💼 Want to stand out? Improve your resume to appeal to recruiters, hiring managers, and Applicant Tracking Systems. ➡️ Improve your resume
We are seeking a highly motivated and experienced **Generative AI Operations (GenAI Ops) Engineer** to join our innovative team.
In this role, you will be at the forefront of the AI revolution, responsible for building, deploying, and maintaining the operational infrastructure for our cutting-edge generative AI models and services. You will work closely with data scientists, machine learning engineers, and software developers to ensure our GenAI applications—especially complex, multi-agent systems—are scalable, reliable, and efficient across major cloud platforms. If you are passionate about operationalizing large-scale AI systems and want to make a significant impact, this is the role for you.
To discover more about Cloud practice at EPAM Georgia, visit this page.
*Experience the freedom of remote work from anywhere in Georgia, whether from the comfort of your home, our modern offices in Tbilisi and Batumi or a coworking space in Kutaisi.*
**Responsibilities**
* Build and Manage CI/CD Pipelines: Design, implement, and maintain robust, automated CI/CD pipelines for training, evaluating, and deploying large language models (LLMs) and AI agents
* Orchestrate Agentic AI Workflows: Design, deploy, and manage sophisticated, multi-agent systems. Ensure seamless Agent-to-Agent (A2A) communication and collaboration between specialized agents to automate complex business processes
* Manage Tool Integration: Implement and manage secure, scalable integrations between AI agents and external tools/APIs, leveraging open standards like the Model Context Protocol (MCP) to ensure interoperability
* Leverage AI-Powered Development: Utilize AI-powered development tools to accelerate the entire software development lifecycle, from writing infrastructure code and tests to troubleshooting operational issues in cloud environments
* Infrastructure as Code (IaC): Utilize cloud-native IaC services or cloud-agnostic tools like Terraform to define and manage the infrastructure required for GenAI workloads
* Model Monitoring and Observability: Implement comprehensive monitoring and logging solutions to track model and agent performance, resource utilization, and system health. For agentic systems, this includes tracing the agent’s actions and logging the multi-step conversational flow
* Scalability and Performance Optimization: Design and implement scalable architectures for model serving and inference. Continuously optimize the performance and cost-effectiveness of our GenAI services
* Security and Compliance: Implement and enforce security best practices for our GenAI infrastructure and data. Ensure compliance with industry standards and regulations
**Requirements**
* Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience
* 3+ years of experience in a DevOps, SRE, or MLOps role with a focus on cloud infrastructure
* Proven experience with cloud services from major providers like AWS, Google Cloud, or Azure
* Strong experience building and managing CI/CD pipelines using tools like Jenkins, GitLab CI, or cloud-native services
* Proficiency in at least one scripting language (e.g., Python, Bash)
* Hands-on experience with Infrastructure as Code (IaC) tools such as AWS CDK, CloudFormation, or Terraform
* Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes)
* Fluent English communication skills at a B2+ level
**Nice to have**
* Master’s degree or PhD in Computer Science, AI, Machine Learning, or a related field
* Experience with cloud-native GenAI services like AWS Bedrock, Azure AI Foundry, or Google Vertex AI
* Familiarity with the architecture and operational challenges of Large Language Models (LLMs)
* Experience designing or managing multi-agent systems or complex, orchestrated workflows
* Knowledge of monitoring and observability tools like Prometheus, Grafana, or Datadog
* Relevant cloud or DevOps certifications
* Strong problem-solving skills and the ability to work effectively in a fast-paced, collaborative environment
**We offer**
* **We connect like-minded people**
* Delivering innovative solutions to industry leaders, making a global impact
* Enjoyable working environment, whether it is the vibrant office or the comfort of your own home
* Opportunity to work abroad for up to two months per year
* Relocation opportunities within our offices in 55+ countries
* Corporate and social events
* **We invest in your growth**
* Leadership development, career advising, soft skills and well-being programs
* Certifications, including GCP, Azure and AWS
* Unlimited access to LinkedIn Learning and Get Abstract
* Free English classes with certified teachers
* **We cover it all**
* Participation in the Employee Stock Purchase Plan
* Monetary bonuses for engaging in the referral program
* Comprehensive medical & family care package
* Five trust days per year (sick leave without a medical certificate)
* Benefits package (sports activities, a variety of stores and services)
EPAM Georgia is a team of innovators united by a passion for technology. The dynamic and inclusive culture we embrace helps positively impact our communities, clients, and employees. Here you will collaborate with multi-national teams, contribute to numerous cutting-edge projects, deliver the most creative solutions, and have an opportunity to learn. Our people are at the heart of our success, and we are proud to provide talents with a solid ground to develop and grow.