Skip navigation EPAM

Lead Site Reliability Engineer (Azure) Pune, India

  • hot

Lead Site Reliability Engineer (Azure) Description

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

We are seeking a highly skilled and experienced Lead Site Reliability Engineer with a focus on Azure environments to join our team.

In this crucial role, you will leverage your expertise to enhance the reliability and scalability of our cloud-based platforms, ensuring efficient operation and optimal performance. This position involves collaborating closely with cross-functional teams to migrate existing services to the OpenShift platform and make our infrastructure Cloud agnostic. As a leader, you’ll guide your team in creating resilient systems and processes that support both internal and external customers relying on our desktop applications and services.


#LI-DNI#EasyApply

Responsibilities

  • Oversee migration of services to OpenShift and work towards making our infrastructure Cloud agnostic
  • Run pipelines using Azure DevOps for environment configuration and application deployment
  • Leverage Python, bash, and PowerShell to automate routine and complex tasks
  • Implement and manage Kubernetes and container-based environments
  • Monitor cloud resources efficiently and improve system performance in line with SLI metrics
  • Debug and resolve operational issues swiftly and effectively
  • Collaborate with development and operations teams to ensure system reliability and security
  • Mentor team members and lead by example in maintaining best practices for site reliability
  • Continuously assess, improve and optimize existing system architecture and applications
  • Stay up-to-date with technological advancements and integrate innovative tools and techniques

Requirements

  • 5+ years of experience as a Systems Engineer with a development background
  • 1+ years of relevant leadership experience
  • Proficiency in Linux and Docker with hands-on experience in Kubernetes
  • Capability to use at least one of the following scripting languages: Python, Bash, PowerShell
  • Background in infrastructure management including networking and operating systems
  • Familiarity with monitoring tools in cloud environments and understanding of SLI concepts
  • Familiarity with Azure and/or GCP as cloud service providers

Nice to have

  • Experience working with Windows
  • Knowledge of CI/CD pipelines, particularly Azure DevOps
  • Understanding of Istio and GitOps tools like ArgoCD

We offer

  • Opportunity to work on technical challenges that may impact across geographies
  • Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
  • Opportunity to share your ideas on international platforms
  • Sponsored Tech Talks & Hackathons
  • Unlimited access to LinkedIn learning solutions
  • Possibility to relocate to any EPAM office for short and long-term projects
  • Focused individual development
  • Benefit package:
    • Health benefits
    • Retirement benefits
    • Paid time off
    • Flexible benefits
  • Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)

Witaj. W czym możemy pomóc?

NASZE LOKALIZACJE