DevOps/ SRE Lead - Bash/ Python/ Go, Docker, Kubernetes, Helm, Cloud, Prometheus/ Grafana
Actively Reviewing the ApplicationsUST
Bengaluru, Karnataka, India
Full-Time
On-site
Posted 4 months ago
•
Apply by May 4, 2026
Job Description
Role Description
Job Summary:
We are seeking an experienced
Site Reliability Engineer (SRE)
with advanced DevOps expertise to help build, scale, and maintain our infrastructure and services. You will play a critical role in ensuring high availability, performance, scalability, and security of our production systems, while enabling continuous deployment and rapid delivery of features to our customers.
Key Responsibilities
Design, build, and maintain reliable, scalable, and secure cloud-based infrastructure (AWS, Azure, or GCP).
Develop and improve observability using monitoring, ing, logging, and tracing tools (e.g., Prometheus, Grafana, ELK, Datadog, etc.).
Automate repetitive tasks and infrastructure using Infrastructure-as-Code (Terraform, CloudFormation, Pulumi).
Create and maintain CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, ArgoCD, etc.) to support fast and safe delivery.
Lead incident response, root cause analysis, and postmortems to ensure high uptime and rapid recovery.
Optimize system performance, reliability, and cost-effectiveness through proactive monitoring and tuning.
Collaborate with software engineering teams to define SLAs/SLOs and improve service reliability.
Implement and maintain security best practices across environments (e.g., secrets management, IAM, firewalls, etc.).
Maintain disaster recovery plans, backups, and high-availability strategies.
Qualifications
Required:
7+ years of experience as an SRE, DevOps Engineer, or similar role.
Proficiency in scripting and automation (Bash, Python, Go, etc.).
Strong experience with containerization and orchestration (Docker, Kubernetes, Helm).
Solid understanding of Linux systems administration and networking fundamentals.
Experience with cloud platforms (AWS, Azure, or GCP).
Experience with IaC tools like Terraform or CloudFormation.
Familiarity with GitOps and modern deployment practices.
Hands-on experience with observability tools (e.g., Prometheus, Grafana, Datadog).
Strong troubleshooting and incident response skills.
Preferred
Experience in a high-traffic, microservices-based architecture.
Exposure to service meshes (Istio, Linkerd).
Certifications (AWS Certified DevOps Engineer, CKA, etc.)
Experience with security automation and compliance (e.g., SOC2, ISO27001).
Soft Skills
Strong communication and collaboration abilities.
Ability to thrive in a fast-paced, agile environment.
Analytical mindset and proactive approach to problem-solving.
A passion for automation, performance, and system design.
Skills
DevOps/ SRE, Bash/ Python/ Go, Docker, Kubernetes, Helm, Cloud, Prometheus/ Grafana
Required Skills
Quick Tip
Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.
Related Similar Jobs
View All
AI ML Full Stack Engineer
CGI
India
Full-Time
JavaScript
Django
Flask
+10
Application Architect-Oracle Cloud Applications
IBM
Pune
Full-Time
Analytics
Software Engineer III - Python, AI/ML, Gen AI, LLM
JPMorganChase
India
Full-Time
Risk Management
Git
Python
+5
Technology and Innovation - Azure Data Engineer
Riveron
India
Full-Time
Engineering
Stored Procedures
Recruitment
+56
Cyber Security Engineer
Digital Waffle
India
Full-Time
Engineering
Cloud Architecture
Azure
+1
Share
Quick Apply
Upload your resume to apply for this position