Role Summary
The Infrastructure & DevOps Team Lead is responsible for designing, operating, and continuously improving the company’s infrastructure and platform reliability. This role ensures high availability, scalability, security, and operational excellence of production systems.
You will lead the Infrastructure / DevOps team, define reliability standards (SLA, SLO), manage incidents, and work closely with Engineering, Security, and Business teams to support mission-critical systems.
Key Responsibilities
Infrastructure & Platform Leadership
• Design and operate on-prem, cloud, or hybrid infrastructure environments
• Lead and mentor Infrastructure / DeVops / Cloud Engineer team members
• Define architecture standards for high availability, disaster recovery, and business continuity
• Plan capacity, scalability, and long-term infrastructure roadmap
Kubernetes & Container Platform
• Operate and manage Kubernetes clusters ( RKE2, on-prem, or cloud)
• Manage ingress controllers, networking, and storage
• Establish environment isolation (DEV / UAT / PROD)
• Define deployment and scaling best practices
CI/CD, Automation & Infrastructure as Code
• Improve CI/CD pipelines with reliability, rollback, and observability in mind
• Reduce manual operations through automation and standardization
Security & Compliance
• Collaborate with Security teams on:
o Patch management and vulnerability remediation
o Access control and IAM
o Logging, auditing, and compliance requirements
• Support regulatory frameworks such as ISO 27001, NIST
Incident Management & Production Support
• Act as an escalation point for production incidents
• Define incident response processes and on-call rotations
• Coordinate with internal teams and external vendors during outages
People, Process & Documentation
• Mentor team members and support skill development
• Create and maintain SOPs, runbooks, and operational playbooks
• Continuously improve operational processes and team effectiveness