Chennai, Tamil Nadu
Site Reliability Engineer #1059280Job Description:
- Employees in this job function are responsible for ensuring availability, reliability and performance of cloud and network systems and services by automating routine manual tasks
Key Responsibilities:
- Collaborate with Infrastructure teams in implementing critical solutions by automating routine tasks
- Monitor and manage production environments, proactively identifying and resolving issues.
- Participate in building advanced tooling for system access monitoring, log session recording, administration of reliability across multiple geographically distributed data centers.
- Engage with engineering teams to improve on-call efficiencies, drive incident management and post-mortem analysis.
- Perform capacity planning and optimization to support growing demands and traffic patterns.
- Maintaining, monitoring and alerting systems for proactive system health checks.
- Continuously improve system performance, stability, and security through data-driven analysis and optimization.
- Facilitate knowledge sharing by creating and maintaining comprehensive documentation & diagrams
Skills Required:
- Dynatrace, Python, Full Stack Java Developer, API, GCP, Front End (Software Engineering)
Skills Preferred:
- DevOps, System Design, AI, Application Support
Experience Required:
- Engineer 3 Exp: Prac. In 2 coding lang. or adv. Prac. in 1 lang.
- 6+ years in IT; 4+ years in development
Experience Preferred:
- AI Ops, Observability, Cloud Engineering, DevOps
Education Required:
- Bachelor's Degree
Additional Information :
Exp: 7-10 years in IT industry with excellent software engineering experience.
Education: B.E or B.TECH with CSE or IT only.
Software engineer development exp in Python - Handson coding expertise is mandatory.
Secondary - DevOps exposure with any one cloud - GCP/AZURE/AWS
Worked with Monitoring tools like Dynatrace/Kibana/Splunk/Grafana
SRE implementation exposure