Sholinganallur, Chennai
Specialty Development Consultant #1021739Job Description:
- As a Site Reliability Engineer at Company, you will play a pivotal role in elevating the performance and dependability of GDI&A platforms and applications.
- Participating in 24x7 on-call production support rotations and handling incident response to minimize disruptions.
- Continuously monitoring the availability, reliability, and performance of systems, platforms, and applications, maintaining a holistic view of system health.
- Regularly review key site technical metrics such as transactions errors, logging, response times, caching strategies, conversion/bounce rates, capacity & resource utilization.
- Providing primary operational and engineering support for multiple large, distributed software applications.
- Proactively identify stability risks & work with engineering leadership to establish appropriate mitigation plans.
- Using automation tools, scripts, and processes to reduce or eliminate repetitive tasks, thereby improving the support provided by Site Reliability Engineering.
- Creating or modifying terraform files according to formats to develop new monitoring dashboards and alert policies.
Skills Required:
- Python, Java, C/C++, Ruby, and JavaScript J2EE, NoSQL/SQL Datastore, Spring Boot, GCP/AWS/Azure & Docker/K8 RESTful APIs and microservices platform Experience with any of APM and other monitoring tools such as Dynatrace, New Relic, ELK, Splunk, Prometheus, Sensu, Nagios, Kafka, DataDog, PagerDuty.
- Strong experience with product & development teams to establish error budgets by identifying the right SLOs (Service level objective), SLIs (Service level indicators), KPIs (Key performance indicators) and effectively drive the use of the budget to ensure maximum domain availability/uptime
Experience Required:
- 5-6 years
Education Required:
- Bachelor’s degree (or equivalent) in computer science or related discipline