Chennai, Tamil Nadu
Site Reliability Engineering Engineer 3 #1043101Job Description:
- Employees in this job function are responsible for ensuring availability, reliability and performance of cloud and network systems and services by automating routine manual tasks
Key Responsibilities:
- Write, configure, and deploy code that improves service reliability for existing or new systems; set standard for others with respect to code quality.
- Provide helpful and actionable feedback and review for code or production changes.
- Drive repair/optimization of complex systems with consideration towards a wide range of contributing factors.
- Lead debugging, troubleshooting, and analysis of service architecture and design.
- Participate in on-call rotation.
- Write documentation: design, system analysis, runbooks, playbooks.
- Provide design feedback and uplevel design skills of others.
- Implement and manage SRE monitoring application backends using Golang, Postgres, and OpenTelemetry.
- Develop tooling using Terraform and other IaC tools to ensure visibility and proactive issue detection across our platforms.
- Work within GCP infrastructure, optimizing performance and cost, and scaling resources to meet demand.
- Collaborate with development teams to enhance system reliability and performance, applying a platform engineering mindset to system administration tasks.
- Develop and maintain automated solutions for operational aspects such as on-call monitoring, performance tuning, and disaster recovery.
- Troubleshoot and resolve issues in our dev, test, and production environments.
- Participate in postmortem analysis and create preventative measures for future incidents.
- Implement and maintain security best practices across our infrastructure, ensuring compliance with industry standards and internal policies.
- Participate in security audits and vulnerability assessments.
- Identify and address performance bottlenecks through code profiling, system analysis, and configuration tuning.
- Implement and monitor performance metrics to proactively identify and resolve issues.
- Contribute to internal knowledge bases and documentation.
Skills Required:
- Go, API
Skills Preferred:
- Dynatrace, GCP
Experience Required:
- Engineer 3 Exp: Prac. In 2 coding lang. or adv. Prac. in 1 lang. 6+ years in IT; 4+ years in development
Education Required:
- Bachelor's Degree