(352) FASTTEK | (352) 327-8835
FASTTEK GLOBALpowered by Fast Switch - Great Lakes
info@fasttek.com
(352) FASTTEK | (352) 327-8835
Role Description
Employees in this job function are responsible for ensuring availability, reliability and performance of cloud and network systems and services by automating routine manual tasks
Key Responsibilities:
- Write, configure, and deploy code that improves service reliability for existing or new systems; set standard for others with respect to code quality.
- Provide helpful and actionable feedback and review for code or production changes.
- Drive repair/optimization of complex systems with consideration towards a wide range of contributing factors.
- Lead debugging, troubleshooting, and analysis of service architecture and design.
- Participate in on-call rotation.
- Write documentation: design, system analysis, runbooks, playbooks. Provide design feedback and uplevel design skills of others.
- Implement and manage SRE monitoring application backends using Golang, Postgres, and OpenTelemetry. Develop tooling using Terraform and other IaC tools to ensure visibility and proactive issue detection across our platforms.
- Work within GCP infrastructure, optimizing performance and cost, and scaling resources to meet demand.
- Collaborate with development teams to enhance system reliability and performance, applying a platform engineering mindset to system administration tasks.
- Develop and maintain automated solutions for operational aspects such as on-call monitoring, performance tuning, and disaster recovery.
- Troubleshoot and resolve issues in our dev, test, and production environments.
- Participate in postmortem analysis and create preventative measures for future incidents.
- Implement and maintain security best practices across our infrastructure, ensuring compliance with industry standards and internal policies. Participate in security audits and vulnerability assessments.
- Identify and address performance bottlenecks through code profiling, system analysis, and configuration tuning. Implement and monitor performance metrics to proactively identify and resolve issues.
- Contribute to internal knowledge bases and documentation.
Skills Required
GCP, Go, Dynatrace