We are looking for an experienced SRE with strong English skills. Join our clients team if you have experience in this field.
Specifically, we are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction.
- Monitor production systems health and ensure high availability.
- Improve reliability, quality and time-to-market of our software solutions.
- Measure and optimize system performance and usage of resources.
- Provide primary operational support and engineering for multiple large distributed software applications.
- Gather and analyze metrics to enhance performance tuning and fault finding.
- Partner with development teams to improve services through rigorous testing and automated releases.
- Participate in system design consulting, platform management, and capacity planning.
- Create immutable systems and services through automation.
- Respond to incidents and provide support for our engineering teams with customer incidents. Completing Root Cause Analysis investigations, and improving team practices in handoffs of work and incidents.
- Setup monitoring and alerting based on symptoms, and not on outages.
- Introduce automation and optimize tools/applications to reduce incidents.
- Debug production issues across different applications, services and levels of the stack.
- Guide various teams on defining service-level Agreements, Objectives and Indicators (SLA/SLO/SLI), considering performance, resilience and cost effectiveness.
- Promote cloud native principles and onboard systems on the infra-as-code, DevSecOps best practices.
- Broad experience with AWS.
- Familiarity with Disaster Recovery and High Availability strategies.
- Experience with Infrastructure-As-Code tools (Terraform, AWS Cloud Formation).
- Experience with containerizing applications for production and local development
- Experience with CI/CD tools and scripting.
Nice to have:
- Familiarity with Logging Aggregation and Monitoring tools.
- Experience in Cybersecurity/Networking
- Experience with system administration in Windows and Linux systems.
What we offer:
- Global Career Opportunities.
- Tuition fee reimbursement for higher education.
- English Live training.
- Global Management Training.
- Strength Finder training.