Hola Cash is empowering the LatAm public with the convenience, speed and reliability of cutting-edge payments technology that they can trust.
Through a friendly and intuitive interface, we are empowering our merchant customers to participate in the Digital Economy through a one-stop shop for payments acceptance .
With customers at the heart of our decisions, Hola Cash is removing barriers, building better user experiences, and providing greater access to financial services in Latin America beginning with Mexico.
About the role:
We are looking for an experienced Site Reliability Engineer to work closely with our product, engineering and infrastructure teams. The Site Reliability Engineer will be performing a mix of hands-on development, qa, security, monitoring and collaborating with other teams and stakeholders to help bring HolaCash engineering systems and culture to the next level.
In this role, you’ll have the opportunity to drive reliability and availability and empower all HolaCash systems in order to achieve more autonomous systems. As an SRE, you’ll drive maturity across microservices and across the company, helping engineering teams set effective SLIs/SLOs/SLAs, cultivate a blameless and diligent postmortem culture, strategize for disaster recovery, and intensively focus on monitoring and maintaining systems performant. Risk can’t be avoided - but it can be managed and mitigated, and that’s what the SRE team is here to do.
What You’ll Be Doing:
- Building automation, metrics collection, performance testing implementation and monitoring to directly improve the reliability, resiliency, and scaling HolaCash services and APIs.
- Research and contribute to whole monitoring systems that will empower the performance, troubleshooting, and maintainability of the systems (Grafana, Elasticsearch, CloudWatch, Sentry)
- Building alerting, scaling and remediation plans/tools for any unexpected scenarios.
- Collaborate with teams on researching, setting, and tuning their SLIs/SLOs/SLAs to drive the best outcomes for customers.
- Enable teams and services to test at the next level including regression, load, and performance testing (Taurus, Gatling, etc)
- Coaching teams and individuals on SRE culture, tooling, approaches, etc.
- Working with peer SREs/DevOps and engineering leaders to define the architectures and practices that should be adopted in order to deliver best operational performance.
- Establishing best practices for development, architecture, deployment, and operations.
- Improve services and processes (including architecture reviews, incident response, monitoring) in a cross-functional manner throughout the engineering organization.
- Empower systems to become more autonomous rather than automatic.
What We’ll Expect From You:
- Software engineering experience is a must.
- Flexibility to get up to speed with a variety of diverse product focused teams.
- Clear communication skills (both written and verbal) to document processes and architectures.
- Experience designing/implementing disaster recovery best practices.
- Knowledge of CI/CD and Infrastructure as Code tools (Github Actions and Terraform preferably).
Nice to have:
- Candidates should feel comfortable writing code in any language of their choice. Architecture patterns and principles knowledge is a plus (SOLID principles, AWS Well-Architected Framework).