Our client, a regional bank, is looking for an SRE Lead where the candidate will lead the development of resilience solutions across the Asia Pacific. Your role is to strengthen the reliability, observability, operability, and scalability of the bank's applications and business flows. You will ensure the overall system reliability, uptime, health, and performance of our services. This involves resolving service-impacting issues, proactively identifying and resolving problems, and providing valuable feedback to improve the long-term reliability of our platform. You should have strong knowledge of SRE practices and be skilled in planning, execution, and reporting.
Job Responsibilities:
Lead SRE and QA efforts across the Asia Pacific to align with priorities and the bank's strategy.
Collaborate with teammates and build a team culture.
Define and enforce SLAs, SLOs, and SLIs for each country.
Manage stakeholder demands while maintaining quality and timely delivery.
Collaborate with application developers and architects to ensure scalability and performance.
Develop monitoring solutions on existing observability platforms.
Communicate effectively with Engineering and Product teams about system performance and reliability.
Create and execute test plans/strategies for performance and reliability.
Improve product reliability through monitoring and application of best practices.
Identify defects and validate system functionality.
Refine development, build, and deployment processes.
Bridge communication between application and infrastructure teams.
Provide expertise in capacity management for robust and cost-effective deployments.
Implement best practices for environmental management.
Act as a quality and reliability ambassador in an Agile software development team.
Maintain and communicate testing timelines and status reports.