Site Reliability Engineer, Group Consumer Banking and Big Data Analytics Technology, Technology & Operations
Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels. Job Purpose
- Working with our business units, development teams, and many other units to help maintain the high quality and service level objectives of our systems.
- Optimize the supportability of systems through automation and applying basic SRE principles such as blameless post-mortem, error budget, and automation.
- Provide production support for the application domain when applicable.
- Manage, monitor and operate the system to ensure all business functions are running smoothly.
- Work across teams to continually review, provide feedback, implement best practices to improve the efficiency of the systems and drive future innovation.
- Manage on-going changes while retaining high levels of service availability to our customer base.
- Pragmatically identify root cause for production incidents and lead to implement necessary actions to prevent recurrence.
- Drive incident management process and support a blameless post-mortem culture.
- Automate the system operations to reduce Toil and attain high level of efficiency.
- Participate in platform operations management and capacity management.
- Coordinate and implement platform/infrastructure upgrades and releases with technical and business teams.
- React to critical issues immediately - troubleshoot, investigate and apply appropriate solutions to normalise systems operations.
- Provide off-hour/weekend support to ensure production systems stability.
- Troubleshoot problems across a wide range of technical skills (development, CI/CD, infrastructure, etc)
- Maintain awareness of relevant technical and product trends with self-learning and job shadowing.
- Create and maintain the operational documents to reflect system changes and upgrades.
- Ability to communicate effectively, professionally and comfortably, both verbally and in writing across all levels.
- Bachelor's Degree/Diploma in Computer Science, Computer Engineering, or Computer Application. Equivalent experience may be considered.
- 3+ years of experience working in supporting critical applications using API driven technologies
- 2+ years of hands-on experience in Python development (preferably with RESTful APIs)
- 2+ years of working with a modern stack (AWS, PCF, containers, or Kubernetes)
- 1+ years of Continuous Integration and Continuous Delivery experience through Jenkins or equivalent.
- Experience with modern observability tools such as Grafana, Kibana, or Prometheus preferred.
- Experience working in an Agile (SAFE or Kanban) environment preferred.
- Knowledge and/or experience using SQL and Linux Shell scripting
- Basic understanding of firewalls, load balancers, and networking concepts.
- Communication skills with all levels and team work spirits are essential.
- Proactive with good analytical and organization skills.
- Ability to work independently, multi-task, prioritize and deliver in a time pressured environment.
We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.