Site Reliability Engineer
Th e Chief Technology Office
(CTO) provides solutions toguide technology across the firm globally, removing inefficiencies andstreamlining how we deliver quality solutions. We're continuing to evolve frombuilding next-gen platforms to guiding architectures that unlock their capabilities,automating how we take code from inception to production. We're focused onoptimizing how apps are designed for the future, targeting solutions that areportable across multi-cloud platforms to stay resilient, scalable andmaintainable. The CTO will drive Modern Engineering Practices
across thefirm and provide the pathway for technologist to improve their speed, qualityand application development practices. Global Technology's Cloud Transformation Program
defines and implements JPMorgan Chase's enterprise cloud strategy. This strategy enables a shift fromspecialized, dedicated infrastructure to elastic, self-provisioned public andprivate cloud infrastructure. We have an opening in our CloudArchitecture team to help JPMC accelerate our cloud journey.
In this role, you will not only designand build a "best in class" global cloud for JPMC, you'll stay current withemerging technologies and public cloud developments, you'll be the "go to" forpublic cloud, you'll partner closely with the business community to ensure theapplication landscape is designed to take most advantage of JPMC's global cloudofferings and ensure the offerings are accelerating JPMC's goals andobjectives.
This role requiresa wide variety of strengths and capabilities, including:
- Mastery of application, data and infrastructure architecture disciplines
- Command of architecture, design and business processes Keen understanding of financial control and budget management
- Expertise in working in partnership with colleagues throughout the firm, and in leading collaborative teams to achieve common goals
- Deep understanding of SRE philosophy, technologies, platforms and tools, SLA management, incident resolution, and automation.
- Hands on experience on managing operations of large-scale internet-centric production environments for application or infrastructure services serving tens to millions of end users.
- Experience in site reliability engineering in one of the following languages: C, C++, Java J2EE technology stack and web technologies (Python, Go, Perl, Ruby or shell scripting)
- Hand-on experience with cloud-based technologies and tools especially in deployment, monitoring and operations, such as Kubernets, Prometheus, FluenD, Ansible, Elasticsearch, Grafana, Kibana, etc.
- Experience in Developing monitoring tools and log analysis tools to manage operations managing and/or influencing infrastructure services ensure application service uptime and user experience developing and managing operations leveraging key event streaming, messaging and DB services such as Casandra, MQ/JMS/Kafka, Aurora, RDS, Cloud SQL, BigTable, DanamoDB, Cloud Spanner, Kinesis, Cloud Pub/Sub, etc.
- Prior experience in large scale internet companies/technologies, where uptime and continuous availability was core to the business.
- Work with Architecture to design reusable patterns to deploy to applications, provide governance around adoption, and influence application development teams on roadmaps and designs.
- Identify and partner with Infrastructure teams and AD teams to implement automation opportunities to drive down toil and reduce technical debt.
- Apply standards of cloud compliance to application design to achieve reliability
- Understanding of Networking and cloud technologies, for example Security, Load Balancing, Network routing protocols.
- Implement SRE frameworks to support globally multi-cloud environments, and ensure the highest level of SLA through operational excellence
- Provides failure analysis / root cause analysis when required
- Provides support to develop & improve the quality of technical engineering documentation
- Provides support to drive the maturity of the software development lifecycle
- Provides quality control of engineering deliverables
- Provides technical consultation to product management
- Performs deployment, administration, management, configuration, testing, and integration tasks related to the cloud platforms
- Helps develop new cloud engineering strategies and implementations for the firm
- Champion a DevOps model so that services are automated and elastic across all platforms
- Writes operation documentation and knowledge base of known issues with solutions
- Participates in 24x7 SRE on-call rotations and escalation workflows.
- Bachelor's degree in Computer Science, Information Technology, or equivalent technical field
- Experience of Enterprise Cloud infrastructure experience (AWS, Azure, GCP) in a mission critical environment
- In-Depth OS Experience (RHEL, Ubuntu, Windows Server) with strong debugging, troubleshooting, and problem-solving skills
- Strong programming skills in BASH, Python, Java, PowerShell or GO
- Strong working knowledge of modern development technologies and tools such Agile, CI/CD, Git, Terraform and Jenkins.
- Strong working knowledge of Internet protocols and web services technologies such as HTTP, DNS, TCP/UDP, SOAP, JSON and REST
- Good understanding of networking protocols and cybersecurity best practices / operations in the cloud
- Certified in one or more cloud technology (AWS, Azure, GCP or RedHat) is a big plus