Site Reliability Engineer
- Work closely with solution architects, application development team to ensure adherence to best practices in design and coding w.r.t SRE principles.
- Monitor, troubleshoot & analyse application & underlying infrastructure performance issues as part of the performance engineering exercises and derive gold-configuration parameters.
- Drive thorough performance analysis of microservices code by using single-user code profiling techniques.
- Assist development team to tune the applications/configurations for critical systems to comply with the NFR before going live in production and ensure the performance recommendations are part of the change request process.
- Ensure appropriate governance w.r.t framework usage across multiple delivery streams and enhance the framework capability to meet the upcoming requirements.
- Participate & contribute to resiliency validation exercises and create proper reporting to the stakeholders.
- Define critical performance KPIs, set alert rules and roll-out monitoring dashboards for Production with timely reporting to the stakeholders.
- Automation of various manual tasks w.r.t performance monitoring, alerting, analysis, reporting, capacity planning etc to improve application observability, resiliency & operational efficiency.
- Bachelor's Degree of Computer Science with equivalent work experience of 6+ years.
- Minimum 2 years of hands on experience in any of the technology such as Red Hat OpenShift/Kubernetes, Docker, Kafka, ELK, Redis and DevOps Tools such as Jenkins, Bitbucket, JIRA.
- Hands on experience in application monitoring with Grafana, Kibana, Prometheus, AppDynamics or Dynatrace is a plus.
- Hands on experience in Chaos Engineering is a plus.
- Strong analytical and problem-solving skills.
- Strong interpersonal and communication skills.
- Positive attitude towards continuous learning.