Service_resilicency( ):

  • Working with the Service health team to investigate vulnerabilities within the product and working closely with service teams to assist in the reduction of these errors.
  • Establishing a team process for conducting root cause analysis on the product's most error-prone services, aiming to enhance the overall customer experience.

SRE_Bot( ):


  • Developing a team-centric BOT to ensure consistent interaction between the SRE team and external service teams.
  • By instilling the SRE mindset during the product development, the initiative also focuses on enhancing the team's proficiency in agile methodology.
  • The ultimate goal of the product is to expand its operational capabilities, including features like incident management initial triage, automated callouts, and JIRA queue management, among others.

Service_Improvement( ):

  • Tasked with identifying areas of improvement for an internal application service team at a previous organisation. 
  • Helped set up adequate monitoring and logging dashboards to measure cloud resource usage, service specific metrics and alerting while bringing forward the need to have in-code documentation and deployment runbooks empowering the team to handle their production changes independant of the infrastructure team. 

Client_Metrics( ):

  • Account and Client specific information was colated and maintained over Metabase at a previous organisation. 
  • The tool presented information on the users consuming the cloud resoucres at a specific time as well as over an inout duration. 
  • This helped in finding out multiple dicrepancies in the data maintained on the cloud database and were mitigated with better querying techniques and backend process changes. 
  • These dashboards also helped in understanding product resource cost at a higher level.