Technology - Incident Management Process

INTRODUCTION

Document Definition

The purpose of this document is to define the RFU Incident Management Process, covering Service Operations, Application Management and Infrastructure Management. The content within this document is based on the best practices of the ITIL® framework.

PURPOSE

The aim of the document is to ensure our Incident Management process ensures swift identification, communication, and resolution of incidents.

The goals for the Incident Management process are to:

Restore normal service operation as quickly as possible.
Minimize the adverse impact on business operations.
Ensure that agreed levels of service quality are maintained.

Incident Management Guidelines

All incidents must be logged in SCRUM.
All incidents must be prioritised according to their urgency and impact.
The Service Desk is responsible for initial diagnosis and triage.
Incidents must only be closed with confirmation that service has been restored, or that the user is happy with any workaround.

SCOPE

Incident

Please see definitions.

Applicability to employees

RFU refers to Rugby Football Union as well as its majority-owned subsidiaries and joint ventures (if applicable). This Policy applies to all employees, officers, members of Board of Directors, and all consultants, and contractors.

Applicability to External Parties

Relevant Policy statements will apply to any external party and be included in contractual obligations on a case-by-case basis.

Applicability to Assets

This Policy applies to all information assets globally owned by RFU, or where RFU has custodial responsibilities.

Definitions

Incident

An IT Incident is any disruption to an organization's IT services that affects anything from a single user or the entire business. In short, an incident is anything that interrupts business continuity.

Application Priority

Reference Application Priority https://scrum.rfu.com/a/solutions/articles/10000076697

ROLES & RESPONSIBILITIES

Role

RFU Role

Current incumbent

Process Owner

Incident Manager

Dan Hart (SOM)

Define the process. Work with stakeholders to ensure process is used by all of the Rugby Football Union Technology Department. Promote process and maintain and update process.

Report, understand and communicate effectiveness of process

Incident Manager

Dan Hart (SOM)

Incident co-ordination

Oversee day to day process execution • Often the Service Desk Manager • Manages major incidents until the appropriate situation manager is identified

Service Request Manager

Service Operations Manager

Dan Hart (SOM)

Service Desk Manager

Service Operations Manager Business Application Manager

Infrastructure Manager

Dan Hart

Simon Jones

Paul Harvey

Manages the service desk function, including staffing management activities.

Provides guidance to Service Desk Analysts

Responsible for Communication and Incident co-ordination during a High priority ticket within their Queue

Technology Service Desk (Tier 1) (SD)

Tom Feasey,

David Hull,

TBC

Logging incidents, Flagging up high priority incidents,

Escalating incidents where appropriate

Technology Service Support (Tier 2) (SS)

Stuart Wright, Theresa Grant,

Ravi Shah

Resolving incidents, Escalating incidents where appropriate,

Problem management

Agent

Any member of Technology, or beyond who can be assigned responsibility for investigating and resolving an incident

ITSM Technical Specialist

Thomas Feasey

Theresa Grant

Requester

Any member of the business who can log an incident

Process

Incident Priority

Requester or Agent logs a ticket via SCRUM (link), it will automatically go to the Service Desk queue.

If it is a High priority ticket the Requester should notify the SD or escalate to Service Operation Manager immediately.

A screenshot of a computerDescription automatically generated

Tickets may also be logged as follows:

By the Technology Team directly in SCRUM
Detected by Event Management
Reported and/or logged by Suppliers.
Event Days via the IT Support phone

SCRUM Team

Technology

Status

The current status of the ticket

Subject

Summary of issue

Third Party

As required – if we are awaiting response from Third Party

Third Party Reference

If a ticket is logged with a Third Party capture a refence number here

Group

The SCRUM Group working on the ticket

Agent

Specific person working on the ticket – this should be selected as soon as a ticket is picked up

Description

Detailed description of the issue

Category / Subcategory / item

Selection based on the impacted Service/Application

Status

The Status of the ticket should be.

Status

Description

Timer on/off

Open

Tickets that immediately need the attention of your support agents. When a new ticket is created its status is always Open at first. And when a customer replies to any ticket, its status always moves back to open.

Awaiting Third Party

Ticket is with a Third Party

Off

Awaiting Customer

Question or note has been sent to requester and we are awaiting a response

Off

Awaiting Internal Review

Ticket is awaiting someone in technology to review or assist

Off

Resolved

When agents are reasonably sure that they have provided the customer with a solution to their issue, they can move the ticket's status to Resolved. The customer can then confirm the resolution and ticket is Closed.

Off

Closed

Off

Automatic process after being in Resolved state for 3 days.

Pending

Sometimes, the agent might need additional information or might need some time to reproduce an issue and confirm the solution. In these times, they can set the status of the ticket as "Pending"

None

Incident assigned

Tickets can be escalated to various groups, depending on the Service Transition Documentation, general rules:

Unassigned - Tickets which have not been acknowledged by the SD will remain in unassigned.

Technology Service Support - Any tickets which required additional capability.

Technology Applications – Tickets escalated to the Application team for specific applications, not used but all corporate services.

Technology Freshservice/Freshdesk – Tickets which are specific to the operation of the ITSM.

Technology DWH – Tickets which are for the Data Warehouse (Data Platform)

Technology Starters/Leavers – A specific queue for Leavers, Starters or Movers.

Technology Infrastructure – Tickets escalated to the infrastructure relating to Network or Platform

Escalation & Communication

Incident Manager is responsible for the Communication to the Business for any High or Urgent Priority Tickets.
Head of Service Delivery should be informed of the severity and details.
Communications to the Business should come from “IT Service Desk” email address on the correct template.
Incident Manager, Infrastructure Manager or Applications Manager are responsible for the coordination of any Third Parties is an incident in SCRUM within their “group”
Cyber Security Incidents should refer to the P1 Cyber Security Incident Process

Ticket SLA

A screenshot of a computerDescription automatically generated

Automated Communications

P1 Incident (Urgent or High)

When a Ticket priority is set to “high” or “urgent” an email alert is trigger to Service Team, Infrastructure and Head of Service Delivery

Ticket Resolved

An email is sent to the Requester to inform them of resolution

Customer Responds

When a note is added to a ticket, via a requester response, the ticket is automatically marked “Awaiting Customer” to “open”

PROCESS FLOW

A diagram of a diagramDescription automatically generated

High Priority Tickets (P1)

Please reference Application Prioritisation Document

Major events at the Rugby Football Union are reported on an increasing scale of impact and urgency. The Priority 1 (P1) process is the first stage of major event management and it relates to high impact, high urgency, ensuring the best people are working on the incident within 15 minutes. Following a P1 incident, Crisis Management, Disaster Recovery and Business Continuity may be invoked.

Data Breach

In the event of a data breach or malicious attack, the Head of Service Delivery must be contacted and Legal to determine next actions, a SCRUM ticket must be raised. Certain data breaches may need to be reported to overseeing authorities such as the ICO within 72 hours, and it is therefore important that the Legal & Governance team are made acknowledge these incidents as soon as possible.

High Priority Process

Once the P1 incident has been created, an email will automatically be sent to the P1 Notification group which includes the IT SLT. Depending on the impacted Service or Application the management of the Incident should be handed over to relevant Technology Manager (Service Operations, Infrastructure or Applications). Handover must be completed in a phone/face-to-face conversation followed up with an email. All details must be included in the incident record in SCRUM. Link

The Service Operation Manager will ensure a Head of Service Delivery or Technology Director are contacted, and advise on key people involved in the management of the incident and A communication should be send to the Business informing them of the issue, Link to templates

The Service Operation Manager will determine if a quick fix is likely to be able to be applied to restore the service within 15 minutes. If this is the case, the relevant support team(s) will be contacted to work on the incident which will be assigned to them to add updates and any action taken. Many quick fixes can be carried out by the Technology Service Desk as 1^st line support, and in this case the Technology Service Desk becomes the support team and is assigned the incident. The Technology Service Desk will also be considered to be the support team if contact is needed directly with a supplier without another support team’s involvement. Should a quick fix be implemented, the resolution email for a quick fix should be sent.

In the event that there is no obvious quick fix, or the actions taken by the resolving team do not result in resolution of the incident, the Service Operation Manager should where possible contact and assign the Incident to a support team to work on. The Service Operation Manager will then send an a communication to the Technology Management Team and business.

A bridge will then be set up via Teams, and all relevant people invited, with confirmation sought via email, phone or IM.

The group involved in the bridge will determine the next actions to take to resolve the issue, likely to be investigation and resolution by support teams, with support from the Service Operations Manage to contact 3^rd parties and suppliers. The incident itself will be updated with this plan and if done already, assigned to the support team that will be working on the incident or liaising with a 3^rd party.

Communications And Updates

All communication to the business and the Technology internal team following the initial logging of the incident must be only be sent by the Incident Manager for the first 2 hours and will be sent via templates, Link to templates.

Initial communication to Business and Technology Management Team – first 15 minutes
Update communication to Business and Technology Management Team – Every 30 minutes
After 4 hours further communication may be sent on behalf of the Technology Director.
The support team that has been assigned the incident will be responsible for updating the incident on a regular basis, not less than once per hour.
The Incident Manager is responsible for checking the updates and chasing the support teams if these are not done. It is important that these updates are of high quality to assist with post incident review, incident escalation and Problem Management.

Each update should be in the following format:

State the steps that have been taken since the last update in a chronological order
State the next actions planned
State when these action will be carried out and the time that the next update is due

Where email communication is not possible, due to an email outage. The SMS system should be used to communicate to the business via mobile phone text message.

Escalation

Initial identification of a High (P1) incident - inform Head of Service Delivery

After initial 15 minutes , no fix or workaround, inform Technology Director

Bridge call established, inform Head of Service Delivery, and provide access to call.

Service Operation Manager is responsible for contacting and updating Technology Management team.

Technology Director or Head of Service Delivery will communicate to Exec Committee as required

Resolution

The Service Operation Manager will advise the Technology Service Desk when a HIGH (P1) incident can be resolved. The incident must be assigned back to the Technology Service desk to resolve the Incident record, and a Incident Report completed, link.

Service Operation Manager must send Resolution communication to the business. The Incident Manager is responsible for carrying this out, however incidents that have been in progress for 4 hours or longer where Technology Director communication has previously been sent, the resolution communication will be sent from the Technology Director.

The Service Operation Manager will also further investigation on the root cause of the incident is required. This conversation should also be used to determine any knowledge articles that should be created by the support team involved in the incident resolution.

Handover

The Service Operation Manager will manage the incident from identification through to resolution. However, there will be circumstances where the incident will need to be handed over, for example if the Incident is raised close to the closing time of the Technology Service Desk it has been logged in, the Service Operation Manager may decide it is best to hand over to the other Service Desk member and Incident Manager to run locally.

Any handover must be carried out over the phone between Incident Managers and backed up with an email with the following information as a minimum:

Current situation
Support teams involved
Details of those involved in the bridge
Details of communications that have been sent.

There must also be a communication to everyone involved in the incident to advise that the incident is being handed over. Any further communication to the business from must include the new Incident Manager’s signature.

Downgrade Of P1

In some circumstances a HIGH priority (P1) will need to be downgraded from High to a Medium/Low. The decision to do this can only be taken by the Service Operation Manager in consultation with The Head of Service Delivery. In this circumstance, a communication must be sent to inform them that the incident is still being worked on as per the High priority. The incident itself can then be downgraded, however the priority level should not be communicated to be business unless requested and instead the specific downgrade communication template should be sent.

The downgrade will then be carried out in SCRUM, and the process will revert to the standard Incident Management process.

MORE INFORMATION

Rugby Football Union Technology department maintains a full list of policies, standards and processes, available here: https://scrum.rfu.com

APPENDIX A

Here are our IT Service Desk and IT Governance contact details:

Location

Telephone Number

Email Address

IT Service Desk

+44208 831 6767

https://scrum.rfu.com

GMS Service Desk

N/A

https://help.rfu.com

Legal Office

N/A

legal@rfu.com

APPENDIX B

Jonathan Conn

Technology Director

07801560090

Daniella Moses

Head of Service Delivery

07731988770

Paul Harvey

Infrastructure Manager

07547 561801

Simon Jones

Business Applications Manager

07395375195

Dan Hart

Incident Manager

07514734128

Link to Document