INTRODUCTION
Document Definition
The purpose of this document is to define the RFU Incident Management Process, covering Service Operations, Application Management and Infrastructure Management. The content within this document is based on the best practices of the ITIL® framework.
PURPOSE
The aim of the document is to ensure our Incident Management process ensures swift identification, communication, and resolution of incidents.
The goals for the Incident Management process are to:
- Restore normal service operation as quickly as possible.
- Minimize the adverse impact on business operations.
- Ensure that agreed levels of service quality are maintained.
Incident Management Guidelines
- All incidents must be logged in SCRUM.
- All incidents must be prioritised according to their urgency and impact.
- The Service Desk is responsible for initial diagnosis and triage.
- Incidents must only be closed with confirmation that service has been restored, or that the user is happy with any workaround.
SCOPE
Incident
Please see definitions.
Applicability to employees
RFU refers to Rugby Football Union as well as its majority-owned subsidiaries and joint ventures (if applicable). This Policy applies to all employees, officers, members of Board of Directors, and all consultants, and contractors.
Applicability to External Parties
Relevant Policy statements will apply to any external party and be included in contractual obligations on a case-by-case basis.
Applicability to Assets
This Policy applies to all information assets globally owned by RFU, or where RFU has custodial responsibilities.
Definitions
Incident
An IT Incident is any disruption to an organization's IT services that affects anything from a single user or the entire business. In short, an incident is anything that interrupts business continuity.
Application Priority
Reference Application Priority https://scrum.rfu.com/a/solutions/articles/10000076697
ROLES & RESPONSIBILITIES
Role
RFU Role
Current incumbent
Process Owner
Incident Manager
Dan Hart (SOM)
Define the process. Work with stakeholders to ensure process is used by all of the Rugby Football Union Technology Department. Promote process and maintain and update process.
Report, understand and communicate effectiveness of process
Incident Manager
Incident Manager
Dan Hart (SOM)
Incident co-ordination
Oversee day to day process execution • Often the Service Desk Manager • Manages major incidents until the appropriate situation manager is identified
Service Request Manager
Service Operations Manager
Dan Hart (SOM)
Service Desk Manager
Service Operations Manager Business Application Manager
Infrastructure Manager
Dan Hart
Simon Jones
Paul Harvey
Manages the service desk function, including staffing management activities.
Provides guidance to Service Desk Analysts
Responsible for Communication and Incident co-ordination during a High priority ticket within their Queue
Technology Service Desk (Tier 1) (SD)
Tom Feasey,
David Hull,
TBC
Logging incidents, Flagging up high priority incidents,
Escalating incidents where appropriate
Technology Service Support (Tier 2) (SS)
Stuart Wright, Theresa Grant,
Ravi Shah
Resolving incidents, Escalating incidents where appropriate,
Problem management
Agent
Any member of Technology, or beyond who can be assigned responsibility for investigating and resolving an incident
ITSM Technical Specialist
Thomas Feasey
Theresa Grant
Requester
Requester
Any member of the business who can log an incident
Process
Incident Priority
Requester or Agent logs a ticket via SCRUM (link), it will automatically go to the Service Desk queue.
If it is a High priority ticket the Requester should notify the SD or escalate to Service Operation Manager immediately.
Tickets may also be logged as follows:
- By the Technology Team directly in SCRUM
- Detected by Event Management
- Reported and/or logged by Suppliers.
- Event Days via the IT Support phone
SCRUM Team
Technology
Status
The current status of the ticket
Subject
Summary of issue
Third Party
As required – if we are awaiting response from Third Party
Third Party Reference
If a ticket is logged with a Third Party capture a refence number here
Group
The SCRUM Group working on the ticket
Agent
Specific person working on the ticket – this should be selected as soon as a ticket is picked up
Description
Detailed description of the issue
Category / Subcategory / item
Selection based on the impacted Service/Application
Status
The Status of the ticket should be.
Status
Description
Timer on/off
Open
Tickets that immediately need the attention of your support agents. When a new ticket is created its status is always Open at first. And when a customer replies to any ticket, its status always moves back to open.
On
Awaiting Third Party
Ticket is with a Third Party
Off
Awaiting Customer
Question or note has been sent to requester and we are awaiting a response
Off
Awaiting Internal Review
Ticket is awaiting someone in technology to review or assist
Off
Resolved
When agents are reasonably sure that they have provided the customer with a solution to their issue, they can move the ticket's status to Resolved. The customer can then confirm the resolution and ticket is Closed.
Off
Closed
When agents are reasonably sure that they have provided the customer with a solution to their issue, they can move the ticket's status to Resolved. The customer can then confirm the resolution and ticket is Closed.
Off
Automatic process after being in Resolved state for 3 days.
Pending
Sometimes, the agent might need additional information or might need some time to reproduce an issue and confirm the solution. In these times, they can set the status of the ticket as "Pending"
None
Incident assigned
Tickets can be escalated to various groups, depending on the Service Transition Documentation, general rules:
Unassigned - Tickets which have not been acknowledged by the SD will remain in unassigned.
Technology Service Support - Any tickets which required additional capability.
Technology Applications – Tickets escalated to the Application team for specific applications, not used but all corporate services.
Technology Freshservice/Freshdesk – Tickets which are specific to the operation of the ITSM.
Technology DWH – Tickets which are for the Data Warehouse (Data Platform)
Technology Starters/Leavers – A specific queue for Leavers, Starters or Movers.
Technology Infrastructure – Tickets escalated to the infrastructure relating to Network or Platform
Escalation & Communication
- Incident Manager is responsible for the Communication to the Business for any High or Urgent Priority Tickets.
- Head of Service Delivery should be informed of the severity and details.
- Communications to the Business should come from “IT Service Desk” email address on the correct template.
-
Incident Manager, Infrastructure Manager or Applications Manager are responsible for the coordination of any Third Parties is an incident in SCRUM within their “group”
-
Cyber Security Incidents should refer to the P1 Cyber Security Incident Process
Ticket SLA
Automated Communications
P1 Incident (Urgent or High)
When a Ticket priority is set to “high” or “urgent” an email alert is trigger to Service Team, Infrastructure and Head of Service Delivery
Ticket Resolved
An email is sent to the Requester to inform them of resolution
Customer Responds
When a note is added to a ticket, via a requester response, the ticket is automatically marked “Awaiting Customer” to “open”
PROCESS FLOW
High Priority Tickets (P1)
Please reference Application Prioritisation Document
Major events at the Rugby Football Union are reported on an increasing scale of impact and urgency. The Priority 1 (P1) process is the first stage of major event management and it relates to high impact, high urgency, ensuring the best people are working on the incident within 15 minutes. Following a P1 incident, Crisis Management, Disaster Recovery and Business Continuity may be invoked.
Data Breach
In the event of a data breach or malicious attack, the Head of Service Delivery must be contacted and Legal to determine next actions, a SCRUM ticket must be raised. Certain data breaches may need to be reported to overseeing authorities such as the ICO within 72 hours, and it is therefore important that the Legal & Governance team are made acknowledge these incidents as soon as possible.
High Priority Process
Once the P1 incident has been created, an email will automatically be sent to the P1 Notification group which includes the IT SLT. Depending on the impacted Service or Application the management of the Incident should be handed over to relevant Technology Manager (Service Operations, Infrastructure or Applications). Handover must be completed in a phone/face-to-face conversation followed up with an email. All details must be included in the incident record in SCRUM. Link
The Service Operation Manager will ensure a Head of Service Delivery or Technology Director are contacted, and advise on key people involved in the management of the incident and A communication should be send to the Business informing them of the issue, Link to templates
The Service Operation Manager will determine if a quick fix is likely to be able to be applied to restore the service within 15 minutes. If this is the case, the relevant support team(s) will be contacted to work on the incident which will be assigned to them to add updates and any action taken. Many quick fixes can be carried out by the Technology Service Desk as 1st line support, and in this case the Technology Service Desk becomes the support team and is assigned the incident. The Technology Service Desk will also be considered to be the support team if contact is needed directly with a supplier without another support team’s involvement. Should a quick fix be implemented, the resolution email for a quick fix should be sent.
In the event that there is no obvious quick fix, or the actions taken by the resolving team do not result in resolution of the incident, the Service Operation Manager should where possible contact and assign the Incident to a support team to work on. The Service Operation Manager will then send an a communication to the Technology Management Team and business.
A bridge will then be set up via Teams, and all relevant people invited, with confirmation sought via email, phone or IM.
The group involved in the bridge will determine the next actions to take to resolve the issue, likely to be investigation and resolution by support teams, with support from the Service Operations Manage to contact 3rd parties and suppliers. The incident itself will be updated with this plan and if done already, assigned to the support team that will be working on the incident or liaising with a 3rd party.
Communications And Updates
All communication to the business and the Technology internal team following the initial logging of the incident must be only be sent by the Incident Manager for the first 2 hours and will be sent via templates, Link to templates.
- Initial communication to Business and Technology Management Team – first 15 minutes
- Update communication to Business and Technology Management Team – Every 30 minutes
- After 4 hours further communication may be sent on behalf of the Technology Director.
- The support team that has been assigned the incident will be responsible for updating the incident on a regular basis, not less than once per hour.
- The Incident Manager is responsible for checking the updates and chasing the support teams if these are not done. It is important that these updates are of high quality to assist with post incident review, incident escalation and Problem Management.
Each update should be in the following format:
- State the steps that have been taken since the last update in a chronological order
- State the next actions planned
- State when these action will be carried out and the time that the next update is due
Where email communication is not possible, due to an email outage. The SMS system should be used to communicate to the business via mobile phone text message.
Escalation
Initial identification of a High (P1) incident - inform Head of Service Delivery
After initial 15 minutes , no fix or workaround, inform Technology Director
Bridge call established, inform Head of Service Delivery, and provide access to call.
Service Operation Manager is responsible for contacting and updating Technology Management team.
Technology Director or Head of Service Delivery will communicate to Exec Committee as required
Resolution
The Service Operation Manager will advise the Technology Service Desk when a HIGH (P1) incident can be resolved. The incident must be assigned back to the Technology Service desk to resolve the Incident record, and a Incident Report completed, link.
Service Operation Manager must send Resolution communication to the business. The Incident Manager is responsible for carrying this out, however incidents that have been in progress for 4 hours or longer where Technology Director communication has previously been sent, the resolution communication will be sent from the Technology Director.
The Service Operation Manager will also further investigation on the root cause of the incident is required. This conversation should also be used to determine any knowledge articles that should be created by the support team involved in the incident resolution.
Handover
The Service Operation Manager will manage the incident from identification through to resolution. However, there will be circumstances where the incident will need to be handed over, for example if the Incident is raised close to the closing time of the Technology Service Desk it has been logged in, the Service Operation Manager may decide it is best to hand over to the other Service Desk member and Incident Manager to run locally.
Any handover must be carried out over the phone between Incident Managers and backed up with an email with the following information as a minimum:
- Current situation
- Support teams involved
- Details of those involved in the bridge
- Details of communications that have been sent.
There must also be a communication to everyone involved in the incident to advise that the incident is being handed over. Any further communication to the business from must include the new Incident Manager’s signature.
Downgrade Of P1
In some circumstances a HIGH priority (P1) will need to be downgraded from High to a Medium/Low. The decision to do this can only be taken by the Service Operation Manager in consultation with The Head of Service Delivery. In this circumstance, a communication must be sent to inform them that the incident is still being worked on as per the High priority. The incident itself can then be downgraded, however the priority level should not be communicated to be business unless requested and instead the specific downgrade communication template should be sent.
The downgrade will then be carried out in SCRUM, and the process will revert to the standard Incident Management process.
MORE INFORMATION
Rugby Football Union Technology department maintains a full list of policies, standards and processes, available here: https://scrum.rfu.com
APPENDIX A
Here are our IT Service Desk and IT Governance contact details:
Location
Telephone Number
Email Address
IT Service Desk
+44208 831 6767
https://scrum.rfu.com
GMS Service Desk
N/A
https://help.rfu.com
Legal Office
N/A
legal@rfu.com
APPENDIX B
Jonathan Conn
Technology Director
07801560090
Daniella Moses
Head of Service Delivery
07731988770
Paul Harvey
Infrastructure Manager
07547 561801
Simon Jones
Business Applications Manager
07395375195
Dan Hart
Incident Manager
07514734128
Link to Document