Definition of Incident Management and objectives
What is Incident Management and its main activities according to ITIL v3
Incident Management is an ITIL process framed in the Service Operation phase.
An incident is any unplanned interruption or reduction in the quality of service. They can be failures or queries reported by users, the service team or by some event monitoring tool.
The main objective of incident management is to restore normal service operations as soon as possible while minimizing the negative impact on business operations.
Normal operation is understood as that which is within the limits of the SLA.
3 Basic concepts on Incident Management
From the SLA, the maximum times in which incidents must be responded to and resolved are established.
We must use management tools to calculate and assign these time scales, as well as to use alerts and escalations to facilitate the response/resolution of the incidents within the defined maximum time.
The incident models allow to optimize the resolution process.
There are incidents that are not new, but have already occurred previously and will occur again in the future. Many companies find it useful to define incident models that can be applied to recurrent service incidents.
An incident model should include:
- The steps to follow for the resolution of the incident.
- The chronological order of these steps and their dependencies if any.
- Responsibilities: who should do what.
- Deadlines for completion of activities.
- Escalation procedures: who should be contacted and when
Each service must define what the criteria are for an incident to be considered serious.
Serious incidents must have their own resolution and escalation procedure associated with them, and have a smaller time scale than the rest. The prioritization activity, which we will see later, must take these criteria into account.
Main activities of Incident Management according to ITIL v3
The sooner an incident is detected, the less impact it has on the business.
Therefore, it is important to monitor resources in order to detect potential incidents and to normalize the service before a negative impact on business processes occurs or, if this is not possible, that the impact is minimal.
All service incidents must be recorded, and each incident must be recorded separately.
The information to be registered generally includes:
- Unique identifier.
- Urgency, impact and priority.
- Date and time.
- Person/group registering the incident.
- Input channel.
- User’s data.
- Associated CIs (Configuration Items).
- Person/group assigned for resolution.
- Associated problem/Known error.
- Activities performed for the resolution.
- Date and time of the resolution.
- Category of the closure.
- Date and time of closure.
In this activity, the exact type of incidence is established.
Generally, a multilevel categorization with dependencies between levels is established. The number of levels will depend on the granularity with which we need to classify the incidents.
Sometimes an incident is not properly categorized at the time of registration. If this happens, we must make sure that at the time of closing the categorization is correctly established.
Generally, the priority of the advocacy tells us how it is to be managed.
The priority of the advocacy usually depends on:
- The urgency: how quickly the issue needs to be resolved.
- The impact: generally it is determined by the number of users affected, although what is really important is the criticality for the business of the users affected by the incident. In the end, what really determines the impact are the adverse aspects that the incident has on the business.
In addition to the urgency and impact, the priority may also depend on other factors such as whether the user is a VIP, the user’s department, etc.
It is very convenient that the support tool used is capable of calculating the priority based on rules. In any case, the support team must know these rules in order to prioritize properly.
When first-level support staff receive an incidence, they digest it based on symptoms and, if trained, resolve it.
There are two types of climbing:
1. Functional: the first level support is unable to resolve the issue and assigns it to the appropriate resolving group.
2. Hierarchical: in the event of certain circumstances (serious or critical incidents, risk of non-compliance with the SLA) which must be notified to those responsible for the corresponding service.
Despite the fact that there is an escalation, the incident still belongs to the Service Desk team, and it is this team that is responsible for following up on the incident and keeping users informed until it is closed.
Research and diagnosis
If the event refers to a system failure, the cause of the failure will most likely need to be investigated.
The most common tasks within this activity are the following:
- Establish exactly what is not working properly and for what sequence of user actions (case studies).
- Establish the potential impact of the incident.
- Determine if the issue is caused by the implementation of a change.
- Search in the knowledge database (database of known errors, registration of incidents, etc.) for possible solutions and/or workarounds.
When a potential solution is detected, it should be applied and tested. Once the resolution has been tested, the issue is considered resolved and assigned to the Service Desk team for closure.
All actions taken to resolve the issue should also be recorded in the issue history.
Before closing the issue the Service Desk team should validate the following:
- If the user is satisfied with the resolution of the issue.
- If the closure has been categorized.
- If all the necessary data has been filled in.
- If it is a recurring problem. In this case, generate a problem.
Eventually, a user satisfaction survey can be passed on.
Why Incident Management
As we have seen, every service company needs Incident Management to prevent or restore as soon as possible any interruption or unplanned reduction in the quality of its service.
However, we must be aware of the challenges and risks of Incident Management in order to ensure the best service operation.