Define the problem, manage and learn

Introduction

Basics

Problem: source of one or more incidents whose cause is unknown.
Problem Management: The responsible process for managing the lifecycle of a problem.
Known Error: problem with a diagnosed root cause.
Workaround: Temporary solution that allows the restoration of the affected service as soon as possible but does not solve the problem.

Goals

Main goals of Problem Management are the following:
 Prevent problems and incidents arising therefrom.
 Eliminate recurring incidents.
 Minimize the impact of incidents that cannot be prevented.
Problem Management works together with Incident Management and Change Management to increase Service quality & availability.

Problem Management Scope

– Diagnose the root cause of incidents and determine the solution of the associated problems.
– Provide workarounds to incident management so that the impact of incidents on the Service is minimized.
– Ensure that the provided solution is implemented through the defined control procedures.
– Perform Post Implementation Review (PIR) to ensure that the changes have solved the problems without introducing new problems.
– Keep information associated to problems, including workarounds and solutions provided.
– Update the KB, so that knowledge is available to all Service support personals.
– Align with Incident management using the same categorization, providing easiness of communication between the two processes.

Management Types

– Proactive: Guided by the continuous improvement process, monitors the Service to detect problems before they become incidents.
– Reactive: analyze incidents and provide solutions and workaround.

Problem Life-Cycle

Status associated with the life cycle of a problem are:
– Detect.
– Record
– Categorize, prioritize and assigne.
– Diagnose
– Solve
– Review
– Close

Problem Life-Cycle Status

Detected

The most common ways to detect problems are:
– Incident Management:

  • Recurring Incidents.
  • Major Incidents.
  • Incidents where the cause is unknown.

– Proactive Problem Management.

Recorded

Information to be recorded:

– Identifier.
– Date of creation.
– Created by.
– Detection type: reactive, proactive.
– Source (recurring incident, serious incident, Incidents with unknown cause, proactive problem management).
– User.
– Related CIs.
– Description (inherited when coming from an incident).
– Applicable SLA.

Categorized, prioritized and assignment

The categorization must be identical to that used in the incident to be able to establish relationships between problem and incidents for easy analysis.
The prioritization criteria should be similar to those used in the Incident Management but considering, if applicable, the frequency and impact of related incidents.
The assignment will consider the prioritization and categorization.

Diagnosticado

The main goals of this phase are:
– Diagnose the root cause of the problem.
– Convert the problem into a known error.
– Update the KE DB (Known Error DB), which may be included in the KB.
– Provide a workaround for related incidents. These workarounds should be updated on the related incidents.
– Useful for problem diagnosis:
– Problem Solving techniques (brainstorming, Ishikawa, Pareto …).
– KEDB.
In this phase, it is also evaluated whether it is worth the effort required to solve the problem. For example, if we have repeated incidences of low impact and have a workaround, is not worth to fix the problem if its cost is high.

Solved

The main goals of this phase are:
– Determine the most efficient solution to apply.
– Generate an RFC (Request for a change) if necessary.

Reviewed

If the problem is solved via an RFC, before closing the problem is needed to check that the change has solved the problem and will not cause new incidents.

Closed

Problem is closed.

Interaction with other processes

Incident Management

– Generar un problema a partir de una incidencia.
– Transfer the problem workaround to the related incidents. Incidents are closed.

Change Management

– Generate a change from a problem.
– Notify to the problem assigned when change is implemented.

Knowledge Management 

– Generate topic based on a workaround.
– Generate topic based on a known error.

Challenges and Risks

Problem Critical Success Factors

A major dependency for Problem Management is the establishment of an effective Incident Management processes and tools.

Problem management needs to identify problems based on incidents (incident trends, major incidents, etc.) and link the problems with them. So, it is critical that the two processes have formal interfaces and common working practices (i.e. the same categorization of items).

Another critical success factor are:

– Support personals have a good understanding of a problems business impact in order to prioritize effectively.
– First, second and third line staff all have a good working relationship.
– Knowledge Management and Configuration Management information are available in order to help improve the process.