April 2016 ~
Creating a Computer Incident Response Team
- Case Study
- Creating a Project Team
- Developing an Incident Response Plan
- Implementing the Incident Response Team
- Benefits
- Resources (links, books, articles, the
lighter side)
April
2016 ~ Creating a Computer Incident Response Team
What happens in your company when a critical computer system
goes down? What happens when your customer-facing website goes down
unexpectedly? What happens when your network crashes?
With technology woven through so much of our every day business
activities, any of those situations can cripple a business for minutes, hours,
days or weeks if not dealt with properly.
If you don’t have a good plan in place for handling outages,
this article may spark some discussions about how to handle those better.
Case Study
One of the organizations I had recently joined found themselves
in deep trouble when a software patch was applied and the updated software would
not work at all. The problem was discovered just at quitting time on
a Monday. The person who first learned about the problem went home
without reporting it to management or focusing any attention on it.
The next day, internal customers were livid when none of their
office programs worked. It took several days before the problem was
identified and eventually fixed. It was later learned that the patch
caused the software to contact a server that was no longer in service.
That incident and the lack of any organized plan for handling
computer system outages became a rallying cry for a better business process to
avoid such severe customer pain. Management agreed that we needed to do a
better job and supported figuring out how to do that.
Creating a Project Team
I will use the process we developed as an example. Readers
can adapt or adjust as appropriate for their company or organization.
A team was created to come up with a better business model for
handling computer outages. In our organization, we gathered a
representative from each area of the information services department:
- Help desk
- Applications development
- Network support services
- Mainframe operations
- Desktop support services
- Security
- Website support.
Developing an Incident
Response Team Plan
Over the course of a couple months, we put together a plan that
included:
- Purpose, Scope and Approach
- Definitions
- What is an "Incident" in our organization?
- What triggers activation of the Incident Response Team?
- Post Incident Review process
- Roles and Responsibilities
- Incident Command Leader
- Communication Leader
- Documentation Leader
- Help Desk Coordinator
- Incident Response Team Members
- Management Representatives
- Communication Plan
- Communication within the Incidence Response Team
- Communication from the Incident Response Team to Impacted Stakeholders
- Post-Incident Review Communication and Incident Tracking
- Escalation Path
- Incident Response – Severity Level 1 – Low
- Incident Response – Severity Level 2 – Medium
- Incident Response – Severity Level 3 – High
While the organizing team was doing their job, we had a few
unexpected outages that allowed us to practice what we were working on and
fine-tune our process. Over a few years, it became a very well-run process
that our internal and external customers came to appreciate very much.
Implementing the Incident
Response Team
We developed a standing interdisciplinary Incident Response Team
(IRT) to lead problem identification, communication, documentation and
resolution activities once an "Incident" was determined to have
occurred. The team had pre-defined levels of authority appropriate for
incident resolution. We assigned primary members and backup members for
each role. The IRT designation was not the person's only job; their
function on the IRT was directly related to the job they already
had.
Whenever a potential issue was identified, the IRT Team met
quickly to determine whether an issue was urgent enough to activate the IRT
plan. If it was an urgent issue, we assigned a level (1 – low impact, 2
– medium impact, 3 – high impact) and quickly assigned roles: Incident
Command Leader, Communication Leader, Documentation Leader and Help Desk
Coordinator. And, assigned other staff and duties as appropriate for the
incident.
If the incident lasted more than 2 hours or was designated a
Level 3 (High Impact), a Management Representative was assigned who was
responsible for the particular area of outage. For example, if it was a
network issue, the Network Manager became the Management
Representative for the IRT team. If it was a virus, the Security
Manager became the Management Representative. If it was a software
application issue, the Software Applications Manager became the Management
Representative, etc.
We developed a 1-page Checklist that helped us identify Who,
What, Where, Why, When and How that helped to quickly gather the appropriate
resources to deal with an issue. (See example
of Checklist )
The Communication Leader would then start the notification
process to management and our internal customers giving them details about the
incident: what happened, what we were doing about it, when they would see
resolution and / or when we would provide an update. They also issued the
"Incident Resolved" notice when the problem was fixed.
We learned very early that when a technical problem happens, the
technical people who can fix it should focus all their time and energy on fixing
the problem, not dealing with customer communications and a barrage of
questions. People who can communicate with customers should be assigned
the communication role so the technical people can concentrate on what they do
best.
We developed communications templates that could be used and
updated quickly with the current incident’s information. We developed a
process for how to notify internal customers if the network or phone system went
down (e.g., walking the floors of offices, using the phone system instead of
email, using walkie-talkies, using cell phones, etc.) Our process also
included when to post notices on our internal intranet as well as external
customer-facing websites and social media about outages.
We developed a step-by-step plan for each severity level for
each role – who did what, when and how. And, when an
"Incident" was called, that incident became the IRT Team’s highest
priority. Everyone went to work to get the issue resolved as quickly as
possible.
Each business group that had a representative on the IRT Team,
also developed a written process for what they would do when there was an
Incident in their focus area. Each IRT Team member and backup person had a
3-ring notebook with their overall IRT and focus area written plans. That
gave them something they could grab quickly and check off what needed to be
done. If for any reason, someone else needed to step in for a team member,
everyone knew where to look to take over easily. We even developed
telephone scripts that could be used to broadcast a voicemail message to
appropriate people in addition to email notifications.
After an Incident was designated, the IRT team members met
quickly as needed for status updates (usually, about every 2 hours) so that we
could keep impacted stakeholders updated with progress reports regularly.
Some incidents were resolved quickly; some went on for several hours or days
before they could be resolved.
After an Incident was resolved and customers notified of
"all clear," we held a Post Incident Review meeting within a week to
review what happened, what went well, what didn’t go well and what
improvements we could make, either in the IRT process or in any other area
affected. Over time, those review meetings provided a large number of
process improvements that prevented other failures.
Benefits
The IRT team members were committed to the process, given the
authority to take action quickly and became very much respected by management
and customers. Everyone in the organization learned that if a computer
problem was noticed by anyone at all, there was a clear path to let the right
people know about it so it could get resolved. 
One of the side benefits of our Incident Response Team process
was that it helped to raise awareness in the IT staff about the customer impact
of a network change that went wrong, a new application that failed, a technology
change that caused unexpected customer disruption or the business impact of any
other type of technology outage. It also provided a clearly identified
business process, with multiple people aware of and responsible for handing any
possible service disruptions.
Computer failures happen all the time, for a wide variety of
reasons. While we can’t very often predict sudden outages, as IT
professionals, we can do a much better job helping customers know what to expect
and we can do our very best to fix a problem as quickly as possible. We certainly owe our customers the simple courtesy of letting them know when the
technology they use every day isn’t working correctly.
Customers — internal and external — expect to be treated
with respect. Having an effective computer Incident Response Plan in place
is simply good business. Not having one in place can lead to great customer
dissatisfaction, wasted time trying to identify and fix problems, and can cost a business a great deal of money and unnecessary
public relations headaches.
Books Disclosure:
We get a small commission for purchases made via links to Amazon.
- Incident Management for I. T. Departments
. Darren
O'Toole. CreateSpace, 2015. ISBN
978-1511631747
- Incident Response & Computer Forensics
— Third
Edition. Jason T. Luttgens, Matthew Pepe, Kevin Mandia.
McGraw-Hill, 2014. ISBN
978-0071798686
-
The Computer Incident Response Planning Handbook: Executable
Plans for Protecting Information at Risk —
1st
Edition. N. K. McCarthy, Matthew Todd, Jeff Klaben. McGraw-Hill,
2012. ISBN
978-0071790390
Related newsletter articles:
June 2001 - Successful Project
Management
November 2006 - Project Management
- Early Warning Signs
December 2000 - Sponsoring
Successful Projects
May 2010 - The 5 Goals of a Project
Manager
November 1996 - Management vs.
Leadership
April 2001 - Consulting Skills for
Managers
June 2004 - Successful
Stakeholdering
August 2008 - Secrets of New
Project Success
About our resource
links: We do not endorse or agree with all the beliefs in
these links. We do keep an open mind about different viewpoints and
respect the ability of our readers to decide for themselves what is useful.
If you have comments about this month's topic, please let us know or take our
newsletter survey. If you would like
to receive free notices of the new monthly topic, please sign up for our mailing
list. See our Privacy Policy.
Page updated: October 16, 2023
This page is http://www.itstime.com/apr2016.htm
Printer-friendly version
|