April 2016 ~ Creating a Computer Incident Response Team

Case Study

Creating a Project Team

Developing an Incident Response Plan

Implementing the Incident Response Team

Benefits

Resources (links, books, articles, the lighter side)

Printer-friendly version

April 2016 ~ Creating a Computer Incident Response Team

What happens in your company when a critical computer system goes down? What happens when your customer-facing website goes down unexpectedly? What happens when your network crashes?

With technology woven through so much of our every day business activities, any of those situations can cripple a business for minutes, hours, days or weeks if not dealt with properly.

If you don’t have a good plan in place for handling outages, this article may spark some discussions about how to handle those better.

Case Study

One of the organizations I had recently joined found themselves in deep trouble when a software patch was applied and the updated software would not work at all. The problem was discovered just at quitting time on a Monday. The person who first learned about the problem went home without reporting it to management or focusing any attention on it. Angry woman pounding her shoe on the desk while talking on the phone

The next day, internal customers were livid when none of their office programs worked. It took several days before the problem was identified and eventually fixed. It was later learned that the patch caused the software to contact a server that was no longer in service.

That incident and the lack of any organized plan for handling computer system outages became a rallying cry for a better business process to avoid such severe customer pain. Management agreed that we needed to do a better job and supported figuring out how to do that.

Creating a Project Team

I will use the process we developed as an example. Readers can adapt or adjust as appropriate for their company or organization.

A team was created to come up with a better business model for handling computer outages. In our organization, we gathered a representative from each area of the information services department:

Help desk
Applications development
Network support services
Mainframe operations
Desktop support services
Security
Website support.

Developing an Incident Response Team Plan

Over the course of a couple months, we put together a plan that included:

Purpose, Scope and Approach
Definitions

What is an "Incident" in our organization?
What triggers activation of the Incident Response Team?
Post Incident Review process

Roles and Responsibilities

Incident Command Leader
Communication Leader
Documentation Leader
Help Desk Coordinator
Incident Response Team Members
Management Representatives

Communication Plan

Communication within the Incidence Response Team
Communication from the Incident Response Team to Impacted Stakeholders
Post-Incident Review Communication and Incident Tracking

Escalation Path

Incident Response – Severity Level 1 – Low
Incident Response – Severity Level 2 – Medium
Incident Response – Severity Level 3 – High

While the organizing team was doing their job, we had a few unexpected outages that allowed us to practice what we were working on and fine-tune our process. Over a few years, it became a very well-run process that our internal and external customers came to appreciate very much.

Implementing the Incident Response Team

We developed a standing interdisciplinary Incident Response Team (IRT) to lead problem identification, communication, documentation and resolution activities once an "Incident" was determined to have occurred. The team had pre-defined levels of authority appropriate for incident resolution. We assigned primary members and backup members for each role. The IRT designation was not the person's only job; their function on the IRT was directly related to the job they already had.

Whenever a potential issue was identified, the IRT Team met quickly to determine whether an issue was urgent enough to activate the IRT plan. If it was an urgent issue, we assigned a level (1 – low impact, 2 – medium impact, 3 – high impact) and quickly assigned roles: Incident Command Leader, Communication Leader, Documentation Leader and Help Desk Coordinator. And, assigned other staff and duties as appropriate for the incident.

If the incident lasted more than 2 hours or was designated a Level 3 (High Impact), a Management Representative was assigned who was responsible for the particular area of outage. For example, if it was a network issue, the Network Manager became the Management Representative for the IRT team. If it was a virus, the Security Manager became the Management Representative. If it was a software application issue, the Software Applications Manager became the Management Representative, etc.

We developed a 1-page Checklist that helped us identify Who, What, Where, Why, When and How that helped to quickly gather the appropriate resources to deal with an issue. (See example of Checklist )

The Communication Leader would then start the notification process to management and our internal customers giving them details about the incident: what happened, what we were doing about it, when they would see resolution and / or when we would provide an update. They also issued the "Incident Resolved" notice when the problem was fixed.

We learned very early that when a technical problem happens, the technical people who can fix it should focus all their time and energy on fixing the problem, not dealing with customer communications and a barrage of questions. People who can communicate with customers should be assigned the communication role so the technical people can concentrate on what they do best.

We developed communications templates that could be used and updated quickly with the current incident’s information. We developed a process for how to notify internal customers if the network or phone system went down (e.g., walking the floors of offices, using the phone system instead of email, using walkie-talkies, using cell phones, etc.) Our process also included when to post notices on our internal intranet as well as external customer-facing websites and social media about outages.

We developed a step-by-step plan for each severity level for each role – who did what, when and how. And, when an "Incident" was called, that incident became the IRT Team’s highest priority. Everyone went to work to get the issue resolved as quickly as possible.

Each business group that had a representative on the IRT Team, also developed a written process for what they would do when there was an Incident in their focus area. Each IRT Team member and backup person had a 3-ring notebook with their overall IRT and focus area written plans. That gave them something they could grab quickly and check off what needed to be done. If for any reason, someone else needed to step in for a team member, everyone knew where to look to take over easily. We even developed telephone scripts that could be used to broadcast a voicemail message to appropriate people in addition to email notifications.

After an Incident was designated, the IRT team members met quickly as needed for status updates (usually, about every 2 hours) so that we could keep impacted stakeholders updated with progress reports regularly. Some incidents were resolved quickly; some went on for several hours or days before they could be resolved.

After an Incident was resolved and customers notified of "all clear," we held a Post Incident Review meeting within a week to review what happened, what went well, what didn’t go well and what improvements we could make, either in the IRT process or in any other area affected. Over time, those review meetings provided a large number of process improvements that prevented other failures.

Benefits

The IRT team members were committed to the process, given the authority to take action quickly and became very much respected by management and customers. Everyone in the organization learned that if a computer problem was noticed by anyone at all, there was a clear path to let the right people know about it so it could get resolved.

One of the side benefits of our Incident Response Team process was that it helped to raise awareness in the IT staff about the customer impact of a network change that went wrong, a new application that failed, a technology change that caused unexpected customer disruption or the business impact of any other type of technology outage. It also provided a clearly identified business process, with multiple people aware of and responsible for handing any possible service disruptions.

Computer failures happen all the time, for a wide variety of reasons. While we can’t very often predict sudden outages, as IT professionals, we can do a much better job helping customers know what to expect and we can do our very best to fix a problem as quickly as possible. We certainly owe our customers the simple courtesy of letting them know when the technology they use every day isn’t working correctly.

Customers — internal and external — expect to be treated with respect. Having an effective computer Incident Response Plan in place is simply good business. Not having one in place can lead to great customer dissatisfaction, wasted time trying to identify and fix problems, and can cost a business a great deal of money and unnecessary public relations headaches.

Internet Resources

RBS fined £56m over 'unacceptable' computer failure http://www.bbc.com/news/business-30125728
Computer failure hits BA flights http://www.dailymail.co.uk/news/article-195165/Computer-failure-hits-BA-flights.html
Computer failure leads to flight chaos http://www.dailymail.co.uk/news/article-109001/Computer-failure-leads-flight-chaos.html
Computer Incident Response Plan http://isowiki.tulane.edu/Tulane_Information_Security_Policies/Tulane_University_Computer_Incident_Response_Plan
Incident Response Plan Template – The Essential Elements http://www.acunetix.com/blog/articles/incident-response-plan-template/
Responding to IT Security Incidents https://technet.microsoft.com/en-us/library/cc700825.aspx
How to Design a Useful Incident Response Policy http://www.symantec.com/connect/articles/how-design-useful-incident-response-policy
The incident response plan you never knew you had http://www.csoonline.com/article/2975277/business-continuity/the-incident-response-plan-you-never-knew-you-had.html
Guidance for Incident Response Plans https://www.dorsey.com/newsresources/publications/client-alerts/2015/05/guidance-for-incident-response-plans
Incident Response Plan Example (under Incident Management) http://www.cio.ca.gov/OIS/Government/library/samples.asp#Incident-Mgtm
Incident Response Planning Guideline https://security.berkeley.edu/incident-response-planning-guideline
Incident Response Plan Template - http://www.oregon.gov/DAS/CIO/ESO/Pages/SIRT.aspx
Are you prepared for a Cybersecurity Attack? http://blog.aicpa.org/2015/09/are-you-prepared-for-a-cybersecurity-attack.html
Incident Response Plan http://admin.utep.edu/Default.aspx?tabid=63604
Tips for Starting a Security Incident Response Program https://zeltser.com/security-incident-response-program-tips/
How-to for Dealing with Computer Security Incidents http://www.nist.gov/itl/csd/sp800-080812.cfm
Data Security Breach Incident Response Plan www.wou.edu/ucs/policy/WOU_Incident_Resp_Plan.pdf (PDF file)
CIRT Sample Policies http://csirt.org/sample_policies/index.html (scroll down to "Incident Handling Procedure")
How good is your cyberincident-response plan? http://www.mckinsey.com/business-functions/business-technology/our-insights/how-good-is-your-cyberincident-response-plan
Information Services Security Incident Response Policy http://www.upenn.edu/almanac/volumes/v53/n18/or.html
Incident Response Plan http://www.comptechdoc.org/independent/security/policies/incident-response-plan.html
Creating an Incident Response Plan https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/3/html/Security_Guide/s1-response-plan.html

Books Disclosure: We get a small commission for purchases made via links to Amazon.

Incident Management for I. T. Departments. Darren O'Toole. CreateSpace, 2015. ISBN 978-1511631747

Incident Response & Computer Forensics — Third Edition. Jason T. Luttgens, Matthew Pepe, Kevin Mandia. McGraw-Hill, 2014. ISBN 978-0071798686

The Computer Incident Response Planning Handbook: Executable Plans for Protecting Information at Risk — 1^st Edition. N. K. McCarthy, Matthew Todd, Jeff Klaben. McGraw-Hill, 2012. ISBN 978-0071790390

Articles

Related newsletter articles:
    June 2001 - Successful Project Management
    November 2006 - Project Management - Early Warning Signs
    December 2000 - Sponsoring Successful Projects
    May 2010 - The 5 Goals of a Project Manager
    November 1996 - Management vs. Leadership
    April 2001 - Consulting Skills for Managers
    June 2004 - Successful Stakeholdering
    August 2008 - Secrets of New Project Success

The Lighter Side

40+ (Funny) Error Messages You've Never Seen Before http://www.hongkiat.com/blog/40-funny-error-messages-youve-never-seen-before/

About our resource links: We do not endorse or agree with all the beliefs in these links. We do keep an open mind about different viewpoints and respect the ability of our readers to decide for themselves what is useful.

If you have comments about this month's topic, please let us know or take our newsletter survey. If you would like to receive free notices of the new monthly topic, please sign up for our mailing list. See our Privacy Policy.

Page updated: October 16, 2023

This page is http://www.itstime.com/apr2016.htm Printer-friendly version

Contact us

Online Newsletter

April 2016 ~ Creating a Computer Incident Response Team

April 2016 ~ Creating a Computer Incident Response Team

Internet Resources

Books Disclosure: We get a small commission for purchases made via links to Amazon.

Articles

The Lighter Side