Disaster Recovery – Datacenters, the Cloud, and Winning Over the Business
Disaster Recovery Planning – Where to Start?
Whether you’re a newly hired IT leader or recently promoted, understanding the business continuity plan of the organization is critical to sleeping well at night and building a platform for future success. No matter the challenge or issue, the CIO is expected to prepare, test, and execute their way through any event involving IT systems, to keep the business going. As IT transitions from a necessary support system into an age of business enablement, having robust and tested DR capabilities are critical to allow for future scalability and reliability.
So, you’ve been charged with reviewing, revamping, or refreshing the “BCDR Plan”. Where do you start?
Documentation and Planning
At Thrive, our customers often look to us to assist them with planning but really this starts with documentation. We need to gather as much information about the business as possible before we can start talking “continuity”. We start by assessing the various functional units of the business, and for any new IT leader, this is a great introduction to the who, what, and why of the company and is great for building strong relationships cross-functionally. We work with the various departments of the client firm to understand what they do, how they do it, but most importantly for this exercise: with which IT software system do they do it? This is a helpful gauge to understand who the data owners are and what they have, and how it is stored/accessed. I find this is the ideal time to create a ‘data map’ of all functional units, their applications, what formats the data is stored in, how its accessed, etc. This can be as simple as an Excel Spreadsheet. Having this reference sheet to go back to, is very important not only understanding the full landscape, but also to your business continuity planning.
Once you have your ‘data map’ it’s time to meet with the functional department leaders and pressure test your opinions on how valuable this data is. We need to ask them two critical questions:
- How long can you live without access to your departments applications or shared services like email?
- This is the “downtime” conversation. Be sure to review every application you’ve uncovered during your mapping exercise
- How much data loss is tolerable, at what point does it damage our reputation or revenue, and what’s the magnitude of the loss?
- This is the “acceptable data loss” conversation – again review all applications individually, as losing a day of email isn’t always the same as losing a day of customer orders.
Recovery Timeline Objective, Recovery Point Object, and the Business
In business continuity planning there are two key terms to know, and you likely already have a good grasp of the concepts from your previous work with leadership. Here at Thrive, we constantly discuss these two terms with our customers:
- Recovery Timeline Objective – The time it takes systems to fully recover, rebuild, reload from any outage or event
- Recovery Point Objective – The amount of data loss that is acceptable to the business, typically expressed as the time between backup jobs or replication snapshots
As we discussed in the previous section, it’s critical to start with a frank conversation on these topics with every functional team leader or business stakeholder. Sometimes these can even be suppliers, vendors, or even customers that have integrations for just-in-time shipping or other logistics integrations. This allows us to form a good understanding of the systems involved and the tolerance for risk.
Creating the RPO/RTO Analysis
Based on your conversations, start to group the applications and datasets in to groups based on how critical they are to the business. This is roughly categorized by ‘Tier’ where Tier 1 is most critical, and Tier 3 is Non-Critical. This is a subjective exercise but helps to sort the information you’ve gathered. From there we advise our clients to build a table that then lists the Recovery Time Objective and Recovery Point Objective information gathered from the interviews (see figure 1).
Planning the Disaster Recovery Solution
If you’re like most businesses, you’ll have discovered a handful of on-premise systems, cloud-based systems, and SaaS applications all with different backup platforms. From here, the Thrive technical team typically assists our clients to review the current backup systems to see what they’re capable of. Can they meet the requirements from the business as described above? During our many engagements with customers, Thrive consultants frequently diagram the ‘Optimal Solution Curve’ for customers, helping them understand that as we improve our time to recovery, the solution cost increases exponentially (see Figure 2).
Figure 2: DR Optimal Solution Curve – This curve illustrates that lower cost solutions often require much longer recovery timelines, versus increasingly expensive solutions that might yield only minimal improvement in RTO.
The fastest recovery times often involve data and systems replication on an alternate site with live or ‘hot’ servers running and ready to pick up the load if there is an incident. Keep in mind, that what may seem expensive to you is all relative to the potential downtime cost or operations risk to the business. It is important that a solution is architected to encompass all datasets and systems, and meets the business requirements outlined during your discussions.
Are there compliance or regulatory requirements that management care deeply about?
In the first section, you hopefully met with the executive team and the functional group managers. This should help you understand the regulatory or compliance landscape you’re operating in. If your firm has a fiduciary responsibility to customers or operates in a regulated market you may have a requirement to test your disaster recovery strategy annually (or even more frequently). You may even be compelled by customer demands to share with them the DR strategy. Keep this in mind while creating your architecture plans and proposals to management.
Building a Compelling Business Case
Lead with the need
In order to grab the attention of your audience from the outset, immediately identify the business need you are trying to address. Begin by asking yourself, “What is the message that I’m trying to get across?”. This is made even more compelling when there is a compliance driven need, so be sure to call out your findings from your interviews.
What Metrics do management care about?
The management team of your company likely has a handful of metrics they use to manage their teams and business units. What metrics do they care about most? What metrics will you use to measure success of the DR implementation? Do they align with management’s KPIs? If so, socialize the KPIs you will measure or how you plan to mitigate risk in a measurable way. Quantifiable metrics help add weight to the story you’re trying to tell.
Be clear and concise and consistent
No matter how much time you’re allotted to present, you won’t know until you walk into the room whether you’ll actually have 5 minutes — or 50. It’s critical to have a short elevator pitch ready in the event your time is short. Take some time to review which one or two slides can be pulled out and still have the same effect. By the same token, you may be asked to do a deeper dive into one facet of your case in the middle of the presentation. That’s when having some appendix slides can be helpful, so that you can expand on certain elements of your case. You don’t need to have every data point memorized
Make sure the message is consistent to all functional groups. Ensure all teams know how their systems are supported or impacted. Many executives won’t know, for example, that their HRIS platform is SaaS-based and outside the scope of a on-premise DR plan. Keep the message consistent that the plan has incorporated all functional teams and that you’ve met with their representatives.
Lastly, try to minimize jargon to the greatest extent possible. Executives don’t need the full technical details, but they do need to know what is going on. Minimize use of acronyms, and if you must use acronyms, define them so that the team is operating/deciding with a shared understanding.
Sometimes you can put together a compelling business case with important KPIs and measurables the executive team should care about but they hear IT project and don’t want to spend the money. This is often the case where Thrive comes in and validates your disaster recovery plan and helps present to management. This can help give a second set of eyes to the plan, but also add external validation that what you’re saying is correct from trusted experts.
If you’re looking for a valuable partner to begin developing your disaster recovery plan with, contact Thrive or call us at 866-205-2810 today.