Big data is a relatively new discipline. In its initial days, only a handful of large-scale IT enterprises had the infrastructure and resource to support big data initiatives. But the scenario is changing rapidly. With big data finally becoming a mainstream business, quite a lot of IT companies are now exploring this avenue to provide their clients with insightful business intelligence.
The bigger concern, however, is whether the companies that deal with big data have proper disaster recovery plan in place to recover from a disastrous outage.
Let's face it. The majority of small to mid-sized businesses can't afford to build their own DC-DR infrastructure. Building DR capability has always been an expensive proposition. The funds required to maintain a secondary data center often rule that option out immediately. Opting for a cloud-based disaster recovery environment seems to be the most viable alternative for those companies.
While cloud-based disaster recovery strategy is getting widespread acceptance among big data initiatives, there are quite a few detractors as well. You need to carefully evaluate the pros and cons of disaster recovery in the cloud before making the final decision.
In the name of optimism, let's start with the pros. Listed below are the major benefits of using cloud for big data disaster recovery:
Reduction in Cost
The biggest advantage of cloud is from the cost perspective. You don't need to make any initial investment on DR infrastructure. The entire setup, including the servers, storage devices, and network equipment, is owned and maintained by the cloud service provider. You just need to pay a monthly fixed price to use their infrastructure.
Theoretically, a cluster of cloud servers can be infinitely scalable. It eliminates the physical capacity issues that plague traditional systems. As your data grows bigger, additional storage capacity gets allocated automatically. However, you are billed strictly on a consumption basis, so you only pay for the resources that you are actually using. This pay-as-you-go model can significantly reduce your procurement and provisioning headaches.
Cloud-based disaster recovery solutions offer the ability to replicate data automatically. With synchronous replication, real-time copies of your data can be created and stored on the cloud server. It ensures that you'll be able to recover the most up-to-date information in case of an emergency. You can even implement auto-failover and switchover to the cloud-based system when your primary data center goes down due to planned maintenance or unscheduled outage.
In a traditional offsite replication setup, you may need to reload and reconfigure your applications first, before restoring your data, which can unnecessarily delay the process of restoration. Cloud-based DR can replicate your entire environment, including your data, applications and their configurations. In the event of a disaster, it would allow you to keep the downtime to an absolute minimum.
While all those pros may make cloud-based big data disaster recovery plans seem like a no-brainer, every benefit has its downfall. Check out the cons below to get a better idea of what you're in for:
Data integrity is a fundamental component of information security compliance. Retaining the accuracy and consistency of data is of the highest importance, no matter whether you are looking for a simple hard disk recovery or a cloud-based disaster recovery solution. C
loud environments are prone to security breaches related to broken authentication tokens, cross-site scripting attacks, sensitive data exposure vulnerabilities and brute-force attacks, which may result in data manipulation or data loss. Make sure that the access gateways and authorization functions are thoroughly protected to prevent information security and data privacy violations. Moreover, additional security measures must be implemented to ensure that the data is kept encrypted during inactive mode (stored on server disks) and while at transit.
When you store data in the cloud, you must trust a third-party to keep it safe. Involvement of a third-party service provider may not go down well with your clients. Companies that deal with stringent compliance regulations may not agree to place its sensitive data in the cloud. It's a genuine concern as the third-party could potentially access sensitive business data of your clients with malicious intent.
Dependency on Network Latency and Bandwidth
When internet is used for data transport, bandwidth and latency constraints must be taken into consideration. The overall performance of a cloud-based DR is entirely dependent on the provider’s network connectivity. As big data projects deal with massive data volume, slow network connection can adversely impact the restoration process. Bandwidth limitation may result in missed recovery point and recovery time objectives.