Before You Pull the Plug: Understand What it Takes to Get the Most From Your Cloud Provider
Article originally published here on MSDynamicsWorld.com.
As organizations shift their applications and infrastructure to the cloud, they may encounter challenges along the way. Any time a change is introduced, an organization increases its risk for issues such as outages or a slower user experience. It may leave you wondering whether such problems are related to the cloud or the application? And is this the best performance you can expect?
In this article, I'll discuss how to investigate the source of many issues you may encounter when launching your cloud-based solutions, how to establish on-going dialog about service-level agreements (SLAs), and how to ensure your organization is getting the most from its cloud provider. Overcoming concerns about your cloud deployment can be achieved by knowing what questions to ask when, and by not waiting until there's a problem to seek clarity on the terms of an SLA.
Let's take "Company X" as an example. This company moved their Dynamics ERP to the cloud recently. If the company begins experiencing slow speeds or drops in the accessibility of their systems or user issues with missing reports or printers that won't print, the first thought may be regret. Perhaps there is second-guessing of the move into the cloud, or the company that they have selected as their cloud or managed hosting provider. Before any major conclusions are made, some specific insights can help track down the problem and then identify next steps.
Some of the most common issues affecting application performance include:
- Latency (the delay from input into a system to the desired outcome)
- Noisy Neighbor (or a co-tenant that is heavy on bandwidth, or system resource use, and can negatively affect other users' cloud performance)
- Cumbersome experience that is not as seamless as anticipated
- Not being able to locate a file, or connect/find a printer
- Undersized (and therefore underperforming) deployment
- Lost reports
A logical first step in the investigation should be to examine whether the issues are with the cloud platform, or whether the problems reside with the application. Changing to a new cloud provider or reverting to an on-premise deployment can actually serve to replicate application issues and exacerbate the problems or challenges.
The next step includes troubleshooting the issues to determine if a system connection is not working: At fault may be the wide area network (WAN). Or if the WAN has been ruled out, and questions remain about the RDS server as the cause of slow speeds, then you might try to run the client from another server, or directly on the SQL server. But if slowness continues, then it's important to make sure there's a database administrator available to help in running SQL traces. These checks can determine if there is a disk challenge like insufficient IO to execute the query.
Once an application issue and a system connection issue is ruled out, it is time to discuss performance with your cloud provider. It's more than likely that your organization put tremendous resources of time and energy into finding and selecting a cloud services provider. That relationship should allow for clear discourse and dialog between parties to find a solution that minimizes disruption and, if possible, avoids migration. You also need to know the right questions to ask in order to collaborate toward improving your system performance. These questions include:
- Are we operating in a single tenant environment? Remember that some cloud providers deploy in a multi-tenant deployment model. This is a good option in theory, as it may help to keep costs low, but there might well be shared components that are affected if another client is that "noisy neighbor", running a long report or another large workload.
- How do you handle CPU ready / Wait time? In asking this question, bear in mind that there is a specific issue with Hyper-visors that needs to be considered. When a machine is assigned four processors, it has to wait for four cores to become available in order to process the request. This means that in highly dense virtual environments, fewer processors can actually result in better performance, a fact that seems counter-intuitive.
- What are the requirements for availability groups, (either in terms of zones or sets) in order to ensure the highest percentage of uptime? Sometimes an organization may believe that there is an SLA in place in terms of availability and deploy with a cloud provider only to find out after the fact that a standard deployment doesn't include any uptime guarantees. Or the cloud provider requires workloads to be deployed in availability groups in order to achieve an uptime guarantee, or more importantly just avoid outage.
- Does your provider offer a Service Health Dashboard to determine when specific services are unavailable? This can be invaluable when trying to identify the specific source and nature of an issue when an outage scenario begins to unfold. By using a Service Health Dashboard, it may be possible to assess whether all services have been interrupted, whether the issue is with the local network, and what degree of functionality clients are still accessible.
- When will planned maintenance occur? The answer to this question can affect some organizations profoundly. For example, organizations with offshore deployment may be affected more by cloud providers that tend to do maintenance at 1:00 or 2:00 a.m. While this may be convenient for some companies, others may have customers that are affected severely during specific hours internationally.
- Can you spell out the details of your disaster recovery plan (DRP)? Asking this question will help your organization consider the implications of their recovery point objectives (RPOs) and recovery time objectives (RTOs). An assumption exists that disaster recovery is part of the standard offering of every cloud services provider. Rather than make this assumption, only to find a different reality when your organization is in need of disaster support, clear communication now can help you understand what you need to do to prepare for any such event.
If it becomes clear that a migration from your current cloud deployment is necessary, there are steps you can take to ensure successful migration.
- First on the list is to choose from the possible ways that migration can be completed, which may include:
- Data Backup and Restore - the existing data is taken into a new environment.
- Replication of VMs - replicating the whole environment, through close collaboration with the existing provider.
- Start fresh with all new data and leave the historic information at the old location.
- Recognize that collaboration is key, and that you need your existing partner to continue to provide service. You may need to ask your existing partner to provide services to help you leave their environment and complete the migration. In effect, you may be asking them to help you take your business elsewhere.
- And throughout the migration, if you've already gone live in your deployment and you can't interrupt your production users, it will be necessary to coordinate the timing for when the change will take place.
Building a mutually rewarding relationship with your cloud provider should be an ongoing process: from evaluation and selection, through maintaining that relationship once it has been established. Throughout that evolving relationship, it's important to be clear about how that provider operates, what are the details of the service level agreement, and how those terms affect the cloud-based needs of your organization. Given the intricacies and interdependence that define this relationship, it is critically important to ensure clarity of communication, understand which questions should be asked when, and know how to help your cloud services provider to protect the integrity of your organization's systems over time.For more on optimizing your cloud deployment, check out our free white paper: "Top 10 Cloud Myths Debunked," a Guide to Navigating to the Cloud - Maximize Operational Efficiencies and Minimize by Avoiding Common Cloud Myths.