June 18, 2019 By Rahul Kulkarni 4 min read

Today’s business requires 24×7 availability for its critical applications. For most organizations, unexpected application downtime translates directly into loss of revenue, loss of business or loss of reputation. Companies want reliable hardware for their IT infrastructure, but regardless of how reliable your hardware and software is, you cannot eliminate server or application downtime completely. So, you need to have a high availability (HA) solution implemented to minimize downtime.

The goal of high availability is to eliminate single points of failure (SPOF) in your IT environment. Eliminating SPOFs requires a careful study of each hardware and software component involved in your IT setup and necessitates providing redundancy at every layer. Organizations whose infrastructure is built on IBM Power Systems servers can use redundant network adapters, redundant host bus adapters (HBAs), redundant virtual input/output (VIO) servers, and so forth, to eliminate SPOFs at the hardware level within a physical server (frame). But there is always a chance that the entire frame may go down. For such scenarios, IBM Power Systems clients can use local HA solutions like IBM PowerHA, Oracle RAC (for Oracle database servers), SUSE-HA (for SUSE Linux) and the like. To handle entire data center failure, clients use their disaster recovery solutions.

Redundant servers for HA means increased cost of hardware, software and maintenance. The goal of this blog post is to discuss how we can minimize the cost of local HA in a PowerHA scenario without compromising on HA capabilities.

Most common implementation scenarios of IBM PowerHA

Figure 1: PowerHA active-passive configuration

In a PowerHA active-passive configuration, shown in figure 1, the application is running on LPAR-1 on frame-1 and is moved to LPAR-2 on frame-2 when LPAR-1 or frame-1 fails. So, the application is highly available.

Figure 2: Power HA mutual takeover configuration

In a PowerHA mutual takeover configuration, shown in figure 2, there are two applications, application A and B. A is active on LPAR-1 on frame-1 and B is active on LPAR-2 on frame-2. If LPAR-1 or frame-1 fails, application A will move to LPAR-2 on frame-2, and if LPAR-2 or frame-2 fails, application B will move to LPAR-1 on frame-1.

In both scenarios, you need to have hardware resources like CPU and memory for both the active and the standby nodes. For example, if an LPAR has 20 CPU cores and 400 GB memory on LPAR-1, you need same amount of CPU and memory for LPAR-2 if you need full performance in the active-passive setup. In a mutual takeover scenario, if application A requires 20 cores/400 GB and application B also requires 20 cores/400 GB, then both LPAR-1 and LPAR-2 need 40 cores/800 GB so that if one LPAR fails, the other LPAR can handle the load of both applications.

The scenarios shown in figure 1 and figure 2 are for a single cluster, but the same applies for multiple clusters on the same set of frames. For example, frame-1 could have 10 LPARs and frame-2 could have 10 LPARs, and each LPAR may be acting as either active or standby. In summary, only 50 percent of resources are used, and 50 percent of resources are kept idle for HA standby. Based on the applications used, you may need double the application licenses for this kind of HA setup, increasing cost further.

In large Power Systems environments, some clients have hundreds of Power servers and thousands of LPARs. If this is your organization’s environment, how can you optimize your HA design to reduce hardware and application licensing costs? The following PowerHA design can help in such a case.

Many to one PowerHA design to save on hardware and software costs

Figure 3: Many to one PowerHA configuration

In the design shown in figure 3, we have a large number of LPARs on hundreds of frames acting as cluster active nodes, and one or two frames are dedicated for all cluster standby LPARs. Optionally, you may want to use idle CPU/memory of the standby frames to run some non-critical workloads when failover is not running.

In the case of failure of any cluster active node, a standby node on a standby frame will get activated, and production can continue. Because a large number of standby nodes are configured on a standby frame, initial CPU/memory configuration will be less, and PowerHA uses the Dynamic LPAR (DLPAR) feature of the Power server to dynamically increase the CPU/memory to the desired value during failover. Thus, you can replace 50 percent standby capacity for HA with around 10 percent, without losing any HA capabilities. If there are multiple frame failures at the same time, the standby frame won’t have adequate resources to start all LPARs, and the result would be data center failure. In this case, you need to switch to DR, just like with a regular PowerHA solution in the active-passive or mutual-takeover configurations.

So, this design can help you to bring down the standby HA capacity needed for local HA from 50 percent of total hardware requirements to around 10 percent and help you to save around 40 percent of infrastructure and software licensing costs.

Need help on planning and designing your Power server environment?

If you’re due for a server refresh, talk to IBM Systems Lab Services. We can provide consolidation design and HA planning to help you to optimize the required hardware and software while keeping availability and performance at the best possible level. If you want to engage IBM Systems Lab Services, you can contact your IBM client rep or reach out to IBM Systems Lab Services directly.

Was this article helpful?
YesNo

More from Cloud

Hyperscale vs. colocation: Go big or go rent?

9 min read - Here’s the situation: You’re the CIO or similarly empowered representative of an organization. Different voices within your business are calling attention to the awesome scalability and power of hyperscale computing, which you’ve also noticed with increasing interest. Now the word comes down from on high that you’ve been tasked with designing and implementing your company’s hyperscale computing solution—whatever that should be. Your organization already has an ambitious agenda in mind for whatever IT infrastructure you wind up choosing. The company…

IBM Tech Now: March 25, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 95 On this episode, we're covering the IBM X-Force Threat Intelligence Index 2024: IBM X-Force Cyber Range Combating deepfakes Stay plugged in You can check out the IBM Blog Announcements for a full rundown…

Types of 5G: Which one is right for your organization?

7 min read - 5G technology isn’t a one-size-fits-all solution that can enable digital transformation at the touch of a button. There are three kinds of 5G, each with its own specific use cases and capabilities, that business leaders need to understand. 5G wireless is broken down into three types—low, mid and high band—named for the spectrum of radio frequencies they support. Low-band 5G transmits data on frequencies between 600 and 900 MHz Mid-band 5G transmits between 1 and 6 GHz High-band 5G transmits…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters