August 27, 2008

Breaking Through the Confusion about Disaster Recovery and High Availability

Virtually every company we talk to needs both disaster recovery solutions to recover their systems and data after a major disruption, and high availability to keep key applications always available. In my discussions with companies considering our everRun software, I’ve heard a lot of them say that they are confused by many vendors’ claims and counter-claims for DR and HA. One of the biggest sources of confusion is that some vendors with solid products for disaster recovery are trying to pass off their DR solutions as reliable HA solutions. If the feedback I’m getting is any indication, these DR solutions posing as HA solutions just don’t work.

It’s not hard to see why a DR solution doesn’t make a good HA solution. With a product that is good at DR, in most cases getting the data across to the other location is pretty straightforward. But when you try to use the same solution to get both the application and the data across to use it for HA, well that’s where it breaks down. Let’s look at why.

A good DR product is usually fairly easy to set up for data replication to another site. But setting up the same product to restart the whole thing, application and data, when a failover occurs is complex and prone to errors. To set it up, you have to script all the pieces to make it happen – fault detection, client redirection to the DR site, application reset, and the list goes on. No wonder we so often hear that scripted-DR-for-HA doesn’t work consistently – there are too many moving parts that have to managed and monitored. In addition, no matter how minor a failure is, failover to the remote site is required. Not every failure you face is a disaster; therefore each failure should not be treated as one. Based on these horror stories, we thought it was a good idea to put together this webinar, Breaking Through the Confusion about DR and HA. I hope to help you better understand when, how, and why DR is the best fit to meet your requirements, when to use an HA solution and how to combine the two for optimal protection.

Interested? You can register here.

June 30, 2008

Virtualization and Availability Webinar Q&A Continued

Following last week’s discussion, event attendees had additional questions that we didn’t get to answer even though we went ten minutes over. We wanted to continue the discussion here on our blog so we figured we would post the continuation of questions and answers for everyone to see. As we mentioned before, if you would like to view the presentation delivered last week by John Humphrey’s (IDC), Simon Crosby (Citrix) and Jerry Melnick (Marathon), download the presentation here.

Are there any performance limitations with everRun VM?

everRun VM supports any guest environment created by XenServer, including multi-CPU VM’s.

Effect of losing inter-server link?

As a best practice we recommend two Availability Links for redundancy. If one should be lost, we will continue to operate unaffected using the remaining one. If both are lost we will take action to prevent complete loss of the VM or SplitBrain.

How far apart can the two machines be – i.e. is there a propagation delay issue?

Host separation is a factor of network latency, which must be <10ms round trip. Current deployments have exceeded 100 miles.

In case of a disk failure, does everRun rebuild the disk from the good physical host to the bad one?

Correct. Recovery of storage is handled as a background task so as not to require downtime or otherwise impact the running VM and application.

When will level 3 of everRun VM be available?

Level-3, System-Level fault tolerance is scheduled for later this year.

What requirements are associated with the everRun Level 3 Protection? (Bandwidth, latency, etc.)

Network and configuration requirements are the same for level-2 and level-3 protection.

Is StorServer a similar or competitive product to everRun?

StorServer is a backup appliance, not a fault-tolerant availability solution, and addresses very different requirements. It would be more complimentary then competitive.

What virtual machines (VMware, Parallel, etc) are supported by Marathon?

Currently only Citrix XenServer, however future plans are to expand upon this.

Are there certain applications that are not suited for everRun, such as I/O or compute intensive apps? Home does DR configurations affect performance?

This is very dependent on the configuration of the server, the VM, the storage and all other components. Appropriate best practices should be followed to ensure optimal performance for all applications.

Can Marathon support physical to vm HA? Does Marathon’s product fully support FC/iSCSI SAN shared storage between protected physical and/or vm pairs? Does Marathon product support local site HA server pair with a third node at a remote site in the event of site failure? Does Marathon product have latency limitations?

Marathon offers solutions for physical and virtual servers. These solutions utilize the same proven fault tolerant technologies however are independent of each other. everRun VM supports any type of storage that is supported by XenServer. Fault tolerance is configured using two VM’s. However we will soon be releasing an asynchronous solution that will allow a third replicated system at a local or remote site. Because everRun VM is a synchronous solution there is a latency requirement of 10ms round-trip between hosts. Our asynchronous solution will not have any latency requirements.

What is the pricing of everRun VM?

everRun VM lists at $4500 when bundled with XenServer Enterprise, and $2000 if you already have XenServer.

Thanks for all of your interest and questions.

June 26, 2008

IDC, Citix and Marathon Discuss The “Best of VMWorld Approach” to Virtualization and Availability

Posted by: Brian Mullins

There was a great turn out for the joint Citrix and Marathon Webinar today, The “Best of VMWorld Approach” to Virtualization and Availability. Thanks to everyone for attending. If you missed it or want more information visit here to download the presentation.

There were a lot of great questions for Simon Crosby and Jerry Melnick, which we have captured below. If anyone has any additional questions, feel free to leave a comment here on the blog or contact us directly.

Simon: Do you need 64-bit hardware to try out the express edition?

All modern server hardware is 64 bit enabled. Xen uses all of the modern features of Intel VT or ANDV to perform hardware virtualization of Windows; so the answer is yes but if you have a modern server you’re in good shape.

Jerry: How does everRun VM’s second level of availability differ from VMware HA?

One of the key pieces is that we compute through the failures of any I/O fault or failure, and then automatically redirect I/O to the device that survives it. In VMware HA, the failure of an I/O device isn’t necessarily detected or managed it’s just host failure. We are managing virtual machine failures in related I/O devices.

The second piece is that we’re doing active validation of all the devices so that we know at all times if all the resources are available and that they can actually be utilized in the case of recovery. If you don’t have active validation, such as with VMware HA, you can failover your VM and get to the other side but you may find that the device which handles the disk isn’t actually operational because of either a failure in the hardware or some kind of problem administratively with how you configured it.

Simon: Is there an extra cost associated with XenCenter?

No, it’s just a pre-feature of the product. Our architecture does not require something like virtual center because every server in the resource pool redundantly has every piece of information for the entire resource pool. Should any server fail, we automatically elect a pool leader from the remaining servers and all mainstream information is highly available as a result.

XenCenter itself is a perfect thin client UI which interfaces with as many resource pools as you want to run, but it is literally a thin UI – it’s stateless, and all of the state related to managing the infrastructure is in the infrastructure itself, which allows us to really scale this architecture.

Jerry: In the demonstration you gave, are users hitting both hosts in the exchange application being protected?

In the level 3 fault tolerance configuration we are running both hosts redundantly which is what you need to do for full system-level fault tolerance. In level 2 the amount of resource being utilized is less because you’re actually only running a virtual machine on one of the hosts, but you’re running I/O on both of the hosts. In level one you’re running at the next level down, with only a single VM allocated and no preallocation of the secondary side, with all I/O processing on just that one side. There is no active redundancy. That’s why we provide the different levels; so that you can choose which virtual machines really need to use that resource and have that ability, and which ones you want to make some trade-offs of availability versus resource utilization.

Simon: What does it mean to have a 64-bit hypervisor and why is that better?

If you have a 64-bit hypervisor then you can host both 32 and 64-bit guests and you don’t have any issues really to address space conversion problems. It’s a cleaner architecture, the memory architecture scales massively up to four terabytes (not that you can buy a server that has four terabytes of thin slots), but it allows us to massively scale the memory and CPU of the system. We support up to 32 physical CPUs and a box as a result, and we have an architecture that is going to scale superbly for us.

Jerry: Do you need a dedicated LAN to run everRun VM?

The only dedicated LAN we use, and can actually be shared because of the flexibility of XenServer itself, is what we call the availability link which is part of our best practices. Otherwise it’s all a standard LAN configuration that you would have in the XenServer pool.

Jerry: Are there certain applications that are not suited for everRun?

Our technology is completely transparent – relative to the application itself. Any windows application that you run on a Windows VM can be run by our technology.

Jerry: Is it possible to combine XenMotion with everRun VM?

As part of our capability, the ability of motion of VM from one host to the next is extremely integral to it. You get the capabilities to provide recovery from failures as well as to be able to have planned downtime and migrate your VMs when you want to do a repair. It’s an integral part of the product and we use XenMotion as the backbone of it. One difference with everRun VM is that we allow this motion capability without the need for a shared-LUN, or SAN, storage subsystem.

Simon: How would a current ESX 3.x customer migrate to a Xen environment and why should they do that?

There are free tools available to do this which can be downloaded off our forums and indeed Microsoft has similar free tools available. Here’s why you would do it: we guarantee that Citrix XenServer VMs are literally compatible with Microsoft Hyper-V. They’re also compatible with every other Xen implementation. What I see emerging is essentially two camps: A camp in which there is an open architecture (Microsoft storage architecture is very similar to XenServer, it’s also an open architecture) where you’ll have a bunch of virtual infrastructures out there from different vendors all of which are interoperable; and then a camp where there’s VMware.

The reason to move to XenServer is that we are fundamentally focused on a rich ecosystem of value added providers. We are diametrically opposed to an architecture which presumes that everything comes from one vendor, and where the entire architecture is dictated to you. The moment you invest in an architecture which is one size fits all (cost aside) you will find that it has limitations.

I am starting to see that the one size fits all architecture, which has done VMware a great favor for its first 10-15% of the market, is starting to show signs of age as we look at new use cases. For example, for desktop virtualization or for high availability, you can’t do this with that architecture and its no surprise then that at VMWorld the awards for innovation go to open architecture and best of breed vendors – Marathon at VMWorld winning the award for fault tolerance. We are dedicated to an open architecture and best of breed.

Jerry: Is Marathon planning to protect Linux based VMs in the near future?

Our road map will extend over the next year to protect all the hosts that are supported by XenServer.

Simon: Can you give a rough idea of the performance overhead of a virtual server vs. a real server?

It’s highly dependent on the workload. Typically we see between .5% and 2% overhead even for very I/O intensive workloads. For Windows it’s notionally higher. The great thing about this is that we are writing the hardware code; unlike my friends at VMware who are still tied to software implementation of virtualization in which they have to patch the binary of a running guest operating system. We ride the hardware improvement curve of Intel and AMD. What we’ve seen there is roughly a three-fold performance increase per year. Typical overheard for virtualizing Windows guests is around 3-5%. The most intensive workload I have ever seen is in fact Windows Terminal Services or our own Citrix Presentation Server where we currently stand at about 8% overhead.

Jerry: Does everRun VM support shared storage?

Yes. We actually support any capability in storage. Whatever kind of LUN that you can present to XenServer and carve up into a storage repository or a VHD, we will support that. If it’s a local disk, low-end RAID storage or just a bunch of disks we will support that, as well as high-end SAN storage. The advantage of the product is that we will support local storage for very low-end small environments.

Jerry: How far can the servers be separated?

It is not a matter of actual distance but rather a matter of network connectivity between the two hosts. We have systems currently deployed with separation of greater than 100 miles.

November 09, 2007

In case you missed it…

Posted by: admin

Last week our own Jerry Melnick sat down alongside Chris Wolf, an analyst for the Burton Group, and Simon Crosby, CTO of XenSource, for a Webinar to discuss the new technology bringing fault tolerant-class availability to virtual environments. Overall feedback was positive, with comments that the session was informative and thought provoking. For those of you that may have missed it (or those that just can’t get enough) we posted a recording of it here at the bottom of the page.

Listen, Share, Enjoy!

October 29, 2007

Webinar - The “Best of VMworld” Approach to Protecting Virtual Machines

Posted by: admin

Presenters:
Chris Wolf, Analyst - Burton Group
Simon Crosby, CTO – XenSource
Jerry Melnick, CTO – Marathon Technologies

At VMworld 2007, Marathon won Best of VMworld – New Technology for bringing fault tolerant-class availability to virtual environments for the first time. Attend this webinar to learn from Chris Wolf of Burton Group how to get the most out of your virtual environment. Simon Crosby of XenSource will explain how XenEnterprise v4 simplifies virtualized DR and availability. And Jerry Melnick of Marathon will demonstrate how you will be soon able to protect business critical applications with fault tolerant virtual machines.

November 1st, 2007 - 11:30AM-12:30PM Eastern Daylight Time (GMT -04:00, New York)

Register for this Webinar

June 20, 2007

Webcast — everRun- It’s What’s Next After Clusters for Application Availability

Posted by: admin

Last week we had the opportunity to speak with the folks at Tech Target and discuss how large and small companies are keeping their Windows Server based applications up and running without the hassle of clustering. Michael Bilancieri, our Director of Products, took time to address the market confusion and noise around the topic of availability.

For example he uses the illustration below to describe various levels of availability. As noted before, there are many misconceptions regarding “availability” so we’ve created this diagram to illustrate the different layers and help explain where everRun fits in the mix.
availability-level.jpg

We encourage you to listen to the webcast which can be found here and send us any questions or comments you may have.