Tuesday, August 7, 2012


VMware Site Recovery Manager 5  & Netapp SRA 2.0 Failure

(Unable to export the NAS device Ensure that the correct export rules are specified in the ontap_config.txt file  )


VMware Site Recovery Manager and SRA installed and configured , Sites and Resource mappings configured correctly , Netapp Protected site volumes Were mirroring correctly , SRA 2.0 array manager installed and configured correctly , Protection groups defined and configured correctly.

When i attempted to run a recovery plan for any of our sites , i would receive an error on the “recovery steps” tab which stated “Error - Failed to recover datastore 'Vol1'. Failed to create snapshots of replica devices. Failed to create snapshot of replica device /vol/Vol1m. SRA command 'testFailoverStart' failed for device '/vol/Vol1m'. Unable to export the NAS device Ensure that the correct export rules are specified in the ontap_config.txt file  

I Checked the content ontap_config.txt file , this file is used to define the R/W and Root hosts for accessing the cloned export of the mirrored production volume , i confirmed that the VMkernel IP’s for the NFS VMkernel were listed. 

I reset and reran the SRM test and examined the VMware-DR-XXX.log file  
Here i could see that the cloned export of the paging volume came online 

“--> 07-08-2012T10:34:35  Export /vol/testfailoverClone_nss_v10745371_volpagem has root & r/w IP=10.10.10.1”

But the production volumes failed

--> 07-08-2012T10:34:35  Checking existence of storage device /vol/Vol1m
--> 07-08-2012T10:34:35  Storage device /vol/Vol1m is a NFS export
--> 07-08-2012T10:34:35  Creating test Clone volume testfailoverClone_nss_v10745371_Vol1m
--> 07-08-2012T10:34:59  Mapping Export /vol/testfailoverClone_nss_v10745371_Vol1m
--> 07-08-2012T10:34:59  Modify the exportfs for path /vol/testfailoverClone_nss_v10745371_volm2
--> 07-08-2012T10:34:59  Modify failed with error: No such file or directory
  

 I logged on to the recovery site filer and had a look at the exports file ,
In there i could see that volpagem had the correct VMkernel IP’s listed as R/W and Root hosts              but for the other two production volumes , the default “All hosts” was listed for both R/W and Root hosts , after much searching i found this post :  

http://communities.vmware.com/message/2051567

The key point being  
“This error is caused by a flaw in the NetApp SRA 2.0. If you have an "-actual" statement in the /etc/exports file on the snapmirror destination filer the SRA will fail to create the flexvol-sharename. So if you are carefull and only use sharenames that equals volumenames for all shares (!) then you avoid the "-actual" statement and the SRA seems to work.

NetApp has confirmed this to be a bug in the SRA.

I read this to mean that if the production export contains  –actual it will cause it to fail , i confirmed that none of the production exports were actual path exports , I thought i hit a brick wall until a colleague noticed that one of the unrelated exports contained an “-actual statement” i confirmed the export was not in use and removed it.

I reran the test and it succeeded.