Wednesday, July 25, 2012

Powershell script to alert for missing snapshots

We had a problem with SMVI not taking backups and to make matters worse not alerting us to the fact that it was not taking backups , the software can only alert if the backup job fails or if it generates warnings which is fine.
But what happens when the job doesnt start , the SMVI service doesnt stop and nothing alerts you to the fact that no backups are taken... this was the second time this happened , the first time could be attributed to a random occurrence but the second time it happens ... well then your waiting to get caught with your pants around your ankles...
I made a short script , scheduled to run each evening , which would send a mail if there were no snapshots less than a day old.

#gets todays date , stores it in the format day-month-year in the variable $nowDate
$NowDate =get-date -uformat "%d-%m-%y"

#Loops through c:\snapshotsourcelist.txt assigning one line to the variable $snappath
For ($file = [System.IO.File]::OpenText("c:\snapshotsourcelist.txt");
     !($file.EndOfStream); $snappath = $file.readline()) 
{
#recursively scan through all subdirectories of $snappath
$Snapshotlist = get-childitem $snappath -recurse |
#we are only interested in files and not directories
where-object {$_.mode -notmatch "d"} |
#cant scan this for some reason so excluding it from the search criteria
Where-object {$_.name -notlike "*iegwydc01*"} |
#returns files which have a datestamp of today
where-object {$_.lastwritetime -gt [datetime]::parse($nowDate)}

#if the variable is empty then send a mail
if (!$Snapshotlist) {

write-host 'Snapshot Alert , No Snapshots for' +$snappath 'were taken since 00:00 last night'

        $SmtpClient = new-object system.net.mail.smtpClient
        $SmtpServer = "mailserver.domain.com"
        $SmtpClient.host =
        $SmtpServer

        $From = "ie-dlitalerts@domain.com"
        $To = "Darragh@domain.com"
        $Title = 'Snapshot Alert , No Snapshots for' +$snappath
        $Body = 'Snapshot Alert , No Snapshots for' +$snappath
        $SmtpClient.Send($from,$to,$title,$Body)  }

}

Storage basics , IOPS Penalty and RAID

Post from Yellow bricks showing the write penalty for various raid implementations
http://www.yellow-bricks.com/2009/12/23/iops/

Nice series of blogs to cement the basics ... again http://vmtoday.com/2009/12/storage-basics-part-i-intro/

High %costop values - no CPU contention - Poor performance


High %costop values - no CPU contention - Poor performance

Presented itself as general performance problems in a regional office , specifically one of the application servers was performing very poorly , with frequent application timeouts , exchange was going offline and VM's were becoming orphaned in vSphere , i logged in to a windows server and saw that the CPU was operating at 100% , in performance view in vsphere the server was consuming approximately 300Mhz , this behaviour was repeated on all other servers in the cluster.

In the DRS view i could see that the servers were receiving appx 10% of entitled resources, there were no limits or reservations set on any of the VM's , on examining the CPU counters in ESXTOP i found that all of the servers had extremely high %costop values (~80% - 90%) , this would normally be indicative of over committed CPU resources on SMP VM’s , as ESX throttles individual CPU’s to prevent skew when some CPU’s make progress and others are unable to due to being scheduled on other VM’s. In our case this could not have been the cause as we had more physical CPU’s than vCPU’s.

During the troubleshooting I noticed that we periodically had huge latencies on the storage system , sometimes spiking to 6 seconds , the first strange thing was that the latencies were within acceptable limits until the IOPS rose above 600 , the second strange point was that combined total of CIFS and NFS IOPS were rarely sustained above 400 IOPS.

This had me stumped until i discovered that the storage array had been populated with 7.2K SATA instead of the 15K disks which i expected , immediately i saw why we weren’t seeing at least double the number of IOPS before latencies ramped up , with 8 usable disks we should see 600 IOPS , instead of the 1200 we expected.

The second point was where were the mysterious extra IOPS coming from , after more investigation we found that the Netapp 2020 has hidden aggregate level snapshots which were tipping us well over the 600IOPS threshold , these hidden jobs were set to run every 3 hours , we rescheduled these to run outside of production hours.

The high costop value can be attributed to the fact that the vCPU has to wait for IO completion and as IO completion was taking an extended period of time , ESX was costopping the CPU’s leading to extremely poor performance

Friday, July 6, 2012

PS Script to move users to an OU

AD Functional Account cleanup , i created this script to move a list of users from multiple OUs to an OU
where they will be mass disabled.

import-CSV c:\testdisableduseraccounts.csv -Header @("Name") | foreach-object { Get-ADUser $_.Name | Move-ADObject -Targetpath "ou=temp disabled,ou=disabled user accounts,dc=DC,dc=DC,dc=domain,dc=com" }