Dell Open Manage Server Administrator 7.3.0.1 fault

Hmmm… can’t guarantee this is 100% correct, but it looks like there’s a problems with the above software when running against a Windows 2008 SP2 cluster. The storage part of OMSA doesn’t seem to get on very well with Windows Cluster.

Up until recently, our cluster had been behaving fine- reboot 1 node, the other picked up etc. However yesterday I installed OMSA 7.3.0.1, and this morning our SQL cluster fell over, badly. Not only that, but the nodes were taking forever to return from a reboot.

We spent a good 2 hours looking at this to get it working- the nodes would take ages to return but then wouldn’t even re-join the cluster properly until we restarted the cluster service, which isn’t ideal.

The Windows server faults were:

“FailoverClustering 1146: The cluster resource host subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually due to a problem in a resource DLL. Please determine which resource DLL is causing the issue and report the problem to the resource vendor.”

and

“FailoverClustering 1230: Cluster resource ‘Cluster Disk 1’ (resource type ”, DLL ‘clusres.dll’) either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor.”

There were a few bits and bobs on the web about 3rd party software, so I disabled all the OMSA services on the inactive node and rebooted… which was vastly quicker and didn’t flag these errors. Not having OMSA at all seemed extreme, so I just removed the storage part, rebooted and again, it rebooted much quicker, didn’t throw these errors and joined the cluster without needing the cluster service restarting. Just note that we have the full OMSA on a 2012 core cluster and it seems fine, so far this seems to be pointing at 2008.

Advertisements

Windows Server 2003 Wuauclt command line

Finally- after a lot of misses, I think I’ve got a list of command-line switches for wuauclt.exe thanks to the SysInternals procexp.exe tool (although I don’t use all of them, the Sysinternals suite is just great…). Here goes:

/DetectNow
/ReportNow
/RunHandlerComServer
/RunStoreAsComServer
/ShowSettingsDialog
/ResetAuthorization
/ResetEulas
/ShowWU
/ShowWindowsUpdate
/CloseWindowsUpdate
/SelfUpdateManaged
/SelfUpdateUnmanaged
/UpdateNow
/ShowWUAutoScan
/ShowFeaturedUpdates
/ShowOptions
/ShowFeaturedOptInDialog
/DemoUI

If you’re wondering why I needed this, we still have a few Server 2003 boxes kicking around that don’t have the nice Updates UI that appeared with Server 2008. It’s funny, all I needed was for the server to tell me it needed a reboot to finish installing updates but I had to try a lot of the above switches before I got the right little yellow shield.

Rescue LogMeIn

On January the 21st 2014, LogMeIn sent me an email saying that LogMeIn would be unavailable permanently… from the 21st January. In case you missed that, they sent me the email on the same day the service became unavailable. No prior warning, it’s just gone.

When I actually logged on, it seems the “LogMeIn team” had made a mistake. That’s right, the generous people at LogMeIn had given me until the 4th February- a massive 2 weeks to sort out the computers I connect to. Thanks LogMeIn. But there’s always a silver lining- for only £29, I can buy LogMeIn Pro… for 2 computers. I need 5, so all of a sudden from free, LogMeIn is now costing me £90 a year.

Are they joking? They’ve not only withdrawn the service I was using with zero notice (by far the worst bit), but then expect me to fork out the best part of £100 a year for a service I didn’t really need. Just to put this into perspective, when Microsoft pulled ForeFront TMG they might have done so quietly but at least gave 2 years warning.

Did I jump for the credit card? Nope. I’ve deleted all LogMeIn accounts from my subscription and gone straight to TeamViewer. If the worst comes to the worst, I’ll set up ssh tunnels to the PCs I need and use VNC- that gives me all the remote control functions I need.

“Error: 0x800f0906\The source files could not be downloaded” when installing GUI on Windows Server 2012 Core.

You’ve installed Windows Server 2012 Core and then decide you need the GUI. With “Features on Demand”, this is easy! All you have to do is go into PowerShell, and type Install-WindowsFeature Server-Gui-Mgmt-Infra,Server-GUI-Shell.

Easy, that is, unless you’ve patched Windows Server in any way, shape or form because as soon as you update Windows Server, the install media is seen as out-of-date and there’s a good chance there will be something preventing the server contacting Microsoft Update (i.e. the internet site, NOT WSUS) directly. This is important, because the above command can ONLY use Microsoft Update, it won’t pay any attention to in-house WSUS installs.

Trying to resolve this issue got so annoying that I now have a case open with MS to try and fix it. I’ve tried this site, and it started crashing half way through the patching of updates onto the WIM file (I’d written a batch file to apply each update sequentially, there was no way I was going to manually install each update on the WIM).

My colleague suggested- and then found- the process outlined on this site. This didn’t work either: the (test) server loses its WSUS settings fine, but then I’m presuming either can’t find its way out through the core switch (by design, obviously, as nothing is allowed out directly through the firewall) or presumably can’t use a proxy server. And with the firewall admins unavailable, I can’t re-configure it to see what happens if it is allowed directly through. I tried sticking WireShark on this box to see what happens but all I got was vast amounts of ARP traffic (I didn’t filter it because I didn’t know where this traffic might be going or even what protocol it was using). Anyway, that’s beside the point: if the above command can’t cope with a proxy server, then we’d have to allow each affected server out directly through the firewall AFTER having applied the right Group Policy to it and then run gpupdate /force for the changes to take effect, which is a lot of effort to go to just to achieve something that’s supposedly “built-in”.

At the moment, I really don’t know what MS’s answer is going to be. I’ve run through a lot of tests which has only proven that the above behaviour works 100% of the time- you can install the GUI from the build media only if the server hasn’t been updated. The moment you patch the server this command stops working assuming your servers don’t have direct access to the internet. If MS could even somehow get our internal WSUS site to patch the WIM file as per the first link above that would be something, as at least we’d maintain a consistent set of patches. I’ve seen suggestions that you always install the GUI version of Windows to start with and then omit the -remove switch when going back to Core so that the GUI installer files remain on the server to get patched but this seems daft when the whole point of Server Core is that the installation payloads aren’t left lying around for a “hacker” to install.

Tesco value network monitor #2

Slightly modified version of the test-connection script. After a weekend-long server outage which resulted in a few thousand emails, I’ve built some logic in to count the number of failures a server has, and only send emails periodically. In testing, this script would generate an email about every 3 minutes against a list  of 50-odd devices. This bit of code- $numberDetector = $serverArray[$rewriteValue,1] / 10– and the subsequent If… -ge 5 code can be modified to increase/ decrease the email frequency.

It took me ages to work out the logic, but eventually I figured out how to create and manipulate the array at the core of the script. It’s pretty well documented in terms of explanation (that’s why it’s so long, really. Again, apologies for the non-tabbing of the code.

Of course, I could just use System Center OpsMan…

======================================================================================================================

#Tesco value server monitor 🙂
#Repeats a test-connection sequence until 01:00am. Then stops.
#Then starts up again at 03:00.
#This enables modification of the server list to be picked up,
#but also ensure false failures don’t get picked up when the
#servers are being automatically rebooted (01:00-03:00).

#Initialise a group of variables[

#This will be used to populate the array of servers initially and also
#to increment a counter to keep track of the number of test-connection failures
[int]$counter = 0

#This will be used to provide the index to each server in the array incrementally
[int]$index = 0

#This will be used to find the count value when a specific index is found.
#This is important; the count will always be one higher than the array index.
[int]$indexFound = 0

#This will be used to re-wrtie the array element.
#It will always be one lower than $indexFound, because the count will start at 1 whereas arrays start from 0
[int]$rewriteValue = 0

#This is a string with the list of servers in it
[string]$serverList = get-content c:\Users\xxxx\desktop\pingList.txt

#This is an array of servers, comprising cells in a (number of servers) x 3 matrix.
#Each server has: an index, failure counter number and name.
#The array takes its vertical size dimension from the number of items in $serverArray, so it will expand as needed
$serverArray = New-Object ‘object[,]’ $serverList.Count,3

#This loop identifies each individual server from the $serverList string, and
#Populates each element of the array in sequence. Arrays always count from 0, so a 5-row array will contain elements 0-4.
#This is a 2-dimensional array, so in the below example the [$index part refers to the “row” of the array. the ,x] part refers to the “column” part.
#This means the first server in the list will have element 0 (index) set to 0, element 1 (counter) to 0 and element 3 (name) to the server name
Foreach($server in $serverList)
{
#These lines write:
#the value of $index to $serverArray row $index, element 0. This will be used to provide a unique index number for the array row.
#the value of $counter to $serverArray row $index, element 1. This will be use to keep track of the number of test-connection failures.
#the value of $server to $serverArray row $index, element 2. This will be used to store the server name
$serverArray[$index,0] = $index
$serverArray[$index,1] = $counter
$serverArray[$index,2] = $server

#increments the $index variable by 1
$index++
}

#Entering Do loop
do {

#Sets $theTime variable to the current time.
$theTime = Get-Date
#Sets $now variable to the hour.
$now = $theTime.Hour

#Reads from a list of servers
get-content c:\Users\xxxx\desktop\pingList.txt | Foreach-Object {

#Sets $ComputerName variable to the most recent item ($_) in memory
$computerName=$_

#Sets $Result variable to output of test-connection
$Result = test-connection $_ -Quiet

#The purpose of this loop is to stop the failed counter of a server remaining
#at the threshold level even after it’s started passing the test-connection process.
#Without this, a server would gradully build up enough failed test-connections to trigger the email alerts.
#If $Result is true (i.e. this machine can connect to the remote server),
#Sets the value of the “counter” element of the given array row to be 0.
#It also writes some stuff to screen, but this is for testing really.
If($result -eq $true)
{
#These lines are commented out, for testing only
#Write-Host “”
#Write-Host $_ “The server is fine”

#This sets $indexFound to the value of the count at which the server name was found.
#It looks for the array row which contains a match for $computerName- i.e. the server name.
#It then finds how many rows it’s read in order to get there.
#This will be one higher than the actual array index.
$indexFound = $serverArray -match $computerName | foreach { ‘{0}’ -f $_.ReadCount}

#This sets the $rewriteValue variable to the value of $indexFound -1 so that we get the correct array row.
$rewriteValue = $indexFound – 1

#This sets the 2nd element of the array row containing the server name to be 0.
#Otherwise, the server would gradually build up enough missed test-connections to trigger the alert emails,
#even tho’ it had only been through a few routine reboots, for example.
$serverArray[$rewriteValue,1] = 0

}
#The purpose of this loop is to deal with failed test-connection processes.
#
ElseIf ($Result -eq $False)
{
#These lines are commented out, for testing only
#Write-Host “”
#Write-Host $_ “Failed”

#This sets $indexFound to the value of the count at which the server name was found.
#It looks for the array row which contains a match for $computerName- i.e. the server name.
#It then finds how many rows it’s read in order to get there.
#This will be one higher than the actual array index.
$indexFound = $serverArray -match $computerName | foreach { ‘{0}’ -f $_.ReadCount}

#This sets the $rewriteValue variable to the value of $indexFound -1 so that we get the correct array row.
$rewriteValue = $indexFound – 1

#This sets the 2nd element of the array row containing the server name to be itself plus .
#Otherwise, the server would gradually build up enough missed test-connections to trigger the alert emails,
#even tho’ it had only been through a few routine reboots.
$serverArray[$rewriteValue,1] = $serverArray[$rewriteValue,1] + $counter

#This variable is purely to slow down the number of emails sent.
#It divides the number of failures by 10, which will either give an Int32 (whole number)
#or a Double (number with decimal places). This is used by the If loop below.
$numberDetector = $serverArray[$rewriteValue,1] / 10

#These lines just output stuff to screen for testing, so are commented out.
$serverArray[$rewriteValue,1]
#$numberDetector.GetType()

#The purpose of this loop is to send emails, but only if the $numberDetector variable is in an Int32 state.
#This only happens on every 10th iteration of the failed test-connection attempts, so will slow the generation of emails down significantly.
If(($serverArray[$rewriteValue,1] -ge 5) -and ($numberDetector.GetType().Name -eq ‘Int32′))
{
#Define hub transport server
$smtp_server = “yourmailserver.yourdomain.com”

#Define email sender and recipient
$sender = “GI Joe <g.i.joe@yourdomain.com>”
$recipient = “A N Other <a.n.other@yourdomain.com>”

#Define email subject and body
$msg_subject = “Important! Server $computerName is not responding to ping requests”
$msg_body_text = “Server $computerName has failed to respond to numerous ping requests. Please investigate urgently.The script that generated this alert is \\someserver\support\scripts\ping.ps1”

#Send it
Send-MailMessage -to $recipient -from $sender -subject $msg_subject -body $msg_body_text -smtpserver $smtp_server
}
$counter++
}
}

#As soon as the script hits 01:00am, this script will stop.
} While ($now -ne ’01’)

Windows Server 2012 Hyper-V guest clustering

Right. Having previously said how wonderful Microsoft clustering was, I did hit a bit of a wall in trying to cluster a couple of guests outside of the clustered roles on the hosts- i.e. my guests were not clustered virtual machines on the Server 2012 Hyper-V hosts, they were just normal Hyper-V guests.

The cluster- with node1 on host1– would create fine. Try adding node2 on host2… and the join would fail. Try creating the cluster with node1 on host1 and node2 on host2 and the cluster wouldn’t even create. Both scenarios reported a timeout. There’re very few solutions on the web about this, but eventually a TechNet social post pointed to this article: http://support.microsoft.com/kb/2872325 which works a treat. This took a good week or two to find, but once I’d un-bound the filtering protocol my cluster would create quite happily.

Out of interest, I re-bound the filtering protocol on the vEthernet ports on both hosts and tried re-creating the cluster with re-built guests running as Hyper-V clustered roles- still on different hosts- and this also worked instantly (I’d left the “Add all available storage” box ticked and- because it was the only iSCSI disk- picked it up and turned it into the quorum as part of the install).

I guess the answer is, just make all your VMs highly available…