This was really stupid (and easily fixed). I couldn’t move the resources for a clustered SQL server named instance. Turns out, I’d installed the default instance onto the node in question but not the named sequence I needed. Re-run setup on the node, install the named instance and, obviously, everything then worked.
Microsoft have had fun putting Easter eggs into lots of their products recently, tho’ they aren’t the type I’d normally want.
The latest is to do with SQL Server 2012 SP1. I had a 3-node Hyper-V cluster up and running quite happily, it was failing over as expected and so on. Then I read up on installing a new SQL cluster- seemed easy enough, prepare all your nodes and then finish the cluster (http://technet.microsoft.com/en-us/library/ms179530.aspx).
Until it gets to the SqlEngineDBStartConfigAction_completefailovercluster_configrc_Cpu64 process, at which point the whole things fails and you get a nice little error which reads “The following error has occurred: The transaction log of database ‘tempdb’ is full due to ‘NOTHING’.”
I found this page: http://www.sqlservercentral.com/Forums/Topic1463273-2799-1.aspx but it didn’t resolve the issues. What I did notice however, as that at some point the installer decided to move ownership of the SQL Server role over to node y, even though the cluster completion process was running on node x. Weird.
As is getting increasingly normal, no answer as of yet but I’ll keep this updated.
UPDATE 12/MAR/2014: I found this link- http://blogs.msdn.com/b/psssql/archive/2013/10/26/10453038.aspx– and this worked perfectly. So it had nothing to do with SQL after all- it was an AD issue in a SQL’s clothing.
Right. Having previously said how wonderful Microsoft clustering was, I did hit a bit of a wall in trying to cluster a couple of guests outside of the clustered roles on the hosts- i.e. my guests were not clustered virtual machines on the Server 2012 Hyper-V hosts, they were just normal Hyper-V guests.
The cluster- with node1 on host1– would create fine. Try adding node2 on host2… and the join would fail. Try creating the cluster with node1 on host1 and node2 on host2 and the cluster wouldn’t even create. Both scenarios reported a timeout. There’re very few solutions on the web about this, but eventually a TechNet social post pointed to this article: http://support.microsoft.com/kb/2872325 which works a treat. This took a good week or two to find, but once I’d un-bound the filtering protocol my cluster would create quite happily.
Out of interest, I re-bound the filtering protocol on the vEthernet ports on both hosts and tried re-creating the cluster with re-built guests running as Hyper-V clustered roles- still on different hosts- and this also worked instantly (I’d left the “Add all available storage” box ticked and- because it was the only iSCSI disk- picked it up and turned it into the quorum as part of the install).
I guess the answer is, just make all your VMs highly available…
Right. First of thanks to Microsoft for making clustering way easier. However, there’s very little info about Server 2012- most of what you can easily find is all about Server 2008/ 2008 R2, which is (obviously) quite outdated.
Anyway, start at the start. I’m not saying any of this is right- or recommended, by the book etc- but it works:
- When thinking about clustering Hyper-V, don’t get swayed by the Hyper-V side. You can’t just start clustering Hyper-V virtual machines. You need to start by clustering the Hyper-V hosts;
- To do this successfully, you need at least 4 networks: CSV (Cluster Shared Volume), Live Migration, LAN, & iSCSI (ideally paired against different network cards for resilience);
- You’ll need to install both Hyper-V & clustering roles on each host, AND the associated powershell cmdlet sets (for ease of use);
- This is important: I’m basing this on EqualLogic SANs with the Dell Host Integration Tools kit. This seems to be important, because (I’m guessing) if you use Microsoft’s own iSCSI software, they recommend using multiple iSCSI subnets. After completely rebuilding the network infrastructure to accommodate this, it turns out the HIT kit really doesn’t like this set up so I’ve reverted back to a single iSCSI VLAN + subnet. Complete pain but at least it’s working again. With the HIT kit installed, run iscsicpl.exe and switch to the “Dell EqualLogic MPIO” tab. You should have 4 connections against every disk, unless the host you’re looking at doesn’t own the “Cluster Group” group in which case you’ll have 2 connections for the quorum disk. This will increase to 4 connections as soon as that host owns the Quorum. You’ll only ever have 4 connections to the quorum on the host that owns the cluster group;
- To install clustering, PowerShell “Install-WindowsFeature Failover-Clustering” and “Install-WindowsFeature RSAT-Clustering-PowerShell” (I know you can chain these together);
- To create the cluster, open up “Failover Cluster Manager”, right-click “Failover Cluster Manager” and choose “Create Cluster…”;
- Follow the wizard through, for the time being only tick and assign an IP address to your preferred CSV/ Cluster and LAN networks;
- Do not tick “Add all available storage to the cluster”;
- Run the validation tool- hopefully everything is green-ticked;
- Finish creating the cluster;
- Once your cluster is created, right-click on “Disks” and choose “Add Storage”. At the next screen, pick just your Quorum disk;
- Once your Quorum disk shows up as “Available Storage”, right-click the cluster itself , choose “More Actions”, pick “Configure Cluster Quorum Settings…” and then at the next screen choose “Add or change the quorum witness” then choose your Quorum disk;
- Now you can add additional iSCSI LUNs as disks, right-click on them and “add to Cluster Shared Volumes” (or something similar);
- At this point rename all the disks and networks to something useful, otherwise you’ll never know what you’re pointing at;
- To configure the networks, make sure that:
- Your preferred CSV/ cluster network is accessible only to the cluster, not clients;
- Your preferred LAN network is accessible to the cluster AND clients;
- Your preferred LiveMigration network is accessible only to the cluster, not clients;
- Your preferred iSCSI network isn’t available to the cluster at all;
- At this point, right-click on the “Networks” group and choose “Live Migration Settings…”;
- De-select all networks apart from your preferred LiveMigration network”
- The networks are assigned-allegedly- according to this article. My experience is that this is right. With a bit of luck, if you now PowerShell “Get-ClusterNetwork -cluster XXXX | Sort-Object Metric | FT Name, Metric, AutoMetric” you’ll see that the CSV/ Cluster network has the lowest metric, followed by the LiveMigration network, then iSCSI, then LAN.
- This is the point at which you can start creating Hyper-V machines on those shared LUNs. So… instead of creating the machine on C:, or E: you create it on \\host\c$\ClusterStorage\VolumeX. This looks like local storage, but of course it’s just a mount point to an iSCSI LUN. Which is why it’s so easy to move virtual machines, because they don’t exist on any host;
- If you need your VMs to move with the cluster nodes, you need to add each VM as a resilient VM inside the clustering tool. This then renders the Hyper-V management tools ineffective against the chosen machines as they are now a cluster resource, not a Hyper-V machine;
- Also… you may get a lot of errors (event ID 1196) about failing to register the DNS name on some networks. I don’t know this for definite, but if you look at ipconfig all the private IP addresses have DNS servers set pointing to presumably imaginary IPv6 DNS addresses. I’m going to remove all the DNS servers from private ranges and see what happens.
- Okay… I think (hope) the way around the 1196 errors is to use “Set-DnsClient -InterfaceAlias X -RegisterThisConnectionsAddress $False. I’m hoping this stops it trying to register with the false DNS servers, as there appears to be no way of setting DNS servers to be blank.
Try starting up clussvc- fails in seconds, not 30,000 miliseconds as specified. Try running “clussvc.exe”- fails, says (if you look in the event log) that the account is missing required privileges. Check secpol.msc, everything looks fine. Check this link:
About mid-way down, it specifes that the account MUST! be part of local admin group. Check this. If the account isn’t there, re-add and the cluster service should re-start.