I read that in addition to local storage for the OS each node requires local storage for tempdb. Is that correct? (So tempdb can't be installed on a SAN?)
TIA,
edm2
I read that in addition to local storage for the OS each node requires local storage for tempdb. Is that correct? (So tempdb can't be installed on a SAN?)
TIA,
edm2
When setting up a two nodes, with sql server installed we want them to be, initially, identical.
What is the "best" way to achieve this from the OS level? (If the servers are VMs I think we could use a template)
What is the "best" way to achieve this from the SQL perspective? (I know if I install Sql Server manually it will generate a script capturing my install settings. I could try to apply that script to all nodes or wonder if a "Powershell" script is better).
TIA,
edm2
I've read that most apps are probably not cluster aware. What makes an app cluster aware? IOW what new properties does it posess after it is made "cluster aware". (A URL for developers would be useful.)
TIA,
edm2
(Newbie) Setup: Two node cluster, active\passive.
After installing software on the active node will it (or any portion, e.g. config files) be automatically pushed to the passive node and kept in sync? Or am I required to install the same software on the passive node myself?
TIA,
edm2
Hi,
New to 2012 and implementing a clustered environment for our File Services role. Have got to a point where I have successfully configured the Shadow copy settings.
Have a large (15tb) disk. S:
Have a VSS drive (volume shadow copy drive) V:
Have successfully configured through Windows Explorer the Shadow copy settings.
Created dependencies in Failcover Cluster Server console whereby S: depends on V:
However, when I failover the resource and browse the Client Access Point share there are no entries under the "Previous Versions" tab.
When I visit the S: drive in windows explorer and open the Shadow copy dialogue box, there are entries showing the times and dates of the shadow copies ran when on the original node. So the disk knows about the shadow copies that were ran on the original node but the "previous versions" tab has no entries to display.
This is in a 2012 server (NOT R2 version).
Can anyone explain what might be the reason? Do I have an "issue" or is this by design?
All help apprecieated!
Kathy
Kathleen Hayhurst Senior IT Support Analyst
Hi, we recently experienced the above issue and after looking for explanations I haven't been able to find any satisfying answers when other people have posted this issue.
Our problem is as follows:
2 node 2008R2 cluster running SQL 2012
Each node is a HP BL460c running in a HP C7000 Blade Chassis.
We were updating the flexfabric cards on one of the chassis. The other chassis had been patched the previous week with no problems.
During the update process the flexfabric cards, which hold the Ethernet and FC connections, reboot so before work had begun all active cluster services had been failed over to the node in the chassis not being worked on. However despite this the cluster service shut down on this one particular cluster. All other clusters running across these 2 chassis continued to run as expected.
As other people have posted before we saw the following errors in the system log.
1564: File share witness resource 'File Share Witness' failed to arbitrate for the file share
1069: Cluster resource 'File Share Witness' in clustered service or application 'Cluster Group' failed.
1172: The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
However we cant understand what could cause this to happen when the service is running on the node in the chassis not being updated, especially when the same update was performed the week before with no issues. How can both nodes lose connectivity to the File Share Witness at the same time?
Cluster Validation tests run fine and don't highlight any issues. The file share witness is accessible from both servers.
Hi all;
Please read the following question:
Your network contains an Active Directory domain named contoso.com. The domain contains two member servers named Server1 and Server2. All servers run Windows Server 2012 R2.
Server1 and Server2 have the Failover Clustering feature installed. The servers are configured as nodes in a failover cluster named Cluster1. You configure File Services and DHCP as clustered resources for Cluster1. Server1 is the active node for
both clustered resources.
You need to ensure that if two consecutive heartbeat messages are missed between Server1 and Server2, Server2 will begin responding to DHCP requests. The solution must ensure that Server1 remains the active node for the File Services clustered resource
for up to five missed heartbeat messages. What should you configure?
Thanks
Please VOTE as HELPFUL if the post helps you and remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
I'm having an issue with Windows Failover Cluster with a Windows Server 2012 R2 machine. I have two cluster nodes (nodeA and nodeB). My issue is that when nodeA is the owner node, and I open failover cluster manager <clusterName> >> roles >> <fileserver role> >> shares tab it will hang and say that it is loading, but this will occur infinitely. Although when I go to nodeB (not the owner node) and I go to shares it will show me all of the shares that I have. Next when I go to <clusterName> >> Nodes >> click on Roles tab the information says "There were errors retrieving file shares."
Now when I switch the nobeB to the owner node, I cannot view the shares on that machine but can now view them on nodeA.
We alse have a test network where I have recreated the machines, environment and the failover cluster to as close as the production network as I can except everything works great in the test network
we have a two node cluster
we have volumes configurred on both the clusters.we are using EVA san storage.
Data(C:\ClusterStorage\Volume1)
Logs(C:\ClusterStorage\Volume2)
i am able to change/move Clustered disk drives from owner node A to Owner node B.I cannot see the clustered drives on both active/passive nodes.
i am also not able to view the volumes on node b in the windows explorer.Can someone please tell me how to look at the volumes once we move between the drives on the cluster.
Thank you
lucky
Hi,
After renamed file share cluster from HKNASS to NASTEST, browse NASTESET shown empty.
I have updated DNS record and possible to ping it.
Is anyting missed ??
Thanks
Hi *.*
I'm experiencing a weird problem.
I've a Fujitsu CX420 S1, a sort of 2 blades server with a shared SAS controller, 4 900Gb SAS disks and 2 200GB SSD disks.
Installed 2012 R2 Std (the server is certified 2012 R2) on both blades, enabled Hyper-V role, configured SS, created a quorum volume without tiering, created a cluster, created a tiered volume, added to CSV, created a VM on it.
If the CSV is, for example on node1 and VM is on the same node, everything works at full speed (200MB/sec write & 300MB/sec read).
If I move the CSV on the opposite node the speed drop to near zero (600Byte/sec write & 20MB/sec read)
It looks like that the CSV is working always in redirected mode and using the HB for passing traffic but not even at 1GBit/sec
Please help!
I'm available for further info, just I'm running out of time to solve the problem (I've to deliver this cluster) before to fall back to the old method of a volume for every VM (no CSV).
Thanks,
Alessio
hello
Regarding VMQ i read the below link
http://technet.microsoft.com/en-us/library/gg162704(v=ws.10).aspx
i have some questions :
why we can using vmq ?
how can i know my hardware NIC supporting VMQ ?
Thanks
A Windows Cluster employs a "quorum" disk which is essentially a log file used to record any changes made to the active node (so they can be pushed to the passive node if required). But I also read that the "quorum" can cast a vote to determine if the cluster remains running. A log file casting a vote? Can you please clarify this?
TIA,
edm2
Been having some issues with nodes basically dropping out of clusters config.
Error showing was
"Cluster Shared Volume 'Volume1' ('Data') has entered a paused state because of '(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished."
All nodes (Poweredge 420) connected a Dell MD3200 shared SAS storage.
Nodes point to Virtual 2012 R2 DC's
Upon running validation with just two nodes, get the same errors over and over again.
Bemused!
----------------
List Software Updates
Description: List software updates that have been applied on each node.
An error occurred while executing the test.
An error occurred while getting information about the software updates installed on the nodes.
One or more errors occurred.
Creating an instance of the COM component with CLSID {4142DD5D-3472-4370-8641-D
and
List Disks
Description: List all disks visible to one or more nodes. If a subset of disks is specified for validation, list only disks in the subset.
An error occurred while executing the test.
Storage cannot be validated at this time. Node 'zhyperv2.KISLNET.LOCAL' could not be initialized for validation testing. Possible causes for this are that another validation test is being run from another management client, or a previous validation test was
unexpectedly terminated. If a previous validation test was unexpectedly terminated, the best corrective action is to restart the node and try again.
Access is denied
-----------
The event viewer on one of the hosts shows
-------------
Cluster node 'zhyperv2' lost communication with cluster node 'zhyperv1'. Network communication was reestablished. This could be due to communication temporarily being blocked by a firewall or connection security policy update. If the problem persists
and network communication are not reestablished, the cluster service on one or more nodes will stop. If that happens, run the Validate a Configuration wizard to check your network configuration. Additionally, check for hardware or software errors related
to the network adapters on this node, and check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected
such as hubs, switches, or bridges.
Only other warning is because the 4 nic ports in each node server are teamed on one ip address split over two switches - I am not concernd about this and could if required split then pairs, I think this is a red herring????
I had an issue sometime back where I was running validations on my Server 2008 R2 (RTM) Hyper-V cluster with an EMC Fibre Channel SAN. While running validations, one of my CSV LUNs came back with a damaged MBR and all data was seemingly lost. A call into Microsoft got the MBR restored, and all was fine, but I was down for the majority of a day.
My question is simply, was this a fluke experience, or is there a proper procedure for validating a Hyper-V cluster. I am now running SP1 on all nodes, and I am preparing to join another node to my cluster. I would like to perform all validation tests before I proceed.
We have 2-nodes MSSQL 2012 failover cluster.
After rebooting all cluster nodes cluster disks are in "reserved" state.
I can to bring online cluster disk manually, but want to do it automatically.
Is it safe to switch SAN Policy from "Offline Shared" to "OnlineAll" ?
And the same question with Windows2008R2 Hyper-V cluster (with cluster shared volume). We want to change SAN policy to "online all".
Best regards, Alexander
SQL clustering
Trying to troubleshot the sequence of events of an outage on a 2-node 2008 R2 MSCS based cluster (we have an IP address and SQL Server instance clustered). Will refer to nodes as NODE05 and NODE06.
Both nodes are running on VMware ESXi 5.x with their database and quorum disks attached via VMware RDM to an IBM XIV via fiber channel. NODE06's RDM is set up to use fixed-path addressing while NODE05's RDM is (incorrectly) set to use round-robin multipath (working to correct this). Each node is running on a different blade center. There is a private heartbeat network.
At the beginning of this event, NODE05 is primary.
At 22:08:20, both nodes report that NODE05 was removed from active failover cluster membership. 23 seconds later, both nodes report that disks were 'unexpectedly lost' by the respective node. These errors continue on for more than two minutes
before things seem to come back online.
Our investigation shows that, at least from the ESX perspective, connectivity to the SAN LUNs was not lost. SAN monitoring also shows nothing "dropping". In addition, I'm not seeing anything in the OS/System event logs indicating the
storage was lost -- the disk errors show up only in the cluster logs. So I'm not thinking that some sort of SAN disruption was the trigger for this event, but want to ensure that theory fits with how a 2-node MSCS cluster functions.
I'm theorizing that the node remove event (possibly triggered by a network disruption) that occurred first may have triggered SCSI-3 based "fencing" which would have resulted in the disks appearing unavailable on both nodes even though the SAN was still up. However, my understanding is that the SCSI reservation requests subsequent to the SCSI reset that occurs in a "split" like this happens at staggered intervals (three seconds for the "primary" node and seven seconds for"challenger" nodes) which really should be resolved fairly quickly -- not the 2+ minutes we saw.
Can someone confirm that I'm on the right track with my thinking? Or possibly describe how a typical failure scenario would play out if the heartbeat network was disrupted for a period of time?
I have a new 2 Node Hyper-V Cluster setup. Cluster Aware Updating has been setup as well.
For another reason I needed to reboot one of the nodes and performed a Pause > Drain Roles.
On this node there is 1 VM that cannot be migrated. This is due to the 2nd disk on that VM not being on shared storage (not possible to currently move to shared storage [6TB].
Is there a way to properly exclude the VM and have it shut down/save instead? Or do I need to manually shut down the VM and then perform Drain Roles?
Hi everyone!
There is a 5-node SQL Server 2012 failover cluster based on Windows server 2012 Datacenter and built on IBM
Bladecenter HS23 type 7875. Cluster nodes are using SAN-boot from IBM Storwize v3700 and LUN's
from IBM Storwize v7000.
Periodically on different nodes of the cluster appears an error vent ID 1073 The Cluster service
was halted to prevent an inconsistency within the failover cluster. The error code was '668', and Event
ID 7031 The Cluster Service service terminated unexpectedly. It has done this 1 time(s). TThe
following corrective action will be taken in 60000 milliseconds: Restart the service и Event ID 7024 The Cluster Service service terminated with the following service-specific error: An assertion failure has occurred. After these errors have appeared cluster
node hangs in "joining" state and the same happens to all nodes that will be rebooted or turned off, and all operations I try to preform on cluster(stopping cluster service, pause, evict, etc) are failling. Cluster returns to normal state only after
all of its node are rebooted. Here's is the piece of cluster log at the time the error occurred:
00000b4c.00000c7c::2014/04/21-03:32:25.939 INFO [VSS] Backing up part of the system state [VSS]
OnPrepareBackup: starting new session dfb4fbf0-db28-40d2-af3a-82e66a271267
00000b4c.00000c7c::2014/04/21-03:32:25.939 INFO [VSS] OnPrepareBackup returning - true
00000b4c.00001194::2014/04/21-03:32:26.704 INFO [GUM] Node 7: Processing RequestLock 4:4744
00000b4c.00001198::2014/04/21-03:32:26.704 INFO [GUM] Node 7: Processing GrantLock to 4 (sent by 3
gumid: 11271)
00000b4c.00000e2c::2014/04/21-03:32:26.704 ERR mscs::GumAgent::ExecuteQueuedUpdate: TransactionInProgress(5918)'
because of 'Cannot restart an in-progress transaction'
00000b4c.00001194::2014/04/21-03:32:26.719 ERR Failed type check .?AUBoxedNodeSet@mscs@@
00000b4c.00001194::2014/04/21-03:32:26.719 ERR [CORE] mscs::ClusterCore::DeliverMessage: TypeMismatch(1629)'
because of 'failed type check'
00000b4c.00000e2c::2014/04/21-03:32:26.750 INFO [VSS] HandleBackupGum - Initiating the backup
00000b4c.00000e2c::2014/04/21-03:32:26.750 INFO [VSS] HandleOnFreezeGum - Stopping the Death Timer
00000b4c.00000e2c::2014/04/21-03:32:26.750 INFO [VSS] HandleBackupGum - Completed the backup Request
00000b4c.00000e2c::2014/04/21-03:32:26.750 ERR [GUM] Node 7: sequenceNumber + 1 == payload->GumId
(5129, 11272)
00000b4c.00000e2c::2014/04/21-03:32:26.750 ERR mscs::GumAgent::ExecuteQueuedUpdate: AssertionFailed(668)'
because of 'failed assertion'(sequenceNumber + 1 == payload->GumId is false)
00000b4c.00000e2c::2014/04/21-03:32:26.750 ERR GumHandler failed (status = 668)
00000b4c.00000e2c::2014/04/21-03:32:26.750 ERR GumHandler failed (status = 668), executing OnStop
00000b4c.00000e2c::2014/04/21-03:32:26.750 INFO [DM]: Shutting down, so unloading the cluster database.
00000b4c.00000e2c::2014/04/21-03:32:26.750 INFO [DM] Shutting down, so unloading the cluster database
(waitForLock: false).
00000b4c.00000e2c::2014/04/21-03:32:26.813 ERR FatalError is Calling Exit Process.
00000b4c.00000b50::2014/04/21-03:32:26.813 INFO [CS] About to exit process...
000015d0.000015d4::2014/04/21-03:32:26.828 WARN [RHS] Cluster service has terminated.
00001618.0000161c::2014/04/21-03:32:26.828 WARN [RHS] Cluster service has terminated.
00001588.0000158c::2014/04/21-03:32:26.828 WARN [RHS] Cluster service has terminated.
000015f4.000015f8::2014/04/21-03:32:26.828 WARN [RHS] Cluster service has terminated.
All of the reccommeded failover cluster updates and hotfixes are installed and the cluster is validated.