Quantcast
Channel: Kevin Holman's System Center Blog
Viewing all 349 articles
Browse latest View live

Agents on Windows 2012 R2 Domain Controllers can stop responding or heart-beating

$
0
0

 

This is an issue I have been tracking for some time.  When you deploy SCOM 2012 Agents on Windows Server 2012 R2 Domain Controllers, it is possible for the agents to stop responding and or sending heartbeats.  The agent services will still be running.  You will see the events in the OpsMgr event log stop processing, and you might see heartbeat failures as well.  I have personally seen this on both of my Windows 2012 R2 domain controllers, which also run DNS and DHCP, and have the AD, DNS, and DHCP management packs imported.

This is caused by an issue in the Server OS (Windows Server 2012 R2), which is outlined at http://support.microsoft.com/kb/2923126

There is a hotfix, which addresses the issue, which is included in the Feb 2014 update rollup hotfix:  http://support.microsoft.com/kb/2919394

I recommend you consider deploying this update if you are deploying DC’s on Windows Server 2012 R2.  This issue could potentially affect any server running Windows Server 2012 R2 operating system, I have just experienced it on DC’s thus far.

If you use Windows Update – this hotfix is listed as “Optional”:

image


Create a script based monitor for the existence of a file, with recovery to copy file

$
0
0

 

This is going to be an example of making a two state monitor to check for the existence of a file on an agent managed server.

If the file does not exist, we can run a recovery to copy the file there from a network location.

 

The context of this example is for a scenario, where we expect the OOMADS.MSI file (Active Directory Helper Objects) to be placed in a specific directoy on the agent:   C:\Program Files\Microsoft Monitoring Agent\Agent\HelperObjects

However, if the agent is manually installed, we do not copy this file, it is copied only if the agent is pushed from the SCOM console.  This might leave several active directory domain controllers without these necessary scripting objects deployed.  Marnix wrote about this here:  http://thoughtsonopsmgr.blogspot.com/2010/10/eventid-10-active-directory-helper.html

The AD management pack will automatically deploy this MSI if and when it is needed, but the MP expects the OOMADS.MSI to be in that specific directly above.  Hence, this example.  Smile

We will start with a simple monitor example, which will use the Microsoft.Windows.TimedScript.TwoStateMonitorType.  This monitor will run a VBscript, which we adapted from Pete Zerger’s script at http://www.systemcentercentral.com/opsmgr-creating-a-monitor-to-determine-if-a-file-exists-sample-script-and-tutorial/

Here is my customized version of Pete’s script:

'=========================================================================='' VBScript Source File -- Created with SAPIEN Technologies PrimalScript 2009'' NAME: DoesFileExist'' AUTHOR: Pete Zerger, MVP (Cloud and Datacenter Admin)' DATE : 3/12/2012'' COMMENT: Verifies a target file (including path) exists. ' Intended for use with OpsMgr two state script monitor.''========================================================================== OPTION EXPLICIT Call Main Sub Main()'Declare Variables 'File-related variables Dim fso, folder, file, FilePath'OpsMgr related variables Dim oArgs, oAPI, oBag Set oArgs = Wscript.Arguments' Retrieve parametersfolder = CStr(oArgs.Item(0)) file = CStr(oArgs.Item(1)) FilePath = folder &"\" & fileWScript.Echo folder WScript.echo file WScript.Echo FilePath ' Instantiate File System ObjectSet fso = CreateObject("Scripting.FileSystemObject")' Instantiate MOM APISet oAPI = CreateObject("MOM.ScriptAPI") Set oBag = oAPI.CreatePropertyBag()' Verify the path to the file exists xistsIf (fso.FolderExists(folder)) Then'Folder exists, submit property bag and continue Call oBag.AddValue("FolderExists","Yes") WScript.Echo "Folder exists" Else 'Folder does not exist, submit property bag and exit Call oBag.AddValue("FolderExists","No") Call oBag.AddValue("FileExists","No") WScript.Echo "Folder doesn't exist" oAPI.AddItem(oBag) Call oAPI.ReturnItems Exit Sub End If ' Verify the file exists If (fso.FileExists(FilePath)) Then'File exists, submit property bag and exit Call oBag.AddValue("FileExists","Yes") oAPI.AddItem(oBag) Call oAPI.ReturnItems Else 'File does not exist, submit property bag and exit Call oBag.AddValue("FileExists","No") WScript.Echo "File doesn't exist" oAPI.AddItem(oBag) Call oAPI.ReturnItems Exit Sub End If End Sub

I will pass two parameters to this script – the directory and the filename I am looking for.  This is a very reuseable monitor for other purposes:

image

 

I will create an expression for unhealthy, stating that if either the folder or file is missing, this is bad:

image

And expression for healthy requires BOTH to exist:

image

That part is quite simple – and will generate an alert with context:

image

Next, I will add a recovery task.  This recovery will do two things.  FIRST, it will include a condition detection, to ensure that for all the “unhealthy” monitors, we will only attempt a recovery action IF the required folder is present, just the file is missing.  Then, it will call another VBscript which will copy the file locally for us.

The condition detection is a simple expression:

image

I could not find any good examples on the web for how to pass a property from a monitor, to a condition detection in a Recovery.  There are lots of examples of passing the monitor property directly to a recovery script as a parameter, but none for a system.expression filter in a condition detection.  The hardest part of this was figuring out the syntax for comparing the output of the variable property from the monitor: 

StateChange/DataItem/Context/DataItem/Property[@Name='FolderExists']

Here is my recovery script… which is VERY basic:

Dim oAPI, fso Set oAPI = CreateObject("MOM.ScriptAPI") Call oAPI.LogScriptEvent("CopyFile.vbs",6002,0,"Starting copyfile script") set fso=CreateObject("Scripting.FileSystemObject") fso.CopyFile "\\scom01\AgentStuff\amd64\oomads.msi", "C:\Program Files\Microsoft Monitoring Agent\Agent\HelperObjects\"WScript.Quit

Now that we have that, putting it all together in a management pack is quite simple.

Here is the entire XML….

<?xml version="1.0" encoding="utf-8"?><ManagementPack ContentReadable="true" SchemaVersion="2.0" OriginalSchemaVersion="1.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><Manifest><Identity><ID>Demo.FileExists.Monitor</ID><Version>1.0.0.3</Version></Identity><Name>Demo.FileExists.Monitor</Name><References><Reference Alias="Windows"><ID>Microsoft.Windows.Library</ID><Version>7.5.8501.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference><Reference Alias="System"><ID>System.Library</ID><Version>7.5.8501.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference><Reference Alias="SC"><ID>Microsoft.SystemCenter.Library</ID><Version>7.0.8433.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference><Reference Alias="Health"><ID>System.Health.Library</ID><Version>7.0.8433.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference></References></Manifest><Monitoring><Monitors><UnitMonitor ID="Demo.FileExists.Monitor.FileExists" Accessibility="Internal" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ParentMonitorID="Health!System.Health.ConfigurationState" Remotable="true" Priority="Normal" TypeID="Windows!Microsoft.Windows.TimedScript.TwoStateMonitorType" ConfirmDelivery="false"><Category>AvailabilityHealth</Category><AlertSettings AlertMessage="Demo.FileExists.Monitor.FileExists_AlertMessageResourceID"><AlertOnState>Warning</AlertOnState><AutoResolve>true</AutoResolve><AlertPriority>Normal</AlertPriority><AlertSeverity>Information</AlertSeverity><AlertParameters><AlertParameter1>$Data/Context/Property[@Name='FolderExists']$</AlertParameter1><AlertParameter2>$Data/Context/Property[@Name='FileExists']$</AlertParameter2></AlertParameters></AlertSettings><OperationalStates><OperationalState ID="Success" MonitorTypeStateID="Success" HealthState="Success"/><OperationalState ID="Error" MonitorTypeStateID="Error" HealthState="Warning"/></OperationalStates><Configuration><IntervalSeconds>60</IntervalSeconds><SyncTime /><ScriptName>fileexists.vbs</ScriptName><Arguments>"C:\Program Files\Microsoft Monitoring Agent\Agent\HelperObjects""OomADs.msi"</Arguments><ScriptBody>'=========================================================================='' VBScript Source File -- Created with SAPIEN Technologies PrimalScript 2009'' NAME: DoesFileExist'' AUTHOR: Pete Zerger, MVP (Cloud and Datacenter Admin)' DATE : 3/12/2012'' COMMENT: Verifies a target file (including path) exists. ' Intended for use with OpsMgr two state script monitor.''========================================================================== OPTION EXPLICIT Call Main Sub Main()'Declare Variables 'File-related variables Dim fso, folder, file, FilePath'OpsMgr related variables Dim oArgs, oAPI, oBag Set oArgs = Wscript.Arguments' Retrieve parametersfolder = CStr(oArgs.Item(0)) file = CStr(oArgs.Item(1)) FilePath = folder &amp; "\" &amp; fileWScript.Echo folder WScript.echo file WScript.Echo FilePath ' Instantiate File System ObjectSet fso = CreateObject("Scripting.FileSystemObject")' Instantiate MOM APISet oAPI = CreateObject("MOM.ScriptAPI") Set oBag = oAPI.CreatePropertyBag()' Verify the path to the file exists xistsIf (fso.FolderExists(folder)) Then'Folder exists, submit property bag and continue Call oBag.AddValue("FolderExists","Yes") WScript.Echo "Folder exists" Else 'Folder does not exist, submit property bag and exit Call oBag.AddValue("FolderExists","No") Call oBag.AddValue("FileExists","No") WScript.Echo "Folder doesn't exist" oAPI.AddItem(oBag) Call oAPI.ReturnItems Exit Sub End If ' Verify the file exists If (fso.FileExists(FilePath)) Then'File exists, submit property bag and exit Call oBag.AddValue("FileExists","Yes") oAPI.AddItem(oBag) Call oAPI.ReturnItems Else 'File does not exist, submit property bag and exit Call oBag.AddValue("FileExists","No") WScript.Echo "File doesn't exist" oAPI.AddItem(oBag) Call oAPI.ReturnItems Exit Sub End If End Sub</ScriptBody><TimeoutSeconds>30</TimeoutSeconds><ErrorExpression><Or><Expression><SimpleExpression><ValueExpression><XPathQuery Type="String">Property[@Name='FolderExists']</XPathQuery></ValueExpression><Operator>Equal</Operator><ValueExpression><Value Type="String">No</Value></ValueExpression></SimpleExpression></Expression><Expression><SimpleExpression><ValueExpression><XPathQuery Type="String">Property[@Name='FileExists']</XPathQuery></ValueExpression><Operator>Equal</Operator><ValueExpression><Value Type="String">No</Value></ValueExpression></SimpleExpression></Expression></Or></ErrorExpression><SuccessExpression><And><Expression><SimpleExpression><ValueExpression><XPathQuery Type="String">Property[@Name='FolderExists']</XPathQuery></ValueExpression><Operator>Equal</Operator><ValueExpression><Value Type="String">Yes</Value></ValueExpression></SimpleExpression></Expression><Expression><SimpleExpression><ValueExpression><XPathQuery Type="String">Property[@Name='FileExists']</XPathQuery></ValueExpression><Operator>Equal</Operator><ValueExpression><Value Type="String">Yes</Value></ValueExpression></SimpleExpression></Expression></And></SuccessExpression></Configuration></UnitMonitor></Monitors><Recoveries><Recovery ID="Demo.FileExists.Monitor.CopyFile" Accessibility="Internal" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" Monitor="Demo.FileExists.Monitor.FileExists" ResetMonitor="false" ExecuteOnState="Warning" Remotable="true" Timeout="300"><Category>Custom</Category><ConditionDetection ID="CD" TypeID="System!System.ExpressionFilter"><Expression><SimpleExpression><ValueExpression><XPathQuery Type="String">StateChange/DataItem/Context/DataItem/Property[@Name='FolderExists']</XPathQuery></ValueExpression><Operator>Equal</Operator><ValueExpression><Value Type="String">Yes</Value></ValueExpression></SimpleExpression></Expression></ConditionDetection><WriteAction ID="SWA" TypeID="Windows!Microsoft.Windows.ScriptWriteAction"><ScriptName>CopyFile.vbs</ScriptName><Arguments /><ScriptBody>Dim oAPI, fso Set oAPI = CreateObject("MOM.ScriptAPI") Call oAPI.LogScriptEvent("CopyFile.vbs",6002,0,"Starting copyfile script") set fso=CreateObject("Scripting.FileSystemObject") fso.CopyFile "\\scom01\AgentStuff\amd64\oomads.msi", "C:\Program Files\Microsoft Monitoring Agent\Agent\HelperObjects\"WScript.Quit</ScriptBody><TimeoutSeconds>60</TimeoutSeconds></WriteAction></Recovery></Recoveries></Monitoring><Presentation><StringResources><StringResource ID="Demo.FileExists.Monitor.FileExists_AlertMessageResourceID"/></StringResources></Presentation><LanguagePacks><LanguagePack ID="ENU" IsDefault="true"><DisplayStrings><DisplayString ElementID="Demo.FileExists.Monitor"><Name>Demo File Exists Monitor MP</Name></DisplayString><DisplayString ElementID="Demo.FileExists.Monitor.CopyFile"><Name>Copy File</Name></DisplayString><DisplayString ElementID="Demo.FileExists.Monitor.FileExists"><Name>Demo File Exists Monitor</Name><Description /></DisplayString><DisplayString ElementID="Demo.FileExists.Monitor.FileExists" SubElementID="Error"><Name>Error</Name></DisplayString><DisplayString ElementID="Demo.FileExists.Monitor.FileExists" SubElementID="Success"><Name>Success</Name></DisplayString><DisplayString ElementID="Demo.FileExists.Monitor.FileExists_AlertMessageResourceID"><Name>Demo File Exists Monitor</Name><Description>The expected file or folder is missing. FolderExists: {0} FileExists: {1}</Description></DisplayString></DisplayStrings></LanguagePack></LanguagePacks></ManagementPack>

We will detect all the agents that are missing this file:

image

Give context:

image

Run a recovery:

image

And finally, redetect that the file exists, set the monitor to healthy, and close any associated alerts:

image

 

This is a VERY simple example… and this specific example should be modified as it will copy the AMD64 version of OOMADS.MSI to all agents missing it, even if they are 32bit.  If you wanted to actually use this, I’d recommend changing the target to your domain controller class, and this assumes all your DC’s are 64bit (they sure should be!)

Regardless, this should be a fairly useful example for monitoring for the existence of a file, and how to pass a property to a condition detection in a recovery.

I will attach the demo MP below.

OpsMgr 2012: Management Servers might not reconnect to SQL after a SQL outage

$
0
0

 

I wrote about this new feature we added to control this in SCOM 2007 R2 with the advent of CU4:  http://blogs.technet.com/b/kevinholman/archive/2011/02/07/a-new-feature-in-r2-cu4-reconnecting-to-sql-server-after-a-sql-outage.aspx

A scenario might occur where a SQL outage exists, and then the management servers will not automatically connect to SQL again once it comes back up on the network.  If you have experienced this you should consider applying this resolution.

However – those registry locations have changed in OM 2012.  This was done to sync the Data Access layer between Service Manager and SCOM.  There has been some confusion if these configuration settings are supported and/or work in SCOM 2012.

The new locations are documented at:  http://support.microsoft.com/kb/2913046/en-us

Start the Registry Editor.

Locate and then click the following registry subkey:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\System Center\2010\Common\DAL

Create the following two new DWORD values:

  • DALInitiateClearPool
    Type: DWORD
    Decimal value: 1
  • DALInitiateClearPoolSeconds
    Type: DWORD
    Decimal Value: 60

image

 

DALInitiateClearPool should be set to Decimal value “1” to enable it.

DALInitiateClearPoolSeconds should be set to Decimal value “60” to represent 60 second retry interval.  The DALInitiateClearPoolSeconds setting controls when the management server drops the current connection pool and when the management server tries to reestablish an SQL connection. We recommend that you set this setting to 60 seconds or more to avoid performance issues. 

Restart the System Center Data Access Service, and the System Center Management/Microsoft Monitoring Agent service on the management server for these settings to take effect.

 

This solution applies to SCOM 2012, 2012 SP1, and 2012 R2.  Along with the same versions of Service Manager 2012.

Modifying access in SCOM user roles – without the console

$
0
0

 

In general, the *supported* method to add users and groups to user roles is using the console.  This is article will demonstrate an alternative method, that might be needed in cases where security got totally messed up, our a critical admin group got deleted.

The idea came from Michel Kamp’s article:  http://michelkamp.wordpress.com/2012/05/05/audit-scom-sdk-usage-operations/

Authorization Manager source (AzMan) was moved from a file in SCOM 2007, to a SQL database store in SCOM 2012.  It was possible in SCOM 2007, to accidentall delete the domain group used for SCOM admins, and lock out access.  To read about how to recover this scenario in SCOM 2007 see:  http://support.microsoft.com/kb/2640222

In SCOM 2012, you can load up Authorization Manager from SQL.  Here is how.

On your SCOM management server, open a MMC, and load the Authorization Manager snap in.

image

 

Once you lad that, right click Authorization Manager in the left pane and choose “Open Authorization Store”

image

 

Choose Microsoft SQL and input the properly formatted connect string.  Here is an example:

mssql://Driver={SQL Server};Server={SERVERNAME\INSTANCE};/OperationsManager/AzmanStore

Replace SEVERNAME\INSTANCE with your SCOM SQL server name (and named instance if needed) and change “OperationsManager” to whatever your SCOM OpsDB is named.  Here is mine:

mssql://Driver={SQL Server};Server={DB01};/OperationsManager/AzmanStore

When this opens up – you can see a list of GUIDS.  Each represents a built-in user role or custom scoped user role.  Expand 597f9d98-356f-4186-8712-4f020f2d98b4 and look at the Role Assignments:

image

 

We can see that belongs to The Operations Manager Administrators role.

Right click the top level GUID 597f9d98-356f-4186-8712-4f020f2d98b4 in the left hand side, and choose Properties:

image

On the security tab – you can add new groups here, or even individual users.

image

 

The above should only be used in a recovery scenario, use the console to directly administer membership of user roles.

SQL MP 6.4.1.0 – SQL 2012 DB Engine group does not contain all SQL servers

$
0
0

 

Ian wrote about this issue back in September:  http://ianblythmanagement.wordpress.com/2013/09/12/sql-server-2012-db-engine-group-problem/

Essentially – SCOM discovers the SQL Version from a registry key that SQL places.  The problem arises that the SQL version in that key uses a different SQL version that what is considered typical. 

SQL build versions are visible here:  http://sqlserverbuilds.blogspot.com/

When you apply SQL 2012 SP1 to SQL 2012, this updates the registry from 11.0.xxxx.x to 11.1.xxx.x as seen below in SCOM Discovered inventory:

image

 

The issue is that the group “SQL Server 2012 DB Engine Group” is hard coded to 11.0.* as seen below:

image

 

I wrote a quick MP that contains a new group population discovery set to “11.*” along with an override to disable the built in group.  The XML is visible below:

 

<ManagementPack ContentReadable="true" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><Manifest><Identity><ID>Microsoft.SQLServer.2012.Discovery.Addendum</ID><Version>6.4.1.0</Version></Identity><Name>Microsoft.SQLServer.2012.Discovery.Addendum</Name><References><Reference Alias="SQL2012Disc"><ID>Microsoft.SQLServer.2012.Discovery</ID><Version>6.4.1.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference><Reference Alias="SQL"><ID>Microsoft.SQLServer.Library</ID><Version>6.4.1.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference><Reference Alias="SC"><ID>Microsoft.SystemCenter.Library</ID><Version>6.1.7221.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference><Reference Alias="Windows"><ID>Microsoft.Windows.Library</ID><Version>6.1.7221.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference><Reference Alias="Health"><ID>System.Health.Library</ID><Version>6.1.7221.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference><Reference Alias="System"><ID>System.Library</ID><Version>6.1.7221.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference></References></Manifest><Monitoring><Discoveries><Discovery ID="Microsoft.SQLServer.2012.Discovery.Addendum.PopulateSQL2012EngineGroup" Enabled="true" Target="SQL2012Disc!Microsoft.SQLServer.2012.InstanceGroup" ConfirmDelivery="true" Remotable="true" Priority="Normal"><Category>Discovery</Category><DiscoveryTypes><DiscoveryClass TypeID="SQL2012Disc!Microsoft.SQLServer.2012.InstanceGroup"/></DiscoveryTypes><DataSource ID="DS" TypeID="SC!Microsoft.SystemCenter.GroupPopulator"><RuleId>$MPElement$</RuleId><GroupInstanceId>$Target/Id$</GroupInstanceId><MembershipRules><MembershipRule><MonitoringClass>$MPElement[Name="SQL2012Disc!Microsoft.SQLServer.2012.DBEngine"]$</MonitoringClass><RelationshipClass>$MPElement[Name="SQL2012Disc!Microsoft.SQLServer.2012.InstanceGroupContainsDBEngine"]$</RelationshipClass><Expression><RegExExpression><ValueExpression><Property>$MPElement[Name="SQL!Microsoft.SQLServer.DBEngine"]/Version$</Property></ValueExpression><Operator>MatchesWildcard</Operator><Pattern>11.*</Pattern></RegExExpression></Expression></MembershipRule></MembershipRules></DataSource></Discovery></Discoveries><Overrides><DiscoveryPropertyOverride ID="Microsoft.SQLServer.2012.Discovery.Addendum.DisableSQL2012GP" Context="SQL2012Disc!Microsoft.SQLServer.2012.InstanceGroup" Enforced="false" Discovery="SQL2012Disc!Microsoft.SQLServer.2012.PopulateSQLServersInstanceGroup" Property="Enabled"><Value>false</Value></DiscoveryPropertyOverride></Overrides></Monitoring><LanguagePacks><LanguagePack ID="ENU" IsDefault="true"><DisplayStrings><DisplayString ElementID="Microsoft.SQLServer.2012.Discovery.Addendum"><Name>SQL Server 2012 (Discovery) Addendum</Name><Description>This management pack addresses a specific issue in the SQL MP 6.4.1.0where the SQL 2012 DB engine instance group does not populate correctly due to a version mismatch. This MP ONLY applies to that specific version.</Description></DisplayString><DisplayString ElementID="Microsoft.SQLServer.2012.Discovery.Addendum.DisableSQL2012GP"><Name>Disable default SQL 2012 Group Population </Name><Description /></DisplayString><DisplayString ElementID="Microsoft.SQLServer.2012.Discovery.Addendum.PopulateSQL2012EngineGroup"><Name>Populate Microsoft SQL Server 2012 Instance Group (fixed)</Name><Description>This discovery fixes an issue in the group discover in SQL MP version 6.4.1.0</Description></DisplayString></DisplayStrings></LanguagePack></LanguagePacks></ManagementPack>

I will also attach the MP as a zip file to this post below.

Now my groups populate as expected:

 

image

Windows 8 Client OS MP doesn’t discover Windows 8.1

$
0
0

 

This article is based on version 6.0.7024.0 of the Windows 8 Client MP available here:   http://www.microsoft.com/en-us/download/details.aspx?id=38434

Microsoft released a management pack to discover and monitor the Windows 8 client OS in SCOM.  This is useful for mission critical desktops, kiosks, ATMs, etc.

However, the management pack was not updated for Windows 8.1, and does not discover or monitor Windows 8.1

 

I am attaching a simple management pack which contains two new discoveries which support Windows 8.1 and Windows 8 RTM.  Additionally, I included two overrides to disable the original MP discoveries. 

The key modification is a simple expression to allow detection of Windows 8 (version 6.2) and Windows 8.1 (version 6.3) as seen in the expression below:

<Expression><Or><Expression><SimpleExpression><ValueExpression><XPathQuery Type="String">Values/WindowsCurrentVersion</XPathQuery></ValueExpression><Operator>Equal</Operator><ValueExpression><Value Type="String">6.2</Value></ValueExpression></SimpleExpression></Expression><Expression><SimpleExpression><ValueExpression><XPathQuery Type="String">Values/WindowsCurrentVersion</XPathQuery></ValueExpression><Operator>Equal</Operator><ValueExpression><Value Type="String">6.3</Value></ValueExpression></SimpleExpression></Expression></Or></Expression>

 

I am attaching the MP below.

Creating Groups of Health Service Watcher Objects based on other Groups

$
0
0

 

It has been a well known requirement for most customers, to be able to Create Groups of Windows Computers that also contain corresponding Health Service Watcher objects.  This was needed for Alert Notification subscriptions so that different teams could receive alert notifications filtered by groups, but also include alerts from the Watcher, such as Heartbeat failure and Computer Unreachable.  There are several articles on this but I will reference a very popular one, on Tims’ site: 

http://www.scom2k7.com/dynamic-computer-groups-that-send-heartbeat-alerts/

Essentially, we needed to add an extra membership rule, to the XML, that would also add any Health Service Watcher objects that have a relationship to the Windows Computer objects already in the group.  We did this with the following XML:

<MembershipRule><MonitoringClass>$MPElement[Name="SC!Microsoft.SystemCenter.HealthServiceWatcher"]$</MonitoringClass><RelationshipClass>$MPElement[Name="MicrosoftSystemCenterInstanceGroupLibrary!Microsoft.SystemCenter.InstanceGroupContainsEntities"]$</RelationshipClass><Expression><Contains><MonitoringClass>$MPElement[Name="SC!Microsoft.SystemCenter.HealthService"]$</MonitoringClass><Expression><Contained><MonitoringClass>$MPElement[Name="Windows!Microsoft.Windows.Computer"]$</MonitoringClass><Expression><Contained><MonitoringClass>$Target/Id$</MonitoringClass></Contained></Expression></Contained></Expression></Contains></Expression></MembershipRule>

However, what if we ONLY want a group of Health Service Watcher objects, and NOT the Windows Computers.  BUT – we wish to based the HSW membership list from another group of Windows Computers.  This is useful if we want to create availability reports for a group of Windows Computers, but need to based the report on the availability of a specific up/down monitor, and not anything related to Windows Computer objects.

Here is a code example of exactly that:

In this sample – we will create a simple group of Windows Computers, that start with the name “DB”.  Then – we will create another group only containing HSW objects, corresponding the SQL computers group.

<ManagementPack ContentReadable="true" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><Manifest><Identity><ID>grouptest</ID><Version>1.0.0.8</Version></Identity><Name>grouptest</Name><References><Reference Alias="MSCIGL"><ID>Microsoft.SystemCenter.InstanceGroup.Library</ID><Version>6.1.7221.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference><Reference Alias="SC"><ID>Microsoft.SystemCenter.Library</ID><Version>6.1.7221.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference><Reference Alias="Windows"><ID>Microsoft.Windows.Library</ID><Version>6.1.7221.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference><Reference Alias="Health"><ID>System.Health.Library</ID><Version>6.1.7221.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference><Reference Alias="System"><ID>System.Library</ID><Version>6.1.7221.0</Version><PublicKeyToken>31bf3856ad364e35</PublicKeyToken></Reference></References></Manifest><TypeDefinitions><EntityTypes><ClassTypes><ClassType ID="grouptest.compgroup" Accessibility="Internal" Abstract="false" Base="SC!Microsoft.SystemCenter.ComputerGroup" Hosted="false" Singleton="true"/><ClassType ID="grouptest.SQLWatchers" Accessibility="Internal" Abstract="false" Base="MSCIGL!Microsoft.SystemCenter.InstanceGroup" Hosted="false" Singleton="true"/></ClassTypes></EntityTypes></TypeDefinitions><Monitoring><Discoveries><Discovery ID="grouptest.DiscoverSQLServersComputerGroup" Enabled="true" Target="grouptest.compgroup" ConfirmDelivery="true" Remotable="true" Priority="Normal"><Category>Discovery</Category><DiscoveryTypes><DiscoveryRelationship TypeID="SC!Microsoft.SystemCenter.ComputerGroupContainsComputer"/></DiscoveryTypes><DataSource ID="GP" TypeID="SC!Microsoft.SystemCenter.GroupPopulator"><RuleId>$MPElement$</RuleId><GroupInstanceId>$MPElement[Name="grouptest.compgroup"]$</GroupInstanceId><MembershipRules><MembershipRule><MonitoringClass>$MPElement[Name="Windows!Microsoft.Windows.Computer"]$</MonitoringClass><RelationshipClass>$MPElement[Name="SC!Microsoft.SystemCenter.ComputerGroupContainsComputer"]$</RelationshipClass><Expression><RegExExpression><ValueExpression><Property>$MPElement[Name="Windows!Microsoft.Windows.Computer"]/PrincipalName$</Property></ValueExpression><Operator>MatchesWildcard</Operator><Pattern>DB*</Pattern></RegExExpression></Expression></MembershipRule></MembershipRules></DataSource></Discovery><Discovery ID="grouptest.DiscoverSQLWatchers" Enabled="true" Target="grouptest.SQLWatchers" ConfirmDelivery="true" Remotable="true" Priority="Normal"><Category>Discovery</Category><DiscoveryTypes><DiscoveryRelationship TypeID="MSCIGL!Microsoft.SystemCenter.InstanceGroupContainsEntities"/></DiscoveryTypes><DataSource ID="GP" TypeID="SC!Microsoft.SystemCenter.GroupPopulator"><RuleId>$MPElement$</RuleId><GroupInstanceId>$MPElement[Name="grouptest.SQLWatchers"]$</GroupInstanceId><MembershipRules><MembershipRule><MonitoringClass>$MPElement[Name="SC!Microsoft.SystemCenter.HealthServiceWatcher"]$</MonitoringClass><RelationshipClass>$MPElement[Name="MSCIGL!Microsoft.SystemCenter.InstanceGroupContainsEntities"]$</RelationshipClass><Expression><Contains><MonitoringClass>$MPElement[Name="SC!Microsoft.SystemCenter.HealthService"]$</MonitoringClass><Expression><Contained><MonitoringClass>$MPElement[Name="grouptest.compgroup"]$</MonitoringClass></Contained></Expression></Contains></Expression></MembershipRule></MembershipRules></DataSource></Discovery></Discoveries></Monitoring><LanguagePacks><LanguagePack ID="ENU" IsDefault="true"><DisplayStrings><DisplayString ElementID="grouptest"><Name>Group Test</Name><Description /></DisplayString><DisplayString ElementID="grouptest.compgroup"><Name>SQL Servers Computer Group</Name></DisplayString><DisplayString ElementID="grouptest.DiscoverSQLServersComputerGroup"><Name>Discovery for SQL Servers Computer Group</Name></DisplayString><DisplayString ElementID="grouptest.DiscoverSQLWatchers"><Name>Discovery for SQL Health Service Watchers Group</Name><Description /></DisplayString><DisplayString ElementID="grouptest.SQLWatchers"><Name>SQL Health Service Watchers Group</Name></DisplayString></DisplayStrings></LanguagePack></LanguagePacks></ManagementPack>

 

The key to this is the specific reference of the other group – shown here:

<MembershipRules><MembershipRule><MonitoringClass>$MPElement[Name="SC!Microsoft.SystemCenter.HealthServiceWatcher"]$</MonitoringClass><RelationshipClass>$MPElement[Name="MSCIGL!Microsoft.SystemCenter.InstanceGroupContainsEntities"]$</RelationshipClass><Expression><Contains><MonitoringClass>$MPElement[Name="SC!Microsoft.SystemCenter.HealthService"]$</MonitoringClass><Expression><Contained><MonitoringClass>$MPElement[Name="grouptest.compgroup"]$</MonitoringClass></Contained></Expression></Contains></Expression></MembershipRule></MembershipRules>

Introducing Thing 1 and Thing 2

$
0
0

 

image

 

My blog has been silent for a bit lately.  This is because of the birth of my first children.  SCOM world - Meet Logan and Lexi Holman.  I’ll be taking some time off work to spend with them during the next few weeks as well.

Logan has already been discussing a management pack to monitor his diaper status, Lexi is gathering the requirements and having the necessary customer meetings.  We aren't in full agreement yet on what constitutes warning versus critical, so I‘ll keep you up to date on the status.  Smile


UR2 for SCOM 2012 R2 – Step by Step

$
0
0

 

Sorry I am a bit behind in publishing this post.  Smile

image

 

KB Article:   http://support.microsoft.com/kb/2929891

Download catalog site:  http://catalog.update.microsoft.com/v7/site/Search.aspx?q=2929891

 

Key fixes:

Issue 1 - This update rollup makes the stored procedure performance aggregate more robust against out-of-range values.
Issue 2 - Adding multiple regular expressions (RegEx) to a group definition causes an SQL exception when the group is added or run.
Issue 3 - Web applications fail when they are monitored by the System Center Operations Manager 2012 R2 APM agent.
Issue 4 - Service Level Objectives (SLO) dashboards sometimes load in several seconds and sometimes take minutes to load. Additionally, the dashboard is empty after it loads in some cases.
Issue 5 - Operations Manager Console crashes when you try to override the scope in the Authoring pane.
Issue 6 - The System Center Operations Manager console is slow to load views if you are a member of a custom Operator role.
Issue 7 - This update rollup includes a fix for the dashboard issue that was introduced in Update Rollup 1.
Issue 8 - SQL Time Out Exceptions for State data (31552 events) occur when you create Data Warehouse workflows.
Issue 9 - This update rollup includes a fix for the Event Data source.

Xplat updates:
Issue 1 - All IBM WebSphere application servers that run on Linux or AIX computers are not automatically discovered by the Management Pack for Java Enterprise Edition (JEE) if multiple application servers are defined in a single WebSphere profile.

 

Lets get started.

From reading the KB article – the order of operations is:

 

  1. Install the update rollup package on the following server infrastructure:
    • Management servers
    • Gateway servers
    • Web console server role computers
    • Operations console role computers
  2. Apply SQL scripts (see installation information).
  3. Manually import the management packs.
  4. Update Agents

Now, we need to add another step – if we are using Xplat monitoring – need to update the Linux/Unix MP’s and agents.

       5.  Update Unix/Linux MP’s and Agents.

 

1.  Management Servers

   image

Since there is no RMS anymore, it doesn’t matter which management server I start with.  There is no need to begin with whomever holds the RMSe role.  I simply make sure I only patch one management server at a time to allow for agent failover without overloading any single management server.

I can apply this update manually via the MSP files, or I can use Windows Update.  I have 3 management servers, so I will demonstrate both.  I will do the first management server manually.  This management server holds 3 roles, and each must be patched:  Management Server, Web Console, and Console.

The first thing I do when I download the updates from the catalog, is copy the cab files for my language to a single location:

image

 

Then extract the contents:

image

Once I have the MSP files, I am ready to start applying the update to each server by role.

***Note:  You MUST log on to each server role as a Local Administrator, SCOM Admin, AND your account must also have System Administrator (SA) role to the database instances that host your OpsMgr databases.

My first server is a management server, and the web console, and has the OpsMgr console installed, so I copy those update files locally, and execute them per the KB, from an elevated command prompt:

image

This launches a quick UI which applies the update.  It will bounce the SCOM services as well.  The update does not provide any feedback that it had success or failure.  You can check the application log for the MsiInstaller events for that:

Log Name:      Application
Source:        MsiInstaller
Date:          6/2/2014 1:58:33 PM
Event ID:      1035
Task Category: None
Level:         Information
Keywords:      Classic
User:          OPSMGR\kevinhol
Computer:      SCOM01.opsmgr.net
Description:
Windows Installer reconfigured the product. Product Name: System Center Operations Manager 2012 Server. Product Version: 7.1.10226.1015. Product Language: 1033. Manufacturer: Microsoft Corporation. Reconfiguration success or error status: 0.

 

You can also spot check a couple DLL files for the file version attribute. 

image

 

Next up – run the Web Console update:

image

This runs much faster.   A quick file spot check:

image

Lastly – install the console update (make sure your console is closed):

image

A quick file spot check:

image

 

Secondary Management Servers:

image

I now move on to my secondary management servers, applying the server update, then the console update. 

On this next management server, I will use Windows Update.  I check online, and make sure that I have configured Windows Update to give me updates for additional products:

image29

This shows me two applicable updates for this server:

image

I apply these updates (along with some additional Windows Server Updates I was missing, and reboot each management server, until all management servers are updated.

 

Updating Gateways:

image

I can use Windows Update or manual installation.

image

The update launches a UI and quickly finishes.

Then I will spot check the DLL’s:

image

That said – there is a long running bug in the gateway update.  The gateway update is NOT placing a very important file here – for agents.

BUG:  In the \Program Files\System Center Operations Manager\Gateway\AgentManagement\ directories – we should be dropping an agent update MSP file for updating agents behind gateways, for x86 and amd64 agents.  However, the GW update does not include this.  If you want to push-deploy agents behind gateways, and need them to be fully up to date, you should copy the correct files from your updated management servers directories.

 

 

2. Apply the SQL Script

 

In the path on your management servers, where you installed/extracted the update, there are two SQL script files: 

%SystemDrive%\Program Files\System Center 2012\Operations Manager\Server\SQL Script for Update Rollups

image

First – let’s run the script to update the OperationsManager database.  Open a SQL management studio query window, connect it to your Operations Manager database, and then open the script file.  Make sure it is pointing to your OperationsManager database, then execute the script.

image44

Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.

You will see the following (or similar) output:

image47  

or

 image

IF YOU GET AN ERROR – STOP!  Do not continue.  Try re-running the script several times until it completes without errors.  In a large environment, you might have to run this several times, or even potentially shut down the services on your management servers, to break their connection to the databases, to get a successful run.

Technical tidbit:  If you had previously ran this script by applying it during the application of SCOM 2012 R2 UR1, this script is unchanged in UR2.  Therefore it does not have to be executed again during the UR2 deployment.  There is no harm in running it again, especially if you are not 100% sure it was run with success during the UR1 deployment, if applicable.  Always best to just run it again with the deployment of UR2.  However, if you have a large environment and it is difficult to get the script to execute with success, you might skip this step.  Again – only if you already applied UR1, and you are 100% sure it was run with success then.

 

image

Next, we have a new script in UR2 to run against the warehouse DB.  Do not skip this step under any circumstances.    From:

%SystemDrive%\Program Files\System Center 2012\Operations Manager\Server\SQL Script for Update Rollups

Open a SQL management studio query window, connect it to your OperationsManagerDW database, and then open the script file UR_Datawarehouse.sql.  Make sure it is pointing to your OperationsManagerDW database, then execute the script.

If you see a warning about line endings, choose Yes to continue.

  image

Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.

You will see the following (or similar) output:

image

 

3. Manually import the management packs?

image

We have five updated MP’s to import  (MAYBE!).

image

The TFS MP bundles are only used for specific scenarios, such as DevOps scenarios where you have integrated APM with TFS, etc.  If you are not currently using these MP’s, there is no need to import or update them.  I’d skip this MP import unless you already have these MP’s present in your environment.

The Advisor MP’s are only needed if you are using System Center Advisor services.

However, the Image and Visualization libraries deal with Dashboard updates, and these need to be updated.

I import all of these without issue.

 

 

4.  Update Agents

image

There is a known issue in UR2 for agents – read carefully below:

Agents should be placed into pending actions by this update (mine worked great):

image

If your agents are not placed into pending management – this is generally caused by not running the update from an elevated command prompt, or having manually installed agents which will not be placed into pending

You can approve these – which will result in a success message:

image

HOWEVER– this didn’t actually do any update.  You can see from the system event logs, that MOMAgentinstaller did run, but when we check the DLL versions, we can see they are not updated.

What you need to do is REJECT any pending updates in the SCOM console – then run a REPAIR on your agents to get them to apply the update.  Alternatively – use a software distribution tool like Configuration Manager to apply agent updates where applicable.  Any agents that are manually installed (Remotely Manageable = No) will not be available for a repair, as always. 

You can track running repairs in Pending Management:

image

 

Soon you should start to see PatchList getting filled in from the Agents By Version view under Operations Manager monitoring folder in the console:

image

 

 

 

5.  Update Unix/Linux MPs and Agents

image

Next up – I download and extract the updated Linux MP’s for SCOM 2012 SP1 UR2

http://www.microsoft.com/en-us/download/details.aspx?id=29696

7.5.1021.0 is current at this time for SCOM 2012 R2 UR2. 

****Note – take GREAT care when downloading – that you select the correct download for R2.  You must scroll down in the list and select the MSI for 2012 R2:

image50

 

Download the MSI and run it.  It will extract the MP’s to C:\Program Files (x86)\System Center Management Packs\System Center 2012 R2 Management Packs for Unix and Linux\

Update any MP’s you are already using.

image

You will likely observe VERY high CPU utilization of your management servers and database server during and immediately following these MP imports.  Give it plenty of time to complete the process of the import and MPB deployments.

Next up – you would upgrade your agents on the Unix/Linux monitored agents.  You can now do this straight from the console:

image

image

You can input credentials or use existing RunAs accounts if those have enough rights to perform this action.

 

 

 

5.  Update the remaining deployed consoles

image

This is an important step.  I have consoles deployed around my infrastructure – on my Orchestrator server, SCVMM server, on my personal workstation, on all the other SCOM admins on my team, on a Terminal Server we use as a tools machine, etc.  These should all get the UR2 update.

 

 

Review:

Now at this point, we would check the OpsMgr event logs on our management servers, check for any new or strange alerts coming in, and ensure that there are no issues after the update.

image

Known issues:

See the existing list of known issues documented in the KB article.

1.  Many people are reporting that the SQL script is failing to complete when executed.  You should attempt to run this multiple times until it completes without error.  You might need to stop the Exchange correlation engine, stop the services on the management servers, or bounce the SQL server services in order to get a successful completion in a busy management group.  The errors reported appear as below:

------------------------------------------------------
(1 row(s) affected)
(1 row(s) affected)
Msg 1205, Level 13, State 56, Line 1
Transaction (Process ID 152) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Msg 3727, Level 16, State 0, Line 1
Could not drop constraint. See previous errors.
--------------------------------------------------------

2.  Gateway Servers don’t get agent patch update files.  See body of this blog article for more details.

3.  Agents don’t go into pending, or go into pending but the agent update doesn’t actually work.  This is a known issue and will be addressed in the next UR3.  For this release, simply use a “repair” to repair the agents that need the update, or use a software distribution mechanism to deploy the update.

WMI Leaks Memory on Windows Server 2012 R2 Domain Controller / DNS server roles – Hotfix available

$
0
0

 

There was an issue when you monitored DNS server roles on Windows Server 2012 R2 servers.  The DNS PowerShell WMI provider would leak memory each time it was called.  When you monitor DNS, and leverage this WMI provider, you would see an aggressive memory leak occur in ONE of the WmiPrvSE.exe processes on the server. 

This leak would continue until the WMI process reached around 500 to 600 MB of private bytes, until the WMI process would eventually become unresponsive, and crash:

Log Name:      Application
Source:        Application Error
Date:          6/2/2014 4:15:39 PM
Event ID:      1000
Task Category: (100)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      DC01.opsmgr.net
Description:
Faulting application name: wmiprvse.exe, version: 6.3.9600.16384, time stamp: 0x5215f9c9
Faulting module name: DnsServerPsProvider.dll, version: 6.3.9600.16384, time stamp: 0x5215e759
Exception code: 0xc0000005
Fault offset: 0x00000000000ef9d1
Faulting process id: 0x16b4
Faulting application start time: 0x01cf7c789301e26b
Faulting application path: C:\Windows\system32\wbem\wmiprvse.exe
Faulting module path: C:\Windows\System32\wbem\DnsServerPsProvider.dll
Report Id: 0b622ace-ea9b-11e3-80ce-00155d0ad51b
Faulting package full name:
Faulting package-relative application ID:

During this time just before the crash, SCOM management packs querying WMI might generate alerts, such as:

Script Based Test Failed to Complete. 

The error returned was: 'Object required' (0x1A8)

Failed to convert to UTC time.
The error returned was: 'No more threads can be created in the system.' (0x800700A4)

Operations Manager failed to run a WMI query

HRESULT: 0x800700a4
Details: No more threads can be created in the system.

Windows DNS - WMI Validation Failed

Testing the WMI namespace root\MicrosoftDNS has failed twice in a row.

HRESULT: 0x8004101d
Details: Unexpected error

If you monitor the WMI process private bytes memory utilization, you will see the leak quite clearly:

image

 

There is now a hotfix to address this issue!

I recommend applying this hotfix as soon as possible to any DNS server or Domain Controller running the DNS server role.

 

The hotfix/KB article for this specific issue is located at:

http://support.microsoft.com/kb/2954185

 

You can apply the hotfix in one of two very specific ways:

Option 1:  Apply the May 2014 Windows Server Hotfix Rollup for WS2012R2 (2955164) which includes this fix: 

http://support.microsoft.com/kb/2955164

Option 2:  Apply the April 2014 Windows Server Hotfix Rollup for WS2012R2 (2919355) *and* then the specific hotfix for the issue (2954185)

http://support.microsoft.com/kb/2919355

http://support.microsoft.com/kb/2954185

 

And remember – I also recommend the following hotfix in addition – to resolve a problem with the agents failing on Windows Server 2012 R2 Domain Controllers:  http://blogs.technet.com/b/kevinholman/archive/2014/03/03/agents-on-windows-2012-r2-domain-controllers-can-stop-responding-or-heart-beating.aspx

 

I have added both of these to my recommended SCOM Hotfix list:

http://blogs.technet.com/b/kevinholman/archive/2009/01/27/which-hotfixes-should-i-apply.aspx

Tweaking SCOM 2012 Management Servers for large environments

$
0
0

 

There are many articles on tweaking certain registry settings for SCOM agents, Gateways, and Management servers, for many reasons.  Large deployments, custom 3rd party MP’s, monitoring Exchange 2010 to name a few.  Matt Goedtel has a good list on his blog:  http://blogs.technet.com/b/mgoedtel/archive/2010/08/24/performance-optimizations-for-operations-manager-2007-r2.aspx

 

The default settings in SCOM 2012 work for MOST environments, out of the box.  It is fairly rare to have to change these settings, and should only be done with the understanding of each setting, and why you’d be adjusting it.

 

Below – I’d like to post some settings that I change on Management Servers, when monitoring very large environments.  What does “very large” mean?  Well, I’d characterize that as a management group with a very large agent count (>5000), or a very large instance space (lots of Management Packs deployed both MS and 3rd party, and custom MP’s which don’t always behave well).  Perhaps you have a very large number of groups, or groups with complex expressions.  It could be your are monitoring a large number of “agentless” items, such as Linux servers, or Network Devices, or URLs, etc.

I stress – these settings are NOT designed to be changed for all SCOM deployments.  These will not make your SCOM deployment “run better” or “faster”.  These are simply commonly required changes for large scale deployments under specific scenarios.

 

All management servers, that host a large amount of agentless objects, which results in the MS running a large number of workflows: (network/URL/Linux/3rd party/VEEAM)  This is an ESE DB setting which controls how often ESE writes to disk.  A larger value will decrease disk IO caused by the SCOM healthservice but increase ESE recovery time in the case of a healthservice crash. 
Key: 
    HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\
REG_DWORD Decimal Value: 
    Persistence Checkpoint Depth Maximum = 104857600
SCOM 2012 default existing registry value = 20971520

All management servers in a large management group:  This sets the maximum size of healthservice internal state queue.  It should be equal or larger than the number of monitor based workflows running in a healthservice.  Too small of a value, or too many workflows will cause state change loss.  http://blogs.msdn.com/b/rslaten/archive/2008/08/27/event-5206.aspx
Key: 
    HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\
REG_DWORD Decimal Value: 
    State Queue Items = 20480
SCOM 2012 default existing registry value: not present.  Value must be created.  Default code value = 10240

All management servers, that participate in any resource pools, that run a large number of workflows:
Key:
    HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\PoolManager\
REG_DWORD Decimal Value: 
    PoolLeaseRequestPeriodSeconds = 600
    PoolNetworkLatencySeconds = 120
SCOM 2012 existing registry value:  not present (must create PoolManager key and both values)  Default code value =  120/30 seconds

All management servers that participate in the All Management Servers resource pool, that have a large agent count or large number of groups:  This setting will slow down how often group calculation runs to find changes in group memberships.  Group calculation can be very expensive, especially with a large number of groups, large agent count, or complex group membership expressions.  Slowing this down will help keep groupcalc from consuming all the healthservice and database I/O.
Key: 
    HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\
REG_DWORD Decimal Value: 
    GroupCalcPollingIntervalMilliseconds = 900000
SCOM 2012 existing registry value:  not present (must create value).  Default code value = 30000 (30 seconds)

All management servers in a management group, this helps with dataset maintenance as the default timeout of 10 minutes is often too short.  Setting this to a longer value helps reduce the 31552 events you might see with standard database maintenance.  This is a very common issue.   http://blogs.technet.com/b/kevinholman/archive/2010/08/30/the-31552-event-or-why-is-my-data-warehouse-server-consuming-so-much-cpu.aspx
Key:
    HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse\
REG_DWORD Decimal Value:
    Command Timeout Seconds = 1200
SCOM 2012 existing registry value: not preset (must create "Data Warehouse" key and value)  Default in code value = 300

All management servers in ANY management group.  This setting configures the SDK service to attempt a reconnection to SQL server upon disconnection, on a regular basis.  Without these settings, an extended SQL outage can cause a management server to never reconnect back to SQL when SQL comes back online after an outage.   Per:  http://support.microsoft.com/kb/2913046/en-us  All management servers in a management group should get the following:
Key:
    HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL\
REG_DWORD Decimal Value:
    DALInitiateClearPool = 1
    DALInitiateClearPoolSeconds = 60
SCOM 2012 existing registry value:   not present - code default - 30 seconds?

To summarize:

Registry Key

Reg DWORD Value NameReg DWORD Decimal Value

HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\

Persistence Checkpoint Depth Maximum104857600

HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\

State Queue Items20480

HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\PoolManager\

PoolLeaseRequestPeriodSeconds

600

HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\PoolManager\

PoolNetworkLatencySeconds120

HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\

GroupCalcPollingIntervalMilliseconds900000

HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse\

Command Timeout Seconds1200

HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL\

DALInitiateClearPool1

HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL\

DALInitiateClearPoolSeconds60

 

 

Below are some simple reg add statements you can run to make setting these easy:

reg add "HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\PoolManager" /v "PoolLeaseRequestPeriodSeconds" /t REG_DWORD /d 600 /f
reg add "HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\PoolManager" /v "PoolNetworkLatencySeconds" /t REG_DWORD /d 120 /f
reg add "HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters" /v "State Queue Items" /t REG_DWORD /d 20480 /f
reg add "HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters" /v "Persistence Checkpoint Depth Maximum" /t REG_DWORD /d 104857600 /f
reg add "HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0" /v "GroupCalcPollingIntervalMilliseconds" /t REG_DWORD /d 900000 /f
reg add "HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse" /v "Command Timeout Seconds" /t REG_DWORD /d 1200 /f
reg add "HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL" /v "DALInitiateClearPool" /t REG_DWORD /d 1 /f
reg add "HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL" /v "DALInitiateClearPoolSeconds" /t REG_DWORD /d 60 /f

The case of the Dell (Detailed) MP – beware of large environments

$
0
0

 

This article is not just a warning about the Dell (Detailed) MP, but the danger of importing ANY management pack into your environment without fully understanding the intended scope, scalability, and any known/common issues.

I recently worked with a customer who had an interesting issue.  They had a very large agent based monitoring environment (greater than 10,000 agents).  While performing a supportability review, we noticed that Config generation was failing.  This was evidenced by the Config monitors showing red on the console, alerts generated, events logged in the Management Server SCOM event logs, and most notably by the fact that agents were not getting updated config in a timely fashion.

Events were similar to:

Log Name:      Operations Manager
Source:        OpsMgr Management Configuration
Event ID:      29181
Computer:      managementserver.domain.com
Description:
OpsMgr Management Configuration Service failed to execute 'SnapshotSynchronization' engine work item due to the following exception

Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.DataAccessException: Data access operation failed
   at Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.DataAccessOperation.ExecuteSynchronously(Int32 timeoutSeconds, WaitHandle stopWaitHandle)
   at Microsoft.EnterpriseManagement.ManagementConfiguration.SqlConfigurationStore.ConfigurationStore.ExecuteOperationSynchronously(IDataAccessConnectedOperation operation, String operationName)
   at Microsoft.EnterpriseManagement.ManagementConfiguration.SqlConfigurationStore.ConfigurationStore.EndSnapshot(String deltaWatermark)
   at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.SnapshotSynchronizationWorkItem.EndSnapshot(String deltaWatermark)
   at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.SnapshotSynchronizationWorkItem.ExecuteSharedWorkItem()
   at Microsoft.EnterpriseManagement.ManagementConfiguration.Interop.SharedWorkItem.ExecuteWorkItem()
   at Microsoft.EnterpriseManagement.ManagementConfiguration.Interop.ConfigServiceEngineWorkItem.Execute()
-----------------------------------
System.Data.SqlClient.SqlException (0x80131904): Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.SqlCommand.InternalEndExecuteReader(IAsyncResult asyncResult, String endMethod)
   at System.Data.SqlClient.SqlCommand.EndExecuteReaderInternal(IAsyncResult asyncResult)
   at System.Data.SqlClient.SqlCommand.EndExecuteReader(IAsyncResult asyncResult)
   at Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.ReaderSqlCommandOperation.SqlCommandCompleted(IAsyncResult asyncResult)
ClientConnectionId:724196c1-d9ec-4f29-8807-b16cab05fcc6

 

Our initial issue was due to the fact that the management servers were running Windows 2012 RTM, with .NET 4.5.  There is an issue here and we needed to install .NET 4.5.1 to resolve these timeouts.  This got us past the initial failing for Snapshot Config failing.

Next – we saw that Delta Config started failing:

Log Name:      Operations Manager
Source:        OpsMgr Management Configuration
Event ID:      29181
Computer:      managementserver.domain.com
Description:
OpsMgr Management Configuration Service failed to execute 'DeltaSynchronization' engine work item due to the following exception

Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.DataAccessException: Data access operation failed
   at Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.DataAccessOperation.ExecuteSynchronously(Int32 timeoutSeconds, WaitHandle stopWaitHandle)
   at Microsoft.EnterpriseManagement.ManagementConfiguration.CmdbOperations.CmdbDataProvider.GetConfigurationDelta(String watermark)
   at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.TracingConfigurationDataProvider.GetConfigurationDelta(String watermark)
   at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.DeltaSynchronizationWorkItem.TransferData(String watermark)
   at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.DeltaSynchronizationWorkItem.ExecuteSharedWorkItem()
   at Microsoft.EnterpriseManagement.ManagementConfiguration.Interop.SharedWorkItem.ExecuteWorkItem()
   at Microsoft.EnterpriseManagement.ManagementConfiguration.Interop.ConfigServiceEngineWorkItem.Execute()
-----------------------------------
System.Data.SqlClient.SqlException (0x80131904): Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
   at System.Data.SqlClient.SqlDataReader.Read()
   at Microsoft.EnterpriseManagement.ManagementConfiguration.CmdbOperations.EntityChangeDeltaReadOperation.ReadManagedEntitiesProperties(SqlDataReader reader)
   at Microsoft.EnterpriseManagement.ManagementConfiguration.CmdbOperations.EntityChangeDeltaReadOperation.ReadData(SqlDataReader reader)
   at Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.ReaderSqlCommandOperation.SqlCommandCompleted(IAsyncResult asyncResult)
ClientConnectionId:9d9ec759-e9bf-4c1e-a958-581377c630b3

We run a snapshot config every 24 hours by default.  We run a delta config every 30 seconds by default.  These are controlled via the ConfigService.config file located in the \Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\ directory.  Delta config timing out was odd.  There can be many reasons for this, so the next step was to take a SQL trace and see what expensive queries were running.

If you want to see these in more clarity – the Config service logs these jobs to the CS.WorkItem table:

SELECT * FROM cs.workitem
ORDER BY WorkItemRowId DESC

You can filter these by Delta Sync or the daily Snapshot sync as well:

SELECT * FROM cs.workitem
WHERE WorkItemName like '%delta%'
ORDER BY WorkItemRowId DESC

SELECT * FROM cs.workitem
WHERE WorkItemName like '%snap%'
ORDER BY WorkItemRowId DESC

WorkItemStateId is the value of success or fail for the job.  It is normal to see some failures, for instance when multiple management servers try and execute the same job, some of those will fail, by design.

1    Running
10    Failed
12    Abandoned
15    Timed out
20    Succeeded

What we found – was one of the MP’s – the Dell Hardware MP – was consuming a large amount of SQL server CPU time, just to queries some standard Managed Type views in the database, many of these lasting over 10 minutes.

When we researched further, we found that the “Dell Windows Server (Detailed Edition)” management pack had been imported, and in the documentation there was no mention of scalability limitations.  However, we found in a much older (4.x) version of the documentation, Dell specifically states that they recommend the Detailed MP only for small environments, when the monitored server count is less than 300 agents!!!!  We had already discovered and were monitoring over 5000 Dell servers.

This massive discovery data influx was also causing Config Churn – and binding showing up as 2115 errors for discovery data:

Log Name:      Operations Manager
Source:        HealthService
Event ID:      2115
Computer:      managementserver.domain.com
Description:
A Bind Data Source in Management Group Production has posted items to the workflow, but has not received a response in 1510 seconds.  This indicates a performance or functional problem with the workflow.
Workflow Id : Microsoft.SystemCenter.CollectDiscoveryData
Instance    : managementserver.domain.com
Instance Id : {B3FA7F2F-3D4A-236D-D3FD-119B3E01C3E3}

So, just delete the MP, right?

Well, lets talk about what must happen when we delete an MP.  When you right click an MP in the console to delete it, we must first delete any discovered instances of any classes defined in that MP.  (Such as an instance of “Dell Server BIOS”.)  In order to delete an instance of a class, we must first also delete ALL monitoring data associated with that instance.  And I don’t mean just simply mark it as “deleted” in the database.  It must actually be deleted transactionally from the tables.  This means all alerts, all monitor based state changes, all events, all performance data, etc.  This can be MASSIVE overhead.

What we actually experienced, is the console locking up, we could track the SQL statements trying to delete the management pack and all the instance data, however this would time out eventually and never return anything to the console.  It would just go away, all the while our MP still existed.

So what can we do?

Well, we do have a possible solution…. in the Remove-SCOMDisabledClassInstance PowerShell commandlet.  This cmdlet allows us to delete the discovered instance data methodically, and slowly.  What this cmdlet does, is to delete any discovered instances in the management group, where that instance’s discovery is explicitly disabled via override.

So – we find all the discoveries in the Dell Detailed MP, and we create a new Override MP, to store a disable override for each discovery in.  Then, we run Remove-SCOMDisabledClassInstance.  This will run and run and run…. seemingly forever, until it returns with no errors.  In many cases, even this cmdlet will time out or crash with an exception, which can be normal when deleting a massive amount of data.

One trick to help with this process – is to set your state, performance, and event retention in the OpsDB to ONE day, then run grooming.  This will greatly reduce the amount of data we must delete transactionally.

Then – just keep running Remove-SCOMDisabledClassInstance.  In this specific case, because the amount of data was so large, it actually took over a day and probably over 100 executions, before the instances were all removed.  You can track the instances being removed, by creating a query that counts the records in the Managed Type tables you are deleting from.  Here is part of the one I crafted for this MP:

select sum(TCount) As TotalCount
from
(
select count (*) as Tcount
from MT_Dell$WindowsServer$Server
union all
select count (*) as Tcount
from MT_Dell$WindowsServer$BIOS
union all
select count (*) as Tcount
from MT_Dell$WindowsServer$Detailed$MemoryUnit
union all
select count (*) as Tcount
from MT_Dell$WindowsServer$Detailed$ProcUnit
union all
select count (*) as Tcount
from MT_Dell$WindowsServer$Detailed$PSUnit
union all
select count (*) as Tcount
from MT_Dell$WindowsServer$EnclosurePhysicalDisk
union all
select count (*) as Tcount
from MT_Dell$WindowsServer$ControllerConnector
) as T

As you run the Remove-SCOMDisabledClassInstance command, you will see these instance counts slowly eroding.  You just have to keep running it until it completes without a timeout or an exception.

Once the instance count gets to zero…. you can delete the MP.  We found this time the MP deleted in seconds!

Now that this MP was gone, the expensive query was over… and we saw the binding on Discovery Data go back to a more reasonable occurrence count and time value.

 

The lesson to learn here is – be careful when importing MP’s.  A badly written MP, or an MP designed for small environments, might wreak havoc in larger ones.  Sometimes the recovery from this can be long and quite painful.   An MP that tests out fine in your Dev SCOM environment might have issues that wont be seen until it moves into production.  You should always monitor for changes to a production SCOM deployment after a new MP is brought in, to ensure that you don’t see a negative impact.  Check the management server event logs, MS CPU performance, database size, and disk/CPU performance to see if there is a big change from your established baselines.

If you are designing a large agent deployment that nears our maximum scalability (currently 15,000 agents) great consideration must go into the management packs in scope.  If you require management packs that discover a large instance space per agent, and/or have a large number of workflows, you might find that you cannot achieve the maximum scale.

Monitor for file size with SCOM – Using script and WMI examples

$
0
0

 

SCOM has many different ways to monitor for a file size.  Here are some simple examples using script and WMI monitor types.

In this specific example – this will be a monitor to look for Windows Server Registry Bloat.  The monitor will inspect the registry hives for the registry file size, and alarm when the size is over a set threshold.

In the console, under Authoring, create a new Unit Monitor.  Choose a Timed Script Two State Monitor and choose an appropriate management pack.

image

 

Provide a displayname for the monitor, and choose “Windows Server Operating System” as that is the BEST generic targeting class.  I will place the monitor under “Availability” as that is most applicable for what I am trying to impact:  If the registry file grows to large, the availability of the server might become impacted.

image

Set a schedule that makes sense for your monitor.  Remember script based monitors consume the most resources, especially depending on the complexity of the script, so don’t try and run it too frequently.

image

Next, give your script a name that it will be compiled in XML as, and paste in the body of your script.  Here is my script below.  It accepts two parameters:  the full path to the file we wish to monitor, and the size threshold.

Option Explicit Dim oAPI, oBag, objFSO, objFile, varSize, oArgs, filepath, threshold Set oArgs = Wscript.Arguments filepath = oArgs(0) threshold =int(oArgs(1)) Set oAPI = CreateObject("MOM.ScriptAPI") Set objFSO = CreateObject("Scripting.FileSystemObject") Set objFile = objFSO.GetFile(filepath) varSize = objFile.Size If varSize > threshold Then Set oBag = oAPI.CreatePropertyBag() Call oBag.AddValue("Status","Bad") Call oBag.AddValue("Size", varSize) Call oBag.AddValue("Threshold", threshold) Call oAPI.Return(oBag) Call oAPI.LogScriptEvent("regfilesize.vbs", 160, 0, "The registry file size of HKLM\SOFTWARE is greater than the threshold of "& threshold &" bytes. The current size is: "& varSize &" bytes") Else Set oBag = oAPI.CreatePropertyBag() Call oBag.AddValue("Status","Ok") Call oBag.AddValue("Size", varSize) Call oBag.AddValue("Threshold", threshold) Call oAPI.Return(oBag) Call oAPI.LogScriptEvent("regfilesize.vbs", 160, 0, "The registry file size of HKLM\SOFTWARE is less than the threshold of "& threshold &" bytes. The current size is: "& varSize &" bytes") End If

Then select the “parameters” button, and provide the params:

image

 

Next – we must provide the “Unhealthy” expression.  We are returning a PropertyBag from the script as “Status” which will either be “Bad” or “Ok”.  The parameter name here is in the format:  Property[@Name='Status']

image

Repeat for Healthy expression:

image

Configure the health status you are looking to drive:

image

 

And alerting.  Note:  to make the value of the alert higher, you can include data from the propertybags returned in the script, into the alert context.  See the examples below for Size and Threshold, along with the computer name:

image

 

Here is the finished result of the alert:

image

 

And Health Explorer output is also very useful:

image

 

If you need to tune the monitor for specific systems – the script arguments are automatically exposed in Overrides:

image

 

Additional reading and examples on using script based monitors:

http://technet.microsoft.com/en-us/library/ff629453.aspx

http://blogs.technet.com/b/kevinholman/archive/2014/03/06/create-a-script-based-monitor-for-the-existence-of-a-file-with-recovery-to-copy-file.aspx

http://blogs.technet.com/b/kevinholman/archive/2014/02/11/opsmgr-simple-example-script-based-monitor-with-script-based-recovery.aspx

http://blogs.technet.com/b/kevinholman/archive/2009/07/22/101-using-custom-scripts-to-write-events-to-the-opsmgr-event-log-with-momscriptapi-logscriptevent.aspx

http://blogs.technet.com/b/kevinholman/archive/2011/03/02/how-to-collect-performance-data-from-a-script-example-network-adapter-utilization.aspx

http://contoso.se/blog/?p=1367

 

You can make this even more sexy, by creating a composite datasource for the script.  Then create a Monitortype to call the datasource, and then create Monitors to pass the necessary data.  Then you can also create a script based performance collection rule to use the same datasource.

 

 

Ok, that’s pretty cool.  But – what about another way? 

 

SCOM also has a built in WMI based monitor, which will accept WMI queries to which you can map as performance type data with thresholds.  I previously wrote examples of this:

Lets create another new Unit Monitor, WMI Performance Counters, Static Threshold, Simple Threshold:

image

Give it a name, choose Windows Server Operating System as that is the preferred generic target of choice, and choose Availability.

image

 

We will connect to root\cimv2.  The query we will use is:

select filesize from cim_datafile where name='c:\\windows\\system32\\config\\software'

image

 

The Performance Mapper screen might be the most confusing.  We simply just need to make up the data as to how we’d like to see it inserted in SCOM. 

image

I used “FileSize” for the counter, since that is what I am querying from WMI.  Then I need to make sure that Value matches the counter name I used, and in the format of:  $Data/Property[@Name='QueryObject']$

Next I set my threshold value:

image

Configure health according to what you desire:

image

Configure alerting:

image

The subsequent alert:

image

And Health Explorer:

image

 

Now, we can also create a rule – to collect this value, and have a report for which servers have the biggest registry:

Create a new rule, collection, performance based, WMI:

image

Provide a name and target:

image

Provide the same query, and set a frequency that you need for reporting on changes.

image

 

Fill out the performance mapper just as we did above:

image

 

Now – create a performance view to examine the data:

image

image

 

image

 

And even a cool dashboard to show off all of it:

image

 

For additional reading on using WMI counters in SCOM:

http://blogs.technet.com/b/kevinholman/archive/2008/07/02/collecting-and-monitoring-information-from-wmi-as-performance-data.aspx

http://blogs.msdn.com/b/steverac/archive/2009/08/30/monitoring-file-size-with-custom-wmi-performance-counter.aspx

Operations Manager 2012 R2 now supports SQL 2012 SP2

$
0
0

 

I didn’t see any announcements on this – but several customers have been asking. 

From the SQL Requirements for System Center 2012 R2, which looks like it was updated on July 9th:

http://technet.microsoft.com/library/dn281933.aspx

 

System Center 2012 R2 component
SQL Server 2008 R2 SP1 Standard, DatacenterSQL Server 2008 R2 SP2 Standard, DatacenterSQL Server 2012 Enterprise, Standard (64-bit)SQL Server 2012 SP1 Enterprise, Standard (64-bit)SQL Server 2012 SP2
App Controller Server  
Data Protection Manager (DPM) Database Server 
Operations Manager Data Warehouse
Operations Manager Operational Database
Operations Manager Reporting Server
Orchestrator Management Server 
Service Manager Database or Data Warehouse Database 
Service Provider Foundation     
Virtual Machine Manager Database Server 
 

 

UR3 for SCOM 2012 R2 – Step by Step

$
0
0

 

 

KB Article for OpsMgr:  http://support.microsoft.com/kb/2965445

KB Article for all System Center components:  http://support.microsoft.com/kb/2965090

Download catalog site:  http://catalog.update.microsoft.com/v7/site/Search.aspx?q=2965445

 

Key fixes:

 

  • Reliability fix:  A deadlock condition occurs when a database is connected after an outage. You may experience this issue may when one or more HealthServices services in the environment are listed as Unavailable after a database goes offline and then comes back online.  Management servers cannot reconnect to SQL after a SQL outage because of thread exhaustion. 
  • The Desktop console crashes after exception TargetInvocationException occurs when the TilesContainer is updated. You may experience this issue after you leave the console open on a Dashboard view for a long time.
  • The Password expiration monitor is fixed for logged events. To make troubleshooting easier, this fix adds more detail to Event IDs 7019 and 7020 when they occur.
  • The Health service bounces because of high memory usage in the instance MonitoringHost: leak MOMModules!CMOMClusterResource::InitializeInstance. This issue may be seen as high memory usage if you examine monitoringhost.exe in Performance Monitor. Or, the Health service may restart every couple of days , depending on the load on the server.
  • The Health service crashes in Windows HTTP Services (WinHTTP) if the RunAs account is not read correctly.
  • Windows PowerShell stops working with System.Management.Automation.PSSnapInReader.ReadEnginePSSnapIns. You may see this issue as Event ID 22400 together with a description of "Failed to run the Powershell script."
  • The PropertyValue column in the contextual details widget is unreadable in smaller widget sizes because the PropertyName column uses too much space.
  • The update threshold for monitor "Health Service Handle Count Threshold" is reset to 30,000. You can see this issue in the environment, and the Health Service Handle Count Threshold monitor is listed in the critical state.
  • An acknowledgement (ACK) is delayed by write collisions in MS queue when lots of data is sent from 1,000 agents.
  • The execution of the Export-SCOMEffectiveMonitoringConfiguration cmdlet fails with the error "Subquery returned more than 1 value.”
  • The MOMScriptAPI.ReturnItems method can be slow because a process race condition may occur when many items are returned, and the method may take two seconds between items. Scripts may run slowly in the System Center Operations Manager environment.
  • When you are in the console and click Authoring, click Management Pack, click Objects, and then click Attributes to perform a Find operation, the Find operations seems unexpectedly slow. Additionally, the Momcache.mdb file grows very large.
  • A delta synchronization times out on SQL operations with Event ID 29181.
  • Operations Manager grooms out the alert history before an alert is closed.
  • The time-zone settings are not added to a subscription when non-English display languages are set. Additionally, time stamps on alert notifications are inaccurate for the time zone.
  • Web Browser widget requires the protocol (http or https) to be included in the URL.
  • You cannot access MonitoringHost's TemporaryStoragePath within the PowerShell Module.
  • The TopNEntitiesByPerfGet stored procedure may cause an Operations Manager dashboard performance issue. This issue may occur when a dashboard is run together with multiple widgets. Additionally, you may receive the following error message after a time-out occurs:

[Error] :DataProviderCommandMethod.Invoke{dataprovidercommandmethod_cs370}( 000000000371AA78 )
An unknown exception was caught during invocation and will be re-wrapped in a DataAccessException. System.TimeoutException: The operation has timed out.  at Microsoft.EnterpriseManagement.Monitoring.DataProviders.RetryCommandExecutionStrategy.Invoke(IDataProviderCommandMethodInvoker invoker) at Microsoft.EnterpriseManagement.Presentation.DataAccess.DataProviderCommandMethod.Invoke(CoreDataGateway gateWay, DataCommand command)

 
Xplat updates:
  • Slow results are returned when you run the Get-SCXAgent cmdlet or view UNIX/Linux computers in the administration pane for lots of managed UNIX/Linux computers.
    Note To apply this hotfix, you must have version 7.5.1025.0 or later of the UNIX/Linux Process Monitoring, UNIX/Linux Log File Monitoring, and UNIX/Linux Shell Command Template management pack bundles.
  • Accessing the UNIX/Linux computers view in the administration pane can sometimes trigger the following exception message:

    Microsoft.SystemCenter.CrossPlatform.ClientLibrary.Common.SDKAbstraction.ManagedObjectNotFoundException

 

Lets get started.

From reading the KB article – the order of operations is:

  1. Install the update rollup package on the following server infrastructure:
    • Management servers
    • Gateway servers
    • Web console server role computers
    • Operations console role computers
  2. Apply SQL scripts.
  3. Manually import the management packs.
  4. Update Agents

Now, we need to add another step – if we are using Xplat monitoring – need to update the Linux/Unix MP’s and agents.

       5.  Update Unix/Linux MP’s and Agents.

 

 

1.  Management Servers

image

Since there is no RMS anymore, it doesn’t matter which management server I start with.  There is no need to begin with whomever holds the RMSe role.  I simply make sure I only patch one management server at a time to allow for agent failover without overloading any single management server.

I can apply this update manually via the MSP files, or I can use Windows Update.  I have 3 management servers, so I will demonstrate both.  I will do the first management server manually.  This management server holds 3 roles, and each must be patched:  Management Server, Web Console, and Console.

The first thing I do when I download the updates from the catalog, is copy the cab files for my language to a single location:

image

Then extract the contents:

image

Once I have the MSP files, I am ready to start applying the update to each server by role.

***Note:  You MUST log on to each server role as a Local Administrator, SCOM Admin, AND your account must also have System Administrator (SA) role to the database instances that host your OpsMgr databases.

My first server is a management server, and the web console, and has the OpsMgr console installed, so I copy those update files locally, and execute them per the KB, from an elevated command prompt:

image

This launches a quick UI which applies the update.  It will bounce the SCOM services as well.  The update does not provide any feedback that it had success or failure.  You can check the application log for the MsiInstaller events for that:

Log Name:      Application
Source:        MsiInstaller
Date:          8/6/2014 3:00:46 PM
Event ID:      1022
Task Category: None
Level:         Information
Keywords:      Classic
User:          OPSMGR\kevinhol
Computer:      SCOM01.opsmgr.net
Description:
Product: System Center Operations Manager 2012 Server - Update 'System Center 2012 R2 Operations Manager UR3 Update Patch' installed successfully.

You can also spot check a couple DLL files for the file version attribute. 

image

Next up – run the Web Console update:

image

This runs much faster.   A quick file spot check:

image

Lastly – install the console update (make sure your console is closed):

image

A quick file spot check:

image

 

 

Secondary Management Servers:

image

I now move on to my secondary management servers, applying the server update, then the console update. 

On this next management server, I will use the example of Windows Update as opposed to manually installing the MSP files.  I check online, and make sure that I have configured Windows Update to give me updates for additional products:

image29

This shows me two applicable updates for this server:

image

I apply these updates (along with some additional Windows Server Updates I was missing, and reboot each management server, until all management servers are updated.

 

Updating Gateways:

image

I can use Windows Update or manual installation.

image

The update launches a UI and quickly finishes.

Then I will spot check the DLL’s:

image

I can also spot-check the \AgentManagement folder, and make sure my agent update files are dropped here correctly:

image

 

 

2. Apply the SQL Scripts

In the path on your management servers, where you installed/extracted the update, there are two SQL script files: 

%SystemDrive%\Program Files\System Center 2012\Operations Manager\Server\SQL Script for Update Rollups

image

First – let’s run the script to update the OperationsManager database.  Open a SQL management studio query window, connect it to your Operations Manager database, and then open the script file.  Make sure it is pointing to your OperationsManager database, then execute the script.

image

Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.

You will see the following (or similar) output:

image47

or

image

IF YOU GET AN ERROR – STOP!  Do not continue.  Try re-running the script several times until it completes without errors.  In a large environment, you might have to run this several times, or even potentially shut down the services on your management servers, to break their connection to the databases, to get a successful run.

Technical tidbit:  This script has been updated in UR3.  Even if you previously ran this script in UR1 or UR2, you must run this again.

 

image

Next, we have a script in UR3 to run against the warehouse DB.  Do not skip this step under any circumstances.    From:

%SystemDrive%\Program Files\System Center 2012\Operations Manager\Server\SQL Script for Update Rollups

Open a SQL management studio query window, connect it to your OperationsManagerDW database, and then open the script file UR_Datawarehouse.sql.  Make sure it is pointing to your OperationsManagerDW database, then execute the script.

If you see a warning about line endings, choose Yes to continue.

image

Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.

You will see the following (or similar) output:

image

 

 

3. Manually import the management packs?

image

We have 6 updated MP’s to import  (MAYBE!).

image

The TFS MP bundles are only used for specific scenarios, such as DevOps scenarios where you have integrated APM with TFS, etc.  If you are not currently using these MP’s, there is no need to import or update them.  I’d skip this MP import unless you already have these MP’s present in your environment.

The Advisor MP’s are only needed if you are using System Center Advisor services.

However, the Image and Visualization libraries deal with Dashboard updates, and these need to be updated.

I import all of these without issue.

image

 

 

4.  Update Agents

image

Agents should be placed into pending actions by this update (mine worked great) for any agent that was not manually installed (remotely manageable = yes):

 image

If your agents are not placed into pending management – this is generally caused by not running the update from an elevated command prompt, or having manually installed agents which will not be placed into pending

You can approve these – which will result in a success message once complete:

image

 

Soon you should start to see PatchList getting filled in from the Agents By Version view under Operations Manager monitoring folder in the console:

image

 

 

5.  Update Unix/Linux MPs and Agents

image

Next up – I download and extract the updated Linux MP’s for SCOM 2012 SP1 UR3

http://www.microsoft.com/en-us/download/details.aspx?id=29696

7.5.1025.0 is current at this time for SCOM 2012 R2 UR2. 

****Note – take GREAT care when downloading – that you select the correct download for R2.  You must scroll down in the list and select the MSI for 2012 R2:

image

Download the MSI and run it.  It will extract the MP’s to C:\Program Files (x86)\System Center Management Packs\System Center 2012 R2 Management Packs for Unix and Linux\

Update any MP’s you are already using.   These are mine for RHEL, SUSE, and the universal Linux libraries:

image

You will likely observe VERY high CPU utilization of your management servers and database server during and immediately following these MP imports.  Give it plenty of time to complete the process of the import and MPB deployments.

Next up – you would upgrade your agents on the Unix/Linux monitored agents.  You can now do this straight from the console:

image

image

You can input credentials or use existing RunAs accounts if those have enough rights to perform this action.

image

 

 

6.  Update the remaining deployed consoles

image

This is an important step.  I have consoles deployed around my infrastructure – on my Orchestrator server, SCVMM server, on my personal workstation, on all the other SCOM admins on my team, on a Terminal Server we use as a tools machine, etc.  These should all get the UR3 update.

 

 

 

Review:

Now at this point, we would check the OpsMgr event logs on our management servers, check for any new or strange alerts coming in, and ensure that there are no issues after the update.

image

 

 

Known issues:

See the existing list of known issues documented in the KB article.

1.  Many people are reporting that the SQL script is failing to complete when executed.  You should attempt to run this multiple times until it completes without error.  You might need to stop the Exchange correlation engine, stop the services on the management servers, or bounce the SQL server services in order to get a successful completion in a busy management group.  The errors reported appear as below:

------------------------------------------------------
(1 row(s) affected)
(1 row(s) affected)
Msg 1205, Level 13, State 56, Line 1
Transaction (Process ID 152) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Msg 3727, Level 16, State 0, Line 1
Could not drop constraint. See previous errors.
--------------------------------------------------------


Silect MPAuthor Service Pack 2 released

FAQ: How do I get a simple list of specific computers based on simple criteria?

$
0
0

 

I probably get about 5 to 10 questions a day on SCOM, from all kinds of sources.  I am going to start a FAQ series of blog posts based on some of these questions just so I know there are examples of these solutions documented.

 

Q:  I want a list of all my Windows 2008 Computers in SCOM.  How can I achieve this easily?

A:  In My Workspace, create a state view.  Give it a name, and for the Object class, choose Windows Server 2008 Computer:

image

Right click the view – and personalize it to only show the data you want to see.  Then perform a CTRL+A and a CTROL+C to copy the data to your clipboard. 

image

 

Paste this in Excel.

 

image

 

Quick.  Dirty.  No PowerShell required.

 

Oh, did you mention dirty PowerShell?  Here is a one-liner to accomplish the same result, and output to a CSV:


get-scomclass -name Microsoft.Windows.Server.2008.Computer | Get-SCOMClassInstance | select Displayname, ``[Microsoft.Windows.Computer`].NetBIOSCOmputername, ``[Microsoft.Windows.Computer`].NetBIOSDomainName, ``[Microsoft.Windows.Computer`].OrganizationalUnit | Export-csv c:\bin\output.csv

get-scomclass -name Microsoft.Windows.Server.2008.Computer | Get-SCOMClassInstance | select Displayname, ``[Microsoft.Windows.Computer`].NetBIOSCOmputername, ``[Microsoft.Windows.Computer`].NetBIOSDomainName, ``[Microsoft.Windows.Computer`].OrganizationalUnit | Export-csv c:\bin\output.csv

FAQ: How can I tell which servers are physical or virtual in SCOM?

$
0
0

 

This comes up all the time in conversations I have. 

Q:  How can I tell which monitored servers are a VM or a Physical Machine, including VMWare virtual machines?

A:  We need to disable a discovery, and add a new customized one in a management pack to accomplish this:

 

There is a property of the Windows Computer class, called “Virtual Machine”.  However, out of the box, we only discover this value and populate it IF the system is running on Hyper-V.  You can see my screenshot below easily which are definitely VM’s (true) and which are not populated, and likely physical:

 

image

 

This is populated by a discovery, in the Microsoft.SystemCenter.Internal.mp, named “Discover if Windows Computer is a Virtual Machine” with the ID of Microsoft.SystemCenter.DiscoverIsVirtualMachineTrue.    Our built in MP simply has a discovery, that populates this value to true, IF the following WMI query returns valid results: 

SELECT Name FROM Win32_BaseBoard WHERE Manufacturer = "Microsoft Corporation"

This is great if you are running Hyper-V, but has two shortcomings:

1.  It does not populate the value if the monitored OS is running on some other HyperVisor, such as VMware.

2.  It does not discover a “false” value, which is needed, since any machine that never runs the query, or has busted WMI, will show up as “NULL”, and this isn't accurate, it makes them all look like physical.

 

 

Ages ago – Pete Zerger posted a nice fix for this, in the form of a very popular community MP:

http://www.systemcentercentral.com/pack-catalog/virtual-machine-discovery-mp-for-operations-manager-2007/

This MP does three things:

1.  Contains a discovery that sets the value to TRUE if we detect the VM runs on VMware via WMI query:  SELECT * FROM Win32_ComputerSystem WHERE Manufacturer = "VMware, Inc."

2.  Contains another discovery that sets the value to FALSE if SELECT * FROM Win32_BaseBoard WHERE Manufacturer <> "Microsoft Corporation" OR Manufacturer <> "VMware, Inc."

3.  Sets an override on the built in discovery to disabled.

 

The above solution is simple, effective, but not perfect.  There are many comments there about things to change to work better.  The second discovery – which has an OR statement causes EVERYTHING to get set to false, in my environment.  This should be set to an AND statement.

Additionally, the “true” value and “false” values come from different places in WMI, which probably isn't the best way to ensure we don’t get any flip/flop.

Lastly, the MP doesn’t work well out of the box for Hybrid environments either, where you have VM’s from Hyper-V and VMware, because the “true” is only being set for VMware systems.

 

A better solution would be something like this:

1.  Discovery that sets the value to “true” if we detect the Manufacturer is “Microsoft Corporation” OR “VMware, Inc.”

SELECT * FROM Win32_ComputerSystem WHERE Manufacturer = "VMware, Inc." OR Manufacturer = "Microsoft Corporation"

2.  Discovery that sets the value to “false” if we detect the Manufacturer is NOT “Microsoft Corporation” AND ALSO NOT “VMware, Inc.”

SELECT * FROM Win32_ComputerSystem WHERE Manufacturer <> "VMware, Inc." AND Manufacturer <> "Microsoft Corporation"

3.  Override to disable the built-in discovery for Virtual Machine attribute.

 

Here is an example of the tweaks:

 

<Discoveries><Discovery ID="Virtual.Machine.Discovery.Custom.IsVirtualMachineVMware" Enabled="true" Target="Windows!Microsoft.Windows.Computer" ConfirmDelivery="true" Remotable="true" Priority="Normal"><Category>Custom</Category><DiscoveryTypes><DiscoveryClass TypeID="Windows!Microsoft.Windows.Computer"><Property TypeID="Windows!Microsoft.Windows.Computer" PropertyID="IsVirtualMachine"/></DiscoveryClass></DiscoveryTypes><DataSource ID="Discover.IsVirtualMachine.VMware" TypeID="Windows!Microsoft.Windows.WmiProviderWithClassSnapshotDataMapper"><NameSpace>\\$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$\ROOT\CIMV2</NameSpace><Query><![CDATA[SELECT * FROM Win32_ComputerSystem WHERE Manufacturer ="VMware, Inc." OR Manufacturer ="Microsoft Corporation"]]></Query><Frequency>86400</Frequency><ClassId>$MPElement[Name="Windows!Microsoft.Windows.Computer"]$</ClassId><InstanceSettings><Settings><Setting><Name>$MPElement[Name="Windows!Microsoft.Windows.Computer"]/PrincipalName$</Name><Value>$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</Value></Setting><Setting><Name>$MPElement[Name="Windows!Microsoft.Windows.Computer"]/IsVirtualMachine$</Name><Value>true</Value></Setting></Settings></InstanceSettings></DataSource></Discovery><Discovery ID="Virtual.Machine.Discovery.Custom.DiscoverIsVirtualMachineFalse" Enabled="true" Target="Windows!Microsoft.Windows.Computer" ConfirmDelivery="true" Remotable="true" Priority="Normal"><Category>Custom</Category><DiscoveryTypes><DiscoveryClass TypeID="Windows!Microsoft.Windows.Computer"><Property TypeID="Windows!Microsoft.Windows.Computer" PropertyID="IsVirtualMachine"/></DiscoveryClass></DiscoveryTypes><DataSource ID="Virtual.Machine.DiscoverIsVirtulMachineFalse" TypeID="Windows!Microsoft.Windows.WmiProviderWithClassSnapshotDataMapper"><NameSpace>\\$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$\ROOT\CIMV2</NameSpace><Query><![CDATA[SELECT * FROM Win32_ComputerSystem WHERE Manufacturer <>"VMware, Inc." AND Manufacturer <>"Microsoft Corporation"]]></Query><Frequency>86400</Frequency><ClassId>$MPElement[Name="Windows!Microsoft.Windows.Computer"]$</ClassId><InstanceSettings><Settings><Setting><Name>$MPElement[Name="Windows!Microsoft.Windows.Computer"]/PrincipalName$</Name><Value>$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</Value></Setting><Setting><Name>$MPElement[Name="Windows!Microsoft.Windows.Computer"]/IsVirtualMachine$</Name><Value>false</Value></Setting></Settings></InstanceSettings></DataSource></Discovery>

I am attaching the modified MP to this blog post as well.  If you have different WMI values in your own environment, you can easily tweak these two discoveries to make it work well for you.  I’d recommend you seal this MP with your own key, because you will likely use this value in computer groups in other management packs.

APM’s Application Advisor link tries to connect to retired Web Console server?

$
0
0

 

When you are in AppDiagnostics, there is a link to the Application Advisor reporting page:

image

 

This URL is set in the OperationsManager database, when the first web console server is installed.  If you retire that web console server, and you want to update this URL, it is stored in the apm.CONFIG table in the OpsDB:

image

 

You can edit this table – and change the URL to your current web console server to fix the broken link.

Scheduled Maintenance mode tool updated to V4

$
0
0

 

There is an updated version of the popular Scheduled Maintenance Mode tool available at:

http://www.scom2k7.com/scom-2012-maintenance-mode-scheduler-4/

Looks like they have added some new features:

Here are the new features in V4:

  • Multi-select Computers  This has been the most requested feature as end users often want to schedule multiple computers at a time without have to create groups.
  • Multi-select Computers in Integrated dashboard  Now you can select multiple computers in the dashboards without getting an error
  • New Search in Computers and Groups  Now instead of scrolling up in down the list you can just start to type in the name of the group or computer and results will be filtered.
  • One Click MM now accepts parameters   Some environments One Click MM was not working as the client security was high or there were DNS issues.  Now you can just add the computer as a parameter.
  • New Configuration backup tool for easier upgrades  Now you can back your configuration and upgrade to the latest version of SCOM 2012 Maintenance Mode Scheduler without having to reconfigure all of your settings.
Viewing all 349 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>