# Friday, 10 January 2014

At one of my customers, the deployment of some new test-servers uncovered a nice chain of issues. Some Windows Server 2012 R2 machines were deployed but the application they were intended for, could not connect to some of the SQL Servers (on a cluster). However, when trying to reproduce the problem for a specific case made of this problem, the application did work as expected. Then after Windows updates were applied to the cluster, the case was reopened again and now reproduction succeeded (as in the application indeed failed to connect).

Troubleshooting at that stage narrowed it down to the fact that Windows 2012 R2 servers could not connect due to authentication issues, whereas Windows Server 2008 R2 machines with the same application had no issues. Must be Windows 2012 R2, or a policy setting that is different. It turned out to be the latter of the two and it was concluded that the policy setting “Network Security: Restrict NTLM: Outgoing NTLM traffic to remote servers” to Deny all, caused the behavior. Bottom line, Kerberos authentication didn’t work, NTLM did.

Now with Kerberos authentication to SQL Server being an issue, the case followed its way to me. Well, Kerberos authentication should work. All clustered SQL Servers were configured and tested for Kerberos. But a little more about this particular cluster;

  • 4 Nodes (Windows Server 2008 R2). Originally a 2-node cluster, extended about a year ago.
  • 8 Virtual SQL Servers (2008 R2). Install dates of those virtual servers varying between 3 years to 1 month ago.
  • Each virtual server is installed using a domain account for that particular server.
  • Install is pretty much default, so the installer determines the dynamic tcp-port.
  • After installation, the required SPNs are created.
  • The state for this cluster is TEST/ACCEPTANCE.

This particular case included two clustered SQL Servers having the problem;

  • CL2_SQL_INS2\INS2
  • CL2_SQL_INS5\INS5

To verify the situation, I connected to the node for INS2 and confirmed there were SPNs registered for the service accounts for INS2 and INS5 (SetSPN –L “domain_name\svc_acnt_name”) which returns results like

C:\Users\admin_075>SetSPN -L "organization\srv_sql_ins2"
Registered ServicePrincipalNames for CN=SRV_SQL_INS2,OU=CL2_SQL,OU=SQL_Clusters,OU=Servers,DC=organization,DC=local:
        MSSQLSvc/srv_sql_ins2:12345
        MSSQLSvc/srv_sql_ins2.organization.local:12345

In other words, it looked just like it should, the SPN for INS2 was 12345 and for INS5 was 54321. Next I opened the SQL Server Configuration Manager and retrieved the TCP Port for IPAll for both INS2 and INS5. INS5 was at TCP Dynamic Ports = 54321 (as expected), but INS2 was at TCP Port = 1433! Now this explains why INS2 isn’t working on Kerberos, it’s not registered for 1433. Of course there now is the question why INS2 is forced to listen on 1433 instead of 54321, but at that stage I was more concerned with why INS5 wasn’t authenticating on Kerberos despite the port configuration. Since the Windows Server I currently was looking at, was the active host for INS2 but not for INS5, I connected to the host for INS5 and checked the TCP Port settings. There is was in for yet another surprise: INS2 = tcp-port 12345 and INS5 = tcp-port 1433!

This is the situation that is not supposed to be!!! The active hosts for the respective virtual servers listen at 1433, but the passive hosts are correct (well, turned out INS2 had 2 hosts configured for 1433). I learned these were probably remnants from earlier troubleshooting activities. However I’m still surprised as in a cluster, when one edits the tcp-port number on the active host, it should propagate with a failover. So I edited the port on the active host for INS2 to be 12345 instead of 1433 and brought the resource offline and online again, port remained at 12345 (= OK). Next I did a failover to the other host that still had 1433 listed, expected that the port would become 12345. It didn’t, it remained at 1433. What is broken here (or has never worked correctly at all)?

The broken, or better said, missing part turned out to be checkpoints. Not that this is the first case of missing checkpoints SQL Server as it seems, the way to fix them is hidden inside the kb-article How to change SQL Server parameters in a clustered environment when SQL Server is not online. The title seems to suggest that this only applies when SQL is not online, but the described method 2 has no restriction on SQL Server being online or not. After checking with cluster . resource /checkpoints it turned out that none of the SQL Server Services had a checkpoint. And since the path HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.INS2\MSSQLServer\SuperSocketNetLib\Tcp\IPAll wasn’t covered by the another checkpoint, it is obvious why tcp-port numbers were inconsistent throughout the nodes. After adding a checkpoint for INS2 (from the host where INS2 was running on 1433) and editing the tcp-port number to become a static (not dynamic as on the other host) 12345. Next I conducted a failover to the previous host and verified that it too now had a static IPAll for port 12345. Issue of the missing checkpoints solved. The other 7 instances went the same way.

CAUTION: At step 4 for Method 2, for SQL Server 2008  the syntax is listed as cluster . resource "SQL Server (<InstanceName>)" /addcheckpoints:"HKEY_LOCAL_MACHINE\Software\Microsoft\Microsoft SQL Server\MSSQL.x\MSSQLSERVER". Be sure to leave out the HKEY_LOCAL_MACHINE\ part as this results in invalid syntax and take a good look how to replace the MSSQL.x as this would apply to SQL Server 2005, whereas MSSQL10.<InstanceName> would apply to SQL 2008 and MSSQL10_50.<InstanceName> to SQL Server 2008 R2. Hopefully this will be updated based on the feedback I submitted.

For more background on the checkpoints, I found the Windows 2008 (R2) documentation on clustering is rather limited to the “What’s new” stuff and to be honest, even the Windows 2003 documentation is not as clear on the subject as Exchange 2003 on Windows Cluster Architecture (sections Checkpoint Manager and Database Manager under Cluster Service Components).

Friday, 10 January 2014 22:51:14 (W. Europe Standard Time, UTC+01:00)
# Sunday, 25 August 2013

After replacing a Hyper-V (windows Server 2012) host, DHCP started to have issues. Unpredictable results so to say.

The old host had a teamed dual-port Intel PT1000. At the network side, a Link Aggregation Group was defined on the Cisco SG300 switch.The new host is equipped with an on-board Broadcom NetXtreme controller and an additional dual-port NetXtreme II adapter. The onboard controller was configured as IP-interface for the host, for the dual-port adapter I installed the Broadcom Advanced Control Suite 4 (BACS4, Version 15.6.31.0) and an additional LAG was defined on the SG300. Well, it wasn’t a clear victory! When pinging between hosts, packets between physical and virtual hosts seemed to be dropped, DHCP wasn’t renewing and RDP-sessions to virtual systems were having response issues (can happen when packets are dropped). The main difference to me was the Hyper-V host, where in the old situation the Intel PT1000 was shared between the host OS and the guests (as external virtual switch). In the new situation the dual port Broadcom was dedicated for the virtual switch.I must admit knowing little more than the basics on Hyper-V thus one Googles in all directions. And thus I stumbled upon SR-IOV. It’s late, SR-IOV made sense to my new situation thus I gave it a try by replacing the old virtual switch with a new SR-IOV based virtual switch. Hurrah, everything seemed to work now; ping, stable RDP sessions and DHCP renewals.

However after a couple of days, I wanted to use my Android tablet, but it did not receive an IP-address. Pressed for time, I switched to my laptop. A few days later, I turned on a Windows 8 system that had been unused for the last couple of weeks and it too did not get an IP-address… so I still did have a DHCP issue! I blamed the switch configuration for a while, as well as the firewalls on the Windows systems. Wireshark showed the DHCP Discover leave at Windows 8 hosts and never arrive at the virtual DHCP server. But other DHCP packets (renewals) were observed, both at the unwilling client and at the server. When I put a static IP on the Windows 8 host, it still couldn’t ping the DHCP Server (or any other virtual system). The MAC of the Windows 8 host did however show up in the MAC table of the switch (but so did the MAC addresses of the virtual servers). I must have reconfigured the switch 13 times, disabled the firewalls and still no result. Oh my… do I now have one of those nasty interoperability issues between switches of different vendors (Cisco and Microsoft)… I needed to learn more about virtual switches in Hyper-V and luckily one of the first links I hit was Hyper-V and VLAN’s by Aidan Finn. I especially want to quote him on:

“I must warn anyone reading this that I’ve worked with a Cisco CCIE while working on Hyper-V and previously with another senior Cisco guy while working on VMware ESX and neither of them could really get their heads around this stuff.  Is it too complicated for them?  Hardly.  I think the problem was that it was too simple!  Seriously!”

That was comforting, it must therefore be that the solution should be simple. At the bottom of Aidan’s post was yet another link that seemed useful Windows Server 2012 NIC Teaming Part 6 – NIC Teaming In The Virtual Machine, and it was! The final part of a series useful articles, which led me to the Microsoft un-support statement of third party NIC teaming: Microsoft Support Policy for NIC Teaming with Hyper-V

Bottom line: the Broadcom Advanced Control Suite could be the source of my problem. I uninstalled it from the Hyper-V host and then configured NIC Teaming from Hyper-V. And the result of ipconfig /renew on the Windows 8 computer: Discover, Offer, Request, ACK. Just the way it should be.

Sunday, 25 August 2013 22:08:07 (W. Europe Daylight Time, UTC+02:00)
# Friday, 03 February 2012

When performing maintenance to a cluster, as in adding an SQL Server virtual to a node or removing it from a node, setup may fail for a number of reasons. And once something went wrong, things keep getting wrong, even if you verified the procedure and you corrected the cause of the error.

Well, the procedure didn't tell you everything. First of all, you should only be logged in to the node where you are running the setup. Logoff from all other nodes. The reason is that SQL Server setup will create a Windows Scheduled task on the other nodes of the cluster for you, under your account. Having an interactive session (you being logged in) and logging in as a batch job (the setup task) with the same account, may lead to unexpected events (and thus setup failing). Obviously Task Scheduler should be running to ;-). To be honest, Microsoft does tell you about the scheduler and the logging off in a KB article (let's be nice and add some community content).

Where it does get mean in the KB article, is when setup keeps failing (and it most likely will); restart all remote nodes!!! The most likely reason for the setup failing this time, is a running setup.exe from a previous failed attempt. This is what may cause the access denied for the scheduled task (or check that your account is local admin on every node and has the login as a service permission). Here is how to avoid having to reboot;

  • Exit the setup on the local node.
  • Check the scheduled tasks on the remote nodes. Some are running: stop them, then delete them. Some have failed, delete them
  • Next, check the processes on all nodes, there may be one or two setup.exe processes from SQL Setup active on the nodes, kill those processes.
  • Don’t forget to log off from the remote nodes.
Friday, 03 February 2012 12:38:29 (W. Europe Standard Time, UTC+01:00)
# Tuesday, 25 October 2011

Character based data types, especially the var types varchar and nvarchar allow to declare a really big amount and just use a tiny bit and it doesn’t even hurt in SQL Server. However, it may in the application. Today I was working with a customer for the implementation of some third party software that would be doing analysis on the data. It would load the really over-dimensioned (n)varchar types in something like a (n)char.... you get the picture, huge amounts of wasted space. So from the databases side, a bit of restructuring was desirable. To see how much we can actually save, I created query to report on the character usage;

CREATE TABLE ##char_columns (
[SCHEMA]
VARCHAR(128) NULL,
[TABLE]
VARCHAR(128) NULL,

[COLUMN]
VARCHAR(128) NULL,

[data_type]
VARCHAR(128) NULL,

[max_length]
INT NULL,

[is_replicated]
BIT NULL,

[actual_max_length]
INT NULL,

[row_count]
BIGINT NULL,

[not_null_count]
BIGINT NULL)

INSERT
##char_columns

SELECT
s.name AS 'schema'
,
t
.name AS
'table'
,
c
.name AS
'column'
,
y
.name AS
'data_type'
,
c
.max_length
,
c
.is_replicated
,
NULL
,
NULL
,
NULL

FROM
sys.tables t

INNER
JOIN sys.schemas s ON t.schema_id = s.schema_id

INNER
JOIN sys.columns c ON t.object_id = c.object_id

INNER
JOIN sys.types y ON c.system_type_id = y.system_type_id

WHERE
t.type = 'U'
AND y.name IN ('varchar','char','nchar','nvarchar')
WHILE
(SELECT COUNT(*) FROM ##char_columns WHERE actual_max_length IS NULL) > 0

BEGIN
TRY

DECLARE
@sql NVARCHAR(4000)
, @actual_max_length INT , @column VARCHAR(128) , @schema VARCHAR(128) , @table VARCHAR(128)
SELECT
TOP 1 @schema = [SCHEMA], @table = [TABLE], @column = [COLUMN]
FROM ##char_columns

WHERE
actual_max_length IS NULL

-- Set the actual_max_length to -1 if a column has only null values or no rows.

SELECT @sql = 'UPDATE ##char_columns
SET [actual_max_length] = (SELECT ISNULL(MAX(LEN([' + @column + '])),-1) FROM [' + @schema + '].[' + @table + '])
,
      [row_count] = (SELECT COUNT_BIG(*) FROM [' + @schema + '].[' + @table + '])
,
      [not_null_count] = (SELECT COUNT_BIG([' + @column + ']) FROM [' + @schema + '].[' + @table + ']) 
WHERE [schema] = ''' + @schema + '''
 
      AND [table] = ''' + @table + '''
 
      AND [column] = ''' + @column + ''''
EXEC
(@sql)

END
TRY

BEGIN
CATCH

DECLARE
@err_num NVARCHAR(20), @err_msg NVARCHAR(2048), @full_msg NVARCHAR(2048)

SELECT
@err_num = ERROR_NUMBER(), @err_msg = ERROR_MESSAGE()
SELECT @full_msg = 'ERROR: ' + @err_num + ' DESCR: ' + @err_msg + ' QUERY: ' + @sql

RAISERROR
(@full_msg,16,0)

BREAK

END
CATCH

SELECT
* FROM ##char_columns

DROP
TABLE ##char_columns

Be aware that varchar(max) and nvarchar(max) will give a max_length of –1 and that actual_max_length is –1 if no rows exist in the table or the column only contains NULLs (refer to the row_count and not_null_count to determine the situation).

Tuesday, 25 October 2011 22:37:16 (W. Europe Daylight Time, UTC+02:00)
# Friday, 15 April 2011

In a clustered environment, it may not always be obvious what node SQL Server has been running on. Most logging is done based on the network name of the resource group, or the instance name of the virtual server. Still it is logged once in the Windows Application Event Log and the SQL Server Error Log each, though the “memory” of those sources is likely limited by configuration; size and overwrite settings of the application log and number of error logs (maximum 99) for the SQL Server Error log.

In the Windows Application Event Log, the EventID 17664 is of interest (verify the source is an SQL Server instance). Event Log filtering is relative straight forward and turns up an event like this.

EventMessageNodeSQLServer

Figure Event Properties; 1 = Instance name, 2 = Network name of the resource group, 3 = Computer name of the cluster node, 4 = Binary code containing network name\instance name

In case of the SQL Server Error Logs, the thing of interest is the message;

The NETBIOS name of the local node that is running the server is 'computername'. This is an informational message only; no user action is required.

Parsing the files might be the easiest from SQL Server with the xp_readerrorlog. Unfortunately error handling surrounding eXtended stored Procedures is limited. Using the script below, the last result set shows the node where a clustered SQL Server was started on.


SET
NOCOUNT ON
DECLARE
@iteration
int
DECLARE
@log_events table ([LogDate] datetime, [ProcessInfo] nvarchar(20), [Text] nvarchar(max), [Iteration] int
)
SELECT @iteration = 0
-- Set to current error log
WHILE @iteration <
100
    BEGIN
    BEGIN
TRY
        -- EXECUTE the XP without INSERT to trip the error.
        -- XP error is handled poorly, but it is even worse
        -- when it occurs on the insert from executing an XP.
        EXEC xp_readerrorlog @p1 = @iteration, @p2 = 1, @p3 = 'NETBIOS name'
;
        INSERT @log_events ([LogDate], [ProcessInfo], [Text]
)
            EXEC xp_readerrorlog @p1 = @iteration, @p2 = 1, @p3 = 'NETBIOS name'
;
        UPDATE
@log_events
            SET [Iteration] =
@iteration
       
    WHERE [Iteration] IS
NULL;
        SELECT @iteration = @iteration + 1
;
   
END
TRY
    BEGIN
CATCH
        SELECT
@iteration = @iteration + 1
;
    END
CATCH
    END
SELECT
[Iteration],
[LogDate]
   
    , CAST(SUBSTRING(

           
SUBSTRING([Text]

            ,
LEN('The NETBIOS name of the local node that is running the server is ''') + 1
           
,16
-- Max length NETBIOS name is 16 char (15 actually).
   
    ), 1, CHARINDEX('''', SUBSTRING
(
            [Text]
           
,LEN('The NETBIOS name of the local node that is running the server is ''') +
1
           
,16
-- Max length NETBIOS name is 16 char (15 actually).
   
    )
    )
- 1) AS varchar(16)) AS
'NodeName'
FROM
@log_events
WHERE [Text] LIKE
'The NETBIOS name of the local node that is running the server is %'

Friday, 15 April 2011 12:56:11 (W. Europe Daylight Time, UTC+02:00)
# Friday, 03 December 2010

Service Pack 2 for SQL Server 2008 annoyed me on one of my machines (development system, SQL Server doesn’t autostart). It came via Windows Update started with offering the service pack without .NET Framework 4 installed… then even after .NET Framework was installed, it kept failing. So f..k Windows Update and download SQL2k8SP2 and install it.

A couple of days later I needed SQL Server on that machine, I did the net start mssqlserver trick and tried to connect using Management Studio directly after that. So management studio threw me the

Login failed for user 'domain\user'. Reason: Server is in script upgrade mode. Only administrator can connect at this time. (Microsoft SQL Server, Error: 18401)

The very “well” documented error about the fact that SQL Server is running some scripts and rather has no-one mesh around (unless you’re on a DAC). The strange thing is that SQL Server seemed to have stopped after; that, something that isn’t supposed to happen. I restarted the service and tried again… same result, so it’s Google-time; best hit: Weird SQL Server Error: Unable to Connect to Server. Reason: Server is in script upgrade mode. With that being the best answer, the behavior I witnessed isn’t supposed to happen and thus I turned to the SQL Error Logs to find some info;

2010-12-03 21:02:04.28 spid7s      Performing replication job security meta-data upgrades...
2010-12-03 21:02:07.34 spid7s      Error: 824, Severity: 24, State: 2.
2010-12-03 21:02:07.34 spid7s      SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:2064; actual 0:0). It occurred during a read of page (1:2064) in database ID 21 at offset 0x00000001020000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\some_database.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
2010-12-03 21:02:07.34 spid7s      Error: 912, Severity: 21, State: 2.
2010-12-03 21:02:07.34 spid7s      Script level upgrade for database 'master' failed because upgrade step 'sqlagent100_msdb_upgrade.sql' encountered error 824, state 2, severity 25. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
2010-12-03 21:02:07.35 spid7s      Error: 3417, Severity: 21, State: 3.
2010-12-03 21:02:07.35 spid7s      Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.
2010-12-03 21:02:07.86 spid7s      SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.
2010-12-03 21:02:07.99 Server      The SQL Server Network Interface library successfully deregistered the Service Principal Name (SPN) [ MSSQLSvc/host.domain.local ] for the SQL Server service.

I had been working on a corrupted database called some_database (name changed for obvious reasons) for a customer a few weeks ago, but hadn’t removed that database. After removing the offending database files, SQL Server wasn’t tripped by the exception while opening the database.

To me, this does smell a like a bug (though I would not recommend running corrupt databases on a production system), so I filed it on connect; Finishing service pack installation prohibited by corrupt database.

Friday, 03 December 2010 22:04:20 (W. Europe Standard Time, UTC+01:00)
# Tuesday, 30 November 2010

There is nothing fancy about this post… just a little frustration. I wanted to compress files after they were processed with SSIS 2005. And with compression, I mean NTFS-compression, not creating a .zip or .gz (because that is what I mostly found through the search engines).

My first path of research was to do it in a Script Task, but it turns out, invoking the compression attribute of files on NTFS requires C# and can’t be done with VB.NET. So it’s a no go in SSIS 2005.

During the search, somewhere the compact.exe program in Windows was suggested as an alternative. Very useful from an SSIS perspective, it means the Execute Process Task is required. To make it all happen, three things are needed;

  • A Foreach Loop container
  • A string variable (either in the scope of the Foreach Loop Container or in the Package-scope), named filename in this example.
  • An Execute Process Task in the Foreach Loop container

clip_image002

Configure (Collection Page) the Foreach Loop Container as Foreach File enumerator, point to the folder where the files reside and retrieve the file names as Fully qualified. Depending your requirements, you may also specify an additional mask and include subfolders.

clip_image004

Next visit the Variable Mappings tab and point to the previously created variable listed as User::filename.

Next is the Execute Process Task. But first a little on compact.exe, to compress a file, compact.exe should be called with the /C flag and the name of the file to be compressed. Something like C:\WINOWS\System32\compact.exe /C "D:\ftp\archive\dump_001208.csv"

In terms of what the Execute Process Task expects, the Executable should be set to compact.exe. The arguments should be /C "D:\ftp\archive\dump_001208.csv" and since the filename is supplied by the variable, an expression should be used. On the Expressions tab, create an expression for Arguments. With the expression designer, create the following expression

"/C \"" + @[User::filename] + "\""

The important part to recognize is that expressions are largely built on C-style syntax (download the PragmaticWorks white paper  SSIS Accelerator Series: Working with SSIS Expressions to learn more on SSIS expressions). The arguments expression should return a string, hence the opening and closing double-quotes. The filename, which is supplied by the @[User::filename] variable (drag and drop the variable in the expression designer) should be enclosed within double-quotes too. For SSIS to recognize that these double-quotes are part of the string it should return, the double-quotes need to be escaped (using the backslash). To confirm the expression is correct, use the Evaluate Expression button. The evaluated value should return /C "" (slash + C + space + double-quote + double-quote), once the filename variable is filled, it will appear between the two double-quotes.

clip_image006

Enclosing the filename in double quotes prevents files with spaces in the name from causing errors. There could be other errors though, like a file being locked. With current configuration, the package would stop and throw an error on the first occurrence of this condition. To prevent the package from failing in that event, set the FailTaskIfReturnCodeIsNotSuccessValue property of the Execute Process Task to false.

Tuesday, 30 November 2010 21:49:56 (W. Europe Standard Time, UTC+01:00)
# Tuesday, 19 October 2010

Some time ago I was working with performance data collected by a customer to look for I/O-patterns and make recommendations for the I/O-subsystem for a new SQL Server 2008 machine that would replace a couple of SQL Server 2000 servers. Unfortunately I ran into issues with the values for Avg. Disk sec/Read and Avg. Disk sec/Write, which were ridiculous with average values around 100-million seconds per read or write. Those values occurred on two virtualized Windows 2000 Servers.


Today, I was working for another customer. Physical Windows Server 2003 boxes (both nodes in the same cluster). I imported the .blg files into a database and was looking at very high values (up to 100 million) for Avg. Disk sec/Read and Avg. Disk sec/Write. I consulted the staff-DBA, who referred me to one off the Windows administrators. So the Windows administrator logged in, opened performance monitor and… got reasonable values for those counters (hovering at about 0.006, I’d even say those values are good). To prove I wasn’t making a fool of myself, we walked over to my desk and I showed the results I had… same servers, same counters, very different values. Next we tried using the account of the staff-DBA, which showed the “normal” values.


Simply eliminating, the Win-admin said the only difference is the account, so he went and checked whether the account I used had full administrator permissions. It had. Next he turned his attention to the user profile for the account I used (which happened to be quite big). The user profile belonging to the account I used was deleted and I logged in again, this time to be shown the correct values. RESOLVED!!!


Unfortunately the collected data still was useless. I started the counter log again, let it run for about 10 minutes and examined it… sky high values!!! In disbelief I switched to current activity, to be shown normal values!!! Next I created a new counter log definition, let it run for a couple of minutes and this time I got normal values.


Bottom line, to resolve insanely high values for Avg. Disk sec/Read and Avg. Disk sec/Write (it might apply for other counters of the type PERF_AVERAGE_TIMER too), use a fresh user profile and a fresh counter log definition.


Thanks Ruud.

Tuesday, 19 October 2010 21:44:49 (W. Europe Daylight Time, UTC+02:00)
# Thursday, 19 August 2010

I am working on an MS Access database that needs to be integrated in a CRM system. Since the backend of the CRM is SQL Server (why else would I be involved), I migrated the Access database to SQL Server using the SQL Server Migration Assistant for Access v4.2 (SSMA for Access). Next I turned my attention to translating attributes between the systems and unfortunately, a free text notes field in the Access database is used for a lot of things that need to be translated to attributes. Some things I know how to translate, but some things have to be determined by the administrators for the CRM System and Access application. So I built a query in SQL on the tables formerly stored in the Access database and used Excel (with MS Query) to create a report for the administrators, so they can sort out how customer statuses should be translated.

Extremely simplified, the query looked like this;

SELECT tlbCustomer.CUST_CODE
  , tlbCustomer.
PAY_REF
  , COALESCE(UPPER(REPLACE(tblNotes.NOTE_TXT,'.','')),'') AS
NOTE_TXT
FROM tlbCustomer LEFT OUTER JOIN
tblNotes
      ON tlbCustomer.CUST_ID = tblNotes.
CUST_ID
GROUP BY tlbCustomer.
CUST_CODE
  , tlbCustomer.
PAY_REF
  , UPPER(REPLACE(tblNotes.NOTE_TXT,'.',''
))
ORDER BY COUNT(*)
DESC

The query worked great on SQL, but did not return the text (NOTE_TXT) to Excel.

To cut the long story short, the original tblNotes.NOTE_TXT field in the Access database was of type MEMO. SSMA 2008 for Access converted this to nvarchar(max) and somehow MS Query (in Office 2007) does not return, nor throw an error on, large data types (varchar(max), nvarchar(max)). A quick test showed that MAX was not really needed;

SELECT MAX(LEN(tblNotes.NOTE_TXT)) FROM tblNotes

So modify the query to return NOTE_TXT as nvarchar(256) does the trick;

SELECT tlbCustomer.CUST_CODE
  , tlbCustomer.
PAY_REF
  , CAST
(
      COALESCE(UPPER(REPLACE(tblNotes.NOTE_TXT,'.','')),'')

    AS nvarchar(256)) AS
NOTE_TXT
FROM tlbCustomer LEFT OUTER JOIN
tblNotes
      ON tlbCustomer.CUST_ID = tblNotes.
CUST_ID
GROUP BY tlbCustomer.
CUST_CODE
  , tlbCustomer.
PAY_REF
  , UPPER(REPLACE(tblNotes.NOTE_TXT,'.',''
))
ORDER BY COUNT(*)
DESC
Thursday, 19 August 2010 12:26:16 (W. Europe Daylight Time, UTC+02:00)
# Saturday, 24 July 2010

Sometimes one is confronted with forgetting an option... It all started with a database that had a single data file of 1.8 GB and a transaction log more than 6 times the size of that. Usually that would be a database in full recovery model and no transaction log backups. This database was in simple recovery model though and according to the Disk Usage report, 99.99% of the transaction log was free. So a DBCC SHRINKFILE should solve the problem, however DBCC SHRINKFILE solved nothing, the log remained at about 11 GB.

A query on sys.databases learned that this particular database had ACTIVE_TRANSACTION for log_reuse_wait_desc. A peek in Activity Monitor then showed that there was an open transaction for the database; a DELETE task with a huge wait time and the BROKER_RECEIVE_WAITFOR wait type.

Then it occurred to me this is the database that used to have a deadlock problem. After implementing some procedure fixes, there haven’t been any deadlocks for over a half year and neither has the server (or service) been shutdown in between.

The stored procedure on the queue does a receive, but no timeout associated to it. Effectively this means that the receive will wait inside the transaction, until a messages comes into the queue. That hasn’t happened in the last six months. The resolution is simple, add the TIMEOUT option to the RECEIVE and the procedure will stop when there wasn’t any work to be done for the specified amount of time (in milliseconds). For the code, see the post “Deadlock Alerts Through Event Notification”, which has been updated with the TIMEOUT option.

After altering the procedure, kill the spid. Service Broker will restart, but now with the TIMEOUT option in the procedure and stop after the specified amount of time has elapsed without messages arriving in the queue.

Saturday, 24 July 2010 18:48:00 (W. Europe Daylight Time, UTC+02:00)