Troubleshooting SQL Server Statistics

Statistics contain statistical information about the distribution of values in one or more columns. SQL Server Query Optimizer uses statistics to create an optimized and cost-effective execution plan that improve query performance. statistics help SQL server to estimate the number of rows (also known as cardinality).
By default, SQL Server maintain create and update statistics automatically for your databases however you do have option to manually disable these features. Disabling auto create and update statistics should done under a valid reason.

There are three ways to create SQL Server statistics:

  • If the auto_create_statistics option is enabled (enabled by default).
  • Manually create statistics.
  • When a new index is created.

What is auto_create_statistics?
when auto_create_statistics option is enabled, before executing any queries SQL server query Optimizer first validate if all the column(s) within the query predicate(s) had statistics created then go ahead and create statistics for all the column(s) did not have statistics created already. Keep in mind, auto create statistics is not free.

What is auto_update_statistics?
when auto_update_statistics option is enabled, SQL Server Query Optimizer automatically updates the statistics if it is outdated. Statistics is considered to be outdated whenever column modification counter is changes more than 20% + 500 of the number of rows in the table. Remember there is a cost for that as it is not free.

Let’s use below queries to create DemoStatistics database then check if auto create and update statistics are turned on by default in order to proceed with our demo. Same thing can be applied using SSMS however for this article I will stick with T-SQL scripts.

USE master
GO
IF DB_ID(‘DemoStatistics’) IS NOT NULL
DROP DATABASE [DemoStatistics];
GO
USE master
GO
CREATE DATABASE
[DemoStatistics]
GO
Run below query to verify if auto create and update statistics are enabled
SELECT name
,is_auto_create_stats_on
,is_auto_update_stats_on
FROM sys.databases
WHERE name = ‘DemoStatistics’
Below output shows that both options are enabled

Run below script to create address table and insert data into it.
USE DemoStatistics
GO
IF OBJECT_ID(‘dbo.Address’) IS NOT NULL
DROP TABLE [dbo].[Address];
GO
CREATE TABLE
[dbo].[Address](
[AddressLine1] [VARCHAR](512) NOT NULL,
[City] [VARCHAR](50) NOT NULL,
[StateProvinceID] [INT] NOT NULL,
[PostalCode] [VARCHAR](20) NOT NULL
)
GO
INSERT INTO
[dbo].[Address]
([AddressLine1],[City],[StateProvinceID],[PostalCode])
SELECT ’15 Pear Dr.’ AS [AddressLine1],’New York’ AS [City],10 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘150 Pear Dr.’ AS [AddressLine1],’New York’ AS [City],10 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘4460 Newport Center Drive’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘8238 D Crane Ct.’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘4915 Pear Dr.’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘9106 Edwards Ave.’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘2061 Matchstick Drive’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘6920 Merriewood Drive’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘1113 Catherine Way’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘9089 San Jose Ave’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘1144 Paradise Ct.’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘5256 Chickpea Ct.’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘5642 La Orinda Pl’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘2748 Logan Court’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘4588 Morgan Territory Road’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘6289 Duck Horn Court’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘6991 Mauna Kea Court’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘8521 Knewal Rd’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘4628 Mountain View Place’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘906 Cloudview Dr’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘4211 Eastgate Avenue’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘4798 Macaroon Drive’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘6939 E. 7th Street’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘9324 Youngsdale Drive’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘6396 Market Place’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘9211 Holiday Hills Drive’ AS [AddressLine1],’Newport Beach’ AS [City],9 AS [StateProvinceID],’92625′ AS [PostalCode]
GO
Now let’ us query Address table without any predicates then check if statistics will automatically get created.
SELECT * FROM [dbo].[Address]
GO

Now run SP_HELPSTATS ‘[dbo].[Address]’ to check if statistics created.

No statistics created, as we can see from above screenshot. Why? Remember auto create and update statistics will only take place whenever there is predicate like where, on, equal, and, or, in whiting the query otherwise there will be absolutely no action taken by SQL Server Query Optimizer. Now run below query and check if auto statistics taken place
SELECT *
FROM [dbo].[Address]
WHERE [City] =’New York’
AND [StateProvinceID] = 10
GO

Now run SP_HELPSTATS ‘[dbo].[Address]’ again to check if statistics created.

As you can see in the image above, SQL Server Query Optimizer automatically creates statistics on City and StateProvinceID columns.
Now, how can we find out when was the last time statistics updated, the density and the number of rows. We can used any of below steps to find out information related to statistics:

  • DBCC SHOW_STATISTICS
  • DMVs
  • SSMS

Let’ us start with DBCC SHOW_STATISTICS. It will provide you lot of useful information like statistics header, density vector and histogram but we cannot rely on it to assume statistics are updated. Let’ us analyze statistics for City column.
DBCC SHOW_STATISTICS(N'[dbo].[Address]’, _WA_Sys_00000002_22AA2996)
GO
SELECT COUNT(1) AS TotalRows FROM [dbo].[Address]
GO
below image shows that statistics are updated. The count rows returned from address table matched with the number of rows, rows_sample statistics header information provided.

I added below script just in case if you do want to analyse header or density vector or histogram separately.
DBCC SHOW_STATISTICS(Address,yourstatname)WITH STAT_HEADER
DBCC SHOW_STATISTICS(Address,yourstatname)WITH DENSITY_VECTOR
DBCC SHOW_STATISTICS(Address,yourstatname) WITH HISTOGRAM
Now let’s just insert few more records and check if statistics will be aware new rows have been inserted.

INSERT INTO [dbo].[Address]
([AddressLine1],[City],[StateProvinceID],[PostalCode])
SELECT ’15 ave.’ AS [AddressLine1],’New York’ AS [City],10 AS [StateProvinceID],’92625′ AS [PostalCode] UNION ALL
SELECT ‘150 Pear Dr.’ AS [AddressLine1],’New Jersey’ AS [City],10 AS [StateProvinceID],’92725′ AS [PostalCode] UNION ALL
SELECT ‘200 Pear Dr.’ AS [AddressLine1],’New Jersey’ AS [City],10 AS [StateProvinceID],’92725′ AS [PostalCode]
GO
DBCC SHOW_STATISTICS(N'[dbo].[Address]’, _WA_Sys_00000002_22AA2996)
GO
SELECT COUNT(1) AS [TotalRows] FROM [dbo].[Address]
GO

If you analyze above result set, you will quickly realize that statistics are outdated. The count rows returned from address table does not matched with any of information related to number of rows statistics provided. With that said, you cannot rely on Updated column information from statistics header to assume statistics are updated.
The best way to assume that statistics are either updated or outdated is to use sys.dm_db_stats_properties and sys.tats dmvs.

Run below scripts then analyze both outputs. Remember we only inserted three records.
SELECT
OBJECT_NAME([sp].[object_id]) AS “Table”,
[sp].[stats_id] AS “Statistic ID”,
[s].[name] AS “Statistic”,
[sp].[last_updated] AS “Last Updated”,
[sp].[rows],
[sp].[rows_sampled],
[sp].[unfiltered_rows],
[sp].[modification_counter] AS “Modifications”
FROM [sys].[stats] AS [s]
OUTER APPLY [sys].[dm_db_stats_properties] ([s].[object_id],[s].[stats_id]) AS [sp]
WHERE [s].[object_id] = OBJECT_ID(N'[dbo].[Address]’);
GO
SELECT
COUNT(1)AS [TotalRows] FROM [dbo].[Address]
GO

From above image, we can see that we do have a new column called Modifications. Within Modification column, we can quickly see our three records we just inserted. That said statistics are outdated. The number of rows, rows_sample from statistics header information do not matched with the number of rows returned from select count from address table.
There are two ways statistics can be updated, manually or automatically. To manually update statistics, we can issue update statistics command. For instance you can issue UPDATE STATISTICS [schemaName].[tableName] to manually update all statistics within a table or UPDATE STATISTICS [schemaName].[tableName] [statName] to update a single stat within a table.

Run below scripts to manually update statistics then let’s analyze the result set.
UPDATE STATISTICS [dbo].[Address]
GO
DBCC SHOW_STATISTICS
(N'[dbo].[Address]’, _WA_Sys_00000002_22AA2996)
GO
SELE
CT
OBJECT_NAME([sp].[object_id]) AS “Table”,
[sp].[stats_id] AS “Statistic ID”,
[s].[name] AS “Statistic”,
[sp].[last_updated] AS “Last Updated”,
[sp].[rows],
[sp].[rows_sampled],
[sp].[unfiltered_rows],
[sp].[modification_counter] AS “Modifications”
FROM [sys].[stats] AS [s]
OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp]
WHERE [s].[object_id] = OBJECT_ID(N'[dbo].[Address]’);
GO
SELECT
COUNT(1)AS [TotalRows] FROM [dbo].[Address]
GO

Now statistics are updated. Modification column values are 0 and rows, rows_sample from statistics header information matched with the select count from address table.

Auto Update Statistics keep track of the underline data changed and automatically trigger whenever column modification counter is changes more than 20% + 500 of the number of rows in the table. Please keep in mind this is not free which means auto statistics may impact your OLTP performance.

let’s generate more data by running below scripts and verify if auto statistics will take place.
INSERT INTO [dbo].[Address]
SELECT * FROM [dbo].[Address] WITH (NOLOCK);
GO 10
SELECT * FROM [dbo].[Address]
GO
SELECT
OBJECT_NAME([sp].[object_id]) AS “Table”,
[sp].[stats_id] AS “Statistic ID”,
[s].[name] AS “Statistic”,
[sp].[last_updated] AS “Last Updated”,
[sp].[rows],
[sp].[rows_sampled],
[sp].[unfiltered_rows],
[sp].[modification_counter] AS “Modifications”
FROM [sys].[stats] AS [s]
OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp]
WHERE [s].[object_id] = OBJECT_ID(N'[dbo].[Address]’);
GO
SELECT COUNT(1)AS [TotalRows] FROM [dbo].[Address]
GO
SELECT
name
,is_auto_create_stats_on
,is_auto_update_stats_on
FROM sys.databases
WHERE name = ‘DemoStatistics’
GO

If you add together rows and modifications columns values returned from statistics header information, you will see that it will equal to TotalRows column value returned from the select count from address table. We can clearly see from above image auto_update_statistics is enabled but did not evoke. You probably wondering “why SQL Server Query Optimizer did not evoke auto update statistics?” Now you realized that it is not recommended to rely on auto update statistics. You have to maintain statistics as well by creating SQL job to update statistics regularly within your environments.

Auto update statistics take place only while querying the data and SQL Server Query Optimizer realized that the column modification counter is changes more than 20% + 500 of the number of rows in the table. The Query Optimizer will pause the query execution then update statistics and uses latest statistics information to create an optimized and cost-effective execution plan then proceed with the query execution. Again, there is a cost for that and performance will slow down while statistics updating.
Let’ us go a bit deeper. Note, Auto update statistics will not take place if there are more than one statistics have to be updated during the query execution.
Now run below scripts and check if statistics will take place.
SELECT *
FROM [dbo].[Address] WHERE [City] =‘New York’ AND [StateProvinceID] = 10
GO
SELECT

OBJECT_NAME([sp].[object_id]) AS “Table”,
[sp].[stats_id] AS “Statistic ID”,
[s].[name] AS “Statistic”,
[sp].[last_updated] AS “Last Updated”,
[sp].[rows],
[sp].[rows_sampled],
[sp].[unfiltered_rows],
[sp].[modification_counter] AS “Modifications”
FROM [sys].[stats] AS [s]
OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp]
WHERE [s].[object_id] = OBJECT_ID(N'[dbo].[Address]’);
GO
SELECT
COUNT(1) AS TotalRows FROM [dbo].[Address]
GO
SELECT
name
,is_auto_create_stats_on
,is_auto_update_stats_on
FROM sys.databases
WHERE name = ‘DemoStatistics’
GO

The Query Optimizer did not take place as you can see from above image more than one statistics have to be updated. That can impact your production environments forever if there is no databases maintenance job in place to update statistics. Please do not rely on auto update statistics. The only way we have to update statistics now is to manually or run the maintenance job to update statistics. Let’s manually update statistics.
UPDATE STATISTICS [dbo].[Address]
GO
SELECT
OBJECT_NAME([sp].[object_id]) AS “Table”,
[sp].[stats_id] AS “Statistic ID”,
[s].[name] AS “Statistic”,
[sp].[last_updated] AS “Last Updated”,
[sp].[rows],
[sp].[rows_sampled],
[sp].[unfiltered_rows],
[sp].[modification_counter] AS “Modifications”
FROM [sys].[stats] AS [s]
OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp]
WHERE [s].[object_id] = OBJECT_ID(N'[dbo].[Address]’);
GO
SELECT COUNT(1)AS [TotalRows] FROM [dbo].[Address]
GO

Great, above image prove that statistics are updated.
We already know the formula((20%) +500), correct? Now let’ use the formula to simulate one example where auto update statistics will automatically take place.
Execute below script then analyze each result set.

UPDATE AD SET City =’New Jersey’
FROM [dbo].[Address] AD
WHERE AddressLine1 IN(‘4460 Newport Center Drive’,‘8238 D Crane Ct.’,’15 Pear Dr.’,‘150 Pear Dr.’,‘906 Cloudview Dr’);
GO
SELECT

OBJECT_NAME([sp].[object_id]) AS “Table”,
[sp].[stats_id] AS “Statistic ID”,
[s].[name] AS “Statistic”,
[sp].[last_updated] AS “Last Updated”,
[sp].[rows],
[sp].[rows_sampled],
[sp].[unfiltered_rows],
[sp].[modification_counter] AS “Modifications”
FROM [sys].[stats] AS [s]
OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp]
WHERE [s].[object_id] = OBJECT_ID(N'[dbo].[Address]’);
GO
— This SELECT statement will force the optimizer to Update Statistics
SELECT
* FROM [dbo].[Address] WHERE City = ‘New Jersey’
GO
SELECT

OBJECT_NAME([sp].[object_id]) AS “Table”,
[sp].[stats_id] AS “Statistic ID”,
[s].[name] AS “Statistic”,
[sp].[last_updated] AS “Last Updated”,
[sp].[rows],
[sp].[rows_sampled],
[sp].[unfiltered_rows],
[sp].[modification_counter] AS “Modifications”
FROM [sys].[stats] AS [s]
OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp]
WHERE [s].[object_id] = OBJECT_ID(N'[dbo].[Address]’);
GO

From above image we can quickly see that statistics was not updated after we issued the update statement however SQL Server Query Optimizer evoke Auto Update Statistics during the query execution. Keep in mind that there is a cost for that.

Summary
Statistics can absolutely improve performance that is the reason Microsoft enabled Auto Create and Update Statistics by default. Depends on the workload, creating and updating statistics during query execution time can impact performance as well. As best practices, you should create databases maintenance job to update statistics regularly, Manually create statistics based on critical queries patterns. Do not rely on Updated column from statistics header information to believe that statistics are updated.

Published by Jean Joseph

Jean Joseph is a Database, Big Data, Data Warehouse Platform, Data Pipeline, Database Architecture Solutions Provider as well as a Data Engineer enthusiast among other disciplines.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: