Statistics are vital for Query Performance however In this post, I want to show you how query performance changes when the automatic create or update statistics does occur during the query execution time. Remember, the optimizer in SQL Server is driven by statistical analysis of query behavior and by default, SQL Server maintain create and update statistics automatically for your databases.
Notes: auto create and update statistics option relying on query predicate. A predicate is a conditional expression that, combined with the logical operators AND and OR, makes up the set of conditions in a WHERE, HAVING, or ON clause.
What is auto_create_statistics?
when auto_create_statistics option is enabled, before executing any queries SQL Server Query Optimizer first validate if all the columns within the query predicate had statistics created then go ahead and create statistics for all the column did not have statistics created already. Keep in mind, auto create statistics is not free.
First let’s setup our environment. Scripts can be found here
Our environment is ready, 120 rows have been inserted into address table also auto create and update statistics are enabled. Now let’s check if there are any statistics created.
From above image, we can clearly see that no statistics created. Remember SQL Server Query Optimizer will evoke Auto Create Statistics during the Query Execution Time whenever there is a conditional expression within the query. Let’s flush the plan cache or buffer cache for DemoStatistics database then set statistics IO and TIME ON. This is a tiny table that has only 120 rows on it. Let’s query address table with a conditional expression and analyze how much CPU, Elapsed time, logical and physical read moreover scan count.
Let’s verify if statistics created.
One row returned from above image which means The Query Optimizer evoked Auto Create Statistics during the Query Execution Time.
Now, let’s flush the buffer cache and execute the same query again then analyze the cost.
The second Query Execution Time is much cheaper than the first one. Reason is that during the first Query Execution Time the predicate within the query force SQL Server Query Optimizer to check and create statistics if not exists, uses statistical information to create an optimized and cost-effective execution plan then proceed with the Query Execution.
First Query from above images, CPU Time was 21ms, Elapsed Time was 344ms whereas for the second query CPU Time was 3ms, Elapsed Time was 4. There both have same logical, physical read and scan count. If for only 120 rows within a tiny table auto create statistics can impact performance like that. How much impact would it be if our environment were large?
What is auto_update_statistics?
when auto_update_statistics option is enabled, SQL Server Query Optimizer automatically updates the statistics if it is outdated. Statistics is considered to be outdated whenever column modification counter is changes more than 20% + 500 of the number of rows in the table. Remember there is a cost for that as it is not free.
Let’s insert more rows into address table, select count one from address table then check if statistics is either updated or outdated. We can clearly see that from below image statistics is outdated after the insert completed. Modifications column is greater than zero, correct?
Now, let’ us query the data again and save statistics IO, Time to compare it later after statistics is updated. Remember, Statistics is considered to be outdated whenever column modification counter is changes more than 20% + 500 of the number of rows in the table.
Modifications column from below image shows that statistics is updated meaning SQL Server Query Optimizer evoked Auto Update Statistics, uses statistical information to create an optimized and cost-effective execution plan then proceed with the Query Execution.
Query the same set of data from address table again and compare statistics IO and Time with the Query Execution where statistics was outdated.
First when statistics was outdated the Query execution CPU Time was 1971ms, Elapsed Time was 1971ms whereas after statistics is updated the second query, CPU Time was 3ms, Elapsed Time was 4. There both returned same number of rows, logical, physical read and scan count.
Auto Create or Update Statistics can reduce the performance because there are resource-intensive operation, and will impact other transactions. Similarly, if the application executes a bulk insert or update operation, update statistics can impact the performance.
The question is, should we keep Auto Create and Update Statistics enabled? I would say, it depends however I would preferred to keep them enabled in addition, have database maintenance job to only update statistics where Modifications column is greater than zero during less critical business hours.