Daily Techno Diary

SQL Server Performance Tuning

Reasons for bad performance:

1. Insufficient and inaccurate indexes.

2. Inaccurate statistics

3. Improper query design

4. Poorly generated execution plan

5. Excessive blocking and deadlocks

6. Non set based operations like – Cursors

7. Inappropriate database design

8. Recompiling execution plans

9. Frequent recompilation of queries

Excessive Index fragmentation.

Indexes:

Indexes are used by queries to find data from tables quickly.
Indexes are created on tables and views.
Index on a table or a view is very similar to an index that we find in a book.
The existence of the right indexes can drastically improve the performance of the query. If there is no index to help the query, then the query engine checks every row in the table from the beginning to the end. This is called Table Scan. Table scan is bad for performance.

Types of indexes in SQL Server

1. Clustered

2. Nonclustered

3. Unique

4. Filtered

5. XML

6. Full Text

7. Spatial

8. Columnstore

9. Index with included columns

10. Index on computed columns

Clustered Index:

A clustered index determines the physical order of data in a table. For this reason, a table can have only one clustered index.
Primary key constraints create clustered indexes automatically if no clustered index already exists on the table.
SQL server doesn't allow us to create more than one clustered index per table.

Non Clustered Index:

A nonclustered index is analogous to an index in a textbook. The data is stored in one place, the index in another place.
The index will have pointers to the storage location of the data. Since, the nonclustered index is stored separately from the actual data, a table can have more than one non clustered index, just like how a book can have an index by Chapters at the beginning and another index by common terms at the end.
In the index itself, the data is stored in an ascending or descending order of the index key, which doesn't in any way influence the storage of data in the table.

Clustered vs Non Clustered Index:

Only one clustered index per table, whereas you can have more than one non clustered index
Clustered index is faster than a non-clustered index, because the non-clustered index must refer back to the table, if the selected column is not present in the index.
Clustered index determines the storage order of rows in the table, and hence doesn't require additional disk space, but whereas a Non Clustered index is stored separately from the table, additional storage space is required.

Index Storage:

In order to understand the index storage first we should understand how data is physically stored in SQL Server.
At logical level data is stored in the rows and columns.
At physical level data is stored in data pages. A data page is the fundamental unit of data storage in SQL Server and it is 8KB in size. When we insert any data into a SQL Server database table, it saves that data to a series of 8 KB data pages.
Data is stored in a series of data pages in a tree-like structure called the B-Tree.
The nodes that are at bottom of the tree are called data pages or leaf nodes of the tree that contain the date. The size of the data page is 8KB. The number of rows that are stored in each data page really depends on the size of each row.
The node at the top of the tree is called the Root node( index row).
The nodes between the root node and the leaf nodes are called intermediate levels( index rows).
Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf node.
Tree like structure has a series of pointers that helps the database engine to find the data quickly.

By default, the index will be stored in the same filegroup as the underlying table on which the index is created.

If you design a Non-Clustered index to be stored in a data file different from the underlying table data file and located in a separate disk drive, or horizontally partition the index to span multiple filegroups, the performance of the queries that are reading from the index will be improved, due to the I/O performance enhancement resulted from hitting on different data files and disk drives at the same time.

Fill factor:

How much space is reserved for empty space?

The initial storage of the index can be also optimized by setting the Fill factor option to a value different from the default value of 0 or 100.
FILLFACTOR is the value that determines the percentage of space on each leaf-level page to be filled with data.
Setting the FILLFACTOR to 90% when you create or rebuild an index, SQL Server will try to leave 10% of each leaf page empty, reserving the remainder on each page as free space for future growth to prevent the page splitting and index fragmentation performance problems.
Page splitting is a CPU consuming task for database engines.

Pad index:

Fillfactor applies to the bottom layer.
PAD_INDEX applies to all layers. This means PAD_INDEX is only useful if FILLFACTOR is set.
ON: The % of free space that is specified by fillfactor is applied to the intermediate-level pages of the index.
OFF: The intermediate-level pages are filled to near capacity

https://stackoverflow.com/questions/6857007/what-is-the-purpose-of-pad-index-in-this-sql-server-constraint/53013749

Sort in tempdb:

When it is On, we direct the SQL server database engine to use tempdb to store the intermediate sort results that are used to build the index.
Although this option increases the amount of temporary disk space that is used to create an index, the option could reduce the time that is required to create or rebuild an index when tempdb is on a set of disks different from that of the user database.
Using sort in tempdb increases your IO load drastically, since instead of Writing the index one time to the datafile, you now write it one time to the tempdb and then write it to the data file. So that is not always ideal.

https://dba.stackexchange.com/questions/108704/when-to-use-sort-in-tempdb-when-rebuilding-indexes

Locks:

Allow row locks: Determines whether row locks are allowed to access the index data.
Allow page locks: Determines whether page locks are allowed to access the index data.

General:

Auto compute statistics:

It determines if the outdated index distribution statistics will be recomputed automatically.
ON: Out-of-date statistics are not automatically recomputed.

Ignore duplicate values:

Specifies if an error message will be shown when duplicate key values are inserted into the unique index.

Operations:

Maximum degree of parallelism:

It is used to limit the number of processors used in a parallel plan execution. The maximum is 64 processors.

Optimize for Sequential key:

Unique Index:

The unique index is used to enforce the uniqueness of values and primary key constraint.
Uniqueness is a property of an Index, and both clustered and non-clustered indexes can be unique.
when we add a unique constraint, a unique index gets created behind the scenes.
By default, a primary key constraint creates a unique clustered index, whereas a unique constraint creates a unique nonclustered index.
A unique constraint or a unique index cannot be created on an existing table, if the table contains duplicate values in the key columns.

Filtered Index:

A filtered index is a non clustered index along with where condition.
Due to where condition, indexing will be on portions of records.
It will improve the query performance.
It will reduce the disk storage space.
It can be used only in non clustered indexes.

Example - if we have a table with 10,000 records and we want to retrieve 1000 records. We can do this in three ways.

Select query on table without index.
Select query on table with index created on table.
Select query on table with filtered index created on t table.

https://www.sqlshack.com/introduction-to-sql-server-filtered-indexes/

Full Text Indexes:

In most cases, we use clustered and non-clustered indexes to help a query go faster, but these kinds of indexes cannot be used for fast text lookup. For instance, a LIKE operator will lead SQL Server to scan the whole table in order to pick up values that meet the expression next to this operator. This means it won’t be fast in every case, even if an index is created for a considered column.

Microsoft SQL Server comes up with an answer to part of this issue with a Full-Text Search feature.

Full-text indexes can be created on LOB datatype columns like TEXT, VARCHAR(MAX), IMAGE, VARBINARY(MAX) (it can also index CHAR and VARCHAR column types). Without this functionality any query that referenced a column defined with a LOB data type would require a full scan.

A full-text index breaks the column into tokens and these tokens make up the index data. Before you can create a full-text index you must create a full text catalog and this catalog is where the full-text index data is stored. A full-text catalog can contain many indexes but a full-text index can only be part of one catalog. Also, only one full-text index can be created on each table.

For running full text queries we use either

Full text predicates CONTAINS and FREETEXT or
Full text functions CONTAINSTABLE and FREETEXTTABLE

Full text predicates are used in where clause of the query and returns true or false whereas full text functions return tables and hence are used in from clause of the query.

Example:

https://www.mssqltips.com/sqlservertutorial/9136/sql-server-full-text-indexes/

https://www.sqlshack.com/hands-full-text-search-sql-server/

Columnstore Indexes:

By default, SQL Server stores data logically in the tables as rows and columns, which appear in the result grid while retrieving data from any table and physically in the disk in the row-store format inside the data pages.

A new data store mechanism introduced in SQL Server 2012, based on xVelocity in-memory technology, in which the data is stored in the column-store data format. This data store mechanism is called the Columnstore index.

The standard row-store index is useful when searching for a particular value on small tables and small ranges of values, which is suitable for the transactional workload. But for analytical workloads like the data warehousing and business intelligence (BI) in which the queries will perform full scans for huge amounts of data on very large tables such as fact tables, the Columnstore index is better here.

In the Columnstore index, the data is organized as individual columns that form together the index structure.

Each column is located in its own highly compressed data container, called a Segment. This segment will contain values from this column only, and for large tables it will be distributed to multiple segments, as the segment can contain only one million rows which is called a Rowgroup. The segment consists of one or multiple pages. The data will be transferred between the memory and the disk as segments.

A columnstore index improves the performance of the data warehousing queries up to 10x the normal execution speed.

Index with included columns:

If all the columns requested in the select command in SQL query are available in non clustered index, then the index is called covering index as the query engine doesn’t have to lookup the table again.

An index with included columns can greatly improve query performance because all columns in the query are included in the index; The query optimizer can locate all column values within the index without accessing table or clustered index resulting in fewer disk I/O operations.

https://www.sqlservertutorial.net/sql-server-indexes/sql-server-indexes-with-included-columns/

Index on Computed Column:

Computed Column:

A computed column is a virtual column whose value is calculated from other values in the table.

By default, the expression’s output value is not physically stored. Instead, SQL Server runs the expression when the column is queried and returns the value as part of the result set.

Computed columns provide a handy tool for building expressions into table definitions.

Here total_cost is a computed column.

In many cases, computed columns put too much burden on the processor, resulting in slower queries and unresponsive applications.

Fortunately, SQL Server provides several strategies for improving computed column performance.

We can create

persisted computed columns,
index the computed columns,
or do both.

Persisted Computed Column:

To create a persisted computed column, we need to add the PERSISTED keyword to your column definition.

Persisted computed columns are used to address performance issues. With this approach, the expression is precalculated and its outputed values stored with the rest of the table data, consuming more space from the table. We can see the space used by the table with the following system stored procedure.

Sp_spaceused ‘table_name’

Looking at the other side of the performance equation, calculating the values of the virtual Computed Column at the runtime will have an extra performance overhead.

If we don’t create the persisted table then an extra step is performed on the table to calculate the computed column.

Querying the table with computed columns will take more time than querying the table with persisted computed columns.

Index on computed columns:

We use create index command to make index on computed column

https://www.sqlshack.com/how-to-create-indexes-on-sql-server-computed-columns/

Daily Techno Diary

Wednesday, March 9, 2022

Views

Wednesday, February 16, 2022

Knowledge Sharing

Popular Posts