Indexes:
- Indexes are used by queries to find
data from tables quickly.
- Indexes are created on tables and
views.
- Index on a table or a view is very
similar to an index that we find in a book.
- The existence of the right indexes can
drastically improve the performance of the query. If there is no index to
help the query, then the query engine checks every row in the table from
the beginning to the end. This is called Table Scan. Table scan is bad for
performance.
Types of indexes in SQL Server
1. Clustered
2. Nonclustered
3. Unique
4. Filtered
5. XML
6. Full Text
7. Spatial
8. Columnstore
9. Index with included columns
10. Index on computed columns
Clustered Index:
- A clustered index determines the
physical order of data in a table. For this reason, a table can have only
one clustered index.
- Primary key constraints create
clustered indexes automatically if no clustered index already exists on
the table.
- SQL server doesn't allow us to create
more than one clustered index per table.
Non Clustered Index:
- A nonclustered index is analogous to
an index in a textbook. The data is stored in one place, the index in
another place.
- The index will have pointers to the
storage location of the data. Since, the nonclustered index is stored
separately from the actual data, a table can have more than one non
clustered index, just like how a book can have an index by Chapters at the
beginning and another index by common terms at the end.
- In the index itself, the data is
stored in an ascending or descending order of the index key, which doesn't
in any way influence the storage of data in the table.
Clustered vs Non Clustered Index:
- Only one clustered index per table,
whereas you can have more than one non clustered index
- Clustered index is faster than a
non-clustered index, because the non-clustered index must refer back to
the table, if the selected column is not present in the index.
- Clustered index determines the storage
order of rows in the table, and hence doesn't require additional disk
space, but whereas a Non Clustered index is stored separately from the
table, additional storage space is required.
Index Storage:
- In order to understand the index
storage first we should understand how data is physically stored in SQL
Server.
- At logical level data is stored in the
rows and columns.
- At physical level data is stored in
data pages. A data page is the fundamental unit of data storage in SQL
Server and it is 8KB in size. When we insert any data into a SQL Server
database table, it saves that data to a series of 8 KB data pages.
- Data is stored in a series of data
pages in a tree-like structure called the B-Tree.
- The nodes that are at bottom of the
tree are called data pages or leaf nodes of the tree that contain the
date. The size of the data page is 8KB. The number of rows that are stored
in each data page really depends on the size of each row.
- The node at the top of the tree is
called the Root node( index row).
- The nodes between the root node and
the leaf nodes are called intermediate levels( index rows).
- Each index row contains a key value
and a pointer to either an intermediate level page in the B-tree, or a data
row in the leaf node.
- Tree like structure has a series of
pointers that helps the database engine to find the data quickly.
By default, the index will be stored in the
same filegroup as the underlying table on which the index is created.
If you design a Non-Clustered index to be
stored in a data file different from the underlying table data file and located
in a separate disk drive, or horizontally partition the index to span multiple
filegroups, the performance of the queries that are reading from the index will
be improved, due to the I/O performance enhancement resulted from hitting on
different data files and disk drives at the same time.
Fill factor:
How much space is reserved
for empty space?
- The initial storage of the index can
be also optimized by setting the Fill factor option to a value different
from the default value of 0 or 100.
- FILLFACTOR is the value that
determines the percentage of space on each leaf-level page to be filled
with data.
- Setting the FILLFACTOR to 90% when you
create or rebuild an index, SQL Server will try to leave 10% of each leaf
page empty, reserving the remainder on each page as free space for future
growth to prevent the page splitting and index fragmentation performance
problems.
- Page splitting is a CPU consuming task
for database engines.
Pad index:
- Fillfactor applies to the bottom
layer.
- PAD_INDEX applies to all layers. This
means PAD_INDEX is only useful if FILLFACTOR is set.
- ON: The % of free space that is specified
by fillfactor is applied to the intermediate-level pages of the index.
- OFF: The intermediate-level pages are
filled to near capacity
https://stackoverflow.com/questions/6857007/what-is-the-purpose-of-pad-index-in-this-sql-server-constraint/53013749
Sort in tempdb:
- When it is On, we direct the SQL
server database engine to use tempdb to store the intermediate sort
results that are used to build the index.
- Although this option increases the
amount of temporary disk space that is used to create an index, the option
could reduce the time that is required to create or rebuild an index when
tempdb is on a set of disks different from that of the user
database.
- Using sort in tempdb increases your IO
load drastically, since instead of Writing the index one time to the
datafile, you now write it one time to the tempdb and then write it to the
data file. So that is not always ideal.
https://dba.stackexchange.com/questions/108704/when-to-use-sort-in-tempdb-when-rebuilding-indexes
Locks:
- Allow row locks: Determines whether row locks are allowed to access the
index data.
- Allow page locks: Determines whether page locks are
allowed to access the index data.
General:
- Auto compute statistics:
- It determines if the outdated index
distribution statistics will be recomputed automatically.
- ON: Out-of-date statistics are not
automatically recomputed.
- Ignore duplicate values:
- Specifies if
an error message will be shown when duplicate key values are inserted
into the unique index.
Operations:
- Maximum degree of parallelism:
- It is used to limit the number of
processors used in a parallel plan execution. The maximum is 64 processors.
- Optimize for Sequential key:
Unique Index:
- The unique index is used to enforce
the uniqueness of values and primary key constraint.
- Uniqueness is a property of an Index,
and both clustered and non-clustered indexes can be unique.
- when we add a unique constraint, a
unique index gets created behind the scenes.
- By default, a primary key constraint
creates a unique clustered index, whereas a unique constraint creates a
unique nonclustered index.
- A unique constraint or a unique index
cannot be created on an existing table, if the table contains duplicate
values in the key columns.
Filtered Index:
- A filtered index is a non clustered
index along with where condition.
- Due to where condition, indexing will
be on portions of records.
- It will improve the query performance.
- It will reduce the disk storage space.
- It can be used only in non clustered
indexes.
Example - if we have a table
with 10,000 records and we want to retrieve 1000 records. We can do this in
three ways.
- Select query on table without index.
- Select query on table with index
created on table.
- Select query on table with filtered
index created on t table.
https://www.sqlshack.com/introduction-to-sql-server-filtered-indexes/
Full Text Indexes:
In most cases, we use
clustered and non-clustered indexes to help a query go faster, but these kinds
of indexes cannot be used for fast text lookup. For instance, a LIKE operator
will lead SQL Server to scan the whole table in order to pick up values that
meet the expression next to this operator. This means it won’t be fast in every
case, even if an index is created for a considered column.
Microsoft SQL Server comes up with an answer to
part of this issue with a Full-Text Search feature.
Full-text indexes can be created on LOB
datatype columns like TEXT, VARCHAR(MAX), IMAGE, VARBINARY(MAX) (it can
also index CHAR and VARCHAR column types). Without this functionality any query that
referenced a column defined with a LOB data type would require a full scan.
A full-text index breaks the column into tokens and these tokens
make up the index data. Before you can create a full-text index you must create a full text catalog and this catalog is
where the full-text index data is stored. A full-text catalog can contain
many indexes but a full-text index can only be part of one catalog. Also,
only one full-text index can be created on each table.
For running full text queries we use
either
- Full text
predicates CONTAINS and FREETEXT or
- Full text functions CONTAINSTABLE and
FREETEXTTABLE
Full text predicates are used in where clause
of the query and returns true or false whereas full text functions return
tables and hence are used in from clause of the query.
Example:
https://www.mssqltips.com/sqlservertutorial/9136/sql-server-full-text-indexes/
https://www.sqlshack.com/hands-full-text-search-sql-server/
Columnstore Indexes:
By default, SQL Server stores
data logically in the tables as rows and columns, which appear in the result
grid while retrieving data from any table and physically in the disk in the
row-store format inside the data pages.
A new data store mechanism
introduced in SQL Server 2012, based on xVelocity in-memory technology, in
which the data is stored in the column-store data format. This data store
mechanism is called the Columnstore index.
The standard row-store index
is useful when searching for a particular value on small tables and small
ranges of values, which is suitable for the transactional workload. But for
analytical workloads like the data warehousing and business intelligence (BI)
in which the queries will perform full scans for huge amounts of data on very
large tables such as fact tables, the Columnstore index is better here.
In the Columnstore index, the
data is organized as individual columns that form together the index
structure.
Each column is located in its
own highly compressed data container, called a Segment. This segment will
contain values from this column only, and for large tables it will be
distributed to multiple segments, as the segment can contain only one million
rows which is called a Rowgroup. The segment consists of one or multiple pages.
The data will be transferred between the memory and the disk as segments.
A columnstore index improves
the performance of the data warehousing queries up to 10x the normal execution
speed.
Index
with included columns:
If all the columns requested in the select
command in SQL query are available in non clustered index, then the index is
called covering index as the query engine doesn’t have to lookup the table
again.
An index with included columns can greatly
improve query performance because all columns in the query are included in the
index; The query optimizer can locate all column values within the index
without accessing table or clustered index resulting in fewer disk I/O
operations.
https://www.sqlservertutorial.net/sql-server-indexes/sql-server-indexes-with-included-columns/
Index on Computed
Column:
Computed Column:
A computed column is a
virtual column whose value is calculated from other values in the table.
By default, the expression’s output value is
not physically stored. Instead, SQL Server runs the expression when the column
is queried and returns the value as part of the result set.
Computed columns provide a handy tool for
building expressions into table definitions.
Here total_cost is a computed column.
In many cases, computed columns put too much
burden on the processor, resulting in slower queries and unresponsive
applications.
Fortunately, SQL Server provides several
strategies for improving computed column performance.
We can create
- persisted computed columns,
- index the computed columns,
- or do both.
Persisted Computed Column:
To create a persisted
computed column, we need to add the PERSISTED keyword to your column
definition.
Persisted computed columns are used to address
performance issues. With this approach, the expression is precalculated and its
outputed values stored with the rest of the table data, consuming more space
from the table. We can see the space used by the table with the following
system stored procedure.
Sp_spaceused
‘table_name’
Looking at the other side of the performance
equation, calculating the values of the virtual Computed Column at the runtime
will have an extra performance overhead.
If we don’t create the persisted table then an
extra step is performed on the table to calculate the computed column.
Querying the table with computed columns will
take more time than querying the table with persisted computed columns.
Index on computed columns:
We use create index command
to make index on computed column
https://www.sqlshack.com/how-to-create-indexes-on-sql-server-computed-columns/