0
SQL Performance Tuning using Indexes
Effective indexes are one of the
best ways to improve performance in a database application. Without an index,
the SQL Server engine is like a reader trying to find a word in a book by
examining each page. By using the index in the back of a book, a reader can
complete the task in a much shorter time. In database terms, a table scan
happens when there is no index available to help a query. In a table scan SQL
Server examines every row in the table to satisfy the query results. Table
scans are sometimes unavoidable, but on large tables, scans have a terrific
impact on performance.
One of the most important jobs
for the database is finding the best index to use when generating an execution
plan. Most major databases ship with tools to show you execution plans for a
query and help in optimizing and tuning indexes. This article outlines several
good rules of thumb to apply when creating and modifying indexes for your
database. First, let’s cover the scenarios where indexes help performance, and
when indexes can hurt performance.
Useful Index Queries
Just like the reader searching
for a word in a book, an index helps when you are looking for a specific record
or set of records with a WHERE clause. This includes queries looking for a
range of values, queries designed to match a specific value, and queries
performing a join on two tables. For example, both of the queries against the
Northwind database below will benefit from an index on the UnitPrice column.
DELETE FROM Products WHERE UnitPrice = 1
SELECT * FROM PRODUCTS
WHERE UnitPrice BETWEEN 14 AND 16
Since index entries are stored in
sorted order, indexes also help when processing ORDER BY clauses. Without an
index the database has to load the records and sort them during execution. An
index on UnitPrice will allow the database to process the following query by
simply scanning the index and fetching rows as they are referenced. To order
the records in descending order, the database can simply scan the index in
reverse.
SELECT * FROM Products ORDER BY UnitPrice ASC
Grouping records with a GROUP BY
clause will often require sorting, so a UnitPrice index will also help the
following query to count the number of products at each price.
SELECT Count(*), UnitPrice FROM Products
GROUP BY UnitPrice
By retrieving the records in
sorted order through the UnitPrice index, the database sees matching prices
appear in consecutive index entries, and can easily keep a count of products at
each price. Indexes are also useful for maintaining unique values in a column,
since the database can easily search the index to see if an incoming value
already exists. Primary keys are always indexed for this reason.
Index Drawbacks
Indexes are a performance drag
when the time comes to modify records. Any time a query modifies the data in a
table the indexes on the data must change also. Achieving the right number of
indexes will require testing and monitoring of your database to see where the
best balance lies. Static systems, where databases are used heavily for
reporting, can afford more indexes to support the read only queries. A database
with a heavy number of transactions to modify data will need fewer indexes to
allow for higher throughput. Indexes also use disk space. The exact size will
depends on the number of records in the table as well as the number and size of
the columns in the index. Generally this is not a major concern as disk space
is easy to trade for better performance.
Building the Best Index
There are a number of guidelines
to building the most effective indexes for your application. From the columns
you select to the data values inside them, consider the following points when
selecting the indexes for your tables.
Short Keys
Having short index is beneficial
for two reasons. First, database work is inherently disk intensive. Larger
index keys will cause the database to perform more disk reads, which limits
throughput. Secondly, since index entries are often involved in comparisons,
smaller entries are easier to compare. A single integer column makes the
absolute best index key because an integer is small and easy for the database
to compare. Character strings, on the other hand, require a character by
character comparison and attention to collation settings.
Distinct Keys
The most effective indexes are
the indexes with a small percentage of duplicated values. As an analogy, think
of a phone book for a town where almost everyone has the last name of Smith. A
phone book in this town is not very useful if sorted in order of last name,
because you can only discount a small number of records when you are looking
for a Smith.
An index with a high percentage
of unique values is a selective index. Obviously, a unique index is highly
selective since there are no duplicate entries. Many databases will track
statistics about each index so they know how selective each index is. The
database uses these statistics when generating an execution plan for a query.
Covering Queries
Indexes generally contain only
the data values for the columns they index and a pointer back to the row with
the rest of the data. This is similar to the index in a book: the index
contains only the key word and then a page reference you can turn to for the rest
of the information. Generally the database will have to follow pointers from an
index back to a row to gather all the information required for a query.
However, if the index contains all of the columns needed for a query, the
database can save a disk read by not returning to the table for more
information.
Take the index on UnitPrice we
discussed earlier. The database could use just the index entries to satisfy the
following query.
SELECT Count(*), UnitPrice FROM Products
GROUP BY UnitPrice
We call these types of queries
covered queries, because all of the columns requested in the output are covered
by a single index. For your most crucial queries, you might consider creating a
covering index to give the query the best performance possible. Such an index would
probably be a composite index (using more than one column), which appears to go
against our first guideline of keeping index entries as short as possible.
Obviously this is another tradeoff you can only evaluate with performance
testing and monitoring.
Clustered Indexes (IOT in oracle )
Many databases have one special
index per table where all of the data from a row exists in the index. SQL
Server calls this index a clustered index. Instead of an index at the back of a
book, a clustered index is closer in similarity to a phone book because each
index entry contains all the information you need, there are no references to
follow to pick up additional data values.
As a general rule of thumb, every
non-trivial table should have a clustered index. If you only create one index
for a table, make the index a clustered index. In SQL Server, creating a
primary key will automatically create a clustered index (if none exists) using
the primary key column as the index key. Clustered indexes are the most
effective indexes (when used, they always cover a query), and in many databases
systems will help the database efficiently manage the space required to store
the table.
When choosing the column or
columns for a clustered index, be careful to choose a column with static data.
If you modify a record and change the value of a column in a clustered index,
the database might need to move the index entry (to keep the entries in sorted
order). Remember, index entries for a clustered index contain all of the column
values, so moving an entry is comparable to executing a DELETE statement
followed by an INSERT, which can obviously cause performance problems if done
often. For this reason, clustered indexes are often found on primary or foreign
key columns. Key values will rarely, if ever, change.
Conclusion
Determining the correct indexes
to use in a database requires careful analysis, benchmarking, and testing. The
rules of thumb presented in this article are general guidelines. After applying
these principals you need to retest your specific application in your specific
environment of hardware, memory, and concurrent activity
0Awesome Comments!