This is the second (and last) post in a pico-series about using Oracle’s data dictionary as a transparent excuse to post something written in Groovy.
Last time we looked at dependencies, today at indices. The primary intent is to add semi-automated schema checks in the development lifecycle, particularly with larger schemas that have grown organically in time. With the aid of a script we’ll flag indices that may require a closer look, because of their performance or applicability. There is no illusion of replacing a DBA or an intimate knowledge of the application, rather, it’s a tool to serve both.
Before the details obscure the big picture, a necessary word of caution: like any statistics, the results are only as good as the underlying data. Running the script against a small database when the production data set will be 1000 times larger is not going to provide meaningful indications in some cases.
Checks like those covered below are most likely present in any respectable performance tool. However, since the supporting data is the same and it’s available for free, for targetted checks it’s easier to implement it with a few lines of code, leaving the commercial tools for full-blown performance analyses.
Background (feel free to skip)
Indices work by reducing the amount of I/O required to fetch the data. Although programmers tend to think in terms of retrieving rows, the unit of I/O in Oracle is not a row, but a block(with many rows in a block or a row spanning over several blocks, depends on the block size and row size).
Using an index is not a free lunch nor does it automatically bring performance improvements. Maintaining the index itself adds time to INSERTs. Even for the primary use case, SELECTs, if the queries are not properly written the index may not be used. There is much material written about tuning SQL queries so I won’t insist here. Even if the indices are used, there is a less obvious factor to consider: the I/O speed of a full table scan versus of an indexed read.
When Oracle is performing a full table scan, it reads multiple blocks at once. How many is a matter of configuration (the db_file_multiblock_read_count parameter). To give a sense of the numbers, on my development machine, the block size is 8K and db_file_multiblock_read_count is 16.
An indexed read, however, works with single-block reads going back and forth between the index and the table: fetch the index entry then fetch the block (the picture is a bit simplified). Go again to the index for the next value, fetch the next block. If the table is not well ordered with regards to the index, reading subsequent values from the latter will read a totally different block (Oracle keeps track of how spread such reads would be in the clustering_factor column for an index). Even if the table data is ordered, if the index doesn’t identify a small subset of data but points to many blocks, it becomes counter-productive (a million records with only 4 possible values for a field would be a good example) to go through the index; a full table may be faster. It’s not always this straightforward: if a query only uses indexed fields, there is no need to go to the actual table and the index will still help, even if it is not selective.
I’ll stop repeating what’s found in any Oracle documentation. One note: the script is written with the regular b-tree indices in mind (well, function-based indices are also supported). The results are less applicable for other types, such as bitmap indices.
The raw data
The script collects information from several dictionary views (USER_INDEXES, USER_TABLES and USER_IND_COLUMNS) as well as some system parameters from V$PARAMETER, for which it requires elevated privileges. The latter are displayed for informational purposes so it’s possible to remove that part if SYSDBA privileges are not available.
It’s easy to fall into a “more statistics is better” trap and generate pages of reports. However, the purpose of the script is not to replace a commercial tool but to provide an aid to developers – the script is intended to “catch” possible problems rather than fill binders. The attached version picks a few low-hanging fruit – more can be easily added.
- Low-selectivity Indices: with a threshold that can be set in the script (as seen above, selectivity = # of distinct keys in the index / # of rows in the table).
- Small Tables Which Are Also Indexed: tables occupying a number of blocks comparable with db_file_multiblock_read_count and for which a full table scan would be very fast anyway.
- Possibly Unique Indices: indices not marked as UNIQUE but having unique values. Perhaps the UNIQUE constraint was accidentally omitted.
- Possibly Overindexed Tables: will list the tables with more than (by default) 5 indices. This is less of a judgement on the indexing strategy and more a way to catch indices that are obsolete and should have been dropped.
- just because an index exists does not matter Oracle will use it. Its optimizer can and will use the same metadata as the script and it may decide to go for a full table scan. Nonetheless, it is useful to get a sense of how selective the index is at all.
- remember the caveat of having relevant data. In production, a small table may not be small and a possibly unique index may not turn up to be unique.
Of course, there are more checks possible. I left out checking blevel and clustering_factor because they leaned more into the day-to-day administration of the database and there are less likely to be useful when developing.
The source (tested with Oracle 10g) is attached. Before running it, edit it and set the username/password for the user whose indices are evaluated as well as the password for ‘sys’. The latter is used to read db_file_multiblock_read_count and db_block_sizeonly.
To invoke, replace the correct classpath to the Oracle driver in the line below:
groovy --classpath C:oracleproduct10.1.0Db_2jdbclibojdbc14.jar IndexAnalyzer
An HTML report named index-analysis.htmlis created in the current directory.
The source should still be quite readable for those less familiar with Groovy. Apart from some syntactic sugar, the code makes use of the very convenient Collection methods such as findAll() and collect(). The report generation part relies on the very compact builder pattern (markup nodes are generated dynamically by invoking pretended methods).