Home » RDBMS Server » Performance Tuning » How to approach this
How to approach this [message #219139] Tue, 13 February 2007 02:47 Go to next message
laiko
Messages: 10
Registered: February 2007
Junior Member
Say, I have a TableA which has 20M rows and has a foreign key that references TableB. Let’s call this column as B_ID. TableB has 500000 records.

What would be the best approach so that I can join TableA and TableB on column B_ID? Should I index TableA.BID to avoid full scan? If so, what kind of index? Please note that I cannot tell upfront the cardinality of TableA.B_ID since it can grow unpredictably, so making it as Bitmap indexed is just a guess.

[Updated on: Tue, 13 February 2007 02:47]

Report message to a moderator

Re: How to approach this [message #219253 is a reply to message #219139] Tue, 13 February 2007 12:34 Go to previous messageGo to next message
smartin
Messages: 1803
Registered: March 2005
Location: Jacksonville, Florida
Senior Member
In general yes, you want indexes on your foreign key columns. But like everything else it depends. Is this 20M row table transactional, or in a more DSS environment with regular loads? Can it be disabled and then loaded into and then rebuilt (if so, a bitmap may be more appropriate).

But you should be able to test both kinds of indexes to find out which is better, if one should be there at all.

And keep in mind that indexes may help on retrievals, but will hurt on data manipulation. So it depends if the benefit outweighs the cost in your environment.

Your 20M row table is a likely candidate for table partitioning.
Re: How to approach this [message #219288 is a reply to message #219253] Tue, 13 February 2007 20:03 Go to previous message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
If you want to join every row (or a significant subset) in the 20M row table then an index is (probably) not going to help; if you read every row in the table, why would you want to read every row in the index as well? If you only want to join a small subset of the rows then index the column that will restrict the subset.

If you do want to join every row in both tables, you will run up against a problem with Hash Joins whereby it starts spilling to disk if the smaller table exceeds your Hash Area Size. When this happens, BOTH tables need to be partitioned in Temp space, so the performance implications for a 20M row table are huge.

There are 4 ways to avoid this problem.
  1. Fast: If you only reference a few columns in one of the tables, create an index that includes the join key as the first column(s) and also includes any other filter or selected columns. This permits a Nested Loops join without a table lookup, and 20M table lookups are VERY costly.
  2. Fairly Fast: partition both tables on the join key, with equivalent partitions in each table. This is equivalent to the partitioning operation performed by Hash Joins when the Hash Area Size is too small. If each partition does not exceed the Hash Area Size then the join will be pretty efficient.
  3. Very Fast: If appropriate, the smaller table may be rebuilt in a single-table Hash Cluster, clustered on the join key. This also permits a nested loops join, except the 20M lookups are single-block hash-probes into the cluster. This is functionally equivalent to a hash join with a limitless Hash Area Size. There are lots of reasons NOT to hash cluster a table. Make sure understand them by reading the Concepts and Performance Tuning manuals before you attempt this. Also speak to the DBA. Unless you really understand your data, I would definitely avoid this method.
  4. Warp-speed: The absolute fastest way to join two tables - bar none - is a cluster join. This is where both tables reside in the same cluster (either index cluster or hash). This is only appropriate when the two tables are inseperable: any SQL that accesses one will almost surely access the other. A good example is INVOICE and INVOICE_LINE. A bad example is INVOICE and CUSTOMER. There are even more reasons NOT to cluster multiple tables. Avoid this method unless you really understand clusters and your data.


Ross Leishman
Previous Topic: Load on Oracle server
Next Topic: Data file init write
Goto Forum:
  


Current Time: Thu May 16 20:48:36 CDT 2024