Home » RDBMS Server » Performance Tuning » In Vs Like
In Vs Like [message #157833] Mon, 06 February 2006 08:56 Go to next message
meguesswho
Messages: 7
Registered: February 2006
Junior Member
Dear All,

Env: Oracle 9.2.0.6 (Linux 3AS)
TAB1 = 2.5 M rows
TAB2 = 200 rows

The two SQLs are as follows

=========
SQL 1
=========
SELECT * FROM TAB1
WHERE TAB1_col1 IN (SELECT TAB2_col1
FROM TAB2
WHERE TAB2_col1 IN ('YY'))

Operation Object Name Rows Bytes Cost

SELECT STATEMENT Optimizer Mode=CHOOSE 14 K 488
NESTED LOOPS 14 K 499 K 488
INDEX UNIQUE SCAN PK_TAB2 1 4
TABLE ACCESS FULL TAB1 14 K 442 K 487

======
SQL 2
======
SELECT * FROM TAB1
WHERE TAB1_col1 IN (SELECT TAB2_col1
FROM TAB2
WHERE TAB2_col1 LIKE ('YY'))

Operation Object Name Rows Bytes Cost
SELECT STATEMENT Optimizer Mode=CHOOSE 102 31
NESTED LOOPS 102 3 K 31
INDEX UNIQUE SCAN PK_TAB2 1 4
TABLE ACCESS BY INDEX ROWID TAB1 102 3 K 30
INDEX RANGE SCAN FK_TAB1 102 21

Schema stats are gathered everyday once. Can anyone help me understandthe reason for why would first execution plan do a full table scan and second sql wont?

Many Thanks
Regards
Re: In Vs Like [message #157838 is a reply to message #157833] Mon, 06 February 2006 09:49 Go to previous messageGo to next message
smartin
Messages: 1803
Registered: March 2005
Location: Jacksonville, Florida
Senior Member
I would think that neither one of those plans are going to be good. How exactly did you gather statistics? You probably want a hash join not the nested loop.

Why aren't you using an = sign to compare to the YY code? LIKE and IN are two very different operators. IN is for a list of separate items, a shortcut for OR. But for that to make sense, you need to have separate values you are comparing against.

Like is for a comparison against a single thing, but it is intended for when you want to use things like the wildcard % sign to match against things which are not necessarily equal, but similar to something else.

In your case though it looks like you just want to use = 'YY'.
Re: In Vs Like [message #157856 is a reply to message #157838] Mon, 06 February 2006 14:05 Go to previous messageGo to next message
meguesswho
Messages: 7
Registered: February 2006
Junior Member
Thanks for your response.

I agree with your comments. I should have specified that I was using 'YY' just to represent set of values and not just one. So ideally I should be using 'IN' but since it was doing full table scan (2.5m rows) I tried experimenting hence used 'LIKE'. And was bit surprised to see that I can avoid a full table scan if I use LIKE instead of IN (atleast for just one value). Since LIKE wont help me as I might have multiple values in the value set, I was hoping, if someone can suggest me, is there anything I can do to avoid full table scan and keep using IN.

Regarding your other query, I collect stats using the statement

DBMS_STATS.GATHER_SCHEMA_STATS('SCHEMA NAME');

which is scheduled to be executed once everyday.

Once again thanks, Looking forward to the response.
Re: In Vs Like [message #157950 is a reply to message #157833] Tue, 07 February 2006 08:02 Go to previous messageGo to next message
smartin
Messages: 1803
Registered: March 2005
Location: Jacksonville, Florida
Senior Member
Oh, I get you. Well the problem isn't necessarily that you are full table scanning a 2.5 million row table. The problem is that you are doing the FTS 200 times, once for each row in the outer portion of your nested loop. So while one FTS may be fine, 200 of them likely isn't.

Your memory and instance and cbo parameters may have something to do with this, as to why the CBO is choosing nested loops instead of hash joins. I would be randomly guessing though to try to throw things out. Suggest you read / skim through the sticky in this forum and the 10gR1 performance tuning guide (even though you are running 9.2, the 10g guide is very good).

Also for stats, you probably should get statistics on your indexes, and capture histograms. The sticky gives the command to do this for a single table at a time, you can lookup the gather_schema_stats function to see how to do that for a whole schema.

Give the CBO as much info as you can about your data and about your goals in the query. That also includes having all necessary constraints, foreign key, primary key, not null, check, etc.
Re: In Vs Like [message #158177 is a reply to message #157833] Wed, 08 February 2006 17:03 Go to previous messageGo to next message
chetwyn
Messages: 73
Registered: December 2005
Member
What does this give you???

SELECT *
  FROM tab1
 WHERE EXISTS (SELECT 1
                 FROM tab2
                WHERE tab1col = tab2col AND tabscol = 'YY')
Re: In Vs Like [message #158216 is a reply to message #158177] Thu, 09 February 2006 03:31 Go to previous messageGo to next message
meguesswho
Messages: 7
Registered: February 2006
Junior Member
Thanks for the responses.

I changed the way I collect the stats as following,

exec DBMS_Stats.Gather_Schema_Stats(
ownname => 'SCHEMA NAME',
estimate_percent => dbms_stats.auto_sample_size,
method_opt=>'for all columns size auto',
cascade=>true
);

After this change, all 3 SQLs (including One from chetwyn) have same execution steps BUT with different "Rows" & "Cost".

For SQL with "IN"

Operation Object Name Rows Bytes Cost

SELECT STATEMENT Optimizer Mode=CHOOSE 16 K 512
NESTED LOOPS 16 K 573 K 512
INDEX UNIQUE SCAN PK_TAB2 1 4
TABLE ACCESS FULL TAB1 16 K 508 K 511

For SQLs with "LIKE" OR "EXISTS"

Operation Object Name Rows Bytes Cost

SELECT STATEMENT Optimizer Mode=CHOOSE 6 K 516
NESTED LOOPS 6 K 231 K 516
INDEX UNIQUE SCAN PK_TAB2 1 4
TABLE ACCESS FULL TAB1 6 K 204 K 511

It is bit difficult for me to understand the different behavior.

Also somehow I cannot get rid of the issue (as pointed out by smartin) of "NESTED LOOPS".

Appreciate your input.

Regards
MS
Re: In Vs Like [message #158225 is a reply to message #158216] Thu, 09 February 2006 04:52 Go to previous messageGo to next message
meguesswho
Messages: 7
Registered: February 2006
Junior Member
Just an update,

I issued the "analyze table compute statistics" against the two tables. The explain plan changed for SQLs using "LIKE" or "EXISTS" and it now same as I posted in my fist post (with total cost of 31).

I am totally confused now, does dbms_stats miss out something which analyze puts back?

Regards
MS
Re: In Vs Like [message #158253 is a reply to message #157833] Thu, 09 February 2006 07:46 Go to previous messageGo to next message
smartin
Messages: 1803
Registered: March 2005
Location: Jacksonville, Florida
Senior Member
The analyze command is different from dbms_stats, but I can't explain exactly how. But one big difference is in making sure you compare apples to apples. When you used the analyze command, you likely were not estimating stats, but instead computing them for all rows. When you did the dbms_stats call, you did an estimate of size auto. I'd recommend sticking with dbms_stats, and doing a compute not an estimate of size auto.

You can also capture stats in parallel using the degree parameter to speed the statistics collection up. And while you are at it, might as well use size 254 in the method_opt instead of size auto. I'm a bit leary of size auto, especially in 9i.

But after that, you need to figure out how to turn it into a hash join if you can. Often it is because there is little or no memory allocated to the hash_area_size. Are you using pga_agg_target with workarea_size_policy set to auto?

Also, can you rewrite the query to just do a join? Would it be logically equivilent in your situation?

select tab1.*'
from tab1, tab2
where tab1.col = tab2.col
and tab2.col in ('a','b','etc');
Re: In Vs Like [message #158290 is a reply to message #158253] Thu, 09 February 2006 14:37 Go to previous messageGo to next message
rkl1
Messages: 97
Registered: June 2005
Member
I will take the suggestion of Mr.Martin and correct me please:

I will put the smaller table in cache:

Alter table small_table in cache;

and make index on the joining column of the big table and run an explain plan to confirm that index will be utilized during query execution.

However another question comes to mind what to do when there are numerous arguments to put inside the IN subquery. I think oracle has a limitation upto 1000 parameters to evaluate in IN subquery (coreect me please). One time I was advised to create a temp table of 1 column only to populate the distinct IN parameters and use join rather than go with IN since join is more preferred in index utilization.

mad scientist approach:let's say x5=big table x6=small table. they have identical column names and data types:

with view1 as (select distinct col2 from x6) select x5.* from x5,x6,view1 where x5.col1=x6.col1 and x6.col2=view1.col2;

thanks.
Re: In Vs Like [message #158371 is a reply to message #158253] Fri, 10 February 2006 04:10 Go to previous messageGo to next message
meguesswho
Messages: 7
Registered: February 2006
Junior Member
Thanks all for your input.

I deleted all table & index stats in schema (using "analyze ... delete statistics" for each table/index) and then, as suggested by Martin, changed the dbmas_stats collection statement (for 4 cpu host) as

exec DBMS_Stats.Gather_Schema_Stats(
ownname => 'SCHEMA NAME',
options => 'GATHER AUTO',
degree => 3
method_opt=>'for all columns size 254',
cascade=>true
);

With the new stats, execution plans were not really changed but cost certainly did (IN still the highest). After rewriting the query to just do a join, the explain plan is same as SQL "IN" i.e. with FTS. I think at the end of the day I have to fix DB to avoid "Nested Loops" as pointed out by Martin. Regarding the init params query,

hash_area_size = 1048576
pga_agg_target = 704643072
workarea_size_policy = AUTO

Is there anything you guys can suggest to avoid "Nested Loop"?

Once again thanks for your input, really appreciate.

Regards
MS
Re: In Vs Like [message #158514 is a reply to message #158371] Sun, 12 February 2006 17:44 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I've been encouraged to join in, so here goes...

Oracle can use an index to evaluate an IN clause. There's two IN clauses at work here: an IN sub-query, and an IN-list.

First the IN-list.
Oracle can use the INLIST ITERATOR operator to repeatedly scan the index for each element in the IN list. If CBO is choosing a FTS, then it thinks that the index will return too many rows. If there is an index on the inlist column, then it is almost certainly not unique (or the leading column of a multi-column index). Furthermore it probably has poor selectivity.
If you know something Oracle doesn't, you can probably encourage it with a simple INDEX hint.eg.
SELECT * FROM TAB1
WHERE TAB1_col1 IN (
  SELECT /*+ INDEX(tab2)*/ TAB2_col1
  FROM TAB2
  WHERE TAB2_col1 IN ('YY', 'XX'))

In your case, Oracle is actually using the index. You don't see the INLIST ITERATOR opteration in the Explain Plan because your in-list contained only one element, and Oracle transformed it to an "equals".

As for the IN sub-query:
With an uncorrelated IN sub-query, that returns few rows, Oracle can drive off the sub-query, and then fetch matching rows from the outer query. If CBO chooses not to do so, then it believes that the sub-query will return too many rows, or there is no index on the outer query IN column, or the index has poor selectivity.

In your case, it IS driving off the sub-query, but is choosing to FTS the outer-query table. It does this because it thinks that TAB1_col1 is either not indexed, or has poor selectivity.

First, ensure that TAB1.TAB1_col1 is indexed.

If it is still not using the index in the outer-query, then try:
SELECT /*+ORDERED USE_NL(tab1) INDEX(tab1)*/ * FROM TAB1
WHERE TAB1_col1 IN (
  SELECT /*+ INDEX(tab2)*/ TAB2_col1
  FROM TAB2
  WHERE tab2.TAB2_col1 IN ('YY'))


If this doesn't work, try different combinations of the three hints in the outer query by removing one or two of them.

_____________
Ross Leishman
Re: In Vs Like [message #158557 is a reply to message #157833] Mon, 13 February 2006 03:15 Go to previous messageGo to next message
chetwyn
Messages: 73
Registered: December 2005
Member
No Message Body

[Updated on: Mon, 13 February 2006 03:16]

Report message to a moderator

Re: In Vs Like [message #158610 is a reply to message #158514] Mon, 13 February 2006 07:47 Go to previous message
meguesswho
Messages: 7
Registered: February 2006
Junior Member
Thanks all for your input.

As suggested by Ross Leishman, I have used hints and it seems to use index now. He was quite correct as the column was the leading column of a multi-column index and had poor selectivity (avg_data_blocks_per_key was 253).

Here is what I did,

SELECT /*+ORDERED USE_NL(tab1) INDEX(tab1)*/ * FROM tab1 WHERE tab1_col1 IN (
SELECT /*+ INDEX(tab2)*/ tab2_col1
FROM tab2
WHERE tab2.tab2_col1 IN ('X1','X2'))


Operation Object Name Rows Bytes Cost
SELECT STATEMENT Optimizer Mode=CHOOSE 448 117
TABLE ACCESS BY INDEX ROWID tab1 224 6 K 58
NESTED LOOPS 448 15 K 117
INLIST ITERATOR
INDEX RANGE SCAN PK_tab2 2 8 1
INDEX RANGE SCAN FK_tab1 354 30


Once again I would like to thank everyone for their input.

Regards
MS
Previous Topic: plan stability:outlines
Next Topic: Delete Rows
Goto Forum:
  


Current Time: Fri Mar 29 04:20:43 CDT 2024