OraFAQ Forum: Performance Tuning » Improving a "DISTINCT" query on a column with skewed data

Home » RDBMS Server » Performance Tuning » Improving a "DISTINCT" query on a column with skewed data

Show: Today's Messages :: Polls :: Message Navigator
E-mail to friend

Improving a "DISTINCT" query on a column with skewed data [message #65778]

Wed, 22 December 2004 08:36

Shane Kaszyca
Messages: 16
Registered: May 2003

Junior Member

I am trying to perform a query for distinct values on a column in one of my tables. There are about 16000 rows in the table and about 30 distinct values total. The data distribution appears at the bottom of this message. The query is performing a full table scan. I have tried putting an index on the column but that doesn't seem to work. Basically, I am doing a SELECT DISTINCT value FROM table. Any idea how I can prevent a Full Table Scan? It's costing me 4 seconds - which I can't afford. The data is skewed to one of the values - It appears about 84 percent of the time. Regards,

Shane Kaszyca

-value-��-count-
value�1��38
value�2��113
value�3��70
value�4��436
value�5��172
value�6��48
value�7��1
value�8��1
value�9��117
value�10��1
value�11��129
value�12��13728
value�13��75
value�14��70
value�15��21
value�16��109
value�17��348
value�18��71
value�19��44
value�20��158
value�21��119
value�22��1
value�23��69
value�24��130
value�25��58
value�26��194
value�27��130
value�28��4
value�29��1
value�30��50

Report message to a moderator

Re: Improving a "DISTINCT" query on a column with skewed data [message #65779 is a reply to message #65778]

Wed, 22 December 2004 12:15

Art Metzer
Messages: 2480
Registered: December 2002

Senior Member

----------------------------------------------------------------------
SQL> SELECT��description
��2��,��qty
��3��,��LPAD(TO_CHAR(RATIO_TO_REPORT(qty)
��4��OVER () *�100
��5��,��'fm90.00')
��6��,��5)��"%AGE"
��7��FROM��(SELECT��description
��8��,��COUNT(*)��qty
��9��FROM��t
�10��GROUP BY description)
�11��ORDER BY TO_NUMBER(SUBSTR(description,INSTR(description,' ') +�1))
�12��/
�
DESCRIPTION��QTY %AGE
--------------- ---------- -----
value 1��38��0.23
value 2��113��0.69
value 3��36��0.22
value 4��436��2.67
value 5��172��1.05
value 6��48��0.29
value 7��1��0.01
value 8��1��0.01
value 9��117��0.72
value 10��2��0.01
value 11��129��0.79
value 12��13728 84.18
value 13��75��0.46
value 14��34��0.21
value 15��21��0.13
value 16��109��0.67
value 17��348��2.13
value 18��71��0.44
value 19��44��0.27
value 20��158��0.97
value 21��119��0.73
value 22��1��0.01
value 23��69��0.42
value 24��66��0.40
value 25��58��0.36
value 26��194��1.19
value 27��64��0.39
value 28��4��0.02
value 29��1��0.01
value 30��50��0.31
�
30 rows selected.
�
SQL>
----------------------------------------------------------------------
Doing this DISTINCT should not take four seconds. In my testing, I had no indexes, and I was able to get a response in about 60 milliseconds.

That said, I don't see how you can do a DISTINCT without doing a full table scan. I mean, think about it: if you're not doing a FTS on a DISTINCT operation, then the row(s) that you hadn't table-scanned might have held that additional, unique value (and you would get back a wrong answer). And full-scanning an index wouldn't give that much performance boost over full-scanning the table, especially if the table is skinny.

I ran tkprof on three approaches I could think of to this problem. Those results are listed below.

********************************************************************************
 
<font color=red>SELECT DISTINCT description
FROM
 t</font>
 
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.06       0.06         43         46         12          30
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.06       0.06         43         46         12          30
 
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 217
 
Rows     Row Source Operation
-------  ---------------------------------------------------
     30  SORT UNIQUE
  16307   TABLE ACCESS FULL T
 
********************************************************************************

********************************************************************************
 
<font color=#663366>SELECT description
FROM
 (SELECT description, COUNT(*) FROM t GROUP BY description)</font>
 
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.06       0.06          0         46         12          30
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.06       0.06          0         46         12          30
 
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 217
 
Rows     Row Source Operation
-------  ---------------------------------------------------
     30  SORT GROUP BY
  16307   TABLE ACCESS FULL T

********************************************************************************

********************************************************************************
 
<font color=blue>SELECT description
FROM
(
SELECT description
,      ROW_NUMBER()
       OVER (PARTITION BY description
             ORDER BY     NULL) r
FROM   t
)
WHERE  r = 1</font>
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.37       0.37         66         46         18          30
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.37       0.37         66         46         18          30
 
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 217
 
Rows     Row Source Operation
-------  ---------------------------------------------------
     30  VIEW
  16307   WINDOW SORT
  16307    TABLE ACCESS FULL T
 
********************************************************************************

As you can see, the first two approaches perform similarly in 0.06 seconds, but the third one is a throwaway.

What kind of tkprof results do you get using your data?

And remember, full table scans are not always evil, indexes are not always good.

Report message to a moderator

Re: Improving a "DISTINCT" query on a column with skewed data [message #65784 is a reply to message #65779]

Thu, 23 December 2004 04:29

Shane Kaszyca
Messages: 16
Registered: May 2003

Junior Member

I tried tracing, it does not appear that my trace file is being created...I looked at what could be the problem, and information says that the oracle instance must have write access to the dump destination directory. Permissions are drwxrwxrwt on the dest dir, and still nothing. Any ideas?

Shane

Report message to a moderator

Previous Topic:	Do Remote Table use Remote indexes ?
Next Topic:	procedure or function which is best related to tuning

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Apr 18 20:55:14 CDT 2024