Home » RDBMS Server » Performance Tuning » Proving OTT undo reads (In post)
Proving OTT undo reads [message #528664] Wed, 26 October 2011 05:59 Go to next message
Roachcoach
Messages: 1576
Registered: May 2010
Location: UK
Senior Member
Version:
mkr@PRD> select * from v$version
  2  /

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
PL/SQL Release 9.2.0.8.0 - Production
CORE    9.2.0.8.0       Production
TNS for Solaris: Version 9.2.0.8.0 - Production
NLSRTL Version 9.2.0.8.0 - Production



I've had to chop/modify the output a little for linesizes, but the key detail is here, not much I could do with the xplan however.

Xplan:
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 |  Name                         | Rows  | Bytes | Cost (%CPU)|  TQ    |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                               |    31 |  7471 |   766K  (1)|        |      |            |
|   1 |  SORT GROUP BY            |                               |    31 |  7471 |   766K  (1)| 69,08  | P->S | QC (RAND)  |
|*  2 |   HASH JOIN               |                               |    31 |  7471 |   766K  (1)| 69,07  | P->P | HASH       |
|*  3 |    HASH JOIN              |                               |    31 |  7254 |   766K  (1)| 69,07  | PCWP |            |
|*  4 |     FILTER                |                               |       |       |            | 69,07  | PCWC |            |
|*  5 |      HASH JOIN OUTER      |                               |       |       |            | 69,05  | P->P | BROADCAST  |
|*  6 |       HASH JOIN           |                               |    31 |  5208 |   630K  (1)| 69,03  | P->P | HASH       |
|*  7 |        HASH JOIN          |                               |    26 |  2782 |   599K  (1)| 69,02  | P->P | BROADCAST  |
|*  8 |         HASH JOIN         |                               |    83 |  4648 |   520K  (1)| 69,01  | P->P | BROADCAST  |
|*  9 |          TABLE ACCESS FULL|    CATALOGUEPRODUCT           |     9 |   216 | 12307   (1)| 69,00  | S->P | BROADCAST  |
|* 10 |          TABLE ACCESS FULL|    PORTFOLIOPRODUCT           |  8425K|   257M|   507K  (1)| 69,01  | PCWP |            |
|* 11 |         TABLE ACCESS FULL |    SUBSCRIPTION               |  2285K|   111M| 79287   (1)| 69,02  | PCWP |            |
|* 12 |        TABLE ACCESS FULL  |    CLAWBACKITEMS              |  3356K|   195M| 30951   (1)| 69,03  | PCWP |            |
|* 13 |       INDEX FAST FULL SCAN|    PK_CLAWBACKEVENTS          |   708K|    21M| 17692   (1)| 69,04  | P->P | HASH       |
|  14 |     TABLE ACCESS FULL     |    SUBSCRIPTIONAGREEMENTITEM  |    65M|  2126M|   117K  (1)| 69,07  | PCWP |            |
|  15 |    TABLE ACCESS FULL      |    SUBSCRIPTIONAGREEMENT      |    31 |   217 |     2  (50)| 69,06  | P->P | BROADCAST  |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("SAI"."SUBSCRIPTIONAGREEMENTID"="SA"."ID")
   3 - access("SAI"."SUBSCRIPTIONID"="S"."ID")
   4 - filter("CE"."EVENTCODE" IS NULL)
   5 - access("CE"."CLAWBACKID"(+)="CI"."ID")
   6 - access("CI"."SUBSCRIPTIONID"="S"."ID")
   7 - access("S"."SERVICEINSTANCEID"="PP"."SERVICEINSTANCEID")
   8 - access("PP"."CATALOGUEPRODUCTID"="CP"."ID")
   9 - filter("CP"."SUBSCRIPTIONTYPE"='4' AND ("CP"."SERVICETYPECODE"='13' OR "CP"."SERVICETYPECODE"='8'))
  10 - filter("PP"."STATUS"='EN')
  11 - filter(("S"."SUBSCRIPTIONTYPEID"='1' OR "S"."SUBSCRIPTIONTYPEID"='2') AND ("S"."STATUS"='AB' OR "S"."STATUS"='PA' OR
              "S"."STATUS"='PO' OR "S"."STATUS"='SC'))
  12 - filter("CI"."EXEMPTIONREASONCODE" IS NULL AND "CI"."EXEMPTIONDATE" IS NULL)
  13 - filter("CE"."EVENTCODE"(+)='EXEMPT')

38 rows selected.

Elapsed: 00:00:01.23



Now then, with the Xplan out of the way, the background:

The above usually runs pretty quickly (in the 30-45 minute range) but today folks let me know it was dragging on for nearly 4 hours and has downstream impacts of a slow run so I started digging and found the below:


Checking live waits:
mkr@PRD> select
  2  s.sid,
  3  --s.ROW_WAIT_OBJ#,
  4  sw.event,
  5  --s.ROW_WAIT_FILE#,
  6  --s.ROW_WAIT_BLOCK#,
  7  ts.name tsname,
  8  dbo.object_name,
  9  dbo.object_type,
 10  sw.seconds_in_wait time
 11  from v$session s
 12  join v$session_wait sw
 13  on s.sid=sw.sid
 14  join v$datafile df
 15  on s.ROW_WAIT_FILE#=df.file#
 16  join v$tablespace ts
 17  on df.ts#=ts.ts#
 18  left join dba_objects dbo
 19  on dbo.object_id = s.ROW_WAIT_OBJ#
 20  where s.sid in (select sid from v$px_session where qcsid=&sid) or s.sid=&sid
 21  /
old  20: where s.sid in (select sid from v$px_session where qcsid=&sid) or s.sid=&sid
new  20: where s.sid in (select sid from v$px_session where qcsid=1170) or s.sid=1170



       SID EVENT                          TSNAME              OBJECT_NAME                  OBJECT_TYP       TIME
---------- ------------------------------ ------------------- ---------------------------- ---------- ----------
      1170 PX Deq: Execute Reply          DATA_0001_01        CATALOGUEPRODUCT             TABLE               0
       440 buffer busy waits              UNDOTBS                                                              0
      2644 buffer busy waits              UNDOTBS                                                              0
      3813 buffer busy waits              UNDOTBS                                                              0
      6187 buffer busy waits              UNDOTBS                                                              0
      6240 latch free                     UNDOTBS                                                              0

6 rows selected.
Elapsed: 00:01:25.62


mkr@PRD> /
old  20: where s.sid in (select sid from v$px_session where qcsid=&sid) or s.sid=&sid
new  20: where s.sid in (select sid from v$px_session where qcsid=1170) or s.sid=1170

       SID EVENT                          TSNAME           OBJECT_NAME                    OBJECT_TYP       TIME
---------- ------------------------------ -------------- --------------------------------- ---------- ----------
      1170 PX Deq: Execute Reply          DATA_0001_01     CATALOGUEPRODUCT               TABLE               1
       440 buffer busy waits              UNDOTBS                                                             0
      2644 db file sequential read        UNDOTBS                                                             0
      3813 buffer busy waits              UNDOTBS                                                             0
      6187 db file sequential read        UNDOTBS                                                             0
      6240 buffer busy waits              UNDOTBS                                                             0

6 rows selected.

Elapsed: 00:01:31.33



mkr@PRD> /
old  20: where s.sid in (select sid from v$px_session where qcsid=&sid) or s.sid=&sid
new  20: where s.sid in (select sid from v$px_session where qcsid=1170) or s.sid=1170

       SID EVENT                          TSNAME            OBJECT_NAME                    OBJECT_TYP       TIME
---------- ------------------------------ ----------------- ------------------------------ ---------- ----------
      1170 PX Deq: Execute Reply          DATA_0001_01       CATALOGUEPRODUCT               TABLE              1
       440 db file sequential read        UNDOTBS                                                              0
      2644 db file sequential read        UNDOTBS                                                              0
      3813 db file sequential read        UNDOTBS                                                              0
      6187 db file sequential read        UNDOTBS                                                              0
      6240 db file sequential read        UNDOTBS                                                              0

6 rows selected.

Elapsed: 00:01:43.65



Checking session events:
mkr@PRD> select * from (
select * from v$session_event where sid in (select sid from v$px_session where qcsid=&sid) or sid=&sid
order by time_waited desc)
where rownum <21
/  2    3    4    5


       SID EVENT                          TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT   MAX_WAIT 
---------- ------------------------------ ----------- -------------- ----------- ------------ ---------- 
      1170 PX Deq: Execute Reply                 6816           6336     1276213          187        198 
      1342 PX Deq: Table Q Normal               82667           4949     1273181           15        197 
        50 PX Deq: Table Q Normal               82491           4948     1272953           15        197 
      6159 PX Deq: Table Q Normal               82704           4948     1272878           15        197 
      1000 PX Deq: Table Q Normal               82478           4948     1272587           15        197 
       642 PX Deq: Table Q Normal               82468           4948     1272547           15        197 
      6187 db file sequential read            1768610              0     1034154            1        254 
       440 db file sequential read            1786564              0     1033784            1        302 
      3813 db file sequential read            1764778              0     1028883            1        161 
      6240 db file sequential read            1757146              0     1026620            1        104 
      2644 db file sequential read            1757390              0     1017182            1        348 
      2644 buffer busy waits                   235996              7      135121            1         98 
      6240 buffer busy waits                   228002              5      128353            1         98 
      3813 buffer busy waits                   219779              1      125131            1         98 
       440 buffer busy waits                   220574              2      122776            1         98 
      6187 buffer busy waits                   216293              1      120654            1         98 
      2644 direct path read                    142094              0       71139            1        159 
      3813 direct path read                    139584              0       70431            1        111 
      6240 direct path read                    129814              0       69788            1        187 
      6187 direct path read                    138785              0       69043            0        184 

20 rows selected.

Elapsed: 00:00:03.56



old   2: select * from v$session_event where sid in (select sid from v$px_session where qcsid=&sid) or sid=&sid
new   2: select * from v$session_event where sid in (select sid from v$px_session where qcsid=1170) or sid=1170

       SID EVENT                          TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT   MAX_WAIT 
---------- ------------------------------ ----------- -------------- ----------- ------------ ---------- 
      1170 PX Deq: Execute Reply                 7528           7020     1412149          188        198 
      1342 PX Deq: Table Q Normal               90614           5467     1408716           16        197 
      6159 PX Deq: Table Q Normal               90652           5467     1408664           16        197 
        50 PX Deq: Table Q Normal               90429           5466     1408400           16        197 
      1000 PX Deq: Table Q Normal               90381           5466     1407847           16        197 
       642 PX Deq: Table Q Normal               90346           5466     1407661           16        197 
       440 db file sequential read            1930671              0     1116983            1        302 
      6187 db file sequential read            1903706              0     1108608            1        254 
      2644 db file sequential read            1904438              0     1100081            1        348 
      6240 db file sequential read            1877216              0     1097300            1        104 
      3813 db file sequential read            1878367              0     1094915            1        161 
      2644 buffer busy waits                   306781              7      175494            1         98 
      6240 buffer busy waits                   293760              5      165220            1         98 
      3813 buffer busy waits                   285398              1      162639            1         98 
       440 buffer busy waits                   290514              2      162497            1         98 
      6187 buffer busy waits                   268801              1      149951            1         98 
      3813 direct path read                    148608              0       72280            0        111 
      2644 direct path read                    149286              0       72159            0        159 
      6240 direct path read                    139591              0       71256            1        187 
      6187 direct path read                    143816              0       69602            0        184 

20 rows selected.

Elapsed: 00:00:03.35




Now those are samples of typical results, I took more samples than posted and saw no great variations. My theory is that those wait events are not in line with that plan, as such, and show a query spending an unacceptable* amount of time in undo reads.

There are LOTS of other jobs going on in parallel to this but a recent one has been added that starts before and ends after this one starts which does a ton of updates to a table in the query.

My questions are: Are my queries/checks on the job activity solid and if so, does it appear in the information to hand that my theory correct - that this query is spending one hell of a long time reading undo, one block at a time?

I'm relatively confident but before I'd value some input if there is anything else I can do to prove this or indeed if I'm basically just going of in the wrong direction. Is there any easy way to prove this after the fact, or do I really need to be tracing it for any measurable retrospective 'proof'?

I realise the data here may be a bit spartan, but it is all I have at the moment.

*Obviously I realise the notion of 'excessive' undo reads is a false one, but if one delays this job until the big bulk updates are done, then this query can crack on with proper direct path reads and not spend all its damned time to single block reads of undo.
Re: Proving OTT undo reads [message #528674 is a reply to message #528664] Wed, 26 October 2011 06:28 Go to previous messageGo to next message
John Watson
Messages: 8922
Registered: January 2010
Location: Global Village
Senior Member
Remember that when you do a parallel full table scan, Oracle uses direct reads. Therefore, the object must be checkpointed first. I'm sure you knew that, but I always like to mention it because a lot of programmers don't realize this. Then the parallel direct read can proceed. But if the checkpoint includes writing uncommitted changes to the datafiles, whenever the scan hits one of those blocks, it must revert to a read via cache so that a past-image can be constructed in the SGA. Then the scan can continue.

A similar effect occurs with delayed block clean-out. With 11g, if a scan hits a block that has been updated and was written to disc before the commit, the scanning process checks the undo segments to see if the transaction has committed, and then "remembers" the XID for all subsequent blocks that need clean-out, so the interruption occurs only once per transaction. I think that back in 9i, this also caused a interruption for every block.

It looks to me as though your scan is being continually interrupted for these reasons. Normally, it wouldn't be too bad because the undo segment should be in memory: I'm surprised at undo having to be read from disc. If this hypothesis is correct, it could result from your new job that is running during this query, particularly if the transaction(s) is so long lasting that the undo blocks are being flushed to disc.

Basically, parallel FTS with concurrent DML is often disastrous. I've come across this with Exadata, where smart scans can be crippled by it.

I may be wrong, but I think your analysis of the situation is probably correct and I've just tried to formalize the issue.

Hope this helps.

edit: typo

[Updated on: Wed, 26 October 2011 06:29]

Report message to a moderator

Re: Proving OTT undo reads [message #528679 is a reply to message #528674] Wed, 26 October 2011 07:01 Go to previous messageGo to next message
Roachcoach
Messages: 1576
Registered: May 2010
Location: UK
Senior Member
It does yes Smile

I should have added that I'm 99% sure that the job I'm suspicious of causing the issue is using looping commits (for restartability I believe, not my job)

I forgot to add the longops was showing a single table read for these events - all of the above was over a single object.


It is a massively busy OLTP server ('batch' jobs and OLTP are bad bedmates, I know but I can only fight one battle at a time!) which may explain the undo ending up in disc more than you'd expect.
Re: Proving OTT undo reads [message #528713 is a reply to message #528679] Wed, 26 October 2011 08:50 Go to previous message
Roachcoach
Messages: 1576
Registered: May 2010
Location: UK
Senior Member
By means of follow up - the job had been killed and restarted and is looking significantly more like I'd expect:

old   2: select * from v$session_event where sid in (select sid from v$px_session where qcsid=&sid) or sid=&sid
new   2: select * from v$session_event where sid in (select sid from v$px_session where qcsid=4423) or sid=4423

       SID EVENT                          TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT   MAX_WAIT 
---------- ------------------------------ ----------- -------------- ----------- ------------ ---------- 
      4423 PX Deq: Execute Reply                  646            121       59270           92        195 
      2378 PX Deq: Table Q Normal               82871              0       54230            1         79 
       125 PX Deq: Table Q Normal               82428              0       53309            1         43 
      1473 PX Deq: Table Q Normal               81328              0       52952            1         77 
      2882 PX Deq: Table Q Normal               81731              0       52721            1         77 
      4520 PX Deq: Table Q Normal               81333              0       52709            1         57 
       383 direct path read                    141188              0       43855            0         68 
       279 direct path read                    149906              0       42747            0        112 
      3856 direct path read                    145727              0       42249            0         86 
      1168 direct path read                    149655              0       42178            0         44 
      2571 direct path read                    145821              0       41773            0         58 
      3856 PX Deq: Table Q Normal                  51             24        6340          124        196 
      2571 PX Deq: Table Q Normal                  50             24        6340          127        195 
      1168 PX Deq: Table Q Normal                  50             24        6297          126        195 
       383 PX Deq: Table Q Normal                  47             24        6240          133        196 
       279 PX Deq: Table Q Normal                  48             24        6239          130        196 
      2378 PX Deq: Execution Msg                   30             24        4863          162        195 
      4520 PX Deq: Execution Msg                   29             23        4773          165        195 
      1473 PX Deq: Execution Msg                   29             23        4729          163        195 
       125 PX Deq: Execution Msg                   27             22        4479          166        195 



Stats pack also shows undotbs reads are down 50% from the earlier problem window as a whole (600k reads per window down from 1.2m).

Now then...to find the root cause of the transaction(s) pushing it into the undo reads...
Previous Topic: Performance Issue after Bulk Insertion
Next Topic: Are Stale STATS Any Good
Goto Forum:
  


Current Time: Fri Mar 29 07:43:05 CDT 2024