Making Use of Table Functions -- Part II

Art Trifonov's picture
articles: 

In Part I we discussed views, stored procedures, and table functions as the three options of returning record sets from the code stored in the database. In this section I will concentrate on table functions, demonstrating several kinds of solutions that utilize this feature.

Streamlining ETL

The reason I chose to discuss this use case first is not because I find it the most important but because this is the one discussed in Oracle documentation. In fact one Oracle 9i article described table functions the new ETL feature. The example discussed in Oracle documentation shows how to transform a single row of the "ticker/open_price/close_price" type into two rows of the "ticker/price_type/price" type.
If you currently perform such transformations by moving data through a series of intermediate staging tables, with table functions you can do it on the fly, all within a single step.
The performance demands of ETL environments are addressed with table functions’ optimization features like pipelining and support for parallel execution.

Common Utilities

Suppose within the application the convention is to always display names in the "Lastname, Firstname MddleInitiaial" format. For that purpose you create the function that returns:

Initcap(lName)
||', '
||Initcap(fName)
||' '
||Substr(mName,1,1)

This is the classic example demonstrating the advantage of using PL/SQL functions. You code the logic once and make it available for re-use by all applications that request names - thus making it easier to code and enforcing the standard.

Table functions can serve the same purpose when you need to return not just a single value, but a record set.
Consider the following example:
Many of your reports return various kinds of business activities summarized by calendar date. In order to include the days that had no activity you must join to the list of calendar dates. A table function returning such list would come very handy.

The example provided here demonstrates the table function that returns one row per each calendar date within any given range. We will return to this function later in this article.

Encapsulating Business Logic

In Part I advocated using a Application Access Layer implemented in the database instead of embedding this logic in applications. In this section I will demonstrate implementing this strategy using table functions.

Company ABC opens its database to affiliated company XYZ. Now, instead of relying on weekly data feeds and reports,
Company XYZ’s applications can run direct queries.

Company XYZ now needs to integrate ABC’s data into its applications. The problem is that the two companies operate under different business rules and each has its own way of recording and interpreting business data. On top of that, when several years earlier ABC migrated their database from a legacy system to Oracle they overlooked such details as proper data modeling. As a result, their data is not just different it is also messy.
A team of business analysts worked for months to define the rules to translate ABC’s data into familiar entities like Order and Order Line Item. Now it is time to develop applications that run against Company ABC’s data.
Requiring every single developer to become intimately familiar with all the peculiarities of ABC’s data may not be a practical option. One alternative is to code a library of table functions – each translating foreign data into the format that corresponds to one of the native tables. Now, the minor syntax difference is all your developers need to be aware of when querying external data.

Thus, instead of

SELECT o.Order_Number,
       o.Order_Date,
       i.Item_Name,
       i.Quantity
FROM   order o, 
       order_line_item i
WHERE  o.Order_Id = i.Order_Id 
  AND  o.Customer_Id = p_Customer_Id

they would code
SELECT o.Order_Number,
       o.Order_Date,
       i.Item_Name,
       i.Quantity
FROM   TABLE(Tfn_ABC_order(p_Customer_Id)) o,
       TABLE(Tfn_ABC_order_line_item(o.Order_Id)) i
WHERE  o.Order_Id = i.Order_Id

With all the complexities stemming from interpreting and integrating external data now hidden inside the library of table functions, it is pretty much business as usual.

Modularizing Complex SQL

One of my recent projects involved migration from Sybase and SQL Server to Oracle. One of the challenges was getting T-SQL developers to abandon their habit of relying on temporary tables in favor of Oracle SQL features like analytic functions, WITH clause, and partitioned outer joins. While many developers were impressed with these features, some complained that they make the code too complex – hard to support and debug. I think it is a fair complaint.

You can use table functions to modularize a complex SQL query in the similar way as you use other PL/SQL programs to modularize your code.

Your monthly Sales Analysis Report displays the following data:

  • Store Name,
  • Department Name,
  • Average Daily Sales
  • Average Daily Sales Rank by Store,
  • Average Daily Sales Rank by Department

In order to generate this result set you need to perform the following operations:

  • Get all sales records for the given month summarized by store code, department code, and calendar date
  • Densify the result set by appending “0 Sales” rows for each day there were no sales recorded for any given store/department combination.
  • Apply analytic functions to perform multi-dimensional statistical analysis and aggregate the results by month.
  • Translate short codes by joining the final results to reference tables.

For optimal performance all of the above operations can be executed in a single SQL statement. The down side of this approach is that your query will have to involve inline views many levels deep. Coding it would require expert SQL skills. And modifying this query for subsequent releases can be a challenge even for the original author.

Alternatively, these operations can be coded as a series of cursor loops, either residing inside the same procedure or modularized as several PL/SQL programs. That would make the code easier to understand and maintain, but it may involve serious performance penalties.

The approach outlined below allows you to combine the power of SQL with the modularity of PL/SQL. You keep the overall structure of the single query, and modularize along the lines of the inline views.

Pkg_month_sales_analysis contains three table functions:

Tfn_get_base_data executes the following SQL:

SELECT c.Calendar_Dt,
       t.Store_Code,
       t.dept_Code,
       t.Sales_Amt
FROM   (SELECT   Trunc(Transaction_Ts) AS Sale_Date,
                 Store_Code,
                 Dept_Code,
                 Sum(Sales_Amt) AS Sales_Amt
        FROM     sales
        WHERE    Transaction_Ts BETWEEN p_dt_Start AND p_dt_End
        GROUP BY Trunc(Transaction_Ts), Store_Code, Dept_Code) t
PARTITION BY (t.Store_Code, t. Dept_Code)
RIGHT OUTER JOIN
       (SELECT   Calendar_Dt
        FROM     TABLE (Tfn_date_range (p_dt_Start, p_dt_End))) c
ON (t.Sale_Date = c.Calendar_Dt)

The above query returns all detail data that needs to be analyzed. Note that it uses the table function tfn_date_rnge as one of its data sources, which returns a set of calendar dates within the specified range.

Tfn_analyze uses the result set returned by Tfn_get_base_data and applies analytic and aggregate calculations.

SELECT Store_Code,
       dept_Code,
       avg_Daily_Sales,
       Rank() OVER(PARTITION BY Store_Code ORDER BY avg_Daily_Sales) AS avg_Daily_Sales_Store_Rank,
       Rank() OVER(PARTITION BY dept_Code ORDER BY avg_Daily_Sales) AS avg_Daily_Sales_dept_Rank
FROM   (SELECT   Store_Code,
                 dept_Code,
                 Avg(Sales_Amt,0) AS avg_Daily_Sales
        FROM     TABLE(pkg_Sales_Analysis.Tfn_get_base_data(p_dt_Start,p_dt_End))
        GROUP BY Store_Code,
                 dept_Code)

Finally, tfn_funal_result looks up reference values for the records returned by tfn_analyze and returns the final result set.

SELECT st.Store_Name,
       de.dept_Name,
       ta.avg_Daily_Sales,
       ta.avg_Daily_Sales_Store_Rank,
       ta.avg_Daily_Sales_dept_Rank
FROM   TABLE(pkg_Sales_Analysis.Tfn_analyze(p_dt_Start,p_dt_End)) ta,
       store st,
       dept de
WHERE  ta.Store_Code = st.Store_Code
       AND ta.dept_Code = dw.dept_Code

Now, the complex query is broken down into three modules. Each module corresponds to a logical unit of the program. First, you locate the data to be analyzed. Next, you perform the actual analysis. Finally, you format and return the final result set. To simplify debugging each function can be called separately, returning the intermediate result set.

The performance impact of this approach vs. a single SQL statement depends on how you choose to modularize the query. If your table functions are aligned with the operations the optimizer would perform if you kept it as a single query, you can expect little or no performance loss.

Conclusion

The objective of this article was to make the case for viewing table functions not just as a special ETL feature, but also more generally as another method for returning record sets from database-stored code. As such it can be applied in many ways that go far beyond optimizing ETL transformations.