Great article on DENSE_RANK and its application using PS.
http://fugoconsulting.wordpress.com/2010/04/19/the-power-of-rank/
The power of RANK
by sundarmk on April 19, 2010
Talk PeopleSoft and you can’t avoid talking about the omni-present, omni-potent “Effective Date”. Ah! What a convenient way to store active and historic records, all in one place? Often you end up developing SQLs that self-join an effective dated table to reach out to the most active record. But, when an effective dated table grows, grows and grows, you soon start having a “King Kong-rampaging-the-streets-of-New-York” scene in your database. Pages take forever to load, sometimes even time-out. Processes never seem to end, so you can’t generate those interface files you need to send across to up/down stream systems. Heck…you can’t even generate that simple Employee listing report and keep your manager happy.
A typical effective-dated query on JOB table looks like:
SELECT J.EMPLID, J.EMPL_RCD, J.EFFDT, J.EFFSEQ……….
FROM PS_JOB J
WHERE J.EFFDT = (SELECT MAX(EFFDT) FROM PS_JOB
WHERE EMPLID = J.EMPLID
AND EMPL_RCD = J.EMPL_RCD
AND EFFDT <= SYSDATE)
AND J.EFFSEQ = (SELECT MAX(EFFSEQ) FROM PS_JOB
WHERE EMPLID = J.EMPLID
AND EMPL_RCD = J.EMPL_RCD
AND EFFDT = J.EFFDT)
Each row for an employee is matched up against all of their other rows to determine the latest effective-dated row. In other words, if the employee has three rows in all, you access the three rows once in the main query and three times again for each row in the subquery. Thats 3+(3×3) = 12 reads against the database for one employee.
If an employee has about 100 rows of data (a common scenario in organizations that have been running on PeopleSoft for over a decade), thats a whopping 10,100 – 100 + (100 x 100) – reads against the database. That’s just for one employee! Many other mammoth-sized tables, for example the Global Payroll Accumulator table GP_RSLT_ACUM, use effective dates and suffer similar performance issues.
Oracle’s LAST analytic function offers a better way to handle this problem. The LAST function with a few other friends like KEEP, DENSE_RANK and OVER, forms a great team that delivers results! Without further ado, here’s the LAST way of doing the effective dated sub-query.
SELECT J.EMPLID, J.EMPL_RCD, J.EFFDT, J.EFFSEQ
FROM (
SELECT EMPLID, EMPL_RCD, EFFDT, EFFSEQ
MAX(EFFDT) KEEP (DENSE_RANK LAST ORDER BY EFFDT, EFFSEQ)
OVER (PARTITION BY EMPLID, EMPL_RCD) MAX_EFFDT
MAX(EFFSEQ) KEEP (DENSE_RANK LAST ORDER BY EFFDT, EFFSEQ)
OVER (PARTITION BY EMPLID, EMPL_RCD) MAX_EFFSEQ
FROM PS_JOB
) J
WHERE J.EFFDT = J.MAX_EFFDT
AND J.EFFSEQ = J.MAX_EFFSEQ
All the fire power is packed in a single inline query, in which you select all the columns you need plus two other pseudo columns – MAX_EFFDT and MAX_EFFSEQ.
The pseudo-column is composed of:
1. An aggregate function, which could be any of MAX, MIN, SUM, etc., which aggregates on rows ranked by LAST.
2. DENSE_RANK LAST – Handpicks only those rows that rank LAST after ordering the dataset by EFFDT and EFFSEQ.
3. OVER – Indicates how to group the dataset, in this case, EMPLID and EMPL_RCD.
The inline query returns as many rows as an employee has on the PS_JOB table. The main query doesn’t have to do much here. It does a simple join on the pseudo-columns to throw the result back in a jiffy. There have been cases when the LAST function performed hundreds of times faster compared to the old-fashioned sub-query.
You can use the LAST function, virtually anywhere you would use a standard sub-query. Some of the other tables that I have had tremendous luck with LAST function are PSTREENODE and PS_GP_RSLT_ACUM.
This is a really good discussion! Thanks for the viewpoints.http://www.accumulatorsandfilters.com
ReplyDelete