Wednesday, October 28, 2015

Dates in Range

Posing the Question

Arturo was after a solution encompassing more than a single month. Following is the question he put to me:
"What if you had two different ranges?
   beg_date        end_date
   01-Jan-2013     15-Jan-2013
   15-Mar-2013     02-Apr-2013
How do you generate individual rows for each day between 01/01 to 01/15, and between 03/15 to 04/02?"
An excellent question! Arturo's question forces me to gereralize the solution from my previous article that is specific to days in a single month, and make it work to cover any date range, even one spanning many months.

Getting Started

Following is my starting point, the solution from my prior article. It's a query that returns one row for each day in the current month.
SELECT LEVEL, LAST_DAY(SYSDATE)-LEVEL+1
FROM dual
WHERE ROWNUM <= EXTRACT(DAY FROM LAST_DAY(SYSDATE))
CONNECT BY LEVEL=ROWNUM;
The query makes creative use of LEVEL and CONNECT BY to form a row-generator giving incremental values beginning with 1. Following are some aspects to consider:
  • The LEVEL column counts upward toward the number of days in the month.
  • The starting date is implicitly the first of the month.
  • The EXTRACT expression generates a value marking the end of the range.
The solution logic is based around integers returned by LEVEL and EXTRACT that represent day numbers. That logic limits the solution to generating rows for only a single month.

Generalizing the Solution

My first need is to generalize the query by making it possible to specify any two dates to denote the range of interest. Before doing that, I want to get rid of ROWNUM from the WHERE clause and use LEVEL instead. It may be lack of coffee, but my use of ROWNUM is making it harder for me to think clearly about the query and how it works. The fix is a trivial, one-word change to the WHERE clause:
SELECT LEVEL, LAST_DAY(SYSDATE)-LEVEL+1
FROM dual
WHERE LEVEL <= EXTRACT(DAY FROM LAST_DAY(SYSDATE))
CONNECT BY LEVEL=ROWNUM;
Now I can get on with solving the problem at hand. I want to specify beginning and ending dates explicitly. There are two date ranges, so I'll create and initialize four bind variables in SQL*Plus:
VARIABLE beg1_date CHAR(11);
VARIABLE end1_date CHAR(11);
VARIABLE beg2_date CHAR(11);
VARIABLE end2_date CHAR(11);

EXECUTE :beg1_date := '1-Jan-2013'; 
EXECUTE :end1_date := '15-Jan-2013';
EXECUTE :beg2_date := '15-Mar-2013';
EXECUTE :end2_date := '2-Apr-2013';
My solution query shown earlier generates dates in reverse order, beginning from the end of the month as given by LAST_DAY(SYSDATE). It's an easy modification to reference end2_date as the end of the range. Just specify TO_DATE(:end2_date) instead. For example:
From: SELECT LEVEL, LAST_DAY(SYSDATE)-LEVEL+1
To: SELECT LEVEL, TO_DATE(:end2_date)-LEVEL+1
I'll simplify the WHERE clause to restrict output to ten rows. Let's see how things go:
SELECT LEVEL, TO_DATE(:end2_date)-LEVEL+1
FROM dual
WHERE LEVEL <= 10
CONNECT BY LEVEL=ROWNUM;

     LEVEL TO_DATE(:END2_DATE
---------- ------------------
         1 02-APR-13
         2 01-APR-13
         3 31-MAR-13
         4 30-MAR-13
         5 29-MAR-13
         6 28-MAR-13
         7 27-MAR-13
         8 26-MAR-13
         9 25-MAR-13
        10 24-MAR-13
These results are promising. Next I can modify the WHERE clause to terminate execution at the beginning of the range. I'll specify beg1_date, because I want the query to generate a wide enough range of dates to encompass both the desired, smaller ranges. Here's the query and its results:
SELECT LEVEL, TO_DATE(:end2_date)-LEVEL+1
FROM dual
WHERE TO_DATE(:beg1_date) <= TO_DATE(:end2_date)-LEVEL+1
CONNECT BY LEVEL=ROWNUM;

     LEVEL TO_DATE(:END2_DATE
---------- ------------------
         1 02-APR-13
         2 01-APR-13
         3 31-MAR-13
...
        90 03-JAN-13
        91 02-JAN-13
        92 01-JAN-13
The query is now generating an extended range of dates covering both the smaller ranges that I'm after.

Punching a Hole

All that's left is to "punch a hole" in the results by getting of the unwanted rows. Here is my first attempt, which is to add two predicates to the WHERE clause:
SELECT LEVEL, TO_DATE(:end2_date)-LEVEL+1
FROM dual
WHERE TO_DATE(:beg1_date) <= TO_DATE(:end2_date)-LEVEL+1
  AND NOT (
        TO_DATE(:end2_date)-LEVEL+1 > TO_DATE(:end1_date)
    AND TO_DATE(:end2_date)-LEVEL+1 < TO_DATE(:beg2_date)
    )
CONNECT BY LEVEL=ROWNUM;

     LEVEL TO_DATE(:END2_DATE
---------- ------------------
         1 02-APR-13
         2 01-APR-13
         3 31-MAR-13
...
        17 17-MAR-13
        18 16-MAR-13
        19 15-MAR-13
Bummer! The CONNECT BY logic falls into the hole and terminates. Results are good until the desired gap is encountered. Then query execution ends.

Playing the Crazy Card

What's needed is a second WHERE clause. The HAVING clause can serve that purpose if you group by all the columns. The result is a fun query that you should probably not ever use in a production setting:
SELECT LEVEL, TO_DATE(:end2_date)-LEVEL+1
FROM dual
WHERE TO_DATE(:beg1_date) <= TO_DATE(:end2_date)-LEVEL+1
CONNECT BY LEVEL=ROWNUM
GROUP BY LEVEL, TO_DATE(:end2_date)-LEVEL+1
HAVING NOT (
        TO_DATE(:end2_date)-LEVEL+1 > TO_DATE(:end1_date)
    AND TO_DATE(:end2_date)-LEVEL+1 < TO_DATE(:beg2_date)
    )
ORDER BY LEVEL;

     LEVEL TO_DATE(:END2_DATE
---------- ------------------
         1 02-APR-13
         2 01-APR-13
         3 31-MAR-13
...
        17 17-MAR-13
        18 16-MAR-13
        19 15-MAR-13
        78 15-JAN-13
        79 14-JAN-13
        80 13-JAN-13
...
        90 03-JAN-13
        91 02-JAN-13
        92 01-JAN-13
The ORDER BY clause I've added this time is irrelevant to the logic. It merely undoes the effects of the grouping operation so the results are displayed in a comprehensible order.

Finding Some Sanity

While clever, the preceding approach is pretty awful. The grouping operation is likely to force a partial sorting of the data. It surely must be better to avoid the GROUP BY processing. To that end, take a query and subquery approach:
SELECT day_of_range FROM (
SELECT LEVEL AS range_counter, 
       TO_DATE(:end2_date)-LEVEL+1 AS day_of_range
FROM dual
WHERE TO_DATE(:beg1_date) <= TO_DATE(:end2_date)-LEVEL+1
CONNECT BY LEVEL=ROWNUM
) WHERE NOT (
        day_of_range > TO_DATE(:end1_date)
    AND day_of_range < TO_DATE(:beg2_date)
    );
The inner query generates the extended range of dates. The outer query's WHERE clause punches the desired hole in that range. The results are dates covering just the two, smaller ranges.

Pondering an Alternative

If the desired gap is large enough, then you may prefer to attack the problem as a union of two queries. Be sure to use the UNION ALL operator to avoid the overhead of unneeded elimination of duplicate rows; there won't be any. For example:
SELECT LEVEL, TO_DATE(:end2_date)-LEVEL+1
FROM dual
WHERE TO_DATE(:beg2_date) <= TO_DATE(:end2_date)-LEVEL+1
CONNECT BY LEVEL=ROWNUM
UNION ALL
SELECT LEVEL, TO_DATE(:end1_date)-LEVEL+1
FROM dual
WHERE TO_DATE(:beg1_date) <= TO_DATE(:end1_date)-LEVEL+1
CONNECT BY LEVEL=ROWNUM;
The first query in this union generates the latter range. The second query generates the earlier range.

No comments:

Post a Comment