Friday, June 19, 2015

Window funcitons in SQL

Syntax of Window functions:

<<WindowFunction>>
OVER
(
PARTITION BY <<List of columns>>
ORDER BY <<List of columns>> <<ASC|DESC>>
ROWS|RANGE ---This is optional and default is RANGE
)

<<WindowFunciton>> specifies which window function we need to use
OVER clause require three items,
a. Partitioning criteria
b. Ordering of the data Ascending or Descending
c. ROWS or RANGE, its an optional filter and default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

We have many window functions in SQL:
Window Function
Description
ROW_NUMBER
It will give consecutive numbers to the specified partition starting from 1
RANK
It will give consecutive numbers to DISTINCT values for the specified partition starting from 1 and it will skip the numbers in between. You can clearly understand this with the below example
DENSE_RANK
It will give consecutive numbers to DISTINCT values for the specified partition starting from 1 and it won't skip the numbers in between.
NTILE(N)
It will divide the given partition into 'N' number of parts
FIRST_VALE
It will give the first value in given partitioning
LAST_VALUE
It will give the last value in given partitioning
LEAD
It will give the next value in given partition and NULL value if no next value is present
LAG
It will give the previous value in given partition and NULL value if no previous value is present

Let's create a table to understand these functions:
CREATE TABLE dbo.DistrictSales
(
SalesID INT NOT NULL IDENTITY(1, 1),
SalesGroup NVARCHAR(30) NOT NULL,
DistrictName NVARCHAR(30) NOT NULL,
SalesAmount INT NOT NULL
);
GO
INSERT INTO dbo.DistrictSales
(SalesGroup, DistrictName, SalesAmount)
VALUES
('Telangana', 'Karimnagar', 22000),
('Telangana', 'Hyderabad', 32000),
('Telangana', 'Warangal', 28000),
('Telangana', 'Nalgonda', 18000),
('Telangana', 'RangaReddy', 28000),
('Telangana', 'Medak', 21000),
('Telangana', 'Khammam', 18000),
('AndhraPradesh', 'Guntur', 19000),
('AndhraPradesh', 'Vizag', 22000),
('AndhraPradesh', 'Vijaywada', 18000),
('AndhraPradesh', 'Srikakulam', 16000),
('AndhraPradesh', 'Amaravathi', 16000),
('AndhraPradesh', 'Ananthpur', 32000),
('TamilNadu', 'Kanyakumari', 17000),
('TamilNadu', 'Chennai', 19000),
('TamilNadu', 'Coimbatore', 22000);
GO

SELECT
SalesGroup
,DistrictName
,SalesAmount
,ROW_NUMBER() OVER(PARTITION BY SalesGroup ORDER BY SalesAmount DESC) AS RowNUmber
,RANK() OVER(PARTITION BY SalesGroup ORDER BY SalesAmount DESC) AS BasicRank
,DENSE_RANK() OVER(PARTITION BY SalesGroup ORDER BY SalesAmount DESC) AS DenseRank
,NTILE(3) OVER(PARTITION BY SalesGroup ORDER BY SalesAmount) AS NTil
,FIRST_VALUE(SalesAmount) OVER(PARTITION BY SalesGroup ORDER BY SalesAmount DESC) AS FirstValue
,LAST_VALUE(SalesAmount) OVER(PARTITION BY SalesGroup ORDER BY SalesAmount DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastValue
,LEAD(SalesAmount, 1) OVER(PARTITION BY SalesGroup ORDER BY SalesAmount DESC) AS NextValue
,LAG(SalesAmount, 1) OVER(PARTITION BY SalesGroup ORDER BY SalesAmount DESC) AS PrevValue
FROM
dbo.DistrictSales
ORDER BY SalesGroup, SalesAmount DESC

Result:

If we see the above screen shot,
RowNumber gave sequence number to the given partition.
Rank gave sequence numbers to the given partition but the rank will be repeated if the same number is repeated in the partition and next number will be skipped. For example same number is repeated twice then it will give same rank to them and will skip next two numbers while giving the rank to the next distinct value. You can see the highlighted section for this scenario.
DenseRank is same as RANK but won't skip the number in between.
Ntile divides the given partition into specified number of groups.
Remaining functions we can easily understand by seeing the result set.

If you observe the query you will identify some difference in LAST_VALUE syntax compared to other window functions.

SELECT
SalesGroup
,DistrictName
,SalesAmount
,LAST_VALUE(SalesAmount) OVER(PARTITION BY SalesGroup ORDER BY SalesAmount DESC) AS LastValue
,LAST_VALUE(SalesAmount) OVER(PARTITION BY SalesGroup ORDER BY SalesAmount DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS ActualLastValue
FROM
dbo.DistrictSales
ORDER BY SalesGroup, SalesAmount DESC
Result:

If you see the result set, in LastValue column data is changing every time where as in ActualLastValue column it is not for the specified partition.
If you remember the syntax, the default value for the partition is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. As per this LastValue will be calculated from the current row to previous rows available in the partition that is the reason you are seeing different values in LastValue column but not in ActualLastValue column.

For ActualLastValue column we are specifying ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING that means it will consider all the values available in that partition unlike considering the values available till the current row in that partition.

By using Window funcitons we can calculate cumilative sum/count as well as incremental sum/count till the current row.
SELECT
SalesGroup
,SalesAmount
,COUNT(SalesAmount) OVER(PARTITION BY SalesGroup) AS CumilativeCount
,COUNT(SalesAmount) OVER(PARTITION BY SalesGroup ORDER BY SalesAmount DESC) AS CountTillCurrentRow_Default
,COUNT(SalesAmount) OVER(PARTITION BY SalesGroup ORDER BY SalesAmount DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CountTillCurrentRow
,SUM(SalesAmount) OVER(PARTITION BY SalesGroup) AS CumilativeSum
,SUM(SalesAmount) OVER(PARTITION BY SalesGroup ORDER BY SalesAmount DESC) AS SumTIllCurrentRow_Default
,SUM(SalesAmount) OVER(PARTITION BY SalesGroup ORDER BY SalesAmount DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS SumTIllCurrentRow
FROM
dbo.DistrictSales
Result:





Labels: , , , , , , , ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home