This blog will demonstrates how
to delete duplicate rows by using SQL Row_Number() function in sql server.
Here is a Student table which
have few duplicate records (in yellow color)
ID
|
Code
|
Address
|
Created Date
|
1
|
CN01
|
Memphis, TN
|
7/1/2017
|
2
|
CN02
|
Nashville, TN
|
7/9/2017
|
3
|
CN03
|
Cordova, TN
|
7/10/2017
|
4
|
CN01
|
Memphis, TN
|
7/11/2017
|
First we need to decide, which
row we are going to keep and which one we are going to delete.
Here is an example to delete all
duplicate records except the oldest record.
Step 1: Get duplicate
student records – here is a sub query which group by code and select only those
code which are having more than 1 records (duplicate records)
Select Code, count(Code) [Ct] from vr.Student group by Code
having count(Code) > 1
Step 2: Generate Row Number by
using SQL ROW_NUMBER() function for each duplicate records
With CTE AS
(
Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank from
(
Select Code, count(Code) [Ct] from dbo.Student group by Code
having count(Code) > 1
) y
)
Select * from CTE
Output:
ID
|
Code
|
Address
|
Created Date
|
RowRank
|
1
|
CN01
|
Memphis, TN
|
7/1/2017
|
1
|
4
|
CN01
|
Memphis, TN
|
7/11/2017
|
2
|
In above output, the oldest duplicate
row has low row rank and now we can delete all duplicate records except the oldest
record (row rank = 1)
Delete from dbo.Student where ID IN (Select ID from CTE where CTE.RowRank <> 1)
Here is a complete SQL Query
With CTE AS
(
Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank from
(
Select Code, count(Code) [Ct] from dbo.Student group by Code
having count(Code) > 1
)
)
Delete from dbo.Student where ID IN (Select ID from CTE where CTE.RowRank <> 1)
No comments:
Post a Comment