Monday, July 24, 2017

SQL - Delete Duplicate Rows

This blog will demonstrates how to delete duplicate rows by using SQL Row_Number() function in sql server.

Here is a Student table which have few duplicate records (in yellow color)

ID	Code	Address	Created Date
1	CN01	Memphis, TN	7/1/2017
2	CN02	Nashville, TN	7/9/2017
3	CN03	Cordova, TN	7/10/2017
4	CN01	Memphis, TN	7/11/2017

First we need to decide, which row we are going to keep and which one we are going to delete.

Here is an example to delete all duplicate records except the oldest record.

Step 1: Get duplicate student records – here is a sub query which group by code and select only those code which are having more than 1 records (duplicate records)

Select Code, count(Code) [Ct] from vr.Student group by Code

having count(Code) > 1

Step 2: Generate Row Number by using SQL ROW_NUMBER() function for each duplicate records

With CTE AS

(

Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank from

(

Select Code, count(Code) [Ct] from dbo.Student group by Code

having count(Code) > 1

) y

)

Select * from CTE

Output:

ID	Code	Address	Created Date	RowRank
1	CN01	Memphis, TN	7/1/2017	1
4	CN01	Memphis, TN	7/11/2017	2

In above output, the oldest duplicate row has low row rank and now we can delete all duplicate records except the oldest record (row rank = 1)

Delete from dbo.Student where ID IN (Select ID from CTE where CTE.RowRank <> 1)

Here is a complete SQL Query

With CTE AS

(

Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank from

(

Select Code, count(Code) [Ct] from dbo.Student group by Code

having count(Code) > 1

)

Delete from dbo.Student where ID IN (Select ID from CTE where CTE.RowRank <> 1)

Blog Header Menu

Monday, July 24, 2017

SQL - Delete Duplicate Rows

No comments:

SQL Server - Identify unused indexes