Python Program to Find and Remove Duplicate Lines in a Large Text File

Python Program to Find and Remove Duplicate Lines in a Large Text File

Table of contents

No heading

No headings in the article.

To find and remove duplicate lines in a large text file, we can follow these steps:

  1. Open the text file in read mode:
with open("content.txt", "r") as f:
  1. Read the lines of the file and store them in a list:
linelist = f.readlines()
  1. Create a temporary list to store the unique lines:
R = []
  1. Iterate through the lines in the linelist and check if they are present in the temporary list. If they are not present, append them to the temporary list:
for line in linelist:
    if line not in R:
        R.append(line)
  1. Open the text file in write mode and write the unique lines in the temporary list to the file:
with open("content.txt", "w") as f:
    for line in R:
        f.write(line)

This is the complete Python program to find and remove duplicate lines in a large text file. The program first reads the lines of the file into a list, then removes the duplicates from the list and finally writes the unique lines back to the file.

Here is the complete code for the Python Program:

# Opening the Text File in Read Mode
with open("content.txt", "r") as f:
    linelist = f.readlines()

# Temporary List
R = []

# Iterating through the lines and checking for duplicates
for line in linelist:
    if line not in R:
        R.append(line)    

# Writing Unique Lines in Text File
with open("content.txt", "w") as f:
    for line in R:
        f.write(line)