Python Program to Find and Remove Duplicate Lines in a Large Text File

To find and remove duplicate lines in a large text file, we can follow these steps:

Open the text file in read mode:

with open("content.txt", "r") as f:

Read the lines of the file and store them in a list:

linelist = f.readlines()

Create a temporary list to store the unique lines:

R = []

Iterate through the lines in the linelist and check if they are present in the temporary list. If they are not present, append them to the temporary list:

for line in linelist:
    if line not in R:
        R.append(line)

Open the text file in write mode and write the unique lines in the temporary list to the file:

with open("content.txt", "w") as f:
    for line in R:
        f.write(line)

This is the complete Python program to find and remove duplicate lines in a large text file. The program first reads the lines of the file into a list, then removes the duplicates from the list and finally writes the unique lines back to the file.

Here is the complete code for the Python Program:

# Opening the Text File in Read Mode
with open("content.txt", "r") as f:
    linelist = f.readlines()

# Temporary List
R = []

# Iterating through the lines and checking for duplicates
for line in linelist:
    if line not in R:
        R.append(line)    

# Writing Unique Lines in Text File
with open("content.txt", "w") as f:
    for line in R:
        f.write(line)

Python Program to Find and Remove Duplicate Lines in a Large Text File

Table of contents

No headings in the article.