Python CSV Splitters

On my GitHub here, there is now a few python scripts for splitting out csv files. I’ve had to work with some huge 60-70GB csv files from some source systems, and the little dev environment hasn’t got enough scale to load these files, so I knocked up a few items to split them into smaller and testable chucks.

There are three items:

  • CSV Splitter – splits files into definable row sizes
  • CSV Sampler – Samples top n rows
  • CSV Random Sampler – Sample n random rows of data.

Here’s a sample of the sampler one. No pun intended!

#CSV Random Sampler
import csv
import random

sample_size = 10000

filename = "file.csv"
output_filename = "C:/filelocation/samplefile_random_" + str(sample_size) + ".csv"

with open(filename, "r") as file:
    reader = csv.reader(file)
    rows = list(reader)
    selected_rows = random.sample(rows, sample_size)

with open(output_filename, "w", newline="") as new_file:
    writer = csv.writer(new_file)
    for row in selected_rows:
        writer.writerow(row)

Hope they help some one!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s