I do a lot of work with CSV files, sometimes very large CSV files, for a living. And sometimes it’s not practical, or possible, to do what I need to do entirely in Excel. Merging files is an example. So here’s how to merge CSV files on various platforms from a command line so you can get it done quickly and efficiently.
Why not just use Excel?
When dealing with large files, it can take several minutes to copy and paste within Excel, and I’ve even found that large files can sometimes cause Excel to crash. Doing the same thing from a command prompt usually is much faster, and even when it’s not faster, it’s less likely to crash. Plus, Excel is limited to slightly more than one million rows, and many datasets are larger than that.
Merge CSV files with header in Windows
You can merge CSV files without duplicating the header in Windows. Even if you’re not familiar with Powershell you can probably follow what it’s doing.
Programmatically, what we’re looking to do is strip the first line from one file and copy the result to the end of another. We can do this with a single line of Powershell. But don’t worry, this will run from a CMD prompt. Make sure you’re in the same directory as the two files you want to merge when you run it.
powershell -NoProfile -Command "Get-Content -Path .\file2.csv | Select-Object -Skip 1 | Add-Content .\file1.csv"
Substitute your existing filenames for file1.csv and file2.csv.
A more elegant Powershell script
If you’re going to do this a lot, and the filenames may change, it makes sense to keep a Powershell script handy. Save this as csvmerge.ps1:
Param ( [Parameter(Mandatory=$True)] [ValidateNotNull()] $File1, [Parameter(Mandatory=$True)] [ValidateNotNull()] $File2 ) Get-Content -Path $File2 | Select-Object -Skip 1 | Add-Content $File1
Now you can just run the command like so, from a Powershell prompt:
csvmerge.ps1 file1.csv file2.csv
The script will append the contents of file2.csv to file1.csv. Substitute your own filenames. If you forget one or both filenames, the script will prompt you.
A possible solution without Powershell
Some companies disable Powershell. Some of them even know why they’re disabling it. If Powershell absolutely, positively isn’t an option, you can try this. I remember using this on older versions of Windows that aren’t supported anymore, but it doesn’t work on my Windows 10 machines so I don’t guarantee it works on anything you have. But it’s worth a try.
findstr /V /R "^$" file2.csv > file3.csv copy /b file1.csv+file3.csv merged.csv
Merge CSV files with header in OS X and Linux
I find it easier to merge CSV files without duplicating the header on Unix and Unix-like systems. Here we can do it with just one line. Make sure you’re in the same directory with your two CSV files.
tail -n +2 file2.csv >> file1.csv
Substitute your two filenames for file2.csv and file1.csv. Whether this is more cryptic or less cryptic than Powershell depends on what you’re used to, I guess. But the Unix way is shorter.
A shell script
If you’re going to do this a lot, and the filenames may change, it makes sense to keep a shell script handy. Save this as csvmerge.sh:
#!/bin/bash if [ "$2" = "" ]; then echo usage: "$0" file1.csv file2.csv else echo "Appending $2 to the end of $1" ls -lh $1 tail -n +2 "$2" >> "$1" ls -lh $1 fi
To make the file executable, issue the command chmod 755 csvmerge.sh.
Now you can merge CSV files with header with the command ./csvmerge.sh file1.csv file2.csv. If you forget one or more filenames, the script will remind you to include two filenames.