How to Create a Bash Script for Easy File Conversion Using DuckDB
Working with different file formats can be challenging, especially when dealing with large datasets. Here’s a simple but powerful Bash script that leverages DuckDB to convert files between different formats (like CSV and Parquet) directly from your terminal.
I took the script from here.
Please visit the Duckdb website. They are doing some of the most amazing work in the data space.
Also, visite the Youtube channel for some amazing content. And Motherduck for some more amazing things .
Step-by-Step Guide
Step 1: Install DuckDB
Before we begin, make sure you have DuckDB installed. You can install it using Homebrew:
brew install duckdb
If you don’t have Homebrew installed, you can get it from brew.sh.
Step 2: Prepare Your Workspace
Just like in our previous scripts (getref and organize), we’ll store this function in your bin directory. If you haven’t created one yet:
mkdir ~/bin
Step 3: Create the Script File
Navigate to your bin directory and create a new file named toduck
:
cd ~/bin
touch toduck
Step 4: Add the Script Code
Open the file in your preferred text editor:
nano toduck
Copy and paste the following code:
#!/bin/bash
convert_file() {
local input_file="$1"
local output_extension="$2"
# Extracting the filename without extension
local base_name=$(basename -- "$input_file")
local name="${base_name%.*}"
# Constructing the output filename
local output_file="${name}.${output_extension}"
# Performing the conversion
duckdb -c "copy (select * from '${input_file}') to '${output_file}'"
echo "Conversion complete: ${output_file}"
}
# Check if the number of arguments is less than 2
if [ "$#" -lt 2 ]; then
echo "Usage: $0 <input_file> <output_extension>"
echo "Example: $0 example.parquet csv"
exit 1
fi
# Call the conversion function with the provided arguments
convert_file "$1" "$2"
Exit the editor and save the file.
This script uses DuckDB’s SQL capabilities to read and write files in different formats. DuckDB automatically detects the input format and handles the conversion seamlessly.
Step 5: Make the Script Executable
Make your script executable by running:
chmod +x ~/bin/toduck
Step 6: Update Your Shell Configuration
If you haven’t already added your bin directory to your PATH (from previous scripts), you’ll need to do so. Open your shell configuration file:
For Zsh users:
nano ~/.zshrc
For Bash users:
nano ~/.bashrc
Add this line if it’s not already there:
export PATH="$HOME/bin:$PATH"
Step 7: Apply the Changes
Source your configuration file to apply the changes:
For Zsh:
source ~/.zshrc
For Bash:
source ~/.bashrc
How to Use Your New Script
Always make sure you have backups of your important data files before performing any conversions. While DuckDB is reliable, it’s always good practice to protect your data.
Using the script is straightforward.
Navigate to the directory containing the file you want to convert and run the script with the desired output format. The syntax is:
toduck <input_file> <output_extension>
Examples:
- Convert a Parquet file to CSV:
toduck data.parquet csv
- Convert a CSV file to Parquet:
toduck data.csv parquet
The script will: 1. Take your input file 2. Create a new file with the same name but different extension 3. Convert the data using DuckDB 4. Show you the path to the converted file
Supported Formats
DuckDB supports various file formats including: - CSV - Parquet - JSON - Excel (xlsx)
The exact formats available may depend on your DuckDB version and configuration.
Common Use Cases
This script is particularly useful when you need to: - Convert large CSV files to Parquet for better compression and performance - Make Parquet files readable in spreadsheet software by converting to CSV - Quickly convert between different data formats for various tools and applications
That’s it! You now have a powerful tool for converting data files right from your terminal. If you have any questions or suggestions for improvements, feel free to leave a comment below.