Quick date format fix with Regular Expression

I was building an SSIS package to load .csv files into a SQL Server database table. Pretty standard stuff, until it end-to-end process was blowing up on all of the date columns, where nearly half of all the dates were throwing invalid date errors.

“What the heck?” (inside joke…)

Turns out the file was created using a date format of dd-mm-yyyy. Which is a bit unusual beings that all of the stakeholders are located in the United States.
A quick conversation and future files should be sent using yyyy-mm-dd. Good to go, except I need to finish development & testing and need a file NOW!
Here is where some experience with regular expressions pays off.

Step 1: make a copy of the file. ALWAYS KEEP THE ORIGINAL – ORIGINAL!

Open the CSV file in Notepad++ & hit [Ctrl]+[H]; set the Search Mode to Regular expression

Open browser to search for “regex cheat sheet” (sometimes it’s far easier to just look it up that it is to try to remember via trial and error…)

Enter the following:
   Find what:   \b(\d\d)-(\d\d)-(\d{4})\b
   Replace with:   $3-$2-$1

Hit [Replace All] and update over 75,000 individual dates in under a second.

Save file

Execute the SSIS package and finish developer testing.

Not diving into process improvement opportunities in the project management/requirements gathering/task assignment/communication channels nor the what-ifs of future recurrence and coding a scripted solution. (Save that one for a rainy day…)

In the end, a bit of regex experience pays off.

If you spend any time doing data processing at a coding level, I would highly recommend at least becoming familiar with how powerful regex can be.


Discover more from Rob Buecker Consulting

Subscribe to get the latest posts sent to your email.

Leave a comment