Generating Sample SQL Data in Excel and Google Sheets

Ryan Iyengar
Ryan Iyengar
Published in
3 min readDec 1, 2017

--

For a lot of analytical SQL blog posts, I find myself generating sample data in Excel or Google Sheets to mess around with. This means coming up with random dates, dollar amounts, IDs, and all sorts of stuff. This is a small list of things I’ve picked up along the way that are very useful. I’ll add things here later as I find them.

Coming up with data

Generating Random User IDs

Randbetween() is my favorite for this.

=randbetween( 1, 100000000 )

And copy down to however many rows I’d like. Then copy and paste values to solidify the random selections. Theoretically this could produce duplicate values, but I doubt I’ll ever run into that problem.

Generating Strings With A Probability Distribution

Let’s say I want to assign 2/3 of my rows to “Control”, and 1/3 to “Test”. randbetween() can also quickly spin this up.

=if( randbetween( 1, 3 ) > 1, "Control", "Test")

Generating Strings From a List With A Probability Distribution

Let’s say I want to add another variant, and assign 1/3 of my rows to “Control”, 1/3 to “Test”, and 1/3 to “Test 2”. This is slightly more complex, but you can just nest if statements in the randbetween() and get a similar result.

=if( randbetween( 1, 3 ) = 1, "Control", if( randbetween( 1, 3 ) = 2, "Test", "Test 2"))

Getting the data into a database

Generating SELECT with UNION

My favorite fast way to insert a handful of rows into a SQL query is generate some SELECT statements UNION-ed together. So if your sample data is starting to look like this:

Then a statement like this can turn it into rows of SQL-ready database table with this in cell D2:

="select "&A2&" as "&$A$1&", "&B2&" as "&$B$1&", "&text(C2,"'yyyy-mm-dd'")&" as "&$C$1&" union"

A couple things are happening here.

  • “select” and “union” are concatenated at the beginning and end of this formula, to make sure you can copy it down across many rows and have a SQL query interpret them as one giant table, selected 1 row at a time. You’ll need to remove the last “union” from the last row you copy down.
  • Cells A1, B1, and C1 are “locked” with the $ operator, so that you can copy down next to the columns with variable data, but the row headers are always concatenated in as column names or aliases.
  • I’m wrapping the date value in C2 with the “text” function, and even within that function, wrapping the yyyy-mm-dd format with single quotes. Without this, Google Sheets will output an integer instead of the date formatted nicely, and without the quotes, my SQL interpreter might interpret the values as integers.

Uploading .CSVs

Depending on your database and SQL client, you can also directly export a .CSV from Google Sheets and upload it into your database. Periscope Data offers this feature, and I use it fairly often. However, I tend to reserve those uploads for when I need more than a few dozen rows, otherwise I find it a little slower.

--

--