Tommy Stanton

Computer programmer and banjo picker

Generating SQL with Perl 5

Background

SQL can be tedious to write. This is probably why things like ORMs (Object-Relational Mappers) were invented, but that's besides the point. ;-)

Specifically, I am going to show examples of generating INSERT and DELETE statements. :-)

INSERT and DELETE statements can become tedious when one wants to add or remove more than 1 row (eg. using VALUES).

The following examples are using my favorite relational database software, PostgreSQL ("Postgres"), with the assumption that a Postgres instance lives at some_host, with user some_user, who has rights to database some_db, which contains a table called users.

A simple, single INSERT

Assuming that the users table has a name column, we can add a single user, "Alice," via a one-liner with the command-line client for Postgres like so:

psql -c "INSERT INTO users VALUES('Alice');"

Multiple INSERTs

What if there are several users to be added?

Simple loop

Once I had verified that command for a single INSERT, I wanted to try using a shell loop in GNU Bash to add several users from a text file at users.txt.

The first iteration of this idea was:

$ for name in $(cat users.txt); do \
psql -c "INSERT INTO users VALUES('$name');" \
-h some_host some_db some_user; \
done

...which works, but is inefficient: a database connection is made for each user-to-be-added.

If users.txt contains "Bob" and "Carol" (delimited by newlines), the result of the loop is that 2 commands are effectively run:

$ psql -c "INSERT INTO users VALUES('Bob');" \
-h some_host some_db some_user; \
$ psql -c "INSERT INTO users VALUES('Carol');" \
-h some_host some_db some_user; \

Turning the loop inside-out

Keep these psql(1) features in mind:

  • The 'c' short option is meant for providing a SQL command, inline
  • The 'f' short option is meant for providing a path to a file of commands to execute

The next iteration of this idea makes only 1 database connection, because only 1 command is effectively run:

$ psql -f \
<(for name in $(cat users.txt); do \
echo "INSERT INTO users VALUES('$name');"; \
done) \
-h some_host some_db some_user

...but this is still very inelegant: an INSERT statement is generated for each user!

<(for name ...) is a very odd filename, yes? That's because it's a named pipe (a Linux concept).


A very practical use of this feature could be to execute several SQL files together, eg. DDL to set up tables:

$ psql -f <(cat path/to/ddl/*.sql) \
-h some_host some_db some_user

A large, generated INSERT

We don't want multiple INSERT statements: what we really want is a single INSERT statement that actually takes advantage of the VALUES clause to include several values.

Perl program

Time to get a scripting language involved. :-D Perl to the rescue!

Consider the following hypothetical Perl program, insert_users (the \047 trick will come in handy later...thanks Stack Overflow answer!):

#!/usr/bin/perl

local $/;
binmode STDIN;

$_ = <STDIN>;

$sql = "INSERT INTO users (name) VALUES ";
@names_to_insert = split /\n/, $_;
$inserts = join ", ", map { "(\047$_\047)" } @names_to_insert;
$sql .= $inserts;
$sql .= ";";

syswrite STDOUT, $sql;

If the content of a file with 3 names ("Carlos," "Charlie," and "Chuck") is piped to this program (eg. cat users.txt | insert_users), the output will be just what is desired:

INSERT INTO users (name) VALUES ('Carlos'), ('Charlie'), ('Chuck');

Perl one-liner

In an attempt to keep this as a copy-and-paste solution, here is the program as a one-liner (well, it's more like a "multi-liner"):

cat users.txt | \
perl -e 'local $/; binmode STDIN; $_ = <STDIN>;' \
-e '$sql = "INSERT INTO users (name) VALUES ";' \
-e '@names_to_insert = split /\n/, $_;' \
-e '$inserts = join ", ", map { "(\047$_\047)" } @names_to_insert;' \
-e '$sql .= $inserts;' \
-e '$sql .= ";";' \
-e 'syswrite STDOUT, $sql;'

Named pipe + Perl one-liner

Finally, use all of the ideas explored to execute the generated SQL:

$ psql -f \
<(cat users.txt | \
perl -e 'local $/; binmode STDIN; $_ = <STDIN>;' \
-e '$sql = "INSERT INTO users (name) VALUES ";' \
-e '@names_to_insert = split /\n/, $_;' \
-e '$inserts = join ", ", map { "(\047$_\047)" } @names_to_insert;' \
-e '$sql .= $inserts;' \
-e '$sql .= ";";' \
-e 'syswrite STDOUT, $sql;') \
-h some_host some_db some_user

Bonus: Generated DELETE statement

Invert the idea to remove users.

Note that in Postgres, the syntax for using the VALUES clause with INSERT:

INSERT...VALUES ('foo'), ('bar')...

...is not the same as using the IN clause with DELETE (note the location of parentheses):

DELETE...IN ('foo', 'bar')...

...so the map and string concatenation bits of the multi-liner need to be tweaked:

$ psql -f \
<(cat users.txt | \
perl -e 'local $/; binmode STDIN; $_ = <STDIN>;' \
-e '$sql = "DELETE FROM users WHERE name IN ";' \
-e '@names_to_delete = split /\n/, $_;' \
-e '$deletes = join ", ", map { "\047$_\047" } @names_to_delete;' \
-e '$sql .= "(${deletes})";' \
-e '$sql .= ";";' \
-e 'syswrite STDOUT, $sql;') \
-h some_host some_db some_user
blog comments powered by Disqus