Tuesday, November 01, 2005

Perl DBI and Loop

DBI and loops
DBI statements are often used within loops that store or fetch data from a database. You might insert the contents of a text file into a database line by line, or read and parse multiple lines of output from a database query. I often see code like the following from new DBI programmers:

foreach $line () {
chomp $line;
($alpha, $beta, $charlie, $delta) = split(/,/,$line);
$sql = qq{
insert into $table (col_a, col_b, col_c, col_d)
values($alpha,$beta,$charlie,$delta)
};
$dbh = DBI->connect($dsn,'login','password');
$sth = $dbh->prepare($sql);
$sth->execute;
$dbh->disconnect;
}
This code works, but not very efficiently. With the above code it took an hour to insert 200,000 rows into a MySQL database -- and MySQL is pretty fast! It might take half a day with Oracle.
There are lots of improvements we can make. The first is just common sense: Never put anything in a loop that doesn't absolutely have to be there. For each of our 200,000 insertions, we connect to the database, prepare our SQL statement, execute it, and disconnect from the database. However, we only need to connect and disconnect once. We simply have to move our connect() and disconnect() outside the loop to drastically improve our performance: $dbh = DBI->connect($dsn,'login','password');
foreach $line () {
chomp $line;
($alpha, $beta, $charlie, $delta) = split(/,/,$line);
$sql = qq{
insert into $table (col_a, col_b, col_c, col_d)
values($alpha,$beta,$charlie,$delta)
};
$sth = $dbh->prepare($sql);
$sth->execute;
}
$dbh->disconnect;
This reduces our run time to half an hour, doubling our speed.
Placeholders
DBI allows us to further improve our code by using placeholders. We now connect and disconnect the database only once, but we still prepare and execute our SQL statement 200,000 times. The statement we prepare doesn't change much; all that changes is the insertion value. We can create a generic statement that we prepare only once, outside the loop. With this generic statement we can specify the insertion values at execution time. DBI uses a ? to define a placeholder, so our new SQL statement looks like this: $sql = qq{
insert into $table (col_a, col_b, col_c, col_d)
values(?, ?, ?, ?)
};
The statement is prepared the same as before: $sth = $dbh->prepare($sql);
Now we can fill in each ? at execution time, once we fill in the ? values. There are two ways to do this. First, we can use the bind_param() subroutine to indicate what values to use: $sql = qq{
insert into $table (col_a, col_b, col_c, col_d)
values(?, ?, ?, ?)
};
$sth = $dbh->prepare($sql);
foreach $line () {
chomp $line;
($alpha, $beta, $charlie, $delta) = split(/,/,$line);
$sth->bind_param(1, $alpha);
$sth->bind_param(2, $beta);
$sth->bind_param(3, $charlie);
$sth->bind_param(4, $delta);
$sth->execute;
}
The bind_param() subroutine allows a data type as a third argument. Normally, this is used to tell the driver whether the placeholder is a number or a string: $sth->bind_param(1, $alpha, SQL_INTEGER);
or $sth->bind_param(2, $beta, SQL_VARCHAR);
The second way to specify values at execution time is to pass them directly as arguments to the execute statement: $sth->execute($alpha, $beta, $charlie, $delta);
Using this method our code simplifies to: $dbh = DBI->connect($dsn,'login','password');
$sql = qq{
insert into $table (col_a, col_b, col_c, col_d)
values(?, ?, ?, ?)
}
$sth = $dbh->prepare($sql);
foreach $line () {
chomp $line;
($alpha, $beta, $charlie, $delta) = split(/,/,$line);
$sth->execute($alpha, $beta, $charlie, $delta);
}
$dbh->disconnect;
Our code now takes twenty minutes to run. We achieved this improvement despite the fact that the MySQL database driver only emulates placeholders. Had this been a database that directly supports placeholders, such as Oracle, the improvement would have been even more dramatic. Placeholders are any easy way to improve your DBI coding, but there are a few gotchas that beginners need to avoid. First, notice that placeholders don't have quotes. This is correct: $sql = qq{
insert into $table (col_a, col_b, col_c, col_d)
values(?, ?, ?, ?)
};
This is not correct: $sql = qq{
insert into $table (col_a, col_b, col_c, col_d)
values('?', '?', '?', '?')
};
Placeholders cannot be used for column names or table names. This won't work: $sql = qq{
select alpha, ? from table
};
Nor will this: $sql = qq{
select alpha, beta from ?
};
Or even this: $sql = qq{
select alpha, beta from table where ? > 200
};
However, you can use Perl variables for column and table names: $sql = qq{
select alpha, beta from $table where $col > 200
};
You'll have to re-prepare this statement whenever you want to change the values of $table or $col.
Fetches
We've dealt mostly with inserting data into a database, so now we'll look at the ways DBI provides to get data out. Most of those ways follow the same basic format: $dbh = DBI->connect($dsn, 'login', 'passwd');
$sql = qq{
select * from $table
};
$sth = $dbh->prepare($sql);
$sth->execute;
while (@row = $sth->fetchrow_array) {
print "@row\n";
}
$dbh->disconnect;
The most efficient of these is fetchrow_arrayref():
while ($row = $sth->fetchrow_arrayref) {
print "@$row\n";
}
There's also fetchrow_hashref(), which stores the keys and values of the database into a hash: while ($ref = $sth->fetchrow_hashref) {
foreach $key (keys %{$ref}) {
print "$ref->{$key}, ";
}
print "\n";
}
Bind Columns
Database fetches can be made even more efficient and elegant with the use of bind columns. Bind columns allow you to bind a variable to each column of a table, so that whenever a fetch is performed the bound variables are automatically updated: $sth->bind_columns(\%attr,@references_to_variables_to_bind);
We can use it like this: $sql = qq{
select alpha, beta, delta from table
};
$sth->prepare($sql);
$sth->execute;
my ($alpha, $beta, $delta);
$sth->bind_columns(undef, \$alpha, \$beta, \$delta);
while ($sth->fetch) {
print "$alpha - $beta - $delta\n";
}
Here, the fetch() subroutine is synonymous with fetchrow_arrayref(). This subroutine, combined with bind_columns(), is the fastest way to fetch data with DBI.
Error Checking
There's a big problem with all of code snippets you've seen in this article: We never check the return values of DBI methods. We simply assume that every DBI call was successful. The obvious solution is to add an or die clause after every DBI method call: $dbh = DBI->connect($dsn,'login','passwd')
die $DBI::errstr;
$sth = $dbh->prepare($sql) die $dbh->errstr;
$sth->execute die $sth->errstr;
However, there is a cleaner solution. Each DBI handle has a boolean attribute called RaiseError. If we set this to true, any errors triggered by that handle cause the script to die with an appropriate error message, saving us from having to check every DBI call we make. RaiseError can be set via the connect statement: $dbh = DBI->connect($dsn,'login','passwd',
{ RaiseError => 1 });
We can also call it after a connection is already made: $dbh->{RaiseError} = 1;
Transactions
I've found that many administrators shy away from using transactions with DBI. For those unfamiliar, transactions provide a way to group multiple SQL statements together so that database changes are only made if every statement is successful. For example, a transaction for updating a customer order database might include statements to update inventory, update accounts payable, and update shipping manifests. All updates must succeed; otherwise, we don't want to perform any of them. If only some of the updates succeeded, we might bill a customer for an order not shipped, or ship an order to a customer without billing them.
Many administrators shy away from using DBI transactions.
By default, DBI commits each statement as it's performed. To use transactions, we need to disable this behavior with the AutoCommit variable: $dbh = DBI->connect($dsn,'login','passwd', {
RaiseError => 1,
AutoCommit => 0,
});
We can also just say: $dbh->{AutoCommit} = 0;
With AutoCommit off, changes are revocable until commit() is called. Uncommitted statements can be undone with rollback().
Next, we need a way to group our statements that lets us test whether any failed. The easiest and most robust way to do this is to set RaiseError and wrap the statements in an eval block: $dbh->{RaiseError} = 1;
eval {
$sth_1->execute; # Update Inventory
$sth_2->execute; # Update Accounts Payable
$sth_3->execute; # Update Shipping
};
if ($@)
$dbh->rollback;
} else {
$dbh->commit;
}
If any statement in the eval block fails, $@ is set and we revoke all three statements.
Don't forget to commit your statements. If you do forget, what happens next depends on the database. Some databases automatically commit them; others roll them back.
References
The material for this article came from personal experience, a thousand or so readings of the DBI documentation and FAQ, and a generous review by DBI guru Tim Bunce. You can find the DBI module and its related documentation at http://www.symbolstone.org/technology/perl/ DBI.
__END__

0 Comments:

Post a Comment

<< Home