It's been a while since I wrote an article. With the new year, I need to write some technical content to keep my momentum going. Recently, I encountered an interesting problem, so let's have a simple discussion about Prepared Statements in MySQL.
Introduction#
gorm is a commonly used ORM when developing with Go. Recently, I encountered an interesting issue while using gORM. Let me briefly describe the problem.
When using the Raw
method of gORM for SQL queries, I constructed a SQL statement like this:
select * from demo where match(name) AGAINST('+?' IN BOOLEAN MODE)
When passing in parameters later, it returned Error
: sql: expected 0 arguments, got 1. However, other queries like the following executed normally:
select * from demo where name = ?
At first, I thought it was an issue with the SQL concatenation module in gORM
, but after looking at the code, I found an interesting logic. gORM does not have any logic for concatenating Raw SQL
; it directly calls the standard library database/sql
in Golang for SQL processing, and database/sql
will directly call the corresponding database driver's implementation. Let's first look at the logic regarding Query in database/sql
.
func (db *DB) queryDC(ctx, txctx context.Context, dc *driverConn, releaseConn func(error), query string, args []interface{}) (*Rows, error) {
queryerCtx, ok := dc.ci.(driver.QueryerContext)
var queryer driver.Queryer
if !ok {
queryer, ok = dc.ci.(driver.Queryer)
}
if ok {
var nvdargs []driver.NamedValue
var rowsi driver.Rows
var err error
withLock(dc, func() {
nvdargs, err = driverArgsConnLocked(dc.ci, nil, args)
if err != nil {
return
}
rowsi, err = ctxDriverQuery(ctx, queryerCtx, queryer, query, nvdargs)
})
if err != driver.ErrSkip {
if err != nil {
releaseConn(err)
return nil, err
}
// Note: ownership of dc passes to the *Rows, to be freed
// with releaseConn.
rows := &Rows{
dc: dc,
releaseConn: releaseConn,
rowsi: rowsi,
}
rows.initContextClose(ctx, txctx)
return rows, nil
}
}
var si driver.Stmt
var err error
withLock(dc, func() {
// An interesting point
si, err = ctxDriverPrepare(ctx, dc.ci, query)
})
if err != nil {
releaseConn(err)
return nil, err
}
ds := &driverStmt{Locker: dc, si: si}
rowsi, err := rowsiFromStatement(ctx, dc.ci, ds, args...)
if err != nil {
ds.Close()
releaseConn(err)
return nil, err
}
// Note: ownership of ci passes to the *Rows, to be freed
// with releaseConn.
rows := &Rows{
dc: dc,
releaseConn: releaseConn,
rowsi: rowsi,
closeStmt: ds,
}
rows.initContextClose(ctx, txctx)
return rows, nil
}
}
When executing the QueryDC logic in database/sql
, it calls the ctxDriverPrepare
method for SQL Query preprocessing. Let's take a look at this logic.
func ctxDriverPrepare(ctx context.Context, ci driver.Conn, query string) (driver.Stmt, error) {
if ciCtx, is := ci.(driver.ConnPrepareContext); is {
return ciCtx.PrepareContext(ctx, query)
}
si, err := ci.Prepare(query)
if err == nil {
select {
default:
case <-ctx.Done():
si.Close()
return nil, ctx.Err()
}
}
return si, err
}
In this, ctxDriverPrepare
calls ci.Prepare(query)
to execute the corresponding SQL Driver's Prepare
or PrepareContext
method for SQL preprocessing. In go-mysql-driver, the corresponding implementation is as follows:
func (mc *mysqlConn) PrepareContext(ctx context.Context, query string) (driver.Stmt, error) {
if err := mc.watchCancel(ctx); err != nil {
return nil, err;
}
stmt, err := mc.Prepare(query)
mc.finish()
if err != nil {
return nil, err;
}
select {
default:
case <-ctx.Done():
stmt.Close()
return nil, ctx.Err();
}
return stmt, nil;
}
This logic shows that go-mysql-driver
will send a prepared statement
request to MySQL and return the corresponding Stmt
after obtaining it.
The stmt
contains information such as the number of parameters and stmt name
. Here, the SQL will parse the ?
parameter placeholders and inform the client of the number of parameters to be passed.
The problem lies here. Let's revisit the previous SQL:
select * from demo where match(name) AGAINST('+?' IN BOOLEAN MODE)
Here, I used Full Text Match supported in MySQL 5.7 and later. The string +?
that we want to match will be parsed by MySQL as a string to be queried, rather than as a placeholder. Therefore, the number of parameters to be passed in the returned stmt
is 0, and database/sql
will match the parameters we passed with the number of parameters needed in subsequent logic. If they do not match, it will throw an Error
.
Now that we have found the problem, what exactly is a Prepared Statement
, and why do we need it?
Prepared Statement#
What is a Prepared Statement?#
The general content has been discussed clearly earlier, so let's review: A Prepared Statement
is a mechanism in MySQL (similar mechanisms exist in others like PGSQL) for preprocessing SQL, separating SQL from query data to ensure the robustness of the program.
According to the official MySQL introduction, Prepared Statements have the following benefits:
- Less overhead for parsing the statement each time it is executed. Typically, database applications process large volumes of almost-identical statements, with only changes to literal or variable values in clauses such as
WHERE
for queries and deletes,SET
for updates, andVALUES
for inserts.- Protection against SQL injection attacks. The parameter values can contain unescaped SQL quote and delimiter characters.
In short:
- Improve performance by avoiding the overhead of repeated SQL parsing.
- Prevent SQL injection.
MySQL's Prepared Statement
has two usage methods: one is to use the binary Prepared Protocol
(which is not within the scope of today's article; I will write another article to discuss some binary protocols in MySQL), and the other is to use SQL for processing.
In Prepared Statement
, there are three commands:
PREPARE
is used to create aPrepared Statement
.EXECUTE
is used to execute aPrepared Statement
.DEALLOCATE PREPARE
is used to destroy aPrepared Statement
.
It is important to note that Prepared Statements
have session limitations; generally, a Prepared Statement
only lives within the session it was created. When the connection is broken or in other cases when the session becomes invalid, the Prepared Statement
will be automatically destroyed.
Next, let's experiment with it.
How to Use Prepared Statement#
First, let's create a test table.
create table if not exists `user`
(
`id` bigint(20) not null auto_increment,
`name` varchar(255) not null,
primary key (`id`)
) engine = InnoDB
charset = 'utf8mb4';
Then insert data.
insert into user (`name`) values ('abc');
Now, let's query using the traditional method.
select *
from user
where name = 'abc';
Now, let's use Prepared Statement
.
First, use the Prepared
keyword to create a statement
.
set @s = 'select * from user where name=?';
PREPARE demo1 from @s;
Then use the Execute
keyword to execute the Statement
.
set @a = 'abc';
EXECUTE demo1 using @a;
It's still quite simple, right?
Why Use Prepared Statement?#
One important reason is to avoid SQL Injection Attack
. But why can Prepared Statement
prevent SQL injection?
It's quite simple; we separate Query
and Data
.
Using the previous table as an example, without manually handling SQL and parameters, we often use string concatenation, which can construct illegal SQL using SQL syntax. For example, in Python:
b = "'abc';drop table user"
a = f"select * from user where name={b}"
This code will generate the following SQL:
select * from user where name='abc';drop table user
Well, that's a disaster.
Now, let's see how it works with Prepared Statement
.
set @a = '\'abc\';drop table user';
EXECUTE demo1 using @a;
Then the final executed statement is:
select * from user where name='\'abc\';drop table user'
Because we structurally separated the Query from the Query Params, no matter what we input, it will be treated as part of the Query Params, thus avoiding the risk of injection.
Advantages and Disadvantages of Prepared Statement#
The benefits are obvious:
- The database caches
Prepared Statements
, eliminating the overhead of repeated SQL processing on the client side. - Prevents
SQL Injection Attack
. - Clear semantics.
However, there are also several disadvantages:
- The binary protocol of
Prepared Statement
has compatibility issues with clients; some language clients may not support the binary protocol forPrepared Statement
. - Due to two communications with the database, there may be I/O bottlenecks in cases of intensive SQL queries.
So, it ultimately depends on the scenario for trade-offs.
Final Thoughts#
I wrote this article on the plane as a new start for the new year. I hope to write more articles, standardize my routine, and take good care of my girlfriend. By the way, through some recent experiences (like parsing Binlog), I suddenly realized that MySQL is a treasure trove. I will write a few articles to discuss the pitfalls and interesting aspects of Binlog
and Protocol
in MySQL (yes, Flag ++, please don't rush me for the articles (run away)).
Alright, that's it for tonight. The plane is about to land, so I need to close my laptop (run away).