A brief discussion about Prepared Statements in SQL

It's been a while since I wrote an article. With the new year, I need to write some technical content to keep my momentum going. Recently, I encountered an interesting problem, so let's have a simple discussion about Prepared Statements in MySQL.

Introduction#

gorm is a commonly used ORM when developing with Go. Recently, I encountered an interesting issue while using gORM. Let me briefly describe the problem.

When using the Raw method of gORM for SQL queries, I constructed a SQL statement like this:

select * from demo where match(name) AGAINST('+?' IN BOOLEAN MODE)

When passing in parameters later, it returned Error: sql: expected 0 arguments, got 1. However, other queries like the following executed normally:

select * from demo where name = ?

At first, I thought it was an issue with the SQL concatenation module in gORM, but after looking at the code, I found an interesting logic. gORM does not have any logic for concatenating Raw SQL; it directly calls the standard library database/sql in Golang for SQL processing, and database/sql will directly call the corresponding database driver's implementation. Let's first look at the logic regarding Query in database/sql.

func (db *DB) queryDC(ctx, txctx context.Context, dc *driverConn, releaseConn func(error), query string, args []interface{}) (*Rows, error) {
	queryerCtx, ok := dc.ci.(driver.QueryerContext)
	var queryer driver.Queryer
	if !ok {
		queryer, ok = dc.ci.(driver.Queryer)
	}
	if ok {
		var nvdargs []driver.NamedValue
		var rowsi driver.Rows
		var err error
		withLock(dc, func() {
			nvdargs, err = driverArgsConnLocked(dc.ci, nil, args)
			if err != nil {
				return
			}
			rowsi, err = ctxDriverQuery(ctx, queryerCtx, queryer, query, nvdargs)
		})
		if err != driver.ErrSkip {
			if err != nil {
				releaseConn(err)
				return nil, err
			}
			// Note: ownership of dc passes to the *Rows, to be freed
			// with releaseConn.
			rows := &Rows{
				dc:          dc,
				releaseConn: releaseConn,
				rowsi:       rowsi,
			}
			rows.initContextClose(ctx, txctx)
			return rows, nil
		}
	}

	var si driver.Stmt
	var err error
	withLock(dc, func() {
        // An interesting point
		si, err = ctxDriverPrepare(ctx, dc.ci, query)
	})
	if err != nil {
		releaseConn(err)
		return nil, err
	}

	ds := &driverStmt{Locker: dc, si: si}
	rowsi, err := rowsiFromStatement(ctx, dc.ci, ds, args...)
	if err != nil {
		ds.Close()
		releaseConn(err)
		return nil, err
	}

	// Note: ownership of ci passes to the *Rows, to be freed
	// with releaseConn.
	rows := &Rows{
		dc:          dc,
		releaseConn: releaseConn,
		rowsi:       rowsi,
		closeStmt:   ds,
	}
	rows.initContextClose(ctx, txctx)
	return rows, nil
}
}

When executing the QueryDC logic in database/sql, it calls the ctxDriverPrepare method for SQL Query preprocessing. Let's take a look at this logic.

func ctxDriverPrepare(ctx context.Context, ci driver.Conn, query string) (driver.Stmt, error) {
	if ciCtx, is := ci.(driver.ConnPrepareContext); is {
		return ciCtx.PrepareContext(ctx, query)
	}
	si, err := ci.Prepare(query)
	if err == nil {
		select {
		default:
		case <-ctx.Done():
			si.Close()
			return nil, ctx.Err()
		}
	}
	return si, err
}

In this, ctxDriverPrepare calls ci.Prepare(query) to execute the corresponding SQL Driver's Prepare or PrepareContext method for SQL preprocessing. In go-mysql-driver, the corresponding implementation is as follows:

func (mc *mysqlConn) PrepareContext(ctx context.Context, query string) (driver.Stmt, error) {
	if err := mc.watchCancel(ctx); err != nil {
		return nil, err;
	}

	stmt, err := mc.Prepare(query)
	mc.finish()
	if err != nil {
		return nil, err;
	}

	select {
	default:
	case <-ctx.Done():
		stmt.Close()
		return nil, ctx.Err();
	}
	return stmt, nil;
}

This logic shows that go-mysql-driver will send a prepared statement request to MySQL and return the corresponding Stmt after obtaining it.

The stmt contains information such as the number of parameters and stmt name. Here, the SQL will parse the ? parameter placeholders and inform the client of the number of parameters to be passed.

The problem lies here. Let's revisit the previous SQL:

select * from demo where match(name) AGAINST('+?' IN BOOLEAN MODE)

Here, I used Full Text Match supported in MySQL 5.7 and later. The string +? that we want to match will be parsed by MySQL as a string to be queried, rather than as a placeholder. Therefore, the number of parameters to be passed in the returned stmt is 0, and database/sql will match the parameters we passed with the number of parameters needed in subsequent logic. If they do not match, it will throw an Error.

Now that we have found the problem, what exactly is a Prepared Statement, and why do we need it?

Prepared Statement#

What is a Prepared Statement?#

The general content has been discussed clearly earlier, so let's review: A Prepared Statement is a mechanism in MySQL (similar mechanisms exist in others like PGSQL) for preprocessing SQL, separating SQL from query data to ensure the robustness of the program.

According to the official MySQL introduction, Prepared Statements have the following benefits:

Less overhead for parsing the statement each time it is executed. Typically, database applications process large volumes of almost-identical statements, with only changes to literal or variable values in clauses such as WHERE for queries and deletes, SET for updates, and VALUES for inserts.

Protection against SQL injection attacks. The parameter values can contain unescaped SQL quote and delimiter characters.

In short:

Improve performance by avoiding the overhead of repeated SQL parsing.
Prevent SQL injection.

MySQL's Prepared Statement has two usage methods: one is to use the binary Prepared Protocol (which is not within the scope of today's article; I will write another article to discuss some binary protocols in MySQL), and the other is to use SQL for processing.

In Prepared Statement, there are three commands:

PREPARE is used to create a Prepared Statement.
EXECUTE is used to execute a Prepared Statement.
DEALLOCATE PREPARE is used to destroy a Prepared Statement.

It is important to note that Prepared Statements have session limitations; generally, a Prepared Statement only lives within the session it was created. When the connection is broken or in other cases when the session becomes invalid, the Prepared Statement will be automatically destroyed.

Next, let's experiment with it.

How to Use Prepared Statement#

First, let's create a test table.

create table if not exists `user`
(
    `id`   bigint(20)   not null auto_increment,
    `name` varchar(255) not null,
    primary key (`id`)
) engine = InnoDB
  charset = 'utf8mb4';

Then insert data.

insert into user (`name`) values ('abc');

Now, let's query using the traditional method.

select *
from user
where name = 'abc';

Now, let's use Prepared Statement.

First, use the Prepared keyword to create a statement.

set @s = 'select * from user where name=?';

PREPARE demo1 from @s;

Then use the Execute keyword to execute the Statement.

set @a = 'abc';

EXECUTE demo1 using @a;

It's still quite simple, right?

Why Use Prepared Statement?#

One important reason is to avoid SQL Injection Attack. But why can Prepared Statement prevent SQL injection?

It's quite simple; we separate Query and Data.

Using the previous table as an example, without manually handling SQL and parameters, we often use string concatenation, which can construct illegal SQL using SQL syntax. For example, in Python:

b = "'abc';drop table user"
a = f"select * from user where name={b}"

This code will generate the following SQL:

select * from user where name='abc';drop table user

Well, that's a disaster.

Now, let's see how it works with Prepared Statement.

set @a = '\'abc\';drop table user';

EXECUTE demo1 using @a;

Then the final executed statement is:

select * from user where name='\'abc\';drop table user'

Because we structurally separated the Query from the Query Params, no matter what we input, it will be treated as part of the Query Params, thus avoiding the risk of injection.

Advantages and Disadvantages of Prepared Statement#

The benefits are obvious:

The database caches Prepared Statements, eliminating the overhead of repeated SQL processing on the client side.
Prevents SQL Injection Attack.
Clear semantics.

However, there are also several disadvantages:

The binary protocol of Prepared Statement has compatibility issues with clients; some language clients may not support the binary protocol for Prepared Statement.
Due to two communications with the database, there may be I/O bottlenecks in cases of intensive SQL queries.

So, it ultimately depends on the scenario for trade-offs.

Final Thoughts#

I wrote this article on the plane as a new start for the new year. I hope to write more articles, standardize my routine, and take good care of my girlfriend. By the way, through some recent experiences (like parsing Binlog), I suddenly realized that MySQL is a treasure trove. I will write a few articles to discuss the pitfalls and interesting aspects of Binlog and Protocol in MySQL (yes, Flag ++, please don't rush me for the articles (run away)).

Alright, that's it for tonight. The plane is about to land, so I need to close my laptop (run away).