The Sign: row-based binary logging and integer signedness in MySQL and MariaDB

"I saw the sign and it opened up my eyes, I saw the sign
Life is demanding without understanding"
–I Saw The Sign, Ace of Base

An interesting question appeared in #mysql on Freenode recently. A user was trying to build a client that would act as a replication slave so that it could consume row-based replication events coming via the replication stream, in order to transform them into JSON that could be sent to ElasticSearch. Neat idea! But he ran into a problem: it's not possible to tell whether integer values in the replication stream are signed or unsigned. Let's take a look at how that works.

The row-based binary log format in MySQL and MariaDB utilizes a couple kinds of events to convey various information to the client (normally a slave). The first event is TABLE_MAP_EVENT, which gives information about the structure of the tables that will be changed in a subsequent ROWS_EVENT. There's pretty good documentation about TABLE_MAP_EVENT at the TABLE_MAP_EVENT page in the MySQL Internals Manual where we can see that this structure has fields that account for schema name, table name, and column types.

The column_type_def field consists of one byte per column in the table, with the value of each byte being one of the types taken from the list on the COM_QUERY Response page. There are different types for each size of integer, including MYSQL_TYPE_INT24 and MYSQL_TYPE_LONG and MYSQL_TYPE_LONGLONG, but none of that carries any information about signedness. The column_meta_def field contains length information for data types that can vary in size, but integer types are represented by a fixed number of bytes, thus no meta information is included for integer types.

There's nowhere else to look in the TABLE_MAP_EVENT for additional information, and we're left with the harsh reality: it is not possible to tell whether an integer value is meant to be signed or unsigned by examining the replication stream alone. And we can easily observe this by experimentation.

Let's create a table with two integer columns, one unsigned and one signed. We'll insert the maximum value into the unsigned column and we'll insert the equivalent sequence of bytes into the signed column, which is represented by the value "-1".

Here are some silly tricks to show the equivalency:

mysql> select conv(-1,10,2);
+------------------------------------------------------------------+
| conv(-1,10,2)                                                    |
+------------------------------------------------------------------+
| 1111111111111111111111111111111111111111111111111111111111111111 |
+------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select conv(18446744073709551615,10,2);
+------------------------------------------------------------------+
| conv(18446744073709551615,10,2)                                  |
+------------------------------------------------------------------+
| 1111111111111111111111111111111111111111111111111111111111111111 |
+------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select conv(repeat(1,64),2,-10);
+--------------------------+
| conv(repeat(1,64),2,-10) |
+--------------------------+
| -1                       |
+--------------------------+
1 row in set (0.00 sec)

mysql> select conv(repeat(1,64),2,10);
+-------------------------+
| conv(repeat(1,64),2,10) |
+-------------------------+
| 18446744073709551615    |
+-------------------------+
1 row in set (0.00 sec)

And here's the table:

mysql> create table t1 (id int unsigned not null auto_increment primary key, col1 bigint unsigned, col2 bigint signed) engine=innodb;
Query OK, 0 rows affected (0.02 sec)

mysql> flush logs; show master status;
Query OK, 0 rows affected (0.09 sec)
+----------------------+----------+--------------+------------------+-------------------+
| File                 | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+----------------------+----------+--------------+------------------+-------------------+
| prosimmon-bin.000019 |      120 |              |                  |                   |
+----------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

mysql> insert into t1 (col1, col2) values (18446744073709551615, -1);
Query OK, 1 row affected (0.00 sec)

mysql> flush logs;
Query OK, 0 rows affected (0.10 sec)

mysql> select * from t1;
+----+----------------------+------+
| id | col1                 | col2 |
+----+----------------------+------+
|  1 | 18446744073709551615 |   -1 |
+----+----------------------+------+
1 row in set (0.00 sec)

Now we use mysqlbinlog with -vv so that it will "Reconstruct pseudo-SQL statements out of row events" and add "comments on column data types".

$ mysqlbinlog -vv ./data/prosimmon-bin.000019
...
### INSERT INTO `test`.`t1`
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2=-1 (18446744073709551615) /* LONGINT meta=0 nullable=1 is_null=0 */
###   @3=-1 (18446744073709551615) /* LONGINT meta=0 nullable=1 is_null=0 */

We definitely inserted different values into the two columns, and our SELECT statement returns different values: 18446744073709551615 and -1. But mysqlbinlog can't tell that they're meant to be represented differently. It just sees the string of bytes in the binary log and then outputs both signed and unsigned interpretations, with the unsigned interpretation in parentheses.

Going back a bit, you'll recall that the TABLE_MAP_EVENT includes information about schema name, table name, and column types. Note that there is no provision in this structure to include column names! That's why the best we can get in mysqlbinlog output is the numeric index of the column in the table, which of course means that a slave can never have a different column layout than a master when using row-based binary logging (except for additional columns at the end of the table), but also that a theoretical non-slave replication client cannot really understand what it is seeing unless it also knows the table structure.

A workaround for any clever clients like the one proposed in #mysql might be for the client to scrape tables in information_schema to get information about table structure for all tables in the database, but it would then have to take on the burden of watching for new tables and changes to existing tables.

I think it would be interesting for the replication facility to evolve into something that was more general purpose, in order to facilitate a variety of interesting applications, but that would require some changes to the protocol that might jeopardize backward compatibility and interoperability.

About the Author

Kolbe Kegel is a Principal Support Engineer at SkySQL, where he has worked since 2011. Kolbe has worked with MySQL since 2005, first at MySQL, later at Sun Microsystems after its acquisition of MySQL Inc., then at Oracle after its acquisition of Sun.
justinswanhart

Still stuck with it

Even with a patch, we are stuck with it for a long time, because people upgrade infrequently, and this is the kind of change that shouldn't really be made once a product reaches the RC stage, so it will probably have to wait until the next major release of MariaDB.

FlexCDC (which processes binary logs and captures the changes in tables) uses the DDL of the log table to determine if the column is signed or unsigned, as this works until an ALTER changes the table. Once that happens, it still works, because FlexCDC will apply the ALTER TABLE changes to the log, should it make changes that need to be reflected, like a data type change.

kolbe

Justin, my naïve hope was

Justin, my naïve hope was that replication clients and other binary log consumers (mysqlbinlog, for instance) would ignore event types they don't understand. I haven't done any testing of that to see what happens in reality, though.

Thanks for mentioning FlexCDC! It sounds like it takes a similar approach to what I suggested in my post.

Kolbe

justinswanhart

reality does not always match up with expectations

Try to use 5.5 mysqlbinlog against a 5.6 binary log for example. The checksum events will make 5.5 blow up, as will query description events (the SQL for RBR), as will new format row events.

justinswanhart

FlexCDC -> other system

Let me know if there is interest in getting FlexCDC to replicate to elasticsearch,etc. It is not difficult to do.

The other issue here is TIMESTAMP, which doesn't get projected as a DATETIME by mysqlbinlog, but instead as a unix timestamp. So if you want to insert the value into a table, you have to use FROM_UNIXTIME() on the value.

kolbe

Davi Arnaut's patch for a new Table_metadata event

Davi Arnaut at Twitter created a new kind of event that provides column names and integer signedness information: https://github.com/twitter/mysql/commit/19b251446

I filed a Jira task for MariaDB with a request to merge/implement that functionality: https://mariadb.atlassian.net/browse/MDEV-5671