Blogs

Xtrabackup 1.6.4 for Solaris 10 and 11

If you need Xtrabackup for Solaris 10 and 11 (x64), you can download it from here:

ftp://ftp.skysql.com/downloads/xtrabackup-solaris10_x86_64.tgz

Fwiw, we needed this for a server, and it's not a standard package available for download, so I just wanted to make this available to all.

Hope this helps.

 

 

Announcing SkySQL™ Enterprise HA for the MariaDB® & MySQL® databases

SkySQL™ today announced the immediate availability of SkySQL™ Enterprise HA, its leading 360° degrees High Availability solution for the MySQL® & MariaDB® databases.

High Availability is the #1 requested enhancement to the MySQL & MariaDB servers, even more popular than scalability and performance.  And with SkySQL's expertise at hand, it is now easier than ever before for customers to achieve the level of High Availability that they want.

SkySQL™ Enterprise HA is SkySQL's 360° answer to providing a ready-to-go solution for MySQL & MariaDB High Availability – in no more than 3 days.

Check out the following resources for more information:

Visit the SkySQL Enterprise HA product page

Including:

  • SkySQL™ Enterprise HA Options Table
  • SkySQL™ Enterprise HA Statement of Work

Download the SkySQL High Availability whitepaper

Contact your local SkySQL representative to discuss your HA needs

Finally, if you are in New York City today, join Ivan Zoratti, SkySQL CTO, at the MySQL Meetup for a discussion about cool new tools & tricks to achieve High Availability of your MySQL servers!

Fore more information, visit the New York City MySQL Group webpage.

We look forward to helping you achieve your High Availability objectives for your MySQL & MariaDB databases!

Largest Finnish Business Newspaper Names SkySQL One of 2011 Hottest Startups

Great news! Talouselämä, Finland's largest business newspaper, today announced that it has named SkySQL to its list of 20 Hottest Finnish startups for 2011.

Each year, a distinguished panel of venture capitalists and private investors nominate promising startups to the list based on:

  • the uniqueness of its product or service
  • the possibility of rapid internationalization
  • an ambitious growth target
  • access to and ability to acquire private financing

This year's selection committee included private equity firm Lifeline Ventures' Peter Koponen, AaltoES and the Venture Garage founder of Kristo Ovaska, Arctic Startup online media journalist , Antti Vilpponen, OP-Pohjola's mergers and acquisitions guru, Mikko Leskelä , Finnish Venture Capital Association president and Nexit Ventures Artturi Tarjanne, and Tekes, Executive Vice President Risto Setälä.

View the complete list of Talouselämä's 2011 20 Hottest Finnish Startups.

 

MEMORY engine and Handler% status variables via gdb

While working through my previous blog post, rows_examined UNION Handler% ... ?!?, I learned about some unexpected behavior of the MEMORY (HEAP) storage engine and how it interacts with Handler_read_rnd_next. In particular, even a "deleted" row (one that never really made it into the table to begin with) still must be read over when scanning the table, thereby incrementing Handler_read_rnd_next when one might not expect that to happen. I thought this would be an interesting opportunity to use gdb to really watch what the server was doing when this situation is encountered.

I'm far from being a gdb expert. In fact, I've probably only used it a dozen times or so. Nevertheless, I found it pretty straightforward to get things set up and to understand how to make my way around. (I've probably missed out on some shortcuts and some useful functionality here; hopefully I'm not leading people too far astray!) Fortunately, mysqld binaries are "not stripped" these days. That means that debugging symbols are included in the binary, which gives gdb the ability to associate the activities of a running program with the lines in the source code used to build it. Verify that this is the case using file, which should say "not stripped" at the end of its output:

$ file ./bin/mysqld
./bin/mysqld: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped

Because we'll need (and want!) to look at the server source code while we do this, we need to download a copy of the source. It is crucial that the source code you have exactly matches the source code used to compile the version of the server you're using. I'm using MariaDB for this testing, so I need to get the version of MariaDB source used to build the version of MariaDB I'm using. I happen to be using MariaDB 5.2.4, so I grabbed mariadb-5.2.4.tar.gz from http://downloads.askmonty.org/MariaDB/5.2.4/.

It's also possible to use bzr to grab a copy of the source, if you're inclined to set that all up, or even to browse the source online on Launchpad. I know that I need revision #2894, because that's the highest revision number in the ChangeLog for MariaDB 5.2.4. You can also see the "Tag" of mariadb-5.2.4 on this revision in the Launchpad revision history.

Once you have obtained the source code that corresponds to the version of MySQL, MariaDB, or Percona server (SkySQL supports all of them!) that you're running, fire up the server, find its PID, and attach gdb. Make sure nobody else is using the server while you're doing this; I'm not taking any steps to narrow the scope of my gdb session to a particular thread. (On Ubuntu starting with 10.10, you either need to run gdb as root or you need to alter the setting of /proc/sys/kernel/yama/ptrace_scope.)

$ gdb -p $( cat ./data/ub-dev.pid )
GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08
Copyright (C) 2011 Free Software Foundation, Inc.
[...snip...]
(gdb)

When you attach to a running process with gdb, it stops the process. You can then set breakpoints, print data structures and variables, et cetera. We have a mission, though, and that is to try to look at what happens when the MEMORY engine causes Handler_read_rnd_next to be incremented. In order to find any of this, it's important to remember that the MEMORY was historically called HEAP, and it retains that name in the server source code. Let's see what we can find out about Handler_read_rnd_next:

$ pwd
/home/kolbe/dev/src/mariadb-5.2.4
$ grep -r Handler_read_rnd_next * | grep -v mysql-test
sql/mysqld.cc:  {"Handler_read_rnd_next",    (char*) offsetof(STATUS_VAR, ha_read_rnd_next_count), SHOW_LONG_STATUS},

I eliminated results that match "mysql-test" because there are lots of tests that look at the value of Handler_read_rnd_next and those are of no interest to us. We see that Handler_read_rnd_next only shows up once in the entire source tree, and that it appears to act as a more human-friendly alias for a lower-level status counter called ha_read_rnd_next_count.

The various storage engines are implemented by way of code that lies in their respective subdirectory of the storage subdirectory of the source tree. We are interested only in the MEMORY engine here, which is called "heap" internally (and historically). Let's look for ha_read_rnd_next_count, but confine our search to the storage/heap directory. I'll use some options to grep to get some context around any matches.

$ grep -n -A 10 -B 10 ha_read_rnd_next_count storage/heap/*
storage/heap/ha_heap.cc-338-  return error;
storage/heap/ha_heap.cc-339-}
storage/heap/ha_heap.cc-340-
storage/heap/ha_heap.cc-341-int ha_heap::rnd_init(bool scan)
storage/heap/ha_heap.cc-342-{
storage/heap/ha_heap.cc-343-  return scan ? heap_scan_init(file) : 0;
storage/heap/ha_heap.cc-344-}
storage/heap/ha_heap.cc-345-
storage/heap/ha_heap.cc-346-int ha_heap::rnd_next(uchar *buf)
storage/heap/ha_heap.cc-347-{
storage/heap/ha_heap.cc:348:  ha_statistic_increment(&SSV::ha_read_rnd_next_count);
storage/heap/ha_heap.cc-349-  int error=heap_scan(file, buf);
storage/heap/ha_heap.cc-350-  table->status=error ? STATUS_NOT_FOUND: 0;
storage/heap/ha_heap.cc-351-  return error;
storage/heap/ha_heap.cc-352-}
storage/heap/ha_heap.cc-353-
storage/heap/ha_heap.cc-354-int ha_heap::rnd_pos(uchar * buf, uchar *pos)
storage/heap/ha_heap.cc-355-{
storage/heap/ha_heap.cc-356-  int error;
storage/heap/ha_heap.cc-357-  HEAP_PTR heap_position;
storage/heap/ha_heap.cc-358-  ha_statistic_increment(&SSV::ha_read_rnd_count);

We can see that the ha_heap::rnd_next function is the only place the HEAP engine modifies the ha_read_rnd_next_count. This function is pretty straightforward. Let's set a breakpoint right after heap_scan is called, so that we can see what's inside the file variable after the new row has been read.

(gdb) break ha_heap.cc:350
Breakpoint 1 at 0x7d0458: file ha_heap.cc, line 350.

We'll also cd inside of gdb to the location of the matching file so that we can look at the source code while we're in the debugger.

(gdb) cd /home/kolbe/dev/src/mariadb-5.2.4/storage/heap
Working directory /home/kolbe/dev/src/mariadb-5.2.4/storage/heap.

Alright, back on the MySQL side, we need some sample data to work with. You'll want to be using GNU screen or a separate terminal window or something in order to be able to easily look at both the MySQL client and your gdb session. Before we'll be able to execute any statements in MySQL, we have to tell gdb to let the server continue running.

(gdb) continue
Continuing.

Now, some sample data and a SELECT to kick things off.

mariadb 5.2.4> create table h1 (id int) engine=memory;
Query OK, 0 rows affected (0.03 sec)

mariadb 5.2.4> insert into h1 values (1),(2),(3);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mariadb 5.2.4> select * from h1;

You'll notice that the SELECT statement doesn't return. That's because the server hit the breakpoint we set in gdb!

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;

Now, we can take a look around at some data structures to see what is going on. The ha_heap::rnd_next function had called heap_scan(file, buf), so let's see what file and buf are.

(gdb) print buf
$1 = <optimized out>
(gdb) print file
$2 = (HP_INFO *) 0x435c4a0

Well, buf isn't something we'll be able to use, because the compiler decided it wasn't actually going to bring that into scope here. file, on the other hand, is a pointer to a HP_INFO struct (see include/heap.h). We can dereference and print it right at the gdb command-line:

(gdb) set print pretty on
(gdb) print *file
$4 = {
  s = 0x432a490, 
  current_ptr = 0x438bd30 "\375\001", 
  current_hash_ptr = 0x0, 
  current_record = 0, 
  next_block = 3, 
  lastinx = -1, 
  errkey = -1, 
  mode = 2, 
  opt_flag = 0, 
  update = 50, 
  lastkey = 0x435c768 "\321\b", 
  recbuf = 0x435c768 "\321\b", 
  last_find_flag = HA_READ_KEY_EXACT, 
  parents = {0x0 <repeats 65 times>}, 
  last_pos = 0x0, 
  lastkey_len = 0, 
  implicit_emptied = 0 '\000', 
  lock = {
    owner = 0x43026d0, 
    next = 0x0, 
    prev = 0x432a5e0, 
    lock = 0x432a590, 
    cond = 0x0, 
    status_param = 0x0, 
    debug_print_param = 0x436ec00, 
    type = TL_READ, 
    priority = 0
  }, 
  open_list = {
    prev = 0x0, 
    next = 0x4373550, 
    data = 0x435c4a0
  }
}

A couple things look useful right away, current_record and current_ptr especially.

s also looks interesting (and it is!), and we can print it to see what it is, just like we did with file:

(gdb) print file->s
$5 = (HP_SHARE *) 0x432a490
(gdb) print *file->s
[...snip...]

It turns out file->s is a pretty large data structure, so I'm not going to include it here. You can browse the definition of HP_SHARE in include/heap.h. We won't be looking into it during this exercise.

We saw earlier that the return value of heap_scan was assigned to error:

(gdb) print error
$9 = 0

You don't have to take my word for it, though; gdb will gladly print the source code that surrounds the current execution context:

(gdb) list
345
346     int ha_heap::rnd_next(uchar *buf)
347     {
348       ha_statistic_increment(&SSV::ha_read_rnd_next_count);
349       int error=heap_scan(file, buf);
350       table->status=error ? STATUS_NOT_FOUND: 0;
351       return error;
352     }
353
354     int ha_heap::rnd_pos(uchar * buf, uchar *pos)

We'll keep an eye on error going forward. For now, though, let's continue execution of the program.

(gdb) c
Continuing.

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;

It stops again right away. There are several rows in the table, and this function is called for every additional row the server reads. Let's make sure error is still 0 and let's see if anything has changed in file since we last looked at it:

(gdb) print error
$12 = 0
(gdb) print *file
$11 = {
  s = 0x432a490, 
  current_ptr = 0x438bd40 "\375\002", 
  current_hash_ptr = 0x0, 
  current_record = 1, 
  next_block = 3, 
  lastinx = -1, 
  errkey = -1, 
  mode = 2, 
  opt_flag = 0, 
  update = 50, 
  lastkey = 0x435c768 "\321\b", 
  recbuf = 0x435c768 "\321\b", 
  last_find_flag = HA_READ_KEY_EXACT, 
  parents = {0x0 <repeats 65 times>}, 
  last_pos = 0x0, 
  lastkey_len = 0, 
  implicit_emptied = 0 '\000', 
  lock = {
    owner = 0x43026d0, 
    next = 0x0, 
    prev = 0x432a5e0, 
    lock = 0x432a590, 
    cond = 0x0, 
    status_param = 0x0, 
    debug_print_param = 0x436ec00, 
    type = TL_READ, 
    priority = 0
  }, 
  open_list = {
    prev = 0x0, 
    next = 0x4373550, 
    data = 0x435c4a0
  }
}

Two things jump out, current_ptr is now "\375\002" (it was "\375\001" before), and current_record is 1 (it was 0 before). Let's keep going.

(gdb) c
Continuing.

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) print error
$13 = 0
(gdb) print *file
$14 = {
  s = 0x432a490, 
  current_ptr = 0x438bd50 "\375\003", 
  current_hash_ptr = 0x0, 
  current_record = 2, 
  next_block = 3, 
  lastinx = -1, 
  errkey = -1, 
  mode = 2, 
  opt_flag = 0, 
  update = 50, 
  lastkey = 0x435c768 "\321\b", 
  recbuf = 0x435c768 "\321\b", 
  last_find_flag = HA_READ_KEY_EXACT, 
  parents = {0x0 <repeats 65 times>}, 
  last_pos = 0x0, 
  lastkey_len = 0, 
  implicit_emptied = 0 '\000', 
  lock = {
    owner = 0x43026d0, 
    next = 0x0, 
    prev = 0x432a5e0, 
    lock = 0x432a590, 
    cond = 0x0, 
    status_param = 0x0, 
    debug_print_param = 0x436ec00, 
    type = TL_READ, 
    priority = 0
  }, 
  open_list = {
    prev = 0x0, 
    next = 0x4373550, 
    data = 0x435c4a0
  }
}

OK, same as before. There was no error, current_ptr is now "\375\003" and current_record is 2.

(gdb) c
Continuing.

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) print error
$15 = 137

Hey, we have an error! What went wrong? Well, nothing. We got to the end of the table. Take a look at include/my_base.h:

389-#define HA_ERR_OLD_FILE             132     /* old databasfile */
390-#define HA_ERR_NO_ACTIVE_RECORD 133 /* No record read in update() */
391-#define HA_ERR_RECORD_DELETED       134     /* A record is not there */
392-#define HA_ERR_RECORD_FILE_FULL 135 /* No more room in file */
393-#define HA_ERR_INDEX_FILE_FULL      136     /* No more room in file */
394:#define HA_ERR_END_OF_FILE  137     /* end in next/prev/first/last */
395-#define HA_ERR_UNSUPPORTED  138     /* unsupported extension used */
396-#define HA_ERR_TO_BIG_ROW   139     /* Too big row */
397-#define HA_WRONG_CREATE_OPTION      140     /* Wrong create option */
398-#define HA_ERR_FOUND_DUPP_UNIQUE 141        /* Dupplicate unique on write */
399-#define HA_ERR_UNKNOWN_CHARSET       142    /* Can't open charset */

The next time we let the program continue, we get our result set back on the client.

(gdb) c
Continuing.
mariadb 5.2.4> select * from h1;
+------+
| id   |
+------+
|    1 |
|    2 |
|    3 |
+------+
3 rows in set (10 min 29.13 sec)

Our breakpoint was hit 4 times, one for each row in the table and one final time for EOF. But I set out trying to figure out what would happen in the case of a "deleted" row in the MEMORY table. Let's test that out, now.

mariadb 5.2.4> delete from h1 where id=2;

Oops, that triggers our breakpoint! We can disable it like this:

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000007d0458 in ha_heap::rnd_next(unsigned char*) at ha_heap.cc:350
        breakpoint already hit 28 times
(gdb) disable breakpoint 1
(gdb) c
Continuing.

Back in the client, you should see that the row was deleted successfully:

mariadb 5.2.4> delete from h1 where id=2;
Query OK, 1 row affected (48.27 sec)

We want to SELECT from the table again, but our breakpoint is disabled. Instead of disabling it, we could have just kept hitting continue to keep gdb working through the file, or even done continue 4 to get all the way through it, since we know that we hit that breakpoint 4 times scanning through our 3-row table.

We've lost our gdb command line now, though, and we won't get it back until a breakpoint is hit or we manually interrupt the program we're debugging. (The normal way to do that is to hit Ctrl-C, but for whatever reason that doesn't seem to work most of the time when debugging mysqld.) Another technique is to send mysqld the TRAP signal (you may need to do it more than once!). Once we have the (gdb) command prompt back, we'll re-enable our breakpoint.

$ kill -TRAP $( cat ./data/ub-dev.pid )
Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x7f8e7a7c6740 (LWP 26113)]
0x00007f8e78ec4913 in select () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) enable breakpoint 1
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000007d0458 in ha_heap::rnd_next(unsigned char*) at ha_heap.cc:350
        breakpoint already hit 28 times
(gdb) c
Continuing.

Now, let's SELECT from the table again and look at some of the data that was interesting the first time through.

mariadb 5.2.4> flush status;
Query OK, 0 rows affected (0.00 sec)

mariadb 5.2.4> select * from h1;
Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) print error
$16 = 0
(gdb) print file->current_ptr
$17 = (uchar *) 0x438bd30 "\375\001"
(gdb) print file->current_record
$18 = 0
(gdb) c
Continuing.

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) print error
$19 = 134
(gdb) print file->current_ptr
$20 = (uchar *) 0x438bd40 ""
(gdb) print file->current_record
$21 = 1

The first row came through as we expected, but the second resulted in an error. The snippet of include/ha_base.h I included above shows that 134 corresponds to HA_ERR_RECORD_DELETED. There it is, our deleted record (id=2), just where we would've expected to find it.

(gdb) c
Continuing.

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) print error
$22 = 0
(gdb) print file->current_ptr
$23 = (uchar *) 0x438bd50 "\375\003"
(gdb) print file->current_record
$24 = 2
(gdb) c
Continuing.

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) print error
$25 = 137
(gdb) print file->current_record
$27 = 3
(gdb) c
Continuing.

We can see that there are still 4 calls to ha_heap::rnd_next (each of which increments Handler_read_rnd_next), even though only 2 rows are returned to the client.

mariadb 5.2.4> select * from h1;
+------+
| id   |
+------+
|    1 |
|    3 |
+------+
2 rows in set (4 min 4.66 sec)

mariadb 5.2.4> show status like 'Handler_read_rnd_next%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_rnd_next | 4     |
+-----------------------+-------+
1 row in set (0.00 sec)

SHOW STATUS hits our breakpoint, too, which means SHOW STATUS is creating a temporary MEMORY table! Again, you'll either need to continue through the breakpoint or disable it.

So, there it is. We just used gdb to look at the activities of a running MariaDB server (though as I said, this will apply equally to MySQL and Percona servers) as it retrieved rows from a (non-indexed) MEMORY table. I was actually surprised how straightforward it was to do this. I look forward to spending more time tracing things like this to gain a better understanding of the server source, and I hope that this might encourage others to do the same!

Southern CAL, here we come!

If you're in the City of Angles this week, be sure to stop by the Southern California Linux Expo (SCALE) - the largest annual Linux, Open-Source, and Free Software conference held in Los Angeles.  Join SkySQL and its strategic partner, Monty Program, January 20-22, at the Hilton Los Angeles Airport hotel. Stop by our booth, #65, and meet our team. As well, don't miss our talks on MySQL & MariaDB:

  • The MySQL Storage Engines Landscape - Colin Charles, Monty Program, Friday, 1/20, 11:30AM
  • A Closer Look at MySQL Cluster:  An Architectural Overview - Max Mether, SkySQL, Friday, 1/20, 4:00PM
  • MariaDB:  The New M in Lamp - Colin Charles, Monty Program, Friday, 1/20, 5:00PM

If you haven't already registered, save 40% off your conference fee by using the discount code MONTY when you register.

Hope to see you there!

MySQL Cluster training Jan 24 - 26 in DC

We still have a few seats left for our MySQL Cluster training in Washington DC January 24 - 26. If you're interested in learning about MySQL Cluster, the architecture, how to install it, administer it and troubleshoot it this is the course for you. The course will also cover replication and optimization and we will also discuss the exciting new features coming in version 7.2 of MySQL Cluster. For more information about the course contents visit http://www.skysql.com/services/training/courses/administering-mysql-cluster and to sign up to the course go to http://www.skysql.com/services/training/schedule/administering-mysql-cluster-1.

See you there!

 

rows_examined UNION Handler% ... ?!?

A customer recently asked if I could help clarify the relationship between rows_examined (in the slow query log) and the Handler% variables (in SHOW STATUS output) for a simple UNION. I didn't know the answer off the top of my head, but I didn't think it would be too terribly difficult to find it. Several days and many hours of work later, I think I finally have some idea of how this all works.

Let's first set up some sample data and get some information about it. (I'm using MariaDB 5.2.4 for these tests because that's the version the customer was using. This should basically all apply to other versions/vendors, except specific changes I refer to in MariaDB 5.3.)

maria 5.2.4> create table t1 (id int, key(id));
Query OK, 0 rows affected (0.01 sec)

maria 5.2.4> \! seq -f '(%.0f)' -s, 1 10
(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)
maria 5.2.4> insert into t1 values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
Query OK, 10 rows affected (0.00 sec)
Records: 10  Duplicates: 0  Warnings: 0

maria 5.2.4> set global slow_query_log=1, log_output='table', long_query_time=0;
Query OK, 0 rows affected (0.03 sec)

 5.2.4> flush status;
Query OK, 0 rows affected (0.00 sec)

maria 5.2.4> select * from t1 where id=5 union all select * from t1 where id=5;
+------+
| id   |
+------+
|    5 |
|    5 |
+------+
2 rows in set (0.00 sec)

maria 5.2.4> show status like 'Handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 0     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 2     |
| Handler_read_next          | 2     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 3     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 4     |
+----------------------------+-------+
15 rows in set (0.01 sec)

maria 5.2.4> select * from mysql.slow_log where sql_text like 'select%' order by start_time desc limit 1\G
*************************** 1. row ***************************
    start_time: 2012-01-12 11:12:39
     user_host: root[root] @ localhost []
    query_time: 00:00:00
     lock_time: 00:00:00
     rows_sent: 2
 rows_examined: 7
            db: test
last_insert_id: 0
     insert_id: 0
     server_id: 0
      sql_text: select * from t1 where id=5 union all select * from t1 where id=5
1 row in set (0.01 sec)

So, we have one table (t1) with one column (id) and 10 rows. There's a key on id. There are two different things I want to look at here: the Handler% variables and rows_examined in the slow query log. Initially, they seem to match up perfectly. Handler_read_key + Handler_read_next + Handler_read_rnd_next = 7 = rows_examined. Case closed, right? Not so fast.

How many rows are actually being handled in order to work execute this query? There's one row from each of the base tables in the UNION (well, there are two instances of the same table, but UNION doesn't know or care about that). Those rows are written to a temporary table as part of the UNION, and then they're read back from the temporary table and sent to the client. One row from each base table is two rows, and then two rows from the temporary table make four rows total. Why, then, is rows_examined=7? And why do the Handler_read% variables seem to support this value? There's a lot more going on here than meets the eye. Not having much luck with the reference manual, I set off on an odyssey through the MariaDB source code to try to make sense out of all of this.

If you want to follow along, go grab a copy of the MariaDB source. You can find it at http://downloads.askmonty.org/MariaDB/5.2.4/.

I did lots of grepping and eventually found my way to st_select_lex_unit::exec in sql/sql_union.cc. This is the "meat" of where a UNION is executed. Right at the top of the function (line 443) you see this:

  ha_rows examined_rows= 0;

It looks like st_select_lex_unit::exec is going to be maintaining its own examined rows counter. The next mention of examined_rows is at line #515:

          examined_rows+= thd->examined_row_count;

Here, the examined_rows counter gets the current value that's stored in the thread information structure. The next thing that happens is at line #629:

        thd->examined_row_count+= examined_rows;

This shows the thread information structure that we saw above getting examined_rows added back to it. That can't be right... assigning that value back whence it came doesn't make any sense. Sure enough, in MariaDB 5.3.3 this is done differently; the code we saw at line #515 above (it's actually at line #660 in the MariaDB 5.3.3 source) looks like this, instead:

          examined_rows+= thd->examined_row_count;
          thd->examined_row_count= 0;

Instead of adding thd->examined_row_count back onto itself, thd->examined_row_count is reinitialized to 0. Later examined_rows is added back to thd->examined_row_count just as was done in MariaDB 5.2.4. That means that the value of rows_examined in the slow query log make a lot more sense in MariaDB 5.3.3 than they did in MariaDB 5.2.4:

mariadb 5.3.3> select * from t1 where id=5 union all select * from t1 where id=5;
+------+
| id   |
+------+
|    5 |
|    5 |
+------+
2 rows in set (0.07 sec)

mariadb 5.3.3> select * from mysql.slow_log order by start_time desc limit 1\G
*************************** 1. row ***************************
    start_time: 2012-01-12 11:36:52.136361
     user_host: root[root] @ localhost []
    query_time: 00:00:00.067328
     lock_time: 00:00:00.024490
     rows_sent: 2
 rows_examined: 4
            db: test
last_insert_id: 0
     insert_id: 0
     server_id: 0
      sql_text: select * from t1 where id=5 union all select * from t1 where id=5
1 row in set (0.00 sec)

So, that solves the conundrum of why rows_examined didn't match expectations for a UNION. There was a bug! But... rows_examined, even though it didn't really make much sense, did match the Handler% variables. Those are unchanged in MariaDB 5.3.3:

mariadb 5.3.3> show status like "handler%";
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 0     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 2     |
| Handler_read_next          | 2     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 3     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 2     |
| Handler_update             | 0     |
| Handler_write              | 1     |
+----------------------------+-------+
17 rows in set (0.13 sec)

Whether we're using MariaDB 5.2.4 or MariaDB 5.3.3, Handler_read_key, Handler_read_next, and Handler_read_rnd_next still need to be accounted for. With the assistance of some of the absolutely wonderful people at Monty Program AB, I eventually got a grasp on these.

Handler_read_key means that the server is making a direct access to a specific index entry. In this case, it's going straight to the first entry in the index for value "5" (id=5). There are two tables in the UNION, and we're doing a very simple query against each of them. Each SELECT only matches a single row, so Handler_read_key is 1 for each table, and 2 in total.

Handler_read_next means that the server is looking for the next value in an index. Even though MySQL uses a direct lookup (Handler_read_key) to find the first entry in the index for a particular value, there might be other entries for that same value, so Handler_read_next is required. In this case, there is only one row that matches id=5, so Handler_read_next returns something like EOF. Again, this happens twice in the UNION, so Handler_read_next is 2. If more rows were to match, Handler_read_key would still be 1, but Handler_read_next would increase:

mariadb 5.2.4> insert into t1 values (5),(5),(5);
Query OK, 3 rows affected (0.03 sec)
Records: 3  Duplicates: 0  Warnings: 0

mariadb 5.2.4> select * from t1 where id=5;
+------+
| id   |
+------+
|    5 |
|    5 |
|    5 |
|    5 |
+------+
4 rows in set (0.00 sec)

mariadb 5.2.4> show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 1     |
| Handler_read_next     | 4     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
6 rows in set (0.00 sec)

Handler_read_rnd_next indicates that the server is doing a table scan. This is happening for our UNION because of the temporary table. Even though there are only 2 rows in the temporary table, Handler_read_rnd_next is 3 for the same reason that Handler_read_next was set—the server must find the end of the data. The last time Handler_read_rnd_next is incremented is when the handler finds the end of the table.

That wasn't so bad, right? There's a little extra wrinkle to look out for (of course). We've been doing UNION ALL, but what changes if we do UNION DISTINCT? I'll use MariaDB 5.3.3 for this example, so that we can compare better values for rows_examined.

mariadb 5.3.3> select * from t1 where id=5 union all select * from t1 where id=5;
+------+
| id   |
+------+
|    5 |
|    5 |
+------+
2 rows in set (0.00 sec)

mariadb 5.3.3> show status like "handler%";
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 0     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 2     |
| Handler_read_next          | 2     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 3     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 2     |
| Handler_update             | 0     |
| Handler_write              | 1     |
+----------------------------+-------+
17 rows in set (0.00 sec)

mariadb 5.3.3> select * from mysql.slow_log order by start_time desc limit 1\G
*************************** 1. row ***************************
    start_time: 2012-01-12 12:07:55.514920
     user_host: root[root] @ localhost []
    query_time: 00:00:00.000745
     lock_time: 00:00:00.000220
     rows_sent: 2
 rows_examined: 4
            db: test
last_insert_id: 0
     insert_id: 0
     server_id: 0
      sql_text: select * from t1 where id=5 union all select * from t1 where id=5
1 row in set (0.01 sec)

mariadb 5.3.3> flush status;
Query OK, 0 rows affected (0.00 sec)

mariadb 5.3.3> select * from t1 where id=5 union distinct select * from t1 where id=5;
+------+
| id   |
+------+
|    5 |
+------+
1 row in set (0.02 sec)

mariadb 5.3.3> show status like "handler%";
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 0     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 2     |
| Handler_read_next          | 2     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 3     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 2     |
| Handler_update             | 0     |
| Handler_write              | 1     |
+----------------------------+-------+
17 rows in set (0.00 sec)

mariadb 5.3.3> select * from mysql.slow_log order by start_time desc limit 1\G
*************************** 1. row ***************************
    start_time: 2012-01-12 12:08:23.308634
     user_host: root[root] @ localhost []
    query_time: 00:00:00.023508
     lock_time: 00:00:00.000156
     rows_sent: 1
 rows_examined: 3
            db: test
last_insert_id: 0
     insert_id: 0
     server_id: 0
      sql_text: select * from t1 where id=5 union distinct select * from t1 where id=5
1 row in set (0.00 sec)

When you do UNION [DISTINCT] (DISTINCT is the default if you don't specify either DISTINCT or ALL), the server creates the temporary table with a flag that tells it to create a unique key and thus reject duplicate rows. So, for UNION DISTINCT, why is Handler_read_rnd_next, the one that corresponds to reading from the temporary table, still equal to 3? If there is only 1 row in the table, shouldn't this variable be equal to 2? One for the row, and one for EOF, right? It turns out that the row is actually created in the temporary table before the unique key check is performed. After the unique check fails, the row is marked as "deleted", but it's still there. When working through the table, Handler_read_rnd_next is incremented every time it's called ... even if it finds a deleted row or EOF.

That's an internal feature of how HEAP/MEMORY tables work. If we force a non-heap (MyISAM or Aria) temporary table, we have Handler_read_rnd_next of 2 for UNION DISTINCT, as we might have expected to begin with:

mariadb 5.3.3> alter table t1 add column t text;
Query OK, 10 rows affected (0.01 sec)              
Records: 10  Duplicates: 0  Warnings: 0

mariadb 5.3.3> flush status;
Query OK, 0 rows affected (0.00 sec)

mariadb 5.3.3> select * from t1 where id=5 union all select * from t1 where id=5;
+------+------+
| id   | t    |
+------+------+
|    5 | NULL |
|    5 | NULL |
+------+------+
2 rows in set (0.00 sec)

mariadb 5.3.3> show status like "Handler_read_rnd_next";
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_rnd_next | 3     |
+-----------------------+-------+
1 row in set (0.00 sec)

mariadb 5.3.3> flush status;
Query OK, 0 rows affected (0.00 sec)

mariadb 5.3.3> select * from t1 where id=5 union distinct select * from t1 where id=5;
+------+------+
| id   | t    |
+------+------+
|    5 | NULL |
+------+------+
1 row in set (0.00 sec)

mariadb 5.3.3> show status like "Handler_read_rnd_next";
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_rnd_next | 2     |
+-----------------------+-------+
1 row in set (0.00 sec)

The Handler_read% variables now taken care of, we ought to take a look at Handler_write. You may have noticed some differences in those between MariaDB 5.2.4 and MariaDB 5.3.3:

mariadb 5.2.4> flush status;
Query OK, 0 rows affected (0.00 sec)

mariadb 5.2.4> select * from t1 where id=5 union all select * from t1 where id=5;
+------+
| id   |
+------+
|    5 |
|    5 |
+------+
2 rows in set (0.07 sec)

mariadb 5.2.4> show status like 'Handler_write%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Handler_write | 4     |
+---------------+-------+
1 row in set (0.04 sec)
mariadb 5.3.3> flush status;
Query OK, 0 rows affected (0.00 sec)

mariadb 5.3.3> select * from t1 where id=5 union all select * from t1 where id=5;
+------+------+
| id   | t    |
+------+------+
|    5 | NULL |
|    5 | NULL |
+------+------+
2 rows in set (0.00 sec)

mariadb 5.3.3> show status like "Handler%write";
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Handler_tmp_write | 2     |
| Handler_write     | 1     |
+-------------------+-------+
2 rows in set (0.00 sec)

It's easiest to look at MariaDB 5.3.3 first, since it helpfully breaks out temporary table writes for us. We know the UNION writes to the temporary table, so Handler_tmp_write of 2 is easy to explain. I was quite confused about Handler_write at first, until someone reminded me that I had the slow query log turned on with log_output=table!

mariadb 5.3.3> set global log_output='file';
Query OK, 0 rows affected (0.00 sec)

mariadb 5.3.3> select * from t1 where id=5 union all select * from t1 where id=5;
+------+------+
| id   | t    |
+------+------+
|    5 | NULL |
|    5 | NULL |
+------+------+
2 rows in set (0.01 sec)

mariadb 5.3.3> show status like "Handler%write";
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Handler_tmp_write | 2     |
| Handler_write     | 0     |
+-------------------+-------+
2 rows in set (0.01 sec)

Still, though, MariaDB 5.2.4 showed Handler_write of 4, while here we've only accounted for 3 (2 for writing to the temporary table, 1 for writing to the slow query log). Then I noticed/remembered/realized that SHOW STATUS itself actually increments Handler_write.

mariadb 5.2.4> show status like 'Handler_write%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Handler_write | 4     |
+---------------+-------+
1 row in set (0.04 sec)

mariadb 5.2.4> show status like 'Handler_write%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Handler_write | 5     |
+---------------+-------+
1 row in set (0.00 sec)

mariadb 5.2.4> show status like 'Handler_write%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Handler_write | 6     |
+---------------+-------+
1 row in set (0.00 sec)

mariadb 5.2.4> show status like 'Handler_write%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Handler_write | 7     |
+---------------+-------+
1 row in set (0.00 sec)

That seems to be fixed in MariaDB 5.3:

mariadb 5.3.3> show status like "Handler%write";
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Handler_tmp_write | 2     |
| Handler_write     | 0     |
+-------------------+-------+
2 rows in set (0.00 sec)

mariadb 5.3.3> show status like "Handler%write";
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Handler_tmp_write | 2     |
| Handler_write     | 0     |
+-------------------+-------+
2 rows in set (0.00 sec)

mariadb 5.3.3> show status like "Handler%write";
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Handler_tmp_write | 2     |
| Handler_write     | 0     |
+-------------------+-------+
2 rows in set (0.00 sec)

What I thought would be a simple investigation into rows_examined and the Handler% variables actually ended up being a really interesting trek through the MariaDB source code, a chance to use gdb, a bug in sql_union.cc, some unexpected ancient low-level behavior exposed via status variables, and a chance to learn about some nice new changes in MariaDB 5.3. I hope I've gotten my facts straight, and I hope you'll let me know if I haven't!

MySQL High Availability Manager (MHA) 0.53 has been Released and Get Support for it at SkySQL

 

I just wanted to let you all know that MHA for MySQL (Master High Availability Manager and tools for MySQL) version 0.53 has been released.

Yoshinori Matsunobu discusses the release in much more detail here:

http://yoshinorimatsunobu.blogspot.com/2012/01/mha-for-mysql-053-released.html

The full MHA 0.53 changelogs are here:

http://code.google.com/p/mysql-master-ha/wiki/ReleaseNotes

MHA 0.53 can be downloaded from here:

http://code.google.com/p/mysql-master-ha/downloads/list

And if you would like support for MHA, simply contact SkySQL:

http://www.skysql.com/how-to-buy

 

 

Demystifying identifier mapping

While tinkering with some things the other day, I noticed that if I create a table named "ñ", MySQL creates a .frm file with the name of @0x.frm. Similar "encoding" is done for any file that needs to be created (.MYI, .ibd, .TRG, .TRN, et cetera). I'm not sure this is necessary or useful when my filesystem seems to support all kinds of fanciful characters in filenames, and especially when character_set_system is set by default (and unchangeably!) to utf8, but it's happening nevertheless and I decided to learn more about how it works.

This behavior is discussed in the reference manual at Mapping of Identifiers to File Names. There's even a chart that claims to show how the mapping is performed. The chart and text below it on the "Mapping of Identifiers to File Names" page show that characters in the range 00C0..017F should be mapped using the pattern [@][0..4][g..z], but it wasn't very obvious why. (I had no luck understanding any of it at first, in large part because I hadn't yet figured out that the values in the Code Range column are UCS-2 values, not UTF-8 values.)

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select convert(0x00c0 using ucs2);
+-----------------------------+
| convert(0x00c0 using ucs2) |
+----------------------------+
| À                          |
+----------------------------+
1 row in set (0.00 sec)

mysql> create table À (id int);
Query OK, 0 rows affected (0.02 sec)
$ pwd
/home/kolbe/mysql/5.6/data/test
$ ls -l
total 108
-rw-rw---- 1 kolbe kolbe  8556 2012-01-03 13:51 @0G.frm
-rw-rw---- 1 kolbe kolbe 98304 2012-01-03 13:51 @0G.ibd

Well, that shows that it works as described, but I still didn't really understand why. I decided to start digging around the MySQL Server source code. First I looked at a lot of things in mysys, thinking that this functionality might be low-level enough to exist there. I was wrong. I looked around in sql, because that's where the vast majority of all the server code exists, but I also didn't have any luck there, initially, because there are tons of file operations that this algorithm applies to and I didn't know where to start.

Eventually, I stumbled across some use of build_table_filename in sql/sql_base.cc, which led me to sql/sql_table.cc, where that function is defined. Well, build_table_filename calls tablename_to_filename, which uses strconvert to actually perform the conversion of the string from one "character set" to another. The strconvert function, defined in sql/strfunc.cc, is called with a "to" character set (to_cs) of my_charset_filename. So, MySQL seems to be using some internal separate "character set" to control/define the mapping used for filenames. The to_cs argument of the strconvert function is a CHARSET_INFO struct that contains information about a given character set, including some methods that can be used to perform conversion operations. There's a good amount of information about the CHARSET_INFO structure in strings/CHARSET_INFO.txt. The strconvert function uses the wc_mb function in the cset member (type MY_CHARSET_HANDLER) of the CHARSET_INFO struct to perform the conversion.

I found my_charset_filename defined in strings/ctype-utf8.c. The cset member of my_charset_filename is my_charset_filename_handler, defined in the same file. The wc_mb function for this character set is my_wc_mb_filename, once again located in the same file. Here's the contents of that function:

static int
my_wc_mb_filename(const CHARSET_INFO *cs __attribute__((unused)),
                  my_wc_t wc, uchar *s, uchar *e)
{ 
  int code;
  char hex[]= "0123456789abcdef";
  if (wc < 128 && filename_safe_char[wc])
  { 
    *s= (uchar) wc;
    return 1;
  }

  if (s + 3 > e)
    return MY_CS_TOOSMALL3;

  *s++= MY_FILENAME_ESCAPE;
  if ((wc >= 0x00C0 && wc <= 0x05FF && (code= uni_0C00_05FF[wc - 0x00C0])) ||
      (wc >= 0x1E00 && wc <= 0x1FFF && (code= uni_1E00_1FFF[wc - 0x1E00])) ||
      (wc >= 0x2160 && wc <= 0x217F && (code= uni_2160_217F[wc - 0x2160])) ||
      (wc >= 0x24B0 && wc <= 0x24EF && (code= uni_24B0_24EF[wc - 0x24B0])) ||
      (wc >= 0xFF20 && wc <= 0xFF5F && (code= uni_FF20_FF5F[wc - 0xFF20])))
  { 
    *s++= (code / 80) + 0x30;
    *s++= (code % 80) + 0x30;
    return 3;
  }

  /* Non letter */
  if (s + 5 > e)
    return MY_CS_TOOSMALL5;

  *s++= hex[(wc >> 12) & 15];
  *s++= hex[(wc >> 8) & 15];
  *s++= hex[(wc >> 4) & 15];
  *s++= hex[(wc)      & 15];
  return 5;
}

There it is, after all the hunting, the actual code that does the conversion from À to @0G. Let's see how it works.

There are numerous arrays of code points defined in strings/ctype-utf8.c (uni_0C00_05FF, uni_1E00_1FFF, et cetera); they look like this:

/* 00C0-05FF */
static uint16 uni_0C00_05FF[1344]=
{
  0x0017,0x0018,0x0019,0x001A,0x001B,0x001C,0x001D,0x001E,
  0x001F,0x0020,0x0021,0x0022,0x0023,0x0024,0x0025,0x0026,
  0x0027,0x0028,0x0029,0x002A,0x0067,0x0068,0x0069,0x0000,
  0x006B,0x006C,0x006D,0x006E,0x006F,0x0070,0x0071,0x008A,
  0x0037,0x0038,0x0039,0x003A,0x003B,0x003C,0x003D,0x003E,
  0x003F,0x0040,0x0041,0x0042,0x0043,0x0044,0x0045,0x0046,
  0x0047,0x0048,0x0049,0x004A,0x0087,0x0088,0x0089,0x0000,
  0x008B,0x008C,0x008D,0x008E,0x008F,0x0090,0x0091,0x0092,
...

(You may notice that the uni_0C00_05FF is misnamed. It should be uni_00C0_05FF, which is even what the comment above its definition says! Oops...)

The uni_0C00_05FF array covers 11 Unicode blocks, including part of Latin-1 Supplement, IPA Extensions, Cyrillic, and Hebrew, among others. Only part of the Latin-1 Supplement block is handled by my_wc_mb_filename, and the entirety of that is shown above, 0x00C0 through 0x00FF, or elements 0 through 63 (since 0x00FF - 0x00C0 = 63).

There's not really a consistent pattern to the contents of the Latin-1 Supplement, which is one reason this array has to exist in the source as a static array with over a thousand elements. Within the second half of the Latin-1 Supplement block, the portion starting at 0x00C0 handled by this function, there are some uppercase vowels with consonants and math operators interspersed, and then the lowercase versions of the same; in other words, the uppercase letters are grouped, and the lowercase letters follow in a separate group, but there's other non-letter stuff in there, too.

The snippet above of the uni_0C00_05FF array consists of the UCS-2 values for G through Z and g through z, with 0x30 subtracted from them, with varying offsets of 80, skipping various non-letter characters. (One such character is 0x00D7, MULTIPLICATION SIGN, element 23 in the array, at the end of the 3rd line above. Such characters exist in filenames using their full unicode representation. In this case, a table named "×" —the MULTIPLICATION SIGN, not the letter "x"!— gets a .frm named @00d7.frm.)

In the Latin Extended-A Block (0x0100 to 0x017F), by contrast, each uppercase letter is directly followed by its lowercase equivalent (Ā, ā, Ă, ă, et cetera). We can see this reflected in another snippet of uni_0C00_05FF directly following the one shown above:

static uint16 uni_0C00_05FF[1344]=
{
...
  0x0073,0x0093,0x0074,0x0094,0x0075,0x0095,0x0076,0x0096,
  0x0077,0x0097,0x0078,0x0098,0x0079,0x0099,0x007A,0x009A,
  0x00B7,0x00D7,0x00B8,0x00D8,0x00B9,0x00D9,0x00BA,0x00DA,
  0x00BB,0x00DB,0x00BC,0x00DC,0x00BD,0x00DD,0x00BE,0x00DE,
...

There's a much clearer pattern there. Each value is followed by itself + 0x20 (decimal 32), which is the distance between an uppercase letter and its lowercase equivalent in Basic Latin (ASCII!):

mysql> select ord('A');
+----------+
| ord('A') |
+----------+
|       65 |
+----------+
1 row in set (0.01 sec)

mysql> select ord('a');
+----------+
| ord('a') |
+----------+
|       97 |
+----------+
1 row in set (0.00 sec)

mysql> select ord('a') - ord('A');
+---------------------+
| ord('a') - ord('A') |
+---------------------+
|                  32 |
+---------------------+
1 row in set (0.00 sec)

As explained in the manual, this conversion/encoding scheme for "identifier mapping" is designed to preserve case sensitivity on the filesystem. That is, À and à become @0G and @0g respectively. That means that table "À" is equal to table "à" on a case-insensitive filesystem (because @0G is equal to @0g) while they're not equal on a case-sensitive filesystem.

The code to perform the conversion essentialy determines which of the arrays to consult, subtracts the offset of the beginning of the range, does a lookup in the appropriate array, adds '0' (0x30), shifts things around a bit with division and modulus to get everything to fit into the scheme of allowed characters ([0..4][g..z]), and that's that.

Because my brain handles character sets/encodings (and arithmetic, I guess) a lot better at the SQL prompt than elsewhere, I put this together:

mysql> create table c (id int not null auto_increment primary key, c int);
Query OK, 0 rows affected (0.10 sec)

mysql> delimiter //
mysql> DROP PROCEDURE IF EXISTS insert_ucs2;
    -> CREATE PROCEDURE insert_ucs2 (c VARCHAR(255)) BEGIN
    ->         DECLARE i INT DEFAULT 1;
    ->         DECLARE n INT DEFAULT 1;
    ->
    ->         WHILE i < LENGTH(c) AND n <> 0 DO
    ->                 SET n = LOCATE(',', SUBSTRING(c FROM i));
    ->                 IF n = 0 THEN SET n = LENGTH(c)-i+2; END IF;
    ->                 INSERT INTO c (c) VALUES ( CONV(SUBSTRING(c FROM i+2 for n-3), 16, 10) );
    ->                 SET i = i + n;
    ->         END WHILE;
    -> 
    -> END//
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> delimiter ;

mysql> call insert_ucs2('0x0073,0x0093,0x0074,0x0094,0x0075,0x0095,0x0076,0x0096,0x0077,0x0097,0x0078,0x0098,0x0079,0x0099,0x007A,0x009A');
Query OK, 1 row affected (0.04 sec)

That's a little procedure to parse the lines from the conversion mapping arrays and put them into a very basic table. I used the first few line that correspond to the Latin Extended-A Block. Now, we can see how those actually map to filenames:

mysql> select concat('@',floor(c / 80), char(convert(c % 80 + 0x30 using ucs2))) as v from c;
+------+
| v    |
+------+
| @1S  |
| @1s  |
| @1T  |
| @1t  |
| @1U  |
| @1u  |
| @1V  |
| @1v  |
| @1W  |
| @1w  |
| @1X  |
| @1x  |
| @1Y  |
| @1y  |
| @1Z  |
| @1z  |
+------+
16 rows in set (0.00 sec)
mysql> create table Ā (id int);
Query OK, 0 rows affected (0.03 sec)

$ ls /home/kolbe/mysql/5.6/data/test
@1S.frm  @1S.ibd  c.frm  c.ibd

Looks like we guessed right! Ā is the first character of Latin Extended-A, and MySQL does map it to @1S as expected.

So, here's something even more silly, a re-implementation of (part of) the conversion algorithm using SQL:

$ tail -n +6897 < ./ctype-utf8.c | head -n $((7065 - 6897)) | perl -e 'my $i = 0; while (<STDIN>) { chomp; print "call insert_ucs2(\"" if $i == 0; s/\s//g; s/,$// if $i == 3; print; print "\");\n" if $i == 3; $i = ($i + 1) %4 }' > inserts.sql

$ mysql < ~/inserts.sql

mysql> select count(*) from c;
+----------+
| count(*) |
+----------+
|     1344 |
+----------+
1 row in set (0.00 sec)

mysql> select 0x05ff - 0x00c0;
+-----------------+
| 0x05ff - 0x00c0 |
+-----------------+
|            1343 |
+-----------------+
1 row in set (0.00 sec)

mysql> create table u (c char(1) character set utf8);
Query OK, 0 rows affected (0.09 sec)

mysql> insert into u values (convert(0x00c0 using ucs2));
Query OK, 1 row affected (0.01 sec)

mysql> insert into u select convert(char(max(ord(convert(c using ucs2)))+1) using ucs2) from u;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

...repeat a few more times...

mysql> select * from u;
+------+
| c    |
+------+
| À    |
| Á    |
| Â    |
| Ã    |
| Ä    |
| Å    |
| Æ    |
| Ç    |
| È    |
| É    |
+------+
10 rows in set (0.00 sec)

mysql> select concat('@',floor(c.c / 80), char(convert(c.c % 80 + 0x30 using ucs2))) as v from c join u on c.id = ord(convert(u.c using ucs2)) - 0x00c0 +1  ;
+------+
| v    |
+------+
| @0G  |
| @0H  |
| @0I  |
| @0J  |
| @0K  |
| @0L  |
| @0M  |
| @0N  |
| @0O  |
| @0P  |
+------+
10 rows in set (0.00 sec)

mysql> truncate u;
Query OK, 0 rows affected (0.08 sec)

mysql> insert into u values ('Ā'),('ā'),(convert(0x00d7 using ucs2));
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from u;
+------+
| c    |
+------+
| Ā    |
| ā    |
| ×    |
+------+
3 rows in set (0.00 sec)

mysql> select concat(
        '@',
        if(
            concat(
                floor(c.c / 80),
                char(convert(c.c % 80 + 0x30 using ucs2))
            ) <> '00'
        ,
            concat(
                floor(c.c / 80),
                char(convert(c.c % 80 + 0x30 using ucs2))
            )
               ,
                       lower( hex(convert(u.c using ucs2)))
               )
       ) as v  
               from c join u
                       on
               c.id = ord(convert(u.c using ucs2)) - 0x00c0 +1
       ;
+-------+
| v     |
+-------+
| @1S   |
| @1s   |
| @00d7 |
+-------+
3 rows in set (0.00 sec)

And, finally, in case you want that as a stored function...

delimiter //
DROP FUNCTION IF EXISTS my_wc_mb_filename//
CREATE FUNCTION my_wc_mb_filename(wc char(1) character set ucs2) RETURNS CHAR(5)
BEGIN
    DECLARE output CHAR(5);
    SELECT concat(
                '@',
                if(
                        concat( 
                                floor(c.c / 80),
                                char(convert(c.c % 80 + 0x30 using ucs2))
                        ) <> '00'
                ,
                        concat( 
                                floor(c.c / 80),
                                char(convert(c.c % 80 + 0x30 using ucs2))
                        )
               ,
                       lower( hex(wc) )
               )
    ) as v  INTO output
    from c
    where 
    id = ord(wc) - 0x00c0 +1
    ;
    RETURN output;
END
//
delimiter ;
mysql> select my_wc_mb_filename('Ý');
+-------------------------+
| my_wc_mb_filename('Ý')  |
+-------------------------+
| @1P                     |
+-------------------------+
1 row in set (0.00 sec)

mysql> select my_wc_mb_filename('÷');
+-------------------------+
| my_wc_mb_filename('÷')  |
+-------------------------+
| @00f7                   |
+-------------------------+
1 row in set (0.00 sec)

It's worth mentioning that Drizzle does all of this quite differently. (They seem to take some pride in having deliberately torn strconvert out of the source tree entirely.) Instead of the complicated algorithm used by MySQL, Drizzle uses a succinct function that deals in UTF-8. You can find it in drizzled/util/tablename_to_filename.cc. Here's the part of it that performs the conversion of non-ASCII characters:

    to.push_back('@');
    to.push_back(hexchars[(it >> 4) & 15]);
    to.push_back(hexchars[it & 15]);

Here are the effects:

drizzle> select hex('ữỗỿḵḝ');
+--------------------------------+
| hex('ữỗỿḵḝ')                   |
+--------------------------------+
| E1BBAFE1BB97E1BBBFE1B8B5E1B89D |
+--------------------------------+
1 row in set (0.000418 sec)

drizzle> create table ữỗỿḵḝ (id int);
Query OK, 0 rows affected (0.010721 sec)

...

$ ls /var/lib/drizzle/local/test
db.opt  @e1@bb@af@e1@bb@97@e1@bb@bf@e1@b8@b5@e1@b8@9d.dfe

I'm not trying to make a case that re-implementing this kind of thing in SQL has any practical purpose, but it was a fun way to poke around in the MySQL source code, learn more about how this algorithm works, and to keep my character set and stored functions chops honed up a bit.

Initial impressions of InnoDB Fulltext

The much-anticipated, long-awaited, and possibly irrelevant (Sphinx? Lucene?) InnoDB Fulltext finally makes a semi-official appearance in MySQL 5.6.4! MySQL 5.6.4, release December 20, 2011, is the newest "development milestone release" ("DMR") of MySQL 5.6.

It's important to bear in mind that this is the first release of InnoDB Fulltext. Hopefully, that means that considerable improvements will be forthcoming; for now, though, we are left to try to make sense of what is available at present.

Creating an InnoDB table that uses a fulltext index is dead easy:

mysql> create table ibft1 (id int unsigned not null auto_increment primary key, c1 varchar(255), fulltext(c1)) engine=innodb;
Query OK, 0 rows affected (0.08 sec)

mysql> insert into ibft1 (c1) values ('brown cow'),('brown dog'),('white cow'),('white dog'),('yellow horse'),('green iguana');
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> select * from ibft1 where match (c1) against ('brown');
+----+-----------+
| id | c1        |
+----+-----------+
|  1 | brown cow |
|  2 | brown dog |
+----+-----------+
2 rows in set (0.00 sec)

mysql> select * from ibft1 where match (c1) against ('cow');
+----+-----------+
| id | c1        |
+----+-----------+
|  1 | brown cow |
|  3 | white cow |
+----+-----------+
2 rows in set (0.01 sec)

Great! I'm happy to see it works, through I was a bit surprised to see that the default minimum word length was less than the value of ft_min_word_len. Those legacy ft_% variables continue to apply only to MyISAM fulltext indexes, evidently. There's a new bundle of system variables to control InnoDB Fulltext:

mysql> show variables like '%ft%';
+---------------------------------+----------------+
| Variable_name                   | Value          |
+---------------------------------+----------------+
| ft_boolean_syntax               | + -><()~*:""&| |
| ft_max_word_len                 | 84             |
| ft_min_word_len                 | 4              |
| ft_query_expansion_limit        | 20             |
| ft_stopword_file                | (built-in)     |
| innodb_ft_aux_table             |                |
| innodb_ft_cache_size            | 32000000       |
| innodb_ft_enable_diag_print     | ON             |
| innodb_ft_enable_stopword       | ON             |
| innodb_ft_max_token_size        | 84             |
| innodb_ft_min_token_size        | 3              |
| innodb_ft_num_word_optimize     | 2000           |
| innodb_ft_server_stopword_table |                |
| innodb_ft_sort_pll_degree       | 2              |
| innodb_ft_user_stopword_table   |                |
+---------------------------------+----------------+
15 rows in set (0.01 sec)

You'll also notice that there are a number of new tables in information_schema related to InnoDB Fulltext:

mysql> show tables from information_schema like 'INNODB_FT%';
+-------------------------------------------+
| Tables_in_information_schema (INNODB_FT%) |
+-------------------------------------------+
| INNODB_FT_CONFIG                          |
| INNODB_FT_DELETED                         |
| INNODB_FT_DEFAULT_STOPWORD                |
| INNODB_FT_INDEX_CACHE                     |
| INNODB_FT_BEING_DELETED                   |
| INNODB_FT_INDEX_TABLE                     |
| INNODB_FT_INSERTED                        |
+-------------------------------------------+
7 rows in set (0.00 sec)

Curiously, most of those tables are completely empty!

mysql> select * from INNODB_FT_CONFIG;
Empty set (0.00 sec)

mysql> select * from INNODB_FT_DELETED;
Empty set (0.00 sec)

mysql> select * from INNODB_FT_DELETED;
Empty set (0.00 sec)

mysql> select * from INNODB_FT_INDEX_CACHE;
Empty set (0.00 sec)

mysql> select * from INNODB_FT_BEING_DELETED;
Empty set (0.00 sec)

mysql> select * from INNODB_FT_INDEX_TABLE;
Empty set (0.00 sec)

mysql> select * from INNODB_FT_INSERTED;
Empty set (0.00 sec)

Well, that's not so curious if you've read the manual, which explains that you must set the value of innodb_ft_aux_table to the database & table that you wish to examine. That seems a bit of a round-about way to get at information in information_schema, but alright:

mysql> set innodb_ft_aux_table='test/ibft1';
ERROR 1229 (HY000): Variable 'innodb_ft_aux_table' is a GLOBAL variable and should be set with SET GLOBAL

Ah, yes. This takes me back to the good old 5.0 days when so much of the new functionality of 5.0 relied on the SUPER privilege, which carries with it all sorts of things that don't really logically go along with the analysis of InnoDB Fulltext indexes.

Well, alright, let's set the thing and take a look at some of these tables.

mysql> set global innodb_ft_aux_table='test/ibft1';
Query OK, 0 rows affected (0.00 sec)

mysql> select * from information_schema.INNODB_FT_CONFIG;
+---------------------------+-------+
| KEY                       | VALUE |
+---------------------------+-------+
| optimize_checkpoint_limit | 180   |
| synced_doc_id             | 7     |
| last_optimized_word       |       |
| deleted_doc_count         | 0     |
| total_word_count          | 8     |
| optimize_start_time       |       |
| optimize_end_time         |       |
| stopword_table_name       |       |
| use_stopword              | 1     |
| table_state               | 0     |
+---------------------------+-------+
10 rows in set (0.00 sec)

mysql> select * from information_schema.INNODB_FT_INDEX_TABLE;
+--------+--------------+-------------+-----------+--------+----------+
| WORD   | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| brown  |            1 |           2 |         2 |      1 |        0 |
| brown  |            1 |           2 |         2 |      2 |        0 |
| cow    |            1 |           3 |         2 |      1 |        6 |
| cow    |            1 |           3 |         2 |      3 |        6 |
| dog    |            2 |           4 |         2 |      2 |        6 |
| dog    |            2 |           4 |         2 |      4 |        6 |
| green  |            6 |           6 |         1 |      6 |        0 |
| horse  |            5 |           5 |         1 |      5 |        7 |
| iguana |            6 |           6 |         1 |      6 |        6 |
| white  |            3 |           4 |         2 |      3 |        0 |
| white  |            3 |           4 |         2 |      4 |        0 |
| yellow |            5 |           5 |         1 |      5 |        0 |
+--------+--------------+-------------+-----------+--------+----------+
12 rows in set (0.01 sec)

Well, there's some good stuff in there, no doubt. Before I got a sense of what these various things mean, I started wondering about how privileges affect what of this information can be viewed by various users. I created a new database (newdb) and a new table (ibft2):

mysql> create database newdb;
Query OK, 1 row affected (0.00 sec)

mysql> use newdb
Database changed
mysql> create table ibft2 like test.ibft1;
Query OK, 0 rows affected (0.03 sec)

mysql> insert into ibft2 select * from test.ibft1;
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> show create table ibft2\G
*************************** 1. row ***************************
       Table: ibft2
Create Table: CREATE TABLE `ibft2` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `c1` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `c1` (`c1`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

mysql> select * from ibft2;
+----+--------------+
| id | c1           |
+----+--------------+
|  1 | brown cow    |
|  2 | brown dog    |
|  3 | white cow    |
|  4 | white dog    |
|  5 | yellow horse |
|  6 | green iguana |
+----+--------------+
6 rows in set (0.00 sec)

mysql> set global innodb_ft_aux_table='newdb/ibft2';
Query OK, 0 rows affected (0.00 sec)

mysql> select * from information_schema.INNODB_FT_CONFIG;
+---------------------------+-------+
| KEY                       | VALUE |
+---------------------------+-------+
| optimize_checkpoint_limit | 180   |
| synced_doc_id             | 1     |
| last_optimized_word       |       |
| deleted_doc_count         | 0     |
| total_word_count          |       |
| optimize_start_time       |       |
| optimize_end_time         |       |
| stopword_table_name       |       |
| use_stopword              | 1     |
| table_state               | 0     |
+---------------------------+-------+
10 rows in set (0.00 sec)

Whoa, hold on. Why is the contents of information_schema.INNODB_FT_CONFIG different than when I had innodb_ft_aux_table='test/ibft1'? Let's confirm that...

mysql> set global innodb_ft_aux_table='test/ibft1';
Query OK, 0 rows affected (0.00 sec)

mysql> select * from information_schema.INNODB_FT_CONFIG;
+---------------------------+-------+
| KEY                       | VALUE |
+---------------------------+-------+
| optimize_checkpoint_limit | 180   |
| synced_doc_id             | 7     |
| last_optimized_word       |       |
| deleted_doc_count         | 0     |
| total_word_count          | 8     |
| optimize_start_time       |       |
| optimize_end_time         |       |
| stopword_table_name       |       |
| use_stopword              | 1     |
| table_state               | 0     |
+---------------------------+-------+
10 rows in set (0.00 sec)

Sure enough, look at total_word_count ... for newdb/ibft2 it was undefined, but for test/ibft1 it is 8. I wonder what INNODB_FT_INDEX_TABLE shows for the new table.

mysql> set global innodb_ft_aux_table='newdb/ibft2';
Query OK, 0 rows affected (0.00 sec)

mysql> select * from information_schema.INNODB_FT_INDEX_TABLE;
Empty set (0.00 sec)

mysql> select * from newdb.ibft2 where match(c1) against ('dog');
+----+-----------+
| id | c1        |
+----+-----------+
|  2 | brown dog |
|  4 | white dog |
+----+-----------+
2 rows in set (0.00 sec)

So... INNODB_FT_INDEX_TABLE is empty(!) for the new table, but I can still make queries against this table? Hmm...

mysql> select * from information_schema.INNODB_FT_INDEX_CACHE;
+--------+--------------+-------------+-----------+--------+----------+
| WORD   | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| brown  |            1 |           2 |         2 |      1 |        0 |
| brown  |            1 |           2 |         2 |      2 |        0 |
| cow    |            1 |           3 |         2 |      1 |        6 |
| cow    |            1 |           3 |         2 |      3 |        6 |
| dog    |            2 |           4 |         2 |      2 |        6 |
| dog    |            2 |           4 |         2 |      4 |        6 |
| green  |            6 |           6 |         1 |      6 |        0 |
| horse  |            5 |           5 |         1 |      5 |        7 |
| iguana |            6 |           6 |         1 |      6 |        6 |
| white  |            3 |           4 |         2 |      3 |        0 |
| white  |            3 |           4 |         2 |      4 |        0 |
| yellow |            5 |           5 |         1 |      5 |        0 |
+--------+--------------+-------------+-----------+--------+----------+
12 rows in set (0.00 sec)

Aha! There's the info I'm looking for; it's in the _CACHE table because ... well, just because it is, I guess. I don't know why it's there now and wasn't there after I created my first table. It turns out that you need to OPTIMIZE TABLE in order to rebuild the index and I guess merge the "new" index data with the "main search index".

mysql> optimize table ibft2;
+-------------+----------+----------+-------------------------------------------------------------------+
| Table       | Op       | Msg_type | Msg_text                                                          |
+-------------+----------+----------+-------------------------------------------------------------------+
| newdb.ibft2 | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| newdb.ibft2 | optimize | status   | OK                                                                |
+-------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (0.07 sec)

Oh, first you need to set a different global variable ... innodb_optimize_fulltext_only.

mysql> set global innodb_optimize_fulltext_only=1;
Query OK, 0 rows affected (0.01 sec)

mysql> optimize table ibft2;
+-------------+----------+----------+----------+
| Table       | Op       | Msg_type | Msg_text |
+-------------+----------+----------+----------+
| newdb.ibft2 | optimize | status   | OK       |
+-------------+----------+----------+----------+
1 row in set (0.05 sec)

mysql> select * from information_schema.INNODB_FT_INDEX_TABLE;
+--------+--------------+-------------+-----------+--------+----------+
| WORD   | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| brown  |            1 |           2 |         2 |      1 |        0 |
| brown  |            1 |           2 |         2 |      2 |        0 |
| cow    |            1 |           3 |         2 |      1 |        6 |
| cow    |            1 |           3 |         2 |      3 |        6 |
| dog    |            2 |           4 |         2 |      2 |        6 |
| dog    |            2 |           4 |         2 |      4 |        6 |
| green  |            6 |           6 |         1 |      6 |        0 |
| horse  |            5 |           5 |         1 |      5 |        7 |
| iguana |            6 |           6 |         1 |      6 |        6 |
| white  |            3 |           4 |         2 |      3 |        0 |
| white  |            3 |           4 |         2 |      4 |        0 |
| yellow |            5 |           5 |         1 |      5 |        0 |
+--------+--------------+-------------+-----------+--------+----------+
12 rows in set (0.00 sec)

mysql> select * from information_schema.INNODB_FT_CONFIG;
+---------------------------+-------+
| KEY                       | VALUE |
+---------------------------+-------+
| optimize_checkpoint_limit | 180   |
| synced_doc_id             | 7     |
| last_optimized_word       |       |
| deleted_doc_count         | 0     |
| total_word_count          | 8     |
| optimize_start_time       |       |
| optimize_end_time         |       |
| stopword_table_name       |       |
| use_stopword              | 1     |
| table_state               | 0     |
+---------------------------+-------+
10 rows in set (0.00 sec)

Alright, we're finally back to where we started! The thing I had planned to look at is how privileges interact with this information_schema metadata strategy. Basically, you need SUPER to be able to look at the data, which seems like an obvious potential problem in terms of unprivileged users being able to access information about InnoDB Fulltext indexes on their own tables. The more interesting thing, though, is that unprivileged users can learn quite a bit abount the contents of whatever table innodb_ft_aux_table happens to be set to.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| test               |
+--------------------+
2 rows in set (0.00 sec)

mysql> use newdb;
ERROR 1044 (42000): Access denied for user 'noprivs'@'localhost' to database 'newdb'
mysql> select * from newdb.ibft2;
ERROR 1142 (42000): SELECT command denied to user 'noprivs'@'localhost' for table 'ibft2'
mysql> select table_schema,table_name from information_schema.tables where table_name='ibft2';
Empty set (0.01 sec)
mysql> select @@innodb_ft_aux_table;
+-----------------------+
| @@innodb_ft_aux_table |
+-----------------------+
| newdb/ibft2           |
+-----------------------+
1 row in set (0.00 sec)

mysql> select * from information_schema.innodb_ft_config;
+---------------------------+-------+
| KEY                       | VALUE |
+---------------------------+-------+
| optimize_checkpoint_limit | 180   |
| synced_doc_id             | 7     |
| last_optimized_word       |       |
| deleted_doc_count         | 0     |
| total_word_count          | 8     |
| optimize_start_time       |       |
| optimize_end_time         |       |
| stopword_table_name       |       |
| use_stopword              | 1     |
| table_state               | 0     |
+---------------------------+-------+
10 rows in set (0.00 sec)

mysql> select * from information_schema.innodb_ft_index_table;
+--------+--------------+-------------+-----------+--------+----------+
| WORD   | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| brown  |            1 |           2 |         2 |      1 |        0 |
| brown  |            1 |           2 |         2 |      2 |        0 |
| cow    |            1 |           3 |         2 |      1 |        6 |
| cow    |            1 |           3 |         2 |      3 |        6 |
| dog    |            2 |           4 |         2 |      2 |        6 |
| dog    |            2 |           4 |         2 |      4 |        6 |
| green  |            6 |           6 |         1 |      6 |        0 |
| horse  |            5 |           5 |         1 |      5 |        7 |
| iguana |            6 |           6 |         1 |      6 |        6 |
| white  |            3 |           4 |         2 |      3 |        0 |
| white  |            3 |           4 |         2 |      4 |        0 |
| yellow |            5 |           5 |         1 |      5 |        0 |
+--------+--------------+-------------+-----------+--------+----------+
12 rows in set (0.00 sec)

My user can't even see whether a particular table or database exists, but he can get all kinds of information about the contents of the table by looking at information_schema.INNODB_FT_INDEX_TABLE/CACHE when innodb_ft_aux_table is set. Not too big a problem, though, right, because any user who sets innodb_ft_aux_table can just unset it after they've gathered whatever metadata/statistics they need. Not so fast!

mysql> set global innodb_ft_aux_table='';
ERROR 1231 (42000): Variable 'innodb_ft_aux_table' can't be set to the value of ''

Yikes.

I am trying to be careful to remind myself that this is the first appearance of InnoDB Fulltext in a public release, but I hope these quirks are just side-effects of immaturity and not design decisions that are intended to be kept around for very long!

Syndicate content