MaxScale - Do we need another MySQL proxy?

I have spent some time thinking about and working on a project that went public on GitHub at the beginning of this year. That project is called MaxScale and is primarily a proxy server for MySQL/MariaDB databases, although it can be something much more than this. The obvious and often asked question is why do we need another proxy? I want to try to give you a flavor for what MaxScale is and why I think there is a need for a tool like MaxScale.

The architecture of MaxScale makes it different from your average proxy

  • MaxScale has awareness of the content it is shipping.
  • As well as being aware of the content the proxy is shipping it is also aware of the configuration and state of the servers to which requests are proxied.
  • It provides plugin modules that can be used to implement the routing logic, supported protocols, authentication methods, monitoring and filters.
  • The implementation is a lightweight, event driven core executed by multiple threads, efficient multiplexing of multiple requests on a single thread allows the best use to be made of the available threads.

Why content awareness helps

MaxScale is able to look inside of the requests it is forwarding and use data gathered by viewing the request content to help it decide the best location to send the request. This means that it is possible to look at the SQL statements, determine the read or write scope of the statement, the database objects that are manipulated and the keys involved in the statement. Using all of this data MaxScale can then make very well educated decisions as to which is the best destination database server to handle the request.

Configuration awareness

As well as content awareness MaxScale is also aware of the dynamic configuration of the database servers. This means that it can take into account the current state of each server and the role it is currently assigned within a cluster of servers. In a typical Master/Slave replication scenario this could mean knowing which server is currently the writeable master and which are slaves. It is possible to take this even further and have MaxScale aware of any replication lag in the system on a per server, per schema or per table basis. The result is that MaxScale has the information to be able to make the best use possible of the database servers that are behind the proxy. This can all be done transparently to the client applications that simply attach to MaxScale as if it were a single instance of MySQL or MariaDB.

Plugin Architecture

Rather than have MaxScale attempt to be all things for all men it has been designed with the concept of using plugin modules to allow the functionality to be tailored to individual implementation needs. This same architecture allows for very simple addition to the available functionality by third party developers. The plugin API and complexity has been deliberately designed to ease this task.

Plugin interfaces exist to allow new protocols to be added, currently MaxScale supports the MySQL protocol for both client connections and for connections to the underlying databases. It is possible to add a new protocol for client connections or for connection to a database that does not support the MySQL protocol. In addition to the protocol plugin modules there are also query routing plugins. These query routers are the modules that make the decision as to which backend database server a particular request is routed. They may either route on a connection or individual statement basis, the routing decision they make is dependent on the statement being routed, data collected by the MaxScale monitors as to the state of the underlying databases and the policy implemented by the router itself.

Monitors are the third plugin module type supported in MaxScale. These are responsible for gathering data from the underlying databases and storing it within MaxScale. This data is used as input to the various query routing modules in order to influence the routing decision of those modules based on the database state. Currently two monitors are available, one targeted to Master/Slave replication setups and the other to Galera clusters.

Two more plugin module types are planned but not yet implemented in MaxScale; authentication plugins and filters. Authentication plugins allow the use of non-native methods of authentication and are planned to follow the same model as the authentication plugins in MariaDB.

Filter plugins are designed to allow arbitrary operations and transformations on the actual SQL requests and the result sets returned. Filters will be able to be connected together to form a chain, with branching options, so that a single request can traverse a complex set of filtering modules and even be split and sent to multiple underlying or external systems. The filters will be able to attach hints to the request data that can be used in downstream filters or the final routing process for the request. A filter may also reject the request, causing it to send a failure notification back to the client, thus providing firewall capabilities for your underlying database. Filters may also modify the request itself or simply pass the request on unaltered, logging information from the request. As well as allowing filtering on the requests it is also planned to allow filters to be used on the result sets returned from these requests.

Event driven, light weight and multi-threaded

It was desirable that MaxScale should have as low a footprint as possible, to that end it uses an efficient polling core, based on the Linux epoll mechanism. All I/O operations are non-blocking, the event driven model allows for the easy sharing of a thread between multiple requests. Threads are occupied only for the time it takes to do the internal processing and not for the duration of any database activities.

Another important design decision is the internal buffering strategy. It has been designed to allow the minimum of data copies; with the ultimate goal of zero data copies provided the required routing and filtering operations can be achieved without the need to examine the request or the response. In practice zero data copies is difficult to achieve for the request packets in most practical cases, but can be achieved for the result sets in the majority of cases.

Decide for yourself

Hopefully I have given you a flavor for why I think there is a place for a proxy like MaxScale. There is a long way to go yet, the implementation that is available via GitHub has some limitations and features that have not yet been started, however it does illustrate the concept. Please feel free to take a look at MaxScale and give the developers your feedback via the MaxScale Google Group.

Source Code: https://github.com/SkySQL/MaxScale
GoogleGroup: https://groups.google.com/forum/#!forum/maxscale

This blog was originally published on 27.1.2014 on Mark's MaxScale blog.

About the Author

Mark Riddoch works as Chief Architect at SkySQL. Mark is developing Open Source Cloud products using MariaDB as a core to create high availability database solutions in the cloud and on premise, complete with management and administration interfaces.