Chapter 6. Writing and using a plugin to gather data from a MySQL database

Table of Contents

Introduction
Complexity
Technologies
Time required
Prerequisites
Ingredients
Activity Plan
Creating the test data
Building the plugin
Plugin inner workins
Configuring the plugin
Let searchbox use the plugin
Testing it all

Introduction

This cookbook will explain how you can write a protocol extended plugin to access data stored on a MySQL database and how you can configure the plugin to retrieve data.

Complexity

1 2 3 4 5

You will write a C++ searchbox extended plugin retrieving documents from a MySQL database, configure searchbox to use the plugin and use it to gather some test data.

Technologies

???

Time required

About an hour

Prerequisites

You should be familiar with SQL and with MySQL server operations.

Warning

Due to MySQL license terms you might need a commercial license for the MySQL server you are using with searchbox.

Ingredients

  • A working install of focuseek searchbox. This cookbook assumes that your installation is accessible at the SOAP endpoint http://admin:password@localhost:2200/soap

  • A working install of MySQL server

  • A working install of MySQL C client development libraries and headers[7]

  • A recent version of the Boost C++ library (http://www.boost.org/). You don't need the library itself, only the .hpp header files.

Activity Plan

Creating the test data

You will now create the sbtest database. TODO

Building the plugin

TODO

Plugin inner workins

The plugin handles urls having the mysql scheme, i.e. urls beginning with “mysql:”. Obviously no such scheme is officially supported by w3c, but the plugin uses it to encode the information required to access a “document” stored in a MySQL database. Except for the prefix (mysql instead of odbc) a mysql URL has the form described in “The odbc URL scheme” paragraph of the searchbox reference manual for odbc urls.

For MySQL urls the connection string part must be in the form: host,db,psock,flags where:

host

is the hostname of the server the database resides on.

db

is the name of the database searchbox should connect to, use localhost for a local connection.

psock

is the port number (for remote connections) or socket name (for local connections) the database listens to. If it is empty MySQL default values are used.

flags

encodes MySQL client flags, in hexadecimal. If it is empty MySQL default values are used.

Basically two different kind of mysql urls exist: one referring to a specific document and the other referring to an index of the documents. Both these urls encode an SQL query but the first one retrieve all the query results, while the second one retrieves one row only.

You will pass an index url to searchbox as a seed and searchbox will fetch it through the plugin which will query the database and return an index containing a document url for each row of the result set. Then searchbox will fetch each one of these document urls, one by one.

Configuring the plugin

The plugin automatically turns the result set into an index but, in order to build a document searchbox will index you need to specify how the plugin should map the result set columns into an FFF structure. The rules are passed as a text file in the Rules argument to the plugin.

The rules accepted by the mysql plugin are the same accepted by the odbc plugin. For a full description see “Configuration values” paragraph the chpater describing the odbc plugin of the searchbox reference manual.

You will now configure the plugin to map the second column of the result set to the FFF normal text slice and the third one to build a metadata with key pubdate and belonging to the FFF title slice. The value for the configuration, also available in the file rules.txt in the examples directory of searchbox installation, is:

FFF
T	centralNorm	2
DM	title	pubdate	3

Please note that the fields are separated by tabs, not spaces.

To let searchbox know the configuration we use the addextendedplugin verb of the searchbox-cli command. Open a new command interpreter, change the current directory to plugins/db-mysql subdirectory of the searchbox examples library and run the following command[8]:

searchbox-cli addextendedplugin --endpoint=http://admin:password@localhost:2200/soap
	--name=mysqlcook --description="Plugin from the mysql cookbook"
	--plugindll=mysqlplugin Rules --value=rules.txt

For the sake of clarity the command above is divided on three lines but you must input it on a single line. The command will output this message:

Created new extended plugin configuration, id plugind

where plugind is the id of the plugin you just created. Write it down as you will need it later.

Let searchbox use the plugin

To let searchbox used the plugin you just built you will create a new source and tell searchbox to use the plugin to fetch suitable urls from the source[9].

Minimize the command shell but don't close it: you will need it later. Run searchbox control panel, create a new source called mysql cookbook. As the only seed add

mysql:localhost,sbtest,,/q=select%20*%20from%20test.table%20--!!!PKW!!!--/k=id

Figure 6.1. Adding the seed

Adding the seed

Click on Apply and accept the default filter that searchbox controlpanel will add. Now open the Info pane and take note of the Source ID (1 in the snapshot).

Figure 6.2. The source info pane

The source info pane

Now select the Authentication pane and set Authentication type to Basic.

Figure 6.3. Setting the authentication type

Setting the authentication type

Get back to the source Info pane nad create a new archive using the button on this pane. When asked reply you want to keep the last version of each document. Now select the “Gathering” pane of the archive you just created and push the Edit authentication... button. Then fill in the username and password fields with the values corresponding to a user who can perform an SQL SELECT from the table table we created above.

Figure 6.4. Setting the authentication data

Setting the authentication data

For the last step you need the command shell again; you will use the editsource subcommand of the searchbox-cli command. Un-minimize the shell you opened at the beginning of the cookbook and type:

searchbox-cli editsource
--endpoint=http://admin:password@localhost:2200/soap
--id=sourceid
--extendedpluginsbundle=Protocol pluginid 0

Again the command was split on multiple lines for the sake of clarity but you must type it all on a single line. Please note you have to replace pluginid with the plugin ID you wrote down when you configured the plugin with addextendedplugin plugin and sourceid with the source ID you wrote down from the Info pane of searchbox controlpanel.

This command tells searchbox that, when searchbox gathers documents from source sourceid and has to choose how it should fetch a document, it must first ask to plugin pluginid if it handles the document protocol (mysql in our example) and only if the plugin doesn't accept it searchbox should try with the “plugin” 0, i.e. searchbox own internal gathering protocols. This lets you add http etc. urls and have searchbox handle them as always and is particularly useful when setting the root bundles.

Testing it all

You are now ready to test your work. Choose the Info pane of the archive and press the Start button to start the gathering.

Figure 6.5. Starting the gathering

Starting the gathering

Wait some moments to let searchbox gather some documents (you may watch at the gathering process using the Logs... button). When you are satisfied go to the Query pane of you archive and try a query. As usual, start with an empty one (just press the Search button leaving the query field blank) and then try something more complex.



[7] MySQL C client libraries and headers are distributed along with the server if you installed MySQL from sources but are packaged separately if you installed a MySQL prebuilt binary. For more details see MySQL web site.

[8] Unless your PATH includes the searchbox binaries installation directory you must use the full path to the searchbox-cli executable to invoke it.

[9] searchbox can be configured to apply some plugins to all the sources; see the User Manual for more details.