Table of Contents
This cookbook will explain how you can write a protocol extended plugin to access data stored on a MySQL database and how you can configure the plugin to retrieve data.
1 2 3 4 5
You will write a C++ searchbox extended plugin retrieving documents from a MySQL database, configure searchbox to use the plugin and use it to gather some test data.
A working install of focuseek searchbox. This cookbook assumes
that your installation is accessible at the SOAP endpoint
http://admin:password@localhost:2200/soap
A working install of MySQL server
A working install of MySQL C client development libraries and headers[7]
A recent version of the Boost C++ library (http://www.boost.org/). You don't need the library itself, only the .hpp header files.
The plugin handles urls having the mysql
scheme, i.e. urls beginning with “mysql:”.
Obviously no such scheme is officially supported by w3c, but the plugin
uses it to encode the information required to access a “document” stored
in a MySQL database. Except for the prefix (mysql
instead of odbc) a mysql URL
has the form described in “The odbc URL scheme”
paragraph of the searchbox reference manual for
odbc urls.
For MySQL urls the connection string part must be in the form:
host,db,psock,flags
where:
hostis the hostname of the server the database resides on.
dbis the name of the database searchbox should connect to, use
localhost for a local connection.
psockis the port number (for remote connections) or socket name (for local connections) the database listens to. If it is empty MySQL default values are used.
flagsencodes MySQL client flags, in hexadecimal. If it is empty MySQL default values are used.
Basically two different kind of mysql urls
exist: one referring to a specific document and the other referring to
an index of the documents. Both these urls encode an SQL query but the
first one retrieve all the query results, while the second one retrieves
one row only.
You will pass an index url to searchbox as a seed and searchbox will fetch it through the plugin which will query the database and return an index containing a document url for each row of the result set. Then searchbox will fetch each one of these document urls, one by one.
The plugin automatically turns the result set into an index but,
in order to build a document searchbox will index you need to specify
how the plugin should map the result set columns into an FFF
structure. The rules are passed as a text file in the
Rules argument to the plugin.
The rules accepted by the mysql plugin are the same accepted by the odbc plugin. For a full description see “Configuration values” paragraph the chpater describing the odbc plugin of the searchbox reference manual.
You will now configure the plugin to map the second column of the
result set to the FFF normal text slice and the third one to build a
metadata with key pubdate and belonging to the FFF
title slice. The value for the configuration, also
available in the file rules.txt in the examples
directory of searchbox installation, is:
FFF T centralNorm 2 DM title pubdate 3
Please note that the fields are separated by tabs, not spaces.
To let searchbox know the configuration we use the
addextendedplugin verb of the
searchbox-cli command. Open a new command
interpreter, change the current directory to
plugins/db-mysql subdirectory of the searchbox
examples library and run the following command[8]:
searchbox-cli addextendedplugin --endpoint=http://admin:password@localhost:2200/soap
--name=mysqlcook --description="Plugin from the mysql cookbook"
--plugindll=mysqlplugin Rules --value=rules.txt
For the sake of clarity the command above is divided on three lines but you must input it on a single line. The command will output this message:
Created new extended plugin configuration, idplugind
where plugind is the id of the plugin
you just created. Write it down as you will need it later.
To let searchbox used the plugin you just built you will create a new source and tell searchbox to use the plugin to fetch suitable urls from the source[9].
Minimize the command shell but don't close it: you will need it
later. Run searchbox control panel, create a new
source called mysql cookbook. As the only seed
add
mysql:localhost,sbtest,,/q=select%20*%20from%20test.table%20--!!!PKW!!!--/k=id
Click on Apply and accept the default
filter that searchbox controlpanel will add. Now open the Info pane and
take note of the Source ID (1 in the
snapshot).
Now select the Authentication pane and set Authentication type to
Basic.
Get back to the source Info pane nad create a new archive using
the button on this pane. When asked reply you want to keep the last
version of each document. Now select the “Gathering” pane of the archive
you just created and push the Edit
authentication... button. Then fill in the username and
password fields with the values corresponding to a user who can perform
an SQL SELECT from the table table
we created above.
For the last step you need the command shell again; you will use the editsource subcommand of the searchbox-cli command. Un-minimize the shell you opened at the beginning of the cookbook and type:
searchbox-cli editsource --endpoint=http://admin:password@localhost:2200/soap --id=sourceid--extendedpluginsbundle=Protocolpluginid0
Again the command was split on multiple lines for the sake of
clarity but you must type it all on a single line. Please note you have
to replace pluginid with the plugin ID you
wrote down when you configured the plugin with
addextendedplugin plugin and
sourceid with the source ID you wrote down
from the Info pane of searchbox controlpanel.
This command tells searchbox that, when searchbox gathers
documents from source sourceid and has to
choose how it should fetch a document, it must first ask to plugin
pluginid if it handles the document protocol
(mysql in our example) and only if the plugin doesn't
accept it searchbox should try with the “plugin”
0, i.e. searchbox own internal gathering
protocols. This lets you add http etc. urls and have searchbox handle
them as always and is particularly useful when setting the root
bundles.
You are now ready to test your work. Choose the Info pane of the
archive and press the Start button to start the
gathering.
Wait some moments to let searchbox gather some documents (you may
watch at the gathering process using the Logs...
button). When you are satisfied go to the Query pane of you archive and
try a query. As usual, start with an empty one (just press the
Search button leaving the query field blank) and
then try something more complex.
[7] MySQL C client libraries and headers are distributed along with the server if you installed MySQL from sources but are packaged separately if you installed a MySQL prebuilt binary. For more details see MySQL web site.
[8] Unless your PATH includes the searchbox
binaries installation directory you must use the full path to the
searchbox-cli executable to invoke it.
[9] searchbox can be configured to apply some plugins to all the sources; see the User Manual for more details.