Chapter 7. Writing and using a plugin to gather data from an Oracle database

Table of Contents

Introduction
Complexity
Technologies
Time required
Prerequisites
Ingredients
Activity Plan
Creating the test data
Building the plugin
Plugin inner workins
Configuring the plugin
Let searchbox use the plugin
Testing it all

Introduction

This cookbook will explain how you can write a protocol extended plugin to access data stored on a Oracle database and how you can configure the plugin to retrieve data.

Complexity

1 2 3 4 5

You will write a C++ searchbox extended plugin retrieving documents from an Oracle database, configure searchbox to use the plugin and use it to gather some test data.

Technologies

???

Time required

About an hour

Prerequisites

You should be familiar with SQL and with Oracle server operations.

Warning

Due to Oracle license terms you might need a commercial license for the Oracle server you are using with searchbox.

Ingredients

  • A working install of focuseek searchbox. This cookbook assumes that your installation is accessible at the SOAP endpoint http://admin:password@localhost:2200/soap

  • A working install of Oracle server

  • A working install of Oracle C client development libraries and headers[10]

  • A recent version of the Boost C++ library (http://www.boost.org/). You don't need the library itself, only the .hpp header files.

Activity Plan

Creating the test data

You will now create the sbtest database. TODO

Building the plugin

TODO

Plugin inner workins

The plugin handles urls having the oracle scheme, i.e. urls beginning with “oracle:”. Obviously no such scheme is officially supported by w3c, but the plugin uses it to encode the information required to access a “document” stored in an Oracle database. Except for the prefix (oracle instead of odbc) a oracle URL has the form described in “The odbc URL scheme” paragraph of the searchbox reference manual for odbc urls.

Basically two different kind of oracle urls exist: one referring to a specific document and the other referring to an index of the documents. Both these urls encode an SQL query but the first one retrieve all the query results, while the second one retrieves one row only.

You will pass an index url to searchbox as a seed and searchbox will fetch it through the plugin which will query the database and return an index containing a document url for each row of the result set. Then searchbox will fetch each one of these document urls, one by one.

Configuring the plugin

The plugin automatically turns the result set into an index but, in order to build a document searchbox will index you need to specify how the plugin should map the result set columns into an FFF structure. The rules are passed as a text file in the Rules argument to the plugin.

The rules accepted by the oracle plugin are the same accepted by the odbc plugin. For a full description see “Configuration values” paragraph the chpater describing the odbc plugin of the searchbox reference manual.

You will now configure the plugin to map the second column of the result set to the FFF normal text slice and the third one to build a metadata with key pubdate and belonging to the FFF title slice. The value for the configuration, also available in the file rules.txt in the examples directory of searchbox installation, is:

FFF
T	centralNorm	2
DM	title	pubdate	3

Please note that the fields are separated by tabs, not spaces.

To let searchbox know the configuration we use the addextendedplugin verb of the searchbox-cli command. Open a new command interpreter, change the current directory to plugins/db-oracle subdirectory of the searchbox examples library and run the following command[11]:

searchbox-cli addextendedplugin --endpoint=http://admin:password@localhost:2200/soap
	--name=oraclecook --description="Plugin from the oracle cookbook"
	--plugindll=oracleplugin Rules --value=rules.txt

For the sake of clarity the command above is divided on three lines but you must input it on a single line. The command will output this message:

Created new extended plugin configuration, id plugind

where plugind is the id of the plugin you just created. Write it down as you will need it later.

Let searchbox use the plugin

To let searchbox used the plugin you just built you will create a new source and tell searchbox to use the plugin to fetch suitable urls from the source[12].

Minimize the command shell but don't close it: you will need it later. Run searchbox control panel, create a new source called oracle cookbook. As the only seed add

oracle:sbtest/q=select%20*%20from%20test.table%20--!!!PKW!!!--/k=id

Figure 7.1. Adding the seed

Adding the seed

Click on Apply and accept the default filter that searchbox controlpanel will add. Now open the Info pane and take note of the Source ID (1 in the snapshot).

Figure 7.2. The source info pane

The source info pane

Now select the Authentication pane and set Authentication type to Basic.

Figure 7.3. Setting the authentication type

Setting the authentication type

Get back to the source Info pane nad create a new archive using the button on this pane. When asked reply you want to keep the last version of each document. Now select the “Gathering” pane of the archive you just created and push the Edit authentication... button. Then fill in the username and password fields with the values corresponding to a user who can perform an SQL SELECT from the table table we created above.

Figure 7.4. Setting the authentication data

Setting the authentication data

For the last step you need the command shell again; you will use the editsource subcommand of the searchbox-cli command. Un-minimize the shell you opened at the beginning of the cookbook and type:

searchbox-cli editsource
--endpoint=http://admin:password@localhost:2200/soap
--id=sourceid
--extendedpluginsbundle=Protocol pluginid 0

Again the command was split on multiple lines for the sake of clarity but you must type it all on a single line. Please note you have to replace pluginid with the plugin ID you wrote down when you configured the plugin with addextendedplugin plugin and sourceid with the source ID you wrote down from the Info pane of searchbox controlpanel.

This command tells searchbox that, when searchbox gathers documents from source sourceid and has to choose how it should fetch a document, it must first ask to plugin pluginid if it handles the document protocol (oracle in our example) and only if the plugin doesn't accept it searchbox should try with the “plugin” 0, i.e. searchbox own internal gathering protocols. This lets you add http etc. urls and have searchbox handle them as always and is particularly useful when setting the root bundles.

Testing it all

You are now ready to test your work. Choose the Info pane of the archive and press the Start button to start the gathering.

Figure 7.5. Starting the gathering

Starting the gathering

Wait some moments to let searchbox gather some documents (you may watch at the gathering process using the Logs... button). When you are satisfied go to the Query pane of you archive and try a query. As usual, start with an empty one (just press the Search button leaving the query field blank) and then try something more complex.



[10] Oracle C client libraries and headers are distributed along with the server if you installed Oracle from sources but are packaged separately if you installed an Oracle prebuilt binary. For more details see Oracle web site.

[11] Unless your PATH includes the searchbox binaries installation directory you must use the full path to the searchbox-cli executable to invoke it.

[12] searchbox can be configured to apply some plugins to all the sources; see the User Manual for more details.