Searching within a subset of a document base

Improving search performance by running queries against previously obtained results.

Most of the sample queries provided with TEXTML Server are run against an entire docbase (see Searching for documents in a docbase).

This topic explains how to construct queries that are run against the results of one or more previously run queries. This allows you to improve performance by searching only within a subset of the documents in a docbase. It can also improve usability: your application can allow the user to search only within results that she has previously retrieved.

We'll look at three sample queries. The first query is SampleDateQuery.xml:

<?xml version="1.0" encoding="UTF-16"?>
<!-- SampleDateQuery.xml -->
<query VERSION="4.5" RESULTSPACE="Date Search Results">
    <key NAME="Date">
        <interval>
            <start INCLUSIVE="True">
                <date>
                    <year>2000</year>
                    <month>7</month>
                    <day>3</day>
                </date>
            </start>
            <end INCLUSIVE="True">
                <date>
                    <year>2000</year>
                    <month>10</month>
                    <day>1</day>
                </date>
            </end>
        </interval>
    </key>
</query>

It can search any docbase that has a Date index. It searches for documents within a specified date range. And it temporarily stores the results that it retrieves in a ResultSpace named "Date Search Results".

The second query is SampleMulticriteriaQuery.xml:

<?xml version="1.0" encoding="UTF-16"?>
<!-- SampleMultiCriteriaQuery.xml -->
<query VERSION="4.5" RESULTSPACE="Publication and Size Results">
    <andkey>
        <key NAME="Publication">
            <elem>Canada NewsWire</elem>
        </key>
        <property NAME="Size">
            <interval>
                <start INCLUSIVE="True">
                    <number>10000</number>
                </start>
                <end INCLUSIVE="True">
                    <number>19999</number>
                </end>
            </interval>
        </property>
    </andkey>
</query>

It searches for documents that meet two criteria: the publication must be Canada NewsWire; and the size of the document must be within a specified range. This query temporarily stores the results that it retrieves in a ResultSpace named "Publication and Size Results".

The third query is SampleCombinedResultsQuery.xml:

<?xml version="1.0" encoding="UTF-16"?>
<!-- SampleCombinedResultsQuery.xml -->
<query VERSION="4.5" RESULTSPACE="Combined Results">
    <andkey>
        <key NAME="FullText">
            <frq VALUE="2">
                <elem>business<anystr/></elem>
            </frq>
        </key>
        <include TYPE="ResultSpace">Date Search Results</include>
        <include TYPE="ResultSpace">Publication and Size Results</include>
    </andkey>
</query>

Because the third query uses <andkey>, it will only retrieve documents that meet all of the following criteria:

  • The document must contain at least two occurrences of any word that begins with "business".
  • The document must be in a ResultSpace named "Date Search Results".
  • The document must be in a ResultSpace named "Publication and Size Results".

The effect is that:

  • The third query combines the results of the two previous queries by ANDing together their ResultSpaces. This combined result is (normally) a subset of the docbase.
  • The third query then searches in the FullText index of the combined result.
Note: The two ResultSpaces will be available to the third query if the search program successfully runs the first two queries (and retains their ResultSpaces) before it runs the third query.

Sample queries and sample program

The three queries are located in your Program Files directory for TEXTML Server: [...]\IxiaSoft\TextmlServer[version]\SDK\Queries\*.xml.

The sample program for running the three queries is located here: [...]\IxiaSoft\TextmlServer[version]\SDK\java\com\ixiasoft\samples\MultiQuerySearch.java.

Run the sample program with the following parameters (among others):

QUERYFILE1=[path]SampleDateQuery.xml
QUERYFILE2=[path]SampleMulticriteriaQuery.xml
QUERYFILE3=[path] SampleCombinedResultsQuery.xml