About indexed content
Indexed content is indexed in place of the document's content.
A document can contain many kinds of data, including content, custom properties, and indexed content. Indexed content is optional: your application may or may not store indexed content for some or all documents in a docbase.
If indexed content is present in a document, the indexed content can be retrieved whenever the document is retrieved from its docbase.
Whenever a docbase is indexed:
- If indexed content is present in a document, the indexed content will be indexed (in place of the content).
- If not, the content will be indexed (providing that the content is in XML).
Indexed content must be coded in (valid) XML. Accordingly, indexed content can always be indexed.
A document’s content, however, can be in XML — or it can be in PDF, JPEG, MP3, or any other binary or text format. However, TEXTML Server can index the content only if it is in XML.
Indexed content is application-specific. TEXTML Server does not provide a DTD for indexed content: the structure of indexed content is the responsibility of the application programmer.
How indexed content can be used
Indexed content is often used when the content of a document is not in XML.
Let’s say that the content is in PDF. The application program can read a PDF file, extract from the PDF some data to be indexed, and format the extracted data as XML. The program then:
- Sets the PDF as the document’s content.
- Sets the XML as the document’s indexed content.
As a result, whenever TEXTML Server indexes the docbase, index entries for the document will be based on the indexed content.