SOLR Search API
From Transkribus Wiki
Revision as of 07:29, 4 June 2020 by Philip Kahle (Talk | contribs)
This search is only possible if the HTR has been post-processed (typically by UPVLC, contact info@readcoop.eu for questions)
Searching for keywords via the SOLR index can be done via GET request to
https://transkribus.eu/TrpServer/rest/keyword
with the following parameters:
-
query
string - the keyword to be searched -
start
int (default: 0) - first result -
rows
int (default: 10) - number of successive results to fetch- In order to process large amounts of hits, SOLR allows to define at a specific hit and show only the next N hits from there onward. This can be used to browse results page-wise (e.g. first page starts at 0 and shows 10 results, next page starts at 11 and shows next 10 etc.)
-
probL
float - lower limit for keyword probability (usually between 0.0 and 1.0) -
probL
float - upper limit for keyword probability (usually 1.0)- Each keyword is stored with a probability value. It is possible to limit searches to results above or below a certain probability. (Note: Currently, the keyword probabilities are stored directly as provided. To transform these probabilities into true relevance probabilities, a calibration function is required in the user interface.)
-
filter
string - allows to specify certain fields and values to filter search results (can take multiple values as in ...&filter=cId:1895&filter=id:4243_221_*...)- fields to filter by are
-
id
: (string) index element id, consisting of document id, page number and a running number for word on the page, separated by underscores -> e.g. 4432_15_10 would be word 10 on page 15 of document 4432. Setting a filter string to 4432_15_* would limit searches to this document and page; *_20_* would limit searches to page 20 of any document. -
title
: (string) title of the document -
cId
: (int) collection id -
auth
: (string) name of the author
-
fuzzy
: int - takes all integer values, but SOLR currently only supports values between 0 and 2- SOLR allows to include results that differ in a certain amount of characters.
-
sorting
string - allows to sort by certain fields. (usually "rp desc" to show results with descending probability)
Example:
Searching for the keyword "london" in collection 1234 with any probability, displaying the first 100 results sorted by descending probability.