In one aspect, the present invention comprises a document citation overview tool (CTO) that allows users to see how often documents from different resources (particular authors, journals or record baskets) have been cited in a selected year range. CTO fulfills a need in the scientific market for easy to use tools for bibliometric analyses.
On the document citation overview page (see FIG. 1), a two-dimensional table is used to display citation counts. There are varieties of citation counts provided by this table:                By selected document and selected year (cell value)        By selected document and selected year range (row total)        By all selected documents and per selected year (column total)        By all selected documents and selected year range (grand total)        
Thousands of documents may be analyzed together. Users can select a year range, configure the number of documents displayed on each page if multiple pages are needed for display, and navigate pages through “previous” and “next” buttons. A citation weight may be displayed that shows the number of citations (grand total) divided by the number of all selected documents for the selected year range.
Users also may save a document set into a saved basket and access a cited-by-result list by clicking a citation count to display all citations associated with that count. In various embodiments, in addition to using dimensions of document and year, users can search on other parameters (author names, institutes, journal names, subjects, etc.) in various combinations.
Although those skilled in the art will be able to make and use a citation tool and citation overview pages based on the functional description below, additional technical solutions to technical problems were required in order to have a citation tool capable of providing search results in a short period of time. Users typically are not satisfied with great results if those results take too long to obtain. Those technical solutions also are described herein.
A strategy that likely would have been used by those familiar with the prior art, would have been to use a naive XQuery approach for the citation queries.
Example:
define function classifyCitedReferencesByYear($eid-list as item( )*)as item( )*{ for $eid in $eid-list return  <eid id=“{$eid}”>   {classifyCitedReferenceByYear($eid)}  </eid>}define function classifyCitedReferenceByYear($eid as item( )) as item( )*{ let $allYears:= data(/ANI-RECORD     [BIBLIOGRAPHY/reference/ref-info/refd-itemidlist/itemid[@idtype=“SCP”] = $eid]    /ANI-SOURCE/publicationdate/year) let $uniqueYears := distinct-values($allYears) for $y in $uniqueYears return  <classification type=“year” value=“{$y}“ count=“  {count(index-of($allYears,$y))}” /> }<eidList> {classifyCitedReferencesByYear((eids go here)))}</eidList>
But this approach has several drawbacks: (1) the use of distinct-values( ) requires all values to be in memory simultaneously; (2) complex XPath expressions require post-filtering of data structures to confirm that index hits are correct; and (3) an I/O is required for every referring document, to fetch the year data. Clearly, this approach does not scale well.
Goals of the present invention include: (1) resolve a query entirely out of indexes; (2) minimize index-related disk I/O; and (3) minimize per-cell computation time.
The preferred solution, described below, is based on a strategy that: (a) uses xdmp:estimate( ) to constrain counting activities to index-only computation; and (b) uses a combination of index techniques to optimize the caching of the indexes so that steady-state evaluation of a query will resolve disk-free.
In one aspect, the present invention comprises a computer system for searching databases and displaying search results, comprising: one or more databases storing information regarding publications, the information comprising author, title, date of publication, cited references, and citing references data; and one or more Internet servers in communication with the one or more databases; wherein at least one of the one or more Internet servers is in communication with and operable to transmit data to a Web browser resident on a user's computer, and wherein the data is sufficient to enable the browser to display a citation overview page comprising: (a) a list of one or more titles of publications, and (b) one or more displayed numerals representing how many publications of one or more specified categories cite to each of the publications.
In various embodiments, in various combinations: (1) one or more specified categories correspond to publication years; (2) at least one of the one or more displayed numerals represents a grand total of how many publications of all specified categories citing to any of the listed publications; (3) the citation overview page comprises a citation weight display that represents the grand total divided by how many publications are listed on the citation overview page; (4) the displayed numerals are hyperlinks; and/or (5) the data is sufficient to enable the browser to display a cited by result page linked to one of the one or more displayed numerals and listing publications in a category corresponding to the one of the one or more displayed numerals.
In another aspect, the invention comprises a computer system for searching databases and displaying search results, comprising: one or more databases storing information regarding publications, the information comprising author, title, date of publication, cited references, and citing references data; and one or more Internet servers in communication with the one or more databases; wherein at least one of the one or more Internet servers is in communication with and operable to transmit data to a Web browser resident on a user's computer, and wherein the data is sufficient to enable the browser to display a citation overview page comprising: (a) a list of one or more names of authors, and (b) for each of the names, one or more numerals representing how many publications of one or more specified types cite to publications on which that name is listed as an author or co-author.
In various embodiments, in various combinations: (1) the one or more specified categories correspond to publication years; (2) at least one of the one or more displayed numerals represents a grand total of how many publications of all specified categories citing to any of the listed names of authors; (3) the citation overview page comprises a citation weight display that represents the grand total divided by how many names of authors are listed on the citation overview page; (4) the displayed numerals are hyperlinks; (5) the data is sufficient to enable the browser to display a cited by result page linked to one of the one or more displayed numerals and listing publications in a category corresponding to the one of the one or more displayed numerals; (6) the citation overview page comprises an exclude author self citations button operable to send a request to the at least one of the one or more Internet servers for data sufficient to enable the browser to display a citation overview page with excluded author self citations for a selected name of an author; (7) the citation overview page with excluded author self citations for a selected name of an author comprises a first displayed numeral representing how many publications in one of the specified categories cited to publications that list the selected name as an author; and/or (8) the citation overview page comprises a second displayed numeral representing how many publications in the one of the specified categories but not listing the name as an author cited to publications that list the name as an author.
In other embodiments: (1) at least one of the one or more databases is an XML-based database; (2) the XML-based database is operable to be searched using XQuery statements that count how many publications in a specified category cite to a specified publication; (3) at least one of the XQuery statements is written as an estimated XPath and unnecessary XPath steps are eliminated; and (4) at least one of the XQuery statements is written with one or more predicate indexes and at least one of the predicate indexes is remapped into memory.
Other aspects and embodiments of the invention will be apparent to those skilled in the art after reviewing the drawings, detailed description, and claims provided below.