GuideStar Search API Technical Notes

Document created by JackCowardin Administrator on May 22, 2017Last modified by JackCowardin Administrator on May 22, 2017
Version 5Show Document
  • View in full screen mode
  1. This document provides some answers to questions that may arise for users of the GuideStar Search V 1_1 API.

 

What is the internal algorithm used in the Search API?

 

The Guidestar Search API employs Lucene as the search engine. More information about Lucene can be found at: https://lucene.apache.org/core/

Lucene employs a “scoring” algorithm to determine the  relevance of a specific index. The scoring algorithm is customizable since Lucene is an open source project, it’s very fast, and that’s why it was chosen as the best technology for the GuideStar Search API. 

 

Read more about Lucene scoring here: https://lucene.apache.org/core/3_6_0/scoring.html

 

GudieStar's  Search API scoring algorithm is proprietary, and full details are not provided, but it is based on a synthetic (composite) field “keyword” index that is comprised of 16 data elements, including :

 

  • Organization Name
  • The organization’s Mission Statement
  • Address
  • NTEE Code(s)
  • IRS Subsection
  • Achievements
  • Programs

 

If a search query such as

https://data.guidestar.org/v1_1/Search?q=american 

is used, you will be searching the “keyword” composite index, and relevance will be based upon a composition of the occurrence of the word “american” in all 16 data elements.

 

If the following search query is used

https://data.guidestar.org/v1_1/Search?q=organization_name:american 

you will be specifying that the search term "americanshould appear in the "organization_name" field, and you will get different results. All of the results, however, will have the search term in the organization_name field.

 

 

Why do Search API results not match the results of a search performed on the GuideStar.org website Search?

 

The GuideStar.org Search function available on our website currently employs Elastic Search and will therefore return results that differ from the Search API that uses Lucene. 

 

https://www.elastic.co/products/elasticsearch

 

Elastic is a somewhat newer technology and offers some advantages over Lucene. The Search API will be changed to employ Elastic in the near future, and at that time should produce search results that are more concurrent with the website Search function.  At this time, they should not be expected to be parallel in terms of search results, even though the same search scoring criteria are used.

 

 

If I am searching for "organization_name:american", in what order should I expect the results back?

 

Results will be returned in the order of relevance. Since you are specifying a field, “organization_name”, you will get matches where the term “american” occurs early in the string value of the organization name field. What occurs after that term in the results is not easy to determine. In practice, the search query

 

https://data.guidestar.org/v1_1/Search?q=organization_name:american&r=25

 

returns 25 results. 22 of these have the term “american” as the first term in the org name. But three have the term later in the text of the org name. For example, the first 16 “hits” have “american” as the first word in the org name.  The org name in the 17th search result is “Autry Museum of the American West”.

 

It’s not possible to predict result order unless one were to apply the scoring algorithm manually, which is neither practical nor possible. But keep in mind that an organization’s self-reported Achievements and Programs, if they exist in an organization’s NonProfit Update Program, are text fields and will be part of the scoring index.

 

 

If you are doing paging on the server, what options do we have for "custom" order?

 

There is no option for custom ordering of the GuideStar Search API.

Attachments

    Outcomes