Details

  • Type: New Feature New Feature
  • Status: Resolved Resolved
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 1.0.4
  • Fix Version/s: 2.0
  • Component/s: None
  • Labels:
    None

Activity

Hide
Fabrizio Giustina added a comment -

Committed a basic implementation, needs to be cleaned up and documented.
The new implementation doesn't use any of the current magnolia search APIs but directly wraps the jackrabbit/jcr query.

Compared to the magnolia implementation:

  • it's blazing fast and less memory hungry since there is no eager creation of result collection
  • has built-in paging which works at jackrabbit/lucene level
  • everything is lazy, magnolia (and also jackrabbit) nodes are built during iteration
  • the query and result object support all the extra items needed for paging (total number of results, number of pages, current page... everything already calculated)
  • result items support excerpts
  • result items support scoring
  • built-in spellchecker support (if configured as in http://wiki.apache.org/jackrabbit/SpellChecker )

Notes/drawbacks compared to standard magnolia queries:

  • you cannot specify an item type AFTER the query [like getContent("mgnl:content")], the query will always return all the results as found. I think the original implementation in magnolia was a big mistake, item type filtering should be done directly on the initial query, having a post filter here has no sense and causes a lot of overhead (it basically makes impossible lazy loading, paging...)
  • no security checks are performed on the returned result: the list of returned pages will also include pages forbidden to the current user, so you should handle it manually if needed. This area needs some good tests and checks before a 2.0 release.
Show
Fabrizio Giustina added a comment - Committed a basic implementation, needs to be cleaned up and documented. The new implementation doesn't use any of the current magnolia search APIs but directly wraps the jackrabbit/jcr query. Compared to the magnolia implementation:
  • it's blazing fast and less memory hungry since there is no eager creation of result collection
  • has built-in paging which works at jackrabbit/lucene level
  • everything is lazy, magnolia (and also jackrabbit) nodes are built during iteration
  • the query and result object support all the extra items needed for paging (total number of results, number of pages, current page... everything already calculated)
  • result items support excerpts
  • result items support scoring
  • built-in spellchecker support (if configured as in http://wiki.apache.org/jackrabbit/SpellChecker )
Notes/drawbacks compared to standard magnolia queries:
  • you cannot specify an item type AFTER the query [like getContent("mgnl:content")], the query will always return all the results as found. I think the original implementation in magnolia was a big mistake, item type filtering should be done directly on the initial query, having a post filter here has no sense and causes a lot of overhead (it basically makes impossible lazy loading, paging...)
  • no security checks are performed on the returned result: the list of returned pages will also include pages forbidden to the current user, so you should handle it manually if needed. This area needs some good tests and checks before a 2.0 release.

People

  • Assignee:
    Fabrizio Giustina
    Reporter:
    Fabrizio Giustina
Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: