Examples of org.apache.lucene.facet.taxonomy.TaxonomyReader

Package org.apache.lucene.facet.taxonomy

Examples of org.apache.lucene.facet.taxonomy.TaxonomyReader

org.apache.lucene.facet.taxonomy.TaxonomyReader
TaxonomyReader is the read-only interface with which the faceted-search library uses the taxonomy during search time.
A TaxonomyReader holds a list of categories. Each category has a serial number which we call an "ordinal", and a hierarchical "path" name:
- The ordinal is an integer that starts at 0 for the first category (which is always the root category), and grows contiguously as more categories are added; Note that once a category is added, it can never be deleted.
- The path is a CategoryPath object specifying the category's position in the hierarchy.
Notes about concurrent access to the taxonomy:
An implementation must allow multiple readers to be active concurrently with a single writer. Readers follow so-called "point in time" semantics, i.e., a TaxonomyReader object will only see taxonomy entries which were available at the time it was created. What the writer writes is only available to (new) readers after the writer's commit() is called.
In faceted search, two separate indices are used: the main Lucene index, and the taxonomy. Because the main index refers to the categories listed in the taxonomy, it is important to open the taxonomy *after* opening the main index, and it is also necessary to reopen() the taxonomy after reopen()ing the main index.
This order is important, otherwise it would be possible for the main index to refer to a category which is not yet visible in the old snapshot of the taxonomy. Note that it is indeed fine for the the taxonomy to be opened after the main index - even a long time after. The reason is that once a category is added to the taxonomy, it can never be changed or deleted, so there is no danger that a "too new" taxonomy not being consistent with an older index. @lucene.experimental

  
  @Test
  public void testDifferentNumResults() throws Exception {
    // test the collector w/ FacetRequests and different numResults
    DirectoryReader indexReader = DirectoryReader.open(indexDir);
    TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoDir);
    IndexSearcher searcher = new IndexSearcher(indexReader);
    
    FacetSearchParams fsp = new FacetSearchParams(new CountFacetRequest(CP_A, NUM_CHILDREN_CP_A), 
        new CountFacetRequest(CP_B, NUM_CHILDREN_CP_B));
    FacetsCollector fc = FacetsCollector.create(randomAccumulator(fsp, indexReader, taxoReader));

View Full Code Here

  }
  
  @Test
  public void testAllCounts() throws Exception {
    DirectoryReader indexReader = DirectoryReader.open(indexDir);
    TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoDir);
    IndexSearcher searcher = new IndexSearcher(indexReader);
    
    FacetSearchParams fsp = new FacetSearchParams(new CountFacetRequest(CP_A, NUM_CHILDREN_CP_A), 
        new CountFacetRequest(CP_B, NUM_CHILDREN_CP_B));
    FacetsCollector fc = FacetsCollector.create(randomAccumulator(fsp, indexReader, taxoReader));

View Full Code Here

  }
  
  @Test
  public void testBigNumResults() throws Exception {
    DirectoryReader indexReader = DirectoryReader.open(indexDir);
    TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoDir);
    IndexSearcher searcher = new IndexSearcher(indexReader);
    
    FacetSearchParams fsp = new FacetSearchParams(new CountFacetRequest(CP_A, Integer.MAX_VALUE), 
        new CountFacetRequest(CP_B, Integer.MAX_VALUE));
    FacetsCollector fc = FacetsCollector.create(randomAccumulator(fsp, indexReader, taxoReader));

View Full Code Here

  }
  
  @Test
  public void testNoParents() throws Exception {
    DirectoryReader indexReader = DirectoryReader.open(indexDir);
    TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoDir);
    IndexSearcher searcher = new IndexSearcher(indexReader);
    FacetSearchParams fsp = new FacetSearchParams(fip, new CountFacetRequest(CP_C, NUM_CHILDREN_CP_C), 
        new CountFacetRequest(CP_D, NUM_CHILDREN_CP_D));
    FacetsCollector fc = FacetsCollector.create(randomAccumulator(fsp, indexReader, taxoReader));
    searcher.search(new MatchAllDocsQuery(), fc);

View Full Code Here

    indexTwoDocs(indexWriter, facetFields, true);  // 6th segment, with content, with categories
    indexTwoDocs(indexWriter, null, true);         // 7th segment, with content, no categories
    IOUtils.close(indexWriter, taxoWriter);


    DirectoryReader indexReader = DirectoryReader.open(indexDir);
    TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoDir);
    IndexSearcher indexSearcher = new IndexSearcher(indexReader);
    
    // search for "f:a", only segments 1 and 3 should match results
    Query q = new TermQuery(new Term("f", "a"));
    FacetRequest countNoComplements = new CountFacetRequest(new CategoryPath("A"), 10);

View Full Code Here

    // index 100 docs, each with one category: ["root", docnum/10, docnum]
    // e.g. root/8/87
    index100Docs(indexDir, taxoDir, fip);
    
    DirectoryReader r = DirectoryReader.open(indexDir);
    TaxonomyReader tr = new DirectoryTaxonomyReader(taxoDir);
    
    CountFacetRequest facetRequest = new CountFacetRequest(new CategoryPath("root"), 10);
    // Setting the depth to '2', should potentially get all categories
    facetRequest.setDepth(2);
    facetRequest.setResultMode(ResultMode.PER_NODE_IN_TREE);

View Full Code Here

    // NRT open
    IndexSearcher searcher = newSearcher(writer.getReader());
    writer.close();


    // NRT open
    TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoWriter);
    taxoWriter.close();


    // Count both "Publish Date" and "Author" dimensions:
    FacetSearchParams fsp = new FacetSearchParams(
        new CountFacetRequest(new CategoryPath("Publish Date"), 10), 
        new CountFacetRequest(new CategoryPath("Author"), 10));


    // Aggregatses the facet counts:
    FacetsCollector c = FacetsCollector.create(fsp, searcher.getIndexReader(), taxoReader);


    // MatchAllDocsQuery is for "browsing" (counts facets
    // for all non-deleted docs in the index); normally
    // you'd use a "normal" query, and use MultiCollector to
    // wrap collecting the "normal" hits and also facets:
    searcher.search(new MatchAllDocsQuery(), c);


    // Retrieve & verify results:
    List<FacetResult> results = c.getFacetResults();
    assertEquals(2, results.size());
    assertEquals("Publish Date (0)\n  2012 (2)\n  2010 (2)\n  1999 (1)\n",
        FacetTestUtils.toSimpleString(results.get(0)));
    assertEquals("Author (0)\n  Lisa (2)\n  Frank (1)\n  Susan (1)\n  Bob (1)\n",
        FacetTestUtils.toSimpleString(results.get(1)));


    
    // Now user drills down on Publish Date/2010:
    fsp = new FacetSearchParams(new CountFacetRequest(new CategoryPath("Author"), 10));
    DrillDownQuery q2 = new DrillDownQuery(fsp.indexingParams, new MatchAllDocsQuery());
    q2.add(new CategoryPath("Publish Date/2010", '/'));
    c = FacetsCollector.create(fsp, searcher.getIndexReader(), taxoReader);
    searcher.search(q2, c);
    results = c.getFacetResults();
    assertEquals(1, results.size());
    assertEquals("Author (0)\n  Lisa (1)\n  Bob (1)\n",
        FacetTestUtils.toSimpleString(results.get(0)));


    // Smoke test PrintTaxonomyStats:
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    PrintTaxonomyStats.printStats(taxoReader, new PrintStream(bos, false, "UTF-8"), true);
    String result = bos.toString("UTF-8");
    assertTrue(result.indexOf("/Author: 4 immediate children; 5 total categories") != -1);
    assertTrue(result.indexOf("/Publish Date: 3 immediate children; 12 total categories") != -1);
    // Make sure at least a few nodes of the tree came out:
    assertTrue(result.indexOf("  /1999") != -1);
    assertTrue(result.indexOf("  /2012") != -1);
    assertTrue(result.indexOf("      /20") != -1);


    taxoReader.close();
    searcher.getIndexReader().close();
    dir.close();
    taxoDir.close();
  }

View Full Code Here

    // NRT open
    IndexSearcher searcher = newSearcher(writer.getReader());
    writer.close();


    // NRT open
    TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoWriter);
    taxoWriter.close();
    
    FacetSearchParams fsp = new FacetSearchParams(fip,
                                                  new CountFacetRequest(new CategoryPath("a", '/'), 10));


    // Aggregatses the facet counts:
    FacetsCollector c = FacetsCollector.create(fsp, searcher.getIndexReader(), taxoReader);


    // MatchAllDocsQuery is for "browsing" (counts facets
    // for all non-deleted docs in the index); normally
    // you'd use a "normal" query, and use MultiCollector to
    // wrap collecting the "normal" hits and also facets:
    searcher.search(new MatchAllDocsQuery(), c);
    List<FacetResult> results = c.getFacetResults();
    assertEquals(1, results.size());
    assertEquals(1, (int) results.get(0).getFacetResultNode().value);


    searcher.getIndexReader().close();
    taxoReader.close();
    dir.close();
    taxoDir.close();
  }

View Full Code Here

  }
  
  private void verifyMigratedIndex(Directory indexDir, Directory taxoDir, HashMap<String,Integer> expectedCounts, 
      FacetIndexingParams fip) throws Exception {
    DirectoryReader indexReader = DirectoryReader.open(indexDir);
    TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoDir);
    IndexSearcher searcher = new IndexSearcher(indexReader);


    assertFalse("index should not have deletions", indexReader.hasDeletions());
    
    verifyNotFacetsData(indexReader, searcher);

View Full Code Here

    writer.close();


    //System.out.println("searcher=" + searcher);


    // NRT open
    TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoWriter);
    taxoWriter.close();


    // Count both "Publish Date" and "Author" dimensions, in
    // drill-down:
    FacetSearchParams fsp = new FacetSearchParams(
        new CountFacetRequest(new CategoryPath("Publish Date"), 10), 
        new CountFacetRequest(new CategoryPath("Author"), 10));


    // Simple case: drill-down on a single field; in this
    // case the drill-sideways + drill-down counts ==
    // drill-down of just the query: 
    DrillDownQuery ddq = new DrillDownQuery(fsp.indexingParams, new MatchAllDocsQuery());
    ddq.add(new CategoryPath("Author", "Lisa"));
    DrillSidewaysResult r = new DrillSideways(searcher, taxoReader).search(null, ddq, 10, fsp);


    assertEquals(2, r.hits.totalHits);
    assertEquals(2, r.facetResults.size());
    // Publish Date is only drill-down, and Lisa published
    // one in 2012 and one in 2010:
    assertEquals("Publish Date: 2012=1 2010=1", toString(r.facetResults.get(0)));
    // Author is drill-sideways + drill-down: Lisa
    // (drill-down) published twice, and Frank/Susan/Bob
    // published once:
    assertEquals("Author: Lisa=2 Frank=1 Susan=1 Bob=1", toString(r.facetResults.get(1)));


    // Same simple case, but no baseQuery (pure browse):
    // drill-down on a single field; in this case the
    // drill-sideways + drill-down counts == drill-down of
    // just the query:
    ddq = new DrillDownQuery(fsp.indexingParams);
    ddq.add(new CategoryPath("Author", "Lisa"));
    r = new DrillSideways(searcher, taxoReader).search(null, ddq, 10, fsp);


    assertEquals(2, r.hits.totalHits);
    assertEquals(2, r.facetResults.size());
    // Publish Date is only drill-down, and Lisa published
    // one in 2012 and one in 2010:
    assertEquals("Publish Date: 2012=1 2010=1", toString(r.facetResults.get(0)));
    // Author is drill-sideways + drill-down: Lisa
    // (drill-down) published twice, and Frank/Susan/Bob
    // published once:
    assertEquals("Author: Lisa=2 Frank=1 Susan=1 Bob=1", toString(r.facetResults.get(1)));


    // Another simple case: drill-down on on single fields
    // but OR of two values
    ddq = new DrillDownQuery(fsp.indexingParams, new MatchAllDocsQuery());
    ddq.add(new CategoryPath("Author", "Lisa"), new CategoryPath("Author", "Bob"));
    r = new DrillSideways(searcher, taxoReader).search(null, ddq, 10, fsp);
    assertEquals(3, r.hits.totalHits);
    assertEquals(2, r.facetResults.size());
    // Publish Date is only drill-down: Lisa and Bob
    // (drill-down) published twice in 2010 and once in 2012:
    assertEquals("Publish Date: 2010=2 2012=1", toString(r.facetResults.get(0)));
    // Author is drill-sideways + drill-down: Lisa
    // (drill-down) published twice, and Frank/Susan/Bob
    // published once:
    assertEquals("Author: Lisa=2 Frank=1 Susan=1 Bob=1", toString(r.facetResults.get(1)));


    // More interesting case: drill-down on two fields
    ddq = new DrillDownQuery(fsp.indexingParams, new MatchAllDocsQuery());
    ddq.add(new CategoryPath("Author", "Lisa"));
    ddq.add(new CategoryPath("Publish Date", "2010"));
    r = new DrillSideways(searcher, taxoReader).search(null, ddq, 10, fsp);
    assertEquals(1, r.hits.totalHits);
    assertEquals(2, r.facetResults.size());
    // Publish Date is drill-sideways + drill-down: Lisa
    // (drill-down) published once in 2010 and once in 2012:
    assertEquals("Publish Date: 2012=1 2010=1", toString(r.facetResults.get(0)));
    // Author is drill-sideways + drill-down:
    // only Lisa & Bob published (once each) in 2010:
    assertEquals("Author: Lisa=1 Bob=1", toString(r.facetResults.get(1)));


    // Even more interesting case: drill down on two fields,
    // but one of them is OR
    ddq = new DrillDownQuery(fsp.indexingParams, new MatchAllDocsQuery());


    // Drill down on Lisa or Bob:
    ddq.add(new CategoryPath("Author", "Lisa"),
            new CategoryPath("Author", "Bob"));
    ddq.add(new CategoryPath("Publish Date", "2010"));
    r = new DrillSideways(searcher, taxoReader).search(null, ddq, 10, fsp);
    assertEquals(2, r.hits.totalHits);
    assertEquals(2, r.facetResults.size());
    // Publish Date is both drill-sideways + drill-down:
    // Lisa or Bob published twice in 2010 and once in 2012:
    assertEquals("Publish Date: 2010=2 2012=1", toString(r.facetResults.get(0)));
    // Author is drill-sideways + drill-down:
    // only Lisa & Bob published (once each) in 2010:
    assertEquals("Author: Lisa=1 Bob=1", toString(r.facetResults.get(1)));


    // Test drilling down on invalid field:
    ddq = new DrillDownQuery(fsp.indexingParams, new MatchAllDocsQuery());
    ddq.add(new CategoryPath("Foobar", "Baz"));
    fsp = new FacetSearchParams(
        new CountFacetRequest(new CategoryPath("Publish Date"), 10), 
        new CountFacetRequest(new CategoryPath("Foobar"), 10));
    r = new DrillSideways(searcher, taxoReader).search(null, ddq, 10, fsp);
    assertEquals(0, r.hits.totalHits);
    assertEquals(2, r.facetResults.size());
    assertEquals("Publish Date:", toString(r.facetResults.get(0)));
    assertEquals("Foobar:", toString(r.facetResults.get(1)));


    // Test drilling down on valid term or'd with invalid term:
    ddq = new DrillDownQuery(fsp.indexingParams, new MatchAllDocsQuery());
    ddq.add(new CategoryPath("Author", "Lisa"),
            new CategoryPath("Author", "Tom"));
    fsp = new FacetSearchParams(
        new CountFacetRequest(new CategoryPath("Publish Date"), 10), 
        new CountFacetRequest(new CategoryPath("Author"), 10));
    r = new DrillSideways(searcher, taxoReader).search(null, ddq, 10, fsp);
    assertEquals(2, r.hits.totalHits);
    assertEquals(2, r.facetResults.size());
    // Publish Date is only drill-down, and Lisa published
    // one in 2012 and one in 2010:
    assertEquals("Publish Date: 2012=1 2010=1", toString(r.facetResults.get(0)));
    // Author is drill-sideways + drill-down: Lisa
    // (drill-down) published twice, and Frank/Susan/Bob
    // published once:
    assertEquals("Author: Lisa=2 Frank=1 Susan=1 Bob=1", toString(r.facetResults.get(1)));


    // Test main query gets null scorer:
    fsp = new FacetSearchParams(
        new CountFacetRequest(new CategoryPath("Publish Date"), 10), 
        new CountFacetRequest(new CategoryPath("Author"), 10));
    ddq = new DrillDownQuery(fsp.indexingParams, new TermQuery(new Term("foobar", "baz")));
    ddq.add(new CategoryPath("Author", "Lisa"));
    r = new DrillSideways(searcher, taxoReader).search(null, ddq, 10, fsp);


    assertEquals(0, r.hits.totalHits);
    assertEquals(2, r.facetResults.size());
    assertEquals("Publish Date:", toString(r.facetResults.get(0)));
    assertEquals("Author:", toString(r.facetResults.get(1)));


    searcher.getIndexReader().close();
    taxoReader.close();
    dir.close();
    taxoDir.close();
  }

View Full Code Here

0 1 2 3 4 5 6 7 8 9

TOP

Related Classes of org.apache.lucene.facet.taxonomy.TaxonomyReader

com.gentics.cr.lucene.search.CRSearcher

org.apache.lucene.benchmark.byTask.tasks.CloseTaxonomyReaderTask

org.apache.lucene.benchmark.byTask.TestPerfTasksLogic

org.apache.lucene.demo.facet.AssociationsFacetsExample

org.apache.lucene.demo.facet.ExpressionAggregationFacetsExample

org.apache.lucene.demo.facet.MultiCategoryListsFacetsExample

org.apache.lucene.demo.facet.SimpleFacetsExample

org.apache.lucene.facet.complements.TestTotalFacetCounts

org.apache.lucene.facet.complements.TestTotalFacetCountsCache

org.apache.lucene.facet.example.adaptive.AdaptiveSearcher

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.