Examples of LcNoDiacriticsNormalizer

org.apache.accumulo.examples.wikisearch.normalizer.LcNoDiacriticsNormalizer
An {@link Normalizer} which performs the following steps:
1. Unicode canonical decomposition ( {@link Form#NFD})
2. Removal of diacritical marks
3. Unicode canonical composition ( {@link Form#NFC})
4. lower casing in the {@link Locale#ENGLISH English local}

Examples of org.apache.accumulo.examples.wikisearch.normalizer.LcNoDiacriticsNormalizer

  protected IndexRanges getTermIndexInformation(Connector c, Authorizations auths, String value, Set<String> typeFilter) throws TableNotFoundException {
    final String dummyTermName = "DUMMY";
    UnionIndexRanges indexRanges = new UnionIndexRanges();
    
    // The entries in the index are normalized, since we don't have a field, just try using the LcNoDiacriticsNormalizer.
    String normalizedFieldValue = new LcNoDiacriticsNormalizer().normalizeFieldValue("", value);
    // Remove the begin and end ' marks
    if (normalizedFieldValue.startsWith("'") && normalizedFieldValue.endsWith("'")) {
      normalizedFieldValue = normalizedFieldValue.substring(1, normalizedFieldValue.length() - 1);
    }
    Text fieldValue = new Text(normalizedFieldValue);

View Full Code Here

Examples of org.apache.accumulo.examples.wikisearch.normalizer.LcNoDiacriticsNormalizer

      
      // We are going to put the fields to be indexed into a multimap. This allows us to iterate
      // over the entire set once.
      Multimap<String,String> indexFields = HashMultimap.create();
      // Add the normalized field values
      LcNoDiacriticsNormalizer normalizer = new LcNoDiacriticsNormalizer();
      for (Entry<String,String> index : article.getNormalizedFieldValues().entrySet())
        indexFields.put(index.getKey(), index.getValue());
      // Add the tokens
      for (String token : tokens)
        indexFields.put(TOKENS_FIELD_NAME, normalizer.normalizeFieldValue("", token));
      
      for (Entry<String,String> index : indexFields.entries()) {
        // Create mutations for the in partition index
        // Row is partition id, colf is 'fi'\0fieldName, colq is fieldValue\0language\0article id
        m.put(indexPrefix + index.getKey(), index.getValue() + NULL_BYTE + colfPrefix + article.getId(), cv, article.getTimestamp(), NULL_VALUE);

View Full Code Here

Examples of org.apache.accumulo.examples.wikisearch.normalizer.LcNoDiacriticsNormalizer

      
      // We are going to put the fields to be indexed into a multimap. This allows us to iterate
      // over the entire set once.
      Multimap<String,String> indexFields = HashMultimap.create();
      // Add the normalized field values
      LcNoDiacriticsNormalizer normalizer = new LcNoDiacriticsNormalizer();
      for (Entry<String,String> index : article.getNormalizedFieldValues().entrySet())
        indexFields.put(index.getKey(), index.getValue());
      // Add the tokens
      for (String token : tokens)
        indexFields.put(TOKENS_FIELD_NAME, normalizer.normalizeFieldValue("", token));
      
      for (Entry<String,String> index : indexFields.entries()) {
        // Create mutations for the in partition index
        // Row is partition id, colf is 'fi'\0fieldName, colq is fieldValue\0language\0article id
        m.put(indexPrefix + index.getKey(), index.getValue() + NULL_BYTE + colfPrefix + article.getId(), cv, article.getTimestamp(), NULL_VALUE);

View Full Code Here

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.