Examples of readCollectionDocumentCount()


Examples of ivory.core.RetrievalEnvironment.readCollectionDocumentCount()

    eTok = TokenizerFactory.createTokenizer(fs, eLang, tokenizerFile, true, conf.get("eStopword"), conf.get("eStemmedStopword"), null);
    sLogger.info("Tokenizer and vocabs created successfully from " + eLang + " " + tokenizerFile + "," + conf.get("eStopword") + "," + conf.get("eStemmedStopword"));

    eScoreFn = (ScoringModel) new Bm25();
    eScoreFn.setAvgDocLength(lang2AvgSentLen.get(eLang));        //average sentence length = heuristic based on De-En data
    eScoreFn.setDocCount(env.readCollectionDocumentCount());

    dict = new DefaultFrequencySortedDictionary(new Path(env.getIndexTermsData()), new Path(env.getIndexTermIdsData()), new Path(env.getIndexTermIdMappingData()), fs);
    dfTable = new DfTableArray(new Path(env.getDfByTermData()), fs);
  }
View Full Code Here

Examples of ivory.core.RetrievalEnvironment.readCollectionDocumentCount()

    fScoreFn = (ScoringModel) new Bm25();
    fScoreFn.setAvgDocLength(lang2AvgSentLen.get(fLang));        

    // we use df table of English side, so we should read collection doc count from English dir
    RetrievalEnvironment eEnv = new RetrievalEnvironment(eDir, localFs);
    fScoreFn.setDocCount(eEnv.readCollectionDocumentCount());  

    classifier = new MoreGenericModelReader(new Path(conf.get("modelFileName")), localFs).constructModel();
  }

  private void loadEModels(Configuration conf) throws Exception {
View Full Code Here

Examples of ivory.core.RetrievalEnvironment.readCollectionDocumentCount()

    eTok = TokenizerFactory.createTokenizer(localFs, eLang, tokenizerFile, true, conf.get("eStopword"), null, null);
    sLogger.info("Tokenizer and vocabs created successfully.");

    eScoreFn = (ScoringModel) new Bm25();
    eScoreFn.setAvgDocLength(lang2AvgSentLen.get(eLang));        //average sentence length = heuristic based on De-En data
    eScoreFn.setDocCount(env.readCollectionDocumentCount());

    dict = new DefaultFrequencySortedDictionary(new Path(env.getIndexTermsData()), new Path(env.getIndexTermIdsData()), new Path(env.getIndexTermIdMappingData()), localFs);
    dfTable = new DfTableArray(new Path(env.getDfByTermData()), localFs);
  }
View Full Code Here

Examples of ivory.core.RetrievalEnvironment.readCollectionDocumentCount()

    }
    outputPath = PwsimEnvironment.getTablesDir(workDir, fs, signatureType, numOfBits, chunkOverlapSize, numOfPermutations);

    RetrievalEnvironment targetEnv = new RetrievalEnvironment(workDir, fs);
    RetrievalEnvironment srcEnv = new RetrievalEnvironment(srcWorkDir, fs);
    int collSize = targetEnv.readCollectionDocumentCount() + srcEnv.readCollectionDocumentCount();
   
    // split table into 10 chunks by default, limit chunk size to range (100k,2m)
    chunkSize = collSize / 10;
    if (chunkSize < 100000) {
      chunkSize = 100000;
View Full Code Here

Examples of ivory.core.RetrievalEnvironment.readCollectionDocumentCount()

    FileSystem fs = FileSystem.get(job2);

    String indexPath = getConf().get("Ivory.IndexPath");
    RetrievalEnvironment env = new RetrievalEnvironment(indexPath, fs);
    int blockSize = getConf().getInt("Ivory.BlockSize", 0);
    int numDocs = env.readCollectionDocumentCount();
    int numBlocks = numDocs / blockSize + 1;

    String inputPath = null;
    for (int i = 0; i < numBlocks; i++) {
      inputPath = conf.get("Ivory.PCPOutputPath") + "/block" + i; // one block of output of PCP
View Full Code Here

Examples of ivory.core.RetrievalEnvironment.readCollectionDocumentCount()

      mDLTable = new DocLengthTable4B(env.getDoclengthsData(), fs);
    } catch (IOException e1) {
      throw new RuntimeException("Error initializing Doclengths file");
    }
    LOG.info(mDLTable.getAvgDocLength()+" is average source-language document length.");
    LOG.info(targetEnv.readCollectionDocumentCount()+" is number of target-language docs. We use the target-side DF table so we set #docs to this value in our scoring model.");

    /////// Configuration setup

    conf.set(Constants.IndexPath, indexPath);
    conf.set("Ivory.ScoringModel", scoringModel);
View Full Code Here

Examples of ivory.core.RetrievalEnvironment.readCollectionDocumentCount()

    /////// Configuration setup

    conf.set(Constants.IndexPath, indexPath);
    conf.set("Ivory.ScoringModel", scoringModel);
    conf.setFloat("Ivory.AvgDocLen", mDLTable.getAvgDocLength());
    conf.setInt(Constants.CollectionDocumentCount, targetEnv.readCollectionDocumentCount());
    conf.set(Constants.Language, getConf().get("Ivory.Lang"));
    conf.set("Ivory.Normalize", getConf().get("Ivory.Normalize"));
    conf.set("Ivory.MinNumTerms", getConf().get("Ivory.MinNumTerms"));

    conf.setNumMapTasks(300);     
View Full Code Here

Examples of ivory.core.RetrievalEnvironment.readCollectionDocumentCount()

    String collectionName = env.readCollectionName();

    int reduceTasks = conf.getInt(Constants.NumReduceTasks, 0);
    int minSplitSize = conf.getInt(Constants.MinSplitSize, 0);
    int collectionDocCnt = env.readCollectionDocumentCount();
    //int maxHeap = conf.getInt(Constants.MaxHeap, 2048);

    String postingsType = conf.get(Constants.PostingsListsType,
        PostingsListDocSortedPositional.class.getCanonicalName());
    @SuppressWarnings("unchecked")
View Full Code Here

Examples of ivory.core.RetrievalEnvironment.readCollectionDocumentCount()

    config.set("Ivory.CollectionName", targetEnv.readCollectionName()+"_"+srcEnv.readCollectionName());
    config.set("Ivory.IndexPath", targetLangDir);

    // collection size is the sum of the two collections' sizes
    int srcCollSize = srcEnv.readCollectionDocumentCount();
    int collSize = targetEnv.readCollectionDocumentCount()+srcCollSize;
    config.setInt("Ivory.CollectionDocumentCount", collSize);

    ///////Parameters/////////////
//    numOfBits = Integer.parseInt(args[2]);
View Full Code Here

Examples of ivory.core.RetrievalEnvironment.readCollectionDocumentCount()

    fScoreFn = (ScoringModel) new Bm25();
    fScoreFn.setAvgDocLength(lang2AvgSentLen.get(fLang));        

    // we use df table of English side, so we should read collection doc count from English dir
    RetrievalEnvironment eEnv = new RetrievalEnvironment(eDir, fs);
    fScoreFn.setDocCount(eEnv.readCollectionDocumentCount());  

    classifier = new MoreGenericModelReader(pathMapping.get(modelFileName), localFs).constructModel();
  }

  private void loadEModels(JobConf conf) throws Exception {
View Full Code Here
TOP
Copyright © 2018 www.massapi.com. All rights reserved.
All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.