Examples of TokenStream


Examples of org.apache.lucene.analysis.TokenStream

    return mmsegTokenizer;
  }

  @Override
  public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream ts = new MMSegTokenizer(newSeg(), reader);
    return ts;
  }
View Full Code Here

Examples of org.apache.lucene.analysis.TokenStream

        String line = null;
        long start = System.currentTimeMillis();
        while((line = reader.readLine()) != null) {
          bw.append("--------------------------").append("\r\n");;
          bw.append(line).append("\r\n");
          TokenStream ts = analyzer.tokenStream("text", new StringReader(line));
          for(Token t= new Token(); (t=TokenUtils.nextToken(ts, t)) !=null;) {
            bw.append(new String(t.termBuffer(), 0, t.termLength())).append(" | ");
          }
          bw.append("\r\n");
        }
View Full Code Here

Examples of org.apache.lucene.analysis.TokenStream

    for(int i=0; i<n; i++) {
      for(File txt : txts) {
        FileInputStream ftxt = new FileInputStream(txt);
        int s = ftxt.available();
        size += s;
        TokenStream ts = analyzer.tokenStream("text", new InputStreamReader(ftxt));
        OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream(new File(txt.getAbsoluteFile()+"."+outputChipName+".word")));
        BufferedWriter bw = new BufferedWriter(osw);
        long start = System.currentTimeMillis();
        for(Token t= new Token(); (t=TokenUtils.nextToken(ts, t)) !=null;) {
          bw.append(new String(t.term())).append("\r\n");
View Full Code Here

Examples of org.apache.lucene.analysis.TokenStream

  public void testChineseAnalyzer() throws IOException {
    Token nt = new Token();
    Analyzer ca = new SmartChineseAnalyzer(true);
    Reader sentence = new StringReader("我购买了道具和服装。");
    String[] result = { "我", "购买", "了", "道具", "和", "服装" };
    TokenStream ts = ca.tokenStream("sentence", sentence);
    int i = 0;
    nt = ts.next(nt);
    while (nt != null) {
      assertEquals(result[i], nt.term());
      i++;
      nt = ts.next(nt);
    }
    ts.close();
  }
View Full Code Here

Examples of org.apache.lucene.analysis.TokenStream

        "我从小就不由自主地认为自己长大以后一定得成为一个象我父亲一样的画家, 可能是父母潜移默化的影响。其实我根本不知道作为画家意味着什么,我是否喜欢,最重要的是否适合我,我是否有这个才华。其实人到中年的我还是不确定我最喜欢什么,最想做的是什么?我相信很多人和我一样有同样的烦恼。毕竟不是每个人都能成为作文里的宇航员,科学家和大教授。知道自己适合做什么,喜欢做什么,能做好什么其实是个非常困难的问题。"
            + "幸运的是,我想我的孩子不会为这个太过烦恼。通过老大,我慢慢发现美国高中的一个重要功能就是帮助学生分析他们的专长和兴趣,从而帮助他们选择大学的专业和未来的职业。我觉得帮助一个未成形的孩子找到她未来成长的方向是个非常重要的过程。"
            + "美国高中都有专门的职业顾问,通过接触不同的课程,和各种心理,个性,兴趣很多方面的问答来帮助每个学生找到最感兴趣的专业。这样的教育一般是要到高年级才开始, 可老大因为今年上计算机的课程就是研究一个职业走向的软件项目,所以她提前做了这些考试和面试。看来以后这样的教育会慢慢由电脑来测试了。老大带回家了一些试卷,我挑出一些给大家看看。这门课她花了2个多月才做完,这里只是很小的一部分。"
            + "在测试里有这样的一些问题:"
            + "你是个喜欢动手的人吗? 你喜欢修东西吗?你喜欢体育运动吗?你喜欢在室外工作吗?你是个喜欢思考的人吗?你喜欢数学和科学课吗?你喜欢一个人工作吗?你对自己的智力自信吗?你的创造能力很强吗?你喜欢艺术,音乐和戏剧吗?  你喜欢自由自在的工作环境吗?你喜欢尝试新的东西吗? 你喜欢帮助别人吗?你喜欢教别人吗?你喜欢和机器和工具打交道吗?你喜欢当领导吗?你喜欢组织活动吗?你什么和数字打交道吗?");
    TokenStream ts = ca.tokenStream("sentence", sentence);

    System.out.println("start: " + (new Date()));
    long before = System.currentTimeMillis();
    nt = ts.next(nt);
    while (nt != null) {
      System.out.println(nt.term());
      nt = ts.next(nt);
    }
    ts.close();
    long now = System.currentTimeMillis();
    System.out.println("time: " + (now - before) / 1000.0 + " s");
  }
View Full Code Here

Examples of org.apache.lucene.analysis.TokenStream

    this.stopWords = stopWords;
    wordSegment = new WordSegmenter();
  }

  public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new SentenceTokenizer(reader);
    result = new WordTokenizer(result, wordSegment);
    // result = new LowerCaseFilter(result);
    // 不再需要LowerCaseFilter,因为SegTokenFilter已经将所有英文字符转换成小写
    // stem太严格了, This is not bug, this feature:)
    result = new PorterStemFilter(result);
View Full Code Here

Examples of org.apache.lucene.analysis.TokenStream

   *         {@link org.apache.lucene.document.Document}
   * @throws IOException if there was an error loading
   */
  public static TokenStream getAnyTokenStream(IndexReader reader, int docId,
      String field, Document doc, Analyzer analyzer) throws IOException {
    TokenStream ts = null;

    TermFreqVector tfv = reader.getTermFreqVector(docId, field);
    if (tfv != null) {
      if (tfv instanceof TermPositionVector) {
        ts = getTokenStream((TermPositionVector) tfv);
View Full Code Here

Examples of org.apache.lucene.analysis.TokenStream

   * @return null if field not stored correctly
   * @throws IOException
   */
  public static TokenStream getAnyTokenStream(IndexReader reader, int docId,
      String field, Analyzer analyzer) throws IOException {
    TokenStream ts = null;

    TermFreqVector tfv = reader.getTermFreqVector(docId, field);
    if (tfv != null) {
      if (tfv instanceof TermPositionVector) {
        ts = getTokenStream((TermPositionVector) tfv);
View Full Code Here

Examples of org.apache.lucene.analysis.TokenStream

        TopDocs hits = indexSearcher.search(query, 1);
        assertEquals(1, hits.totalHits);
        final Highlighter highlighter = new Highlighter(
            new SimpleHTMLFormatter(), new SimpleHTMLEncoder(),
            new QueryScorer(query));
        final TokenStream tokenStream = TokenSources
            .getTokenStream(
                (TermPositionVector) indexReader.getTermFreqVector(0, FIELD),
                false);
        assertEquals("<B>the fox</B> did not jump",
            highlighter.getBestFragment(tokenStream, TEXT));
View Full Code Here

Examples of org.apache.lucene.analysis.TokenStream

        TopDocs hits = indexSearcher.search(query, 1);
        assertEquals(1, hits.totalHits);
        final Highlighter highlighter = new Highlighter(
            new SimpleHTMLFormatter(), new SimpleHTMLEncoder(),
            new QueryScorer(query));
        final TokenStream tokenStream = TokenSources
            .getTokenStream(
                (TermPositionVector) indexReader.getTermFreqVector(0, FIELD),
                false);
        assertEquals("<B>the fox</B> did not jump",
            highlighter.getBestFragment(tokenStream, TEXT));
View Full Code Here
TOP
Copyright © 2018 www.massapi.com. All rights reserved.
All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.