Examples of HtmlParser

appl.Portal.Utils.LinkSearch.HtmlParser
br.com.caelum.tubaina.parser.html.HtmlParser
br.com.caelum.tubaina.parser.html.desktop.HtmlParser
cn.edu.hfut.dmic.webcollector.parser.HtmlParser
默认的网页解析器 @author hu
com.flaptor.util.parser.HtmlParser
com.google.dart.engine.html.parser.HtmlParser
Instances of the class {@code HtmlParser} are used to parse tokens into a AST structure comprisedof {@link XmlNode}s. @coverage dart.engine.html
com.google.gwt.thirdparty.streamhtmlparser.HtmlParser
com.salas.bb.utils.htmlparser.HtmlParser
Simplpified and fast parser of HTML that detects text, tags and entities separately.
com.scraper.parser.HTMLParser
com.substanceofcode.utils.HTMLParser
Simple and lightweight HTML parser without complete error handling. @author Irving Bunton
de.mhus.lib.parser.HtmlParser
@author hummel
de.spotnik.util.html.HTMLParser
HTMLParser. @author Jens Rehp�hler @since 26.08.2006
edu.stanford.nlp.web.HTMLParser
Parses an HTML document and returns the plain text (and title). The main thing that HTMLParser is used for is the parse(String url) method, which will return a String with the contents of an HTML page, without the tags. After calling parse, you can get the HTML title (contents of the TITLE tag) by calling title(). Subclasses may override the handleText(), handleComment(), handleStartTag(), etc. methods so that parse(String url) returns something other than the text of the web page. (For example, one may be interested in returning only part of the text, or only the links.) @author Sepandar Kamvar (sdkamvar@stanford.edu)
nu.validator.htmlparser.sax.HtmlParser
This class implements an HTML5 parser that exposes data through the SAX2 interface.
By default, when using the constructor without arguments, the this parser coerces XML 1.0-incompatible infosets into XML 1.0-compatible infosets. This corresponds to ALTER_INFOSET as the general XML violation policy. To make the parser support non-conforming HTML fully per the HTML 5 spec while on the other hand potentially violating the SAX2 API contract, set the general XML violation policy to ALLOW. It is possible to treat XML 1.0 infoset violations as fatal by setting the general XML violation policy to FATAL.
By default, this parser doesn't do true streaming but buffers everything first. The parser can be made truly streaming by calling setStreamabilityViolationPolicy(XmlViolationPolicy.FATAL). This has the consequence that errors that require non-streamable recovery are treated as fatal.
By default, in order to make the parse events emulate the parse events for a DTDless XML document, the parser does not report the doctype through LexicalHandler. Doctype reporting through LexicalHandler can be turned on by calling setReportingDoctype(true). @version $Id$ @author hsivonen
org.ajax4jsf.webapp.HtmlParser
org.apache.droids.parse.html.HtmlParser
@version 1.0
org.apache.jmeter.protocol.http.parser.HTMLParser
HtmlParsers can parse HTML content to obtain URLs.
org.apache.lenya.lucene.html.HTMLParser
HTML Parser
org.apache.lenya.lucene.parser.HTMLParser
org.apache.lucene.demo.html.HTMLParser
org.apache.nutch.parse.html.HtmlParser
org.apache.stanbol.enhancer.engines.htmlextractor.impl.HtmlParser
HtmlParser.java @author Walter Kasper
org.apache.tika.parser.html.HtmlParser
HTML parser. Uses TagSoup to turn the input document to HTML SAX events, and post-processes the events to produce XHTML and metadata expected by Tika clients.
org.jasen.interfaces.HTMLParser

Parses the HTML part of a message.
@author Jason Polites
org.lobobrowser.html.parser.HtmlParser
rabbit.html.HtmlParser
This is a class that is used to parse a block of HTML code into separate tokens. This parser uses a recursive descent approach. @author Robert Olofsson
railo.runtime.search.lucene2.html.HTMLParser
saveReddit.parser.htmlParser
uk.ac.ucl.panda.utility.parser.HTMLParser
HTML Parsing Interfacew for test purposes
vmcreative.htmlparser.HTMLParser

Examples of org.apache.tika.parser.html.HtmlParser

      Multipart mp = (Multipart) p.getContent();
      int count = mp.getCount();
      for (int i = 0; i < count; i++)
        content.append(getContentFromHTML(mp.getBodyPart(i)));
    } else if (p.isMimeType("text/html")) {
      HtmlParser parser = new HtmlParser();
      Metadata met = new Metadata();
      TextContentHandler handler = new TextContentHandler(
          new BodyContentHandler());
      parser.parse(new ByteArrayInputStream(((String) p.getContent())
          .getBytes()), handler, met);
      content.append(handler.toString());
    } else {
      Object obj = p.getContent();
      if (obj instanceof Part)

View Full Code Here

Examples of org.apache.tika.parser.html.HtmlParser

        StringTokenizer tokenizer = new StringTokenizer(classes, ", \t\n\r\f");
        while (tokenizer.hasMoreTokens()) {
            String name = tokenizer.nextToken();
            if (name.equals(
                    "org.apache.jackrabbit.extractor.HTMLTextExtractor")) {
                parsers.put("text/html", new HtmlParser());
            } else if (name.equals(
                    "org.apache.jackrabbit.extractor.MsExcelTextExtractor")) {
                Parser parser = new OfficeParser();
                parsers.put("application/vnd.ms-excel", parser);
                parsers.put("application/msexcel", parser);

View Full Code Here

Examples of org.jasen.interfaces.HTMLParser

        String[] tokens = null;
        ParserData data = null;


        int counter = 1;


        HTMLParser htmlParser = null;


        System.out.println ("Scanning " + files.length + " files");
        for (int i = 0; i < files.length; i++)
        {
            try
            {
                htmlParser = (HTMLParser)htmlParserClass.newInstance();


                mm = getMimeMessage(files[i]);
                message = mimeParser.parse(mm);
                data = htmlParser.parse(mm, message, tokenizer);


                if(learn(data, type)) {
                    count++;
                }

View Full Code Here

Examples of org.lobobrowser.html.parser.HtmlParser

        
        Reader reader = new InputStreamReader(in);
        Document document = builder.newDocument();
        
        try {
            HtmlParser parser = new HtmlParser(new SimpleUserAgentContext(), document);
            parser.parse(reader);
        } catch (Exception e) {
            logger.error(e, e);
        }


        in.close();

View Full Code Here

Examples of org.lobobrowser.html.parser.HtmlParser

          {
             
              Document document = builder.newDocument();
              
              // Here is where we use Cobra's HTML parser.            
              HtmlParser parser = new HtmlParser(uacontext, document);
              
              parser.parse(bin);
              
              
              
              /*
               *

View Full Code Here

Examples of rabbit.html.HtmlParser

  long size = f.length ();
  FileInputStream fis = new FileInputStream (f);
  DataInputStream dis = new DataInputStream (fis);
  byte[] buf = new byte[(int)size];
  dis.readFully (buf);
  HtmlParser parser = new HtmlParser ();
  parser.setText (buf);
  HtmlBlock block = parser.parse ();
  for (Token t : block.getTokens ()) {
      System.out.print ("t.type: " + t.getType ());
      if (t.getType () == TokenType.TAG)
    System.out.print (", tag: " + t.getTag ().getType ());
      System.out.println ();

View Full Code Here

Examples of rabbit.html.HtmlParser

      response.removeHeader ("Content-Length");
      /* Not sure why we would need this, used to be in rabbit/2.x
      if (!con.getChunking ())
    con.setKeepalive (false);
      */
      parser = new HtmlParser ();
      filters = initFilters ();
  }
    }

View Full Code Here

Examples of railo.runtime.search.lucene2.html.HTMLParser

  
  public static Document getDocument(Resource res,String charset)  {
    Document doc = new Document();
    doc.add(FieldUtil.Text("uid", uid(res), false));
    
    HTMLParser parser = new HTMLParser();
    try {
      parser.parse(res,charset);
    } 
    catch (Throwable t) {
        return doc;
    }
    addContent(doc,parser);

View Full Code Here

Examples of saveReddit.parser.htmlParser

  Integer totalImages;
  
  public Sorter(MainWindow pMW, fileIO pFileIO, jsonParser pJsonParser, LinkedList<JSONObject> pContent) {
    mw = pMW;
    fileIO = pFileIO;
    htmlParser = new htmlParser(mw);
    jsonParser = pJsonParser;
    
    content = pContent;
    contentSelf = new LinkedList<JSONObject>();
    contentImages = new LinkedList<JSONObject>();

View Full Code Here

Examples of uk.ac.ucl.panda.utility.parser.HTMLParser


       appProp.setProperty("doc.maker.forever", "false");
    
       Config config = new Config(appProp);
       docMaker.setConfig(config); 
       HTMLParser htmlParser = (HTMLParser) Class.forName(config.get("html.parser","uk.ac.ucl.panda.applications.demo.DemoHTMLParser")).newInstance();
       docMaker.setHTMLParser(htmlParser);
       
       IndexWriter writer = new IndexWriter(indexDir, 
          new PorterStemAnalyzer(), true);
      writer.setUseCompoundFile(false);

View Full Code Here

0 1 2 3 4 5 6

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.