Examples of org.apache.xml.serialize.BaseMarkupSerializer

org.apache.xml.serialize.BaseMarkupSerializer
Base class for a serializer supporting both DOM and SAX pretty serializing of XML/HTML/XHTML documents. Derives classes perform the method-specific serializing, this class provides the common serializing mechanisms.
The serializer must be initialized with the proper writer and output format before it can be used by calling {@link #setOutputCharStream} or {@link #setOutputByteStream} for the writer and {@link #setOutputFormat}for the output format.
The serializer can be reused any number of times, but cannot be used concurrently by two threads.
If an output stream is used, the encoding is taken from the output format (defaults to UTF-8). If a writer is used, make sure the writer uses the same encoding (if applies) as specified in the output format.
The serializer supports both DOM and SAX. DOM serializing is done by calling {@link #serialize(Document)} and SAX serializing is done by firingSAX events and using the serializer as a document handler. This also applies to derived class.
If an I/O exception occurs while serializing, the serializer will not throw an exception directly, but only throw it at the end of serializing (either DOM or SAX's {@link org.xml.sax.DocumentHandler#endDocument}.
For elements that are not specified as whitespace preserving, the serializer will potentially break long text lines at space boundaries, indent lines, and serialize elements on separate lines. Line terminators will be regarded as spaces, and spaces at beginning of line will be stripped.
When indenting, the serializer is capable of detecting seemingly element content, and serializing these elements indented on separate lines. An element is serialized indented when it is the first or last child of an element, or immediate following or preceding another element. @version $Revision: 1.46 $ $Date: 2003/07/30 11:14:08 $ @author Assaf Arkin @author Rahul Srivastava @author Elena Litani, IBM @see Serializer @see DOMSerializer

        String xpathExpr = params.get(ExtractingParams.XPATH_EXPRESSION);
        boolean extractOnly = params.getBool(ExtractingParams.EXTRACT_ONLY, false);
        ContentHandler parsingHandler = handler;


        StringWriter writer = null;
        BaseMarkupSerializer serializer = null;
        if (extractOnly == true) {
          String extractFormat = params.get(ExtractingParams.EXTRACT_FORMAT, "xml");
          writer = new StringWriter();
          if (extractFormat.equals(TEXT_FORMAT)) {
            serializer = new TextSerializer();
            serializer.setOutputCharStream(writer);
            serializer.setOutputFormat(new OutputFormat("Text", "UTF-8", true));
          } else {
            serializer = new XMLSerializer(writer, new OutputFormat("XML", "UTF-8", true));
          }
          if (xpathExpr != null) {
            Matcher matcher =
                    PARSER.parse(xpathExpr);
            serializer.startDocument();//The MatchingContentHandler does not invoke startDocument.  See http://tika.markmail.org/message/kknu3hw7argwiqin
            parsingHandler = new MatchingContentHandler(serializer, matcher);
          } else {
            parsingHandler = serializer;
          }
        } else if (xpathExpr != null) {
          Matcher matcher =
                  PARSER.parse(xpathExpr);
          parsingHandler = new MatchingContentHandler(handler, matcher);
        } //else leave it as is


        try{
          //potentially use a wrapper handler for parsing, but we still need the SolrContentHandler for getting the document.
          ParseContext context = new ParseContext();//TODO: should we design a way to pass in parse context?
          parser.parse(inputStream, parsingHandler, metadata, context);
        } catch (TikaException e) {
          if(ignoreTikaException)
            log.warn(new StringBuilder("skip extracting text due to ").append(e.getLocalizedMessage())
                .append(". metadata=").append(metadata.toString()).toString());
          else
            throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
        }
        if (extractOnly == false) {
          addDoc(handler);
        } else {
          //serializer is not null, so we need to call endDoc on it if using xpath
          if (xpathExpr != null){
            serializer.endDocument();
          }
          rsp.add(stream.getName(), writer.toString());
          writer.close();
          String[] names = metadata.names();
          NamedList metadataNL = new NamedList();

View Full Code Here

        String xpathExpr = params.get(ExtractingParams.XPATH_EXPRESSION);
        boolean extractOnly = params.getBool(ExtractingParams.EXTRACT_ONLY, false);
        ContentHandler parsingHandler = handler;


        StringWriter writer = null;
        BaseMarkupSerializer serializer = null;
        if (extractOnly == true) {
          String extractFormat = params.get(ExtractingParams.EXTRACT_FORMAT, "xml");
          writer = new StringWriter();
          if (extractFormat.equals(TEXT_FORMAT)) {
            serializer = new TextSerializer();
            serializer.setOutputCharStream(writer);
            serializer.setOutputFormat(new OutputFormat("Text", "UTF-8", true));
          } else {
            serializer = new XMLSerializer(writer, new OutputFormat("XML", "UTF-8", true));
          }
          if (xpathExpr != null) {
            Matcher matcher =
                    PARSER.parse(xpathExpr);
            serializer.startDocument();//The MatchingContentHandler does not invoke startDocument.  See http://tika.markmail.org/message/kknu3hw7argwiqin
            parsingHandler = new MatchingContentHandler(serializer, matcher);
          } else {
            parsingHandler = serializer;
          }
        } else if (xpathExpr != null) {
          Matcher matcher =
                  PARSER.parse(xpathExpr);
          parsingHandler = new MatchingContentHandler(handler, matcher);
        } //else leave it as is


        //potentially use a wrapper handler for parsing, but we still need the SolrContentHandler for getting the document.
        parser.parse(inputStream, parsingHandler, metadata);
        if (extractOnly == false) {
          addDoc(handler);
        } else {
          //serializer is not null, so we need to call endDoc on it if using xpath
          if (xpathExpr != null){
            serializer.endDocument();
          }
          rsp.add(stream.getName(), writer.toString());
          writer.close();
          String[] names = metadata.names();
          NamedList metadataNL = new NamedList();

View Full Code Here

TOP

Related Classes of org.apache.xml.serialize.BaseMarkupSerializer

org.apache.solr.handler.extraction.ExtractingDocumentLoader

org.apache.xerces.dom.DOMLocatorImpl

java.lang.reflect.Method

java.io.OutputStreamWriter

org.w3c.dom.ls.LSException

org.w3c.dom.Entity

org.w3c.dom.DOMImplementation

org.w3c.dom.Notation

org.w3c.dom.NamedNodeMap

org.w3c.dom.Node

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.