Examples of com.ibm.icu.text.CharsetMatch

com.ibm.icu.text.CharsetMatch
This class represents a charset that has been identified by a CharsetDetector as a possible encoding for a set of input data. From an instance of this class, you can ask for a confidence level in the charset identification, or for Java Reader or String to access the original byte data in Unicode form.
Instances of this class are created only by CharsetDetectors.
Note: this class has a natural ordering that is inconsistent with equals. The natural ordering is based on the match confidence value. @stable ICU 3.4

      if (usedDecoder == null) {
        CharsetDetector detector = new CharsetDetector();
        detector.enableInputFilter(filtered);
        byte[] data = buffer.toByteArray();
        detector.setText(data);
        CharsetMatch cm = detector.detect();
        try {
          usedDecoder = Charset.forName(cm == null ? "ISO-8859-1" : cm.getName()).newDecoder();
        } catch (UnsupportedCharsetException ex) {
          usedDecoder = Charset.forName("ISO-8859-1").newDecoder();
        }
        usedDecoder.onUnmappableCharacter(unmappableCharacterAction());
        usedDecoder.onMalformedInput(malformedInputAction());

View Full Code Here

    }


    public static Reader readerWithCharsetDetect(InputStream is) {
        CharsetDetector detector = new CharsetDetector();
        try {
            CharsetMatch match = detector.setText(is).detect();
            is.reset();
            return new InputStreamReader(is, match.getName());
        } catch (IOException e) {
            e.printStackTrace();
            try {
                is.reset();
            } catch (IOException e1) {

View Full Code Here

    
    public Encoding sniff() throws IOException {
        try {
            CharsetDetector detector = new CharsetDetector();
            detector.setText(this);
            CharsetMatch match = detector.detect();
            Encoding enc = Encoding.forName(match.getName());
            Encoding actual = enc.getActualHtmlEncoding();
            if (actual != null) {
                enc = actual;
            }
            if (enc != Encoding.WINDOWS1252 && enc.isAsciiSuperset()) {

View Full Code Here

0 1 2 3

TOP

Related Classes of com.ibm.icu.text.CharsetMatch

com.ibm.icu.dev.demo.charsetdet.DetectingViewer

com.ibm.icu.dev.test.charsetdet.TestCharsetDetector

net.sf.jmatchparser.util.charset.icu4jchardet.ICU4JChardetCharset$Decoder

net.vidageek.crawler.component.WebDownloader

nu.validator.htmlparser.extra.IcuDetectorSniffer

org.apache.marmotta.platform.core.services.importer.ImportWatchServiceImpl

org.apache.maven.doxia.DefaultConverter

org.apache.shindig.gadgets.encoding.EncodingDetector

org.apache.shindig.gadgets.encoding.EncodingDetector$FallbackEncodingDetector

org.apache.stanbol.enhancer.engines.htmlextractor.impl.CharsetRecognizer

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.