java.lang.Object
org.mozilla.universalchardet.UniversalDetector

public class UniversalDetector extends Object
  • Field Details

    • SHORTCUT_THRESHOLD

      public static final float SHORTCUT_THRESHOLD
      See Also:
    • MINIMUM_THRESHOLD

      public static final float MINIMUM_THRESHOLD
      See Also:
    • inputState

      private UniversalDetector.InputState inputState
    • done

      private boolean done
    • start

      private boolean start
    • gotData

      private boolean gotData
    • onlyPrintableASCII

      private boolean onlyPrintableASCII
    • lastChar

      private byte lastChar
    • detectedCharset

      private String detectedCharset
    • probers

      private CharsetProber[] probers
    • escCharsetProber

      private CharsetProber escCharsetProber
    • listener

      private CharsetListener listener
  • Constructor Details

    • UniversalDetector

      public UniversalDetector()
    • UniversalDetector

      public UniversalDetector(CharsetListener listener)
      Parameters:
      listener - a listener object that is notified of the detected encocoding. Can be null.
  • Method Details

    • isDone

      public boolean isDone()
    • getDetectedCharset

      public String getDetectedCharset()
      Returns:
      The detected encoding is returned. If the detector couldn't determine what encoding was used, null is returned.
    • setListener

      public void setListener(CharsetListener listener)
    • getListener

      public CharsetListener getListener()
    • handleData

      public void handleData(byte[] buf)
      Feed the detector with more data
      Parameters:
      buf - The buffer containing the data
    • handleData

      public void handleData(byte[] buf, int offset, int length)
      Feed the detector with more data
      Parameters:
      buf - Buffer with the data
      offset - initial position of data in buf
      length - length of data
    • detectCharsetFromBOM

      public static String detectCharsetFromBOM(byte[] buf)
    • detectCharsetFromBOM

      private static String detectCharsetFromBOM(byte[] buf, int offset)
    • dataEnd

      public void dataEnd()
      Marks end of data reading. Finish calculations.
    • reset

      public final void reset()
      Resets detector to be used again.
    • detectCharset

      public static String detectCharset(File file) throws IOException
      Gets the charset of a File.
      Parameters:
      file - The file to check charset for
      Returns:
      The charset of the file, null if cannot be determined
      Throws:
      IOException - if some IO error occurs
    • detectCharset

      public static String detectCharset(Path path) throws IOException
      Gets the charset of a Path.
      Parameters:
      path - The path to file to check charset for
      Returns:
      The charset of the file, null if cannot be determined
      Throws:
      IOException - if some IO error occurs
    • detectCharset

      public static String detectCharset(InputStream inputStream) throws IOException
      Gets the charset of content from InputStream.
      Parameters:
      inputStream - InputStream containing text file
      Returns:
      The charset of the file, null if cannot be determined
      Throws:
      IOException - if some IO error occurs