Class UnicodeBidiAlgorithm

  • All Implemented Interfaces:
    BidiConstants

    public final class UnicodeBidiAlgorithm
    extends java.lang.Object
    implements BidiConstants

    The UnicodeBidiAlgorithm class implements functionality prescribed by the Unicode Bidirectional Algorithm, Unicode Standard Annex #9.

    This work was originally authored by Glenn Adams (gadams@apache.org).

    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      private static int convertToScalar​(int chHi, int chLo)
      Convert UTF-16 surrogate pair to unicode scalar valuee.
      private static boolean convertToScalar​(java.lang.CharSequence cs, int[] chars)
      Convert character sequence (a UTF-16 encoded string) to an array of unicode scalar values expressed as integers.
      private static int[] copySequence​(int[] ta)  
      private static int directionOfLevel​(int level)  
      private static void dump​(java.lang.String header, int[] chars, int[] classes, int defaultLevel, int[] levels)  
      private static int findNextNonRetainedFormattingLevel​(int[] wca, int[] ea, int start, int lPrev)  
      private static int[] getClasses​(int[] chars)  
      private static java.lang.String getClassName​(int bc)  
      private static int getLevelRunLength​(int[] ea, int start)  
      private static int getRetainedFormattingRunLength​(int[] wca, int start)  
      private static boolean isNeutral​(int bc)  
      private static boolean isRetainedFormatting​(int bc)  
      private static boolean isRetainedFormatting​(int[] ca, int s, int e)  
      private static boolean isStrong​(int bc)  
      private static int levelOfEmbedding​(int embedding)  
      private static int[] levelsFromEmbeddings​(int[] ea, int[] la)  
      private static int max​(int x, int y)  
      private static java.lang.String padLeft​(int n, int width)  
      private static java.lang.String padLeft​(java.lang.String s, int width)  
      private static java.lang.String padRight​(java.lang.String s, int width)  
      private static void resolveAdjacentBoundaryNeutrals​(int[] wca, int start, int end, int index, int bcNew)  
      private static void resolveExplicit​(int[] wca, int defaultLevel, int[] ea)  
      private static void resolveImplicit​(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor)  
      static int[] resolveLevels​(int[] chars, int[] classes, int defaultLevel, int[] levels, boolean useRuleL1)
      Resolve the directionality levels of each character in a character seqeunce.
      static int[] resolveLevels​(int[] chars, int defaultLevel, int[] levels)
      Resolve the directionality levels of each character in a character seqeunce.
      static int[] resolveLevels​(java.lang.CharSequence cs, Direction defaultLevel)
      Resolve the directionality levels of each character in a character seqeunce.
      private static void resolveNeutrals​(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor)  
      private static int resolveRun​(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int levelPrev)  
      private static void resolveRuns​(int[] wca, int defaultLevel, int[] ea, int[] la)  
      private static void resolveSeparators​(int[] ica, int[] wca, int dl, int[] la)
      Resolve separators and boundary neutral levels to account for UAX#9 3.4 L1 while taking into account retention of formatting codes (5.2).
      private static void resolveWeak​(int[] wca, int defaultLevel, int[] ea, int[] la, int start, int end, int level, int sor, int eor)  
      private static boolean startsWithRetainedFormattingRun​(int[] wca, int[] ea, int start)  
      private static boolean triggersBidi​(int ch)
      Determine of character CH triggers bidirectional processing.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • log

        private static final org.apache.commons.logging.Log log
        logging instance
    • Constructor Detail

      • UnicodeBidiAlgorithm

        private UnicodeBidiAlgorithm()
    • Method Detail

      • resolveLevels

        public static int[] resolveLevels​(java.lang.CharSequence cs,
                                          Direction defaultLevel)
        Resolve the directionality levels of each character in a character seqeunce. If some character is encoded in the character sequence as a Unicode Surrogate Pair, then the directionality level of each of the two members of the pair will be identical.
        Parameters:
        cs - input character sequence representing a UTF-16 encoded string
        defaultLevel - the default paragraph level, which must be zero (LR) or one (RL)
        Returns:
        null if bidirectional processing is not required; otherwise, returns an array of integers, where each integer corresponds to exactly one UTF-16 encoding element present in the input character sequence, and where each integer denotes the directionality level of the corresponding encoding element
      • resolveLevels

        public static int[] resolveLevels​(int[] chars,
                                          int defaultLevel,
                                          int[] levels)
        Resolve the directionality levels of each character in a character seqeunce.
        Parameters:
        chars - array of input characters represented as unicode scalar values
        defaultLevel - the default paragraph level, which must be zero (LR) or one (RL)
        levels - array to receive levels, one for each character in chars array
        Returns:
        null if bidirectional processing is not required; otherwise, returns an array of integers, where each integer corresponds to exactly one UTF-16 encoding element present in the input character sequence, and where each integer denotes the directionality level of the corresponding encoding element
      • resolveLevels

        public static int[] resolveLevels​(int[] chars,
                                          int[] classes,
                                          int defaultLevel,
                                          int[] levels,
                                          boolean useRuleL1)
        Resolve the directionality levels of each character in a character seqeunce.
        Parameters:
        chars - array of input characters represented as unicode scalar values
        classes - array containing one bidi class per character in chars array
        defaultLevel - the default paragraph level, which must be zero (LR) or one (RL)
        levels - array to receive levels, one for each character in chars array
        useRuleL1 - true if rule L1 should be used
        Returns:
        null if bidirectional processing is not required; otherwise, returns an array of integers, where each integer corresponds to exactly one UTF-16 encoding element present in the input character sequence, and where each integer denotes the directionality level of the corresponding encoding element
      • copySequence

        private static int[] copySequence​(int[] ta)
      • resolveExplicit

        private static void resolveExplicit​(int[] wca,
                                            int defaultLevel,
                                            int[] ea)
      • directionOfLevel

        private static int directionOfLevel​(int level)
      • levelOfEmbedding

        private static int levelOfEmbedding​(int embedding)
      • levelsFromEmbeddings

        private static int[] levelsFromEmbeddings​(int[] ea,
                                                  int[] la)
      • resolveRuns

        private static void resolveRuns​(int[] wca,
                                        int defaultLevel,
                                        int[] ea,
                                        int[] la)
      • findNextNonRetainedFormattingLevel

        private static int findNextNonRetainedFormattingLevel​(int[] wca,
                                                              int[] ea,
                                                              int start,
                                                              int lPrev)
      • getLevelRunLength

        private static int getLevelRunLength​(int[] ea,
                                             int start)
      • startsWithRetainedFormattingRun

        private static boolean startsWithRetainedFormattingRun​(int[] wca,
                                                               int[] ea,
                                                               int start)
      • getRetainedFormattingRunLength

        private static int getRetainedFormattingRunLength​(int[] wca,
                                                          int start)
      • resolveRun

        private static int resolveRun​(int[] wca,
                                      int defaultLevel,
                                      int[] ea,
                                      int[] la,
                                      int start,
                                      int end,
                                      int level,
                                      int levelPrev)
      • resolveWeak

        private static void resolveWeak​(int[] wca,
                                        int defaultLevel,
                                        int[] ea,
                                        int[] la,
                                        int start,
                                        int end,
                                        int level,
                                        int sor,
                                        int eor)
      • resolveNeutrals

        private static void resolveNeutrals​(int[] wca,
                                            int defaultLevel,
                                            int[] ea,
                                            int[] la,
                                            int start,
                                            int end,
                                            int level,
                                            int sor,
                                            int eor)
      • resolveAdjacentBoundaryNeutrals

        private static void resolveAdjacentBoundaryNeutrals​(int[] wca,
                                                            int start,
                                                            int end,
                                                            int index,
                                                            int bcNew)
      • resolveImplicit

        private static void resolveImplicit​(int[] wca,
                                            int defaultLevel,
                                            int[] ea,
                                            int[] la,
                                            int start,
                                            int end,
                                            int level,
                                            int sor,
                                            int eor)
      • resolveSeparators

        private static void resolveSeparators​(int[] ica,
                                              int[] wca,
                                              int dl,
                                              int[] la)
        Resolve separators and boundary neutral levels to account for UAX#9 3.4 L1 while taking into account retention of formatting codes (5.2).
        Parameters:
        ica - original input class array (sequence)
        wca - working copy of original intput class array (sequence), as modified by prior steps
        dl - default paragraph level
        la - array of output levels to be adjusted, as produced by bidi algorithm
      • isStrong

        private static boolean isStrong​(int bc)
      • isNeutral

        private static boolean isNeutral​(int bc)
      • isRetainedFormatting

        private static boolean isRetainedFormatting​(int bc)
      • isRetainedFormatting

        private static boolean isRetainedFormatting​(int[] ca,
                                                    int s,
                                                    int e)
      • max

        private static int max​(int x,
                               int y)
      • getClasses

        private static int[] getClasses​(int[] chars)
      • convertToScalar

        private static boolean convertToScalar​(java.lang.CharSequence cs,
                                               int[] chars)
                                        throws java.lang.IllegalArgumentException
        Convert character sequence (a UTF-16 encoded string) to an array of unicode scalar values expressed as integers. If a valid UTF-16 surrogate pair is encountered, it is converted to two integers, the first being the equivalent unicode scalar value, and the second being negative one (-1). This special mechanism is used to track the use of surrogate pairs while working with unicode scalar values, and permits maintaining indices that apply both to the input UTF-16 and out scalar value sequences.
        Parameters:
        cs - a UTF-16 encoded character sequence
        chars - an integer array to accept the converted scalar values, where the length of the array must be the same as the length of the input character sequence
        Returns:
        a boolean indicating that content is present that triggers bidirectional processing
        Throws:
        java.lang.IllegalArgumentException - if the input sequence is not a valid UTF-16 string, e.g., if it contains an isolated UTF-16 surrogate
      • convertToScalar

        private static int convertToScalar​(int chHi,
                                           int chLo)
        Convert UTF-16 surrogate pair to unicode scalar valuee.
        Parameters:
        chHi - high (most significant or first) surrogate
        chLo - low (least significant or second) surrogate
        Returns:
        a unicode scalar value
        Throws:
        java.lang.IllegalArgumentException - if one of the input surrogates is not valid
      • triggersBidi

        private static boolean triggersBidi​(int ch)
        Determine of character CH triggers bidirectional processing. Bidirectional processing is deemed triggerable if CH is a strong right-to-left character, an arabic letter or number, or is a right-to-left embedding or override character.
        Parameters:
        ch - a unicode scalar value
        Returns:
        true if character triggers bidirectional processing
      • dump

        private static void dump​(java.lang.String header,
                                 int[] chars,
                                 int[] classes,
                                 int defaultLevel,
                                 int[] levels)
      • getClassName

        private static java.lang.String getClassName​(int bc)
      • padLeft

        private static java.lang.String padLeft​(int n,
                                                int width)
      • padLeft

        private static java.lang.String padLeft​(java.lang.String s,
                                                int width)
      • padRight

        private static java.lang.String padRight​(java.lang.String s,
                                                 int width)