Class BaseTagger

java.lang.Object
org.languagetool.tagging.BaseTagger
All Implemented Interfaces:
Tagger

public abstract class BaseTagger extends Object implements Tagger
Base tagger using Morfologik binary dictionaries.
  • Field Details

    • wordTagger

      protected final WordTagger wordTagger
    • conversionLocale

      protected final Locale conversionLocale
    • tagLowercaseWithUppercase

      private final boolean tagLowercaseWithUppercase
    • dictionaryPath

      private final String dictionaryPath
    • dictionary

      private final morfologik.stemming.Dictionary dictionary
  • Constructor Details

    • BaseTagger

      public BaseTagger(String filename)
      Since:
      2.9
    • BaseTagger

      public BaseTagger(String filename, Locale conversionLocale)
      Since:
      2.9
    • BaseTagger

      public BaseTagger(String filename, Locale locale, boolean tagLowercaseWithUppercase)
      Since:
      2.9
  • Method Details

    • getManualAdditionsFileName

      @Nullable public abstract @Nullable String getManualAdditionsFileName()
      Get the filename for manual additions, e.g., /en/added.txt, or null.
      Since:
      2.8
    • getManualRemovalsFileName

      @Nullable public @Nullable String getManualRemovalsFileName()
      Get the filename for manual removals, e.g., /en/removed.txt, or null.
      Since:
      3.2
    • getDictionaryPath

      public String getDictionaryPath()
      Since:
      2.9
    • overwriteWithManualTagger

      public boolean overwriteWithManualTagger()
      If true, tags from the binary dictionary (*.dict) will be overwritten by manual tags from the plain text dictionary.
      Since:
      2.9
    • getWordTagger

      protected WordTagger getWordTagger()
    • initWordTagger

      private WordTagger initWordTagger()
    • getDictionary

      protected morfologik.stemming.Dictionary getDictionary()
    • tag

      public List<AnalyzedTokenReadings> tag(List<String> sentenceTokens) throws IOException
      Description copied from interface: Tagger
      Returns a list of AnalyzedTokens that assigns each term in the sentence some kind of part-of-speech information (not necessarily just one tag).

      Note that this method takes exactly one sentence. Its implementation may implement special cases for the first word of a sentence, which is usually written with an uppercase letter.

      Specified by:
      tag in interface Tagger
      Parameters:
      sentenceTokens - the text as returned by a WordTokenizer
      Throws:
      IOException
    • getAnalyzedTokens

      protected List<AnalyzedToken> getAnalyzedTokens(String word)
    • asAnalyzedTokenList

      protected List<AnalyzedToken> asAnalyzedTokenList(String word, List<morfologik.stemming.WordData> wdList)
    • asAnalyzedTokenListForTaggedWords

      protected List<AnalyzedToken> asAnalyzedTokenListForTaggedWords(String word, List<TaggedWord> taggedWords)
    • asAnalyzedToken

      protected AnalyzedToken asAnalyzedToken(String word, morfologik.stemming.WordData wd)
    • asAnalyzedToken

      private AnalyzedToken asAnalyzedToken(String word, TaggedWord taggedWord)
    • addTokens

      private void addTokens(List<AnalyzedToken> taggedTokens, List<AnalyzedToken> l)
    • createNullToken

      public final AnalyzedTokenReadings createNullToken(String token, int startPos)
      Description copied from interface: Tagger
      Create the AnalyzedToken used for whitespace and other non-words. Use null as the POS tag for this token.
      Specified by:
      createNullToken in interface Tagger
    • createToken

      public AnalyzedToken createToken(String token, String posTag)
      Description copied from interface: Tagger
      Create a token specific to the language of the implementing class.
      Specified by:
      createToken in interface Tagger
    • additionalTags

      @Nullable protected @Nullable List<AnalyzedToken> additionalTags(String word, WordTagger wordTagger)
      Allows additional tagging in some language-dependent circumstances
      Parameters:
      word - The word to tag
      Returns:
      Returns list of analyzed tokens with additional tags, or null