class Syntax::Tokenizer
The base class of all tokenizers. It sets up the scanner and manages the looping until all tokens have been extracted. It also provides convenience methods to make sure adjacent tokens of identical groups are returned as a single token.
Constants
- EOL
Attributes
The current chunk of text being accumulated
The current group being processed by the tokenizer
Public Class Methods
Source
# File lib/syntax/common.rb, line 98 def self.delegate( sym ) define_method( sym ) { |*a| @text.__send__( sym, *a ) } end
A convenience for delegating method calls to the scanner.
Public Instance Methods
Source
# File lib/syntax/common.rb, line 57 def finish start_group nil teardown end
Finish tokenizing. This flushes the buffer, yielding any remaining text to the client.
Source
# File lib/syntax/common.rb, line 89 def option(opt) @options ? @options[opt] : nil end
Get the value of the specified option.
Source
# File lib/syntax/common.rb, line 84 def set( opts={} ) ( @options ||= Hash.new ).update opts end
Specify a set of tokenizer-specific options. Each tokenizer may (or may not) publish any options, but if a tokenizer does those options may be used to specify optional behavior.
Source
# File lib/syntax/common.rb, line 52 def setup end
Subclasses may override this method to provide implementation-specific setup logic.
Source
# File lib/syntax/common.rb, line 42 def start( text, &block ) @chunk = "".dup @group = :normal @callback = block @text = StringScanner.new( text ) setup end
Start tokenizing. This sets up the state in preparation for tokenization, such as creating a new scanner for the text and saving the callback block. The block will be invoked for each token extracted.
Source
# File lib/syntax/common.rb, line 69 def step raise NotImplementedError, "subclasses must implement #step" end
Subclasses must implement this method, which is called for each iteration of the tokenization process. This method may extract multiple tokens.
Source
# File lib/syntax/common.rb, line 64 def teardown end
Subclasses may override this method to provide implementation-specific teardown logic.
Source
# File lib/syntax/common.rb, line 75 def tokenize( text, &block ) start text, &block step until @text.eos? finish end
Begins tokenizing the given text, calling step
until the text has been exhausted.
Private Instance Methods
Source
# File lib/syntax/common.rb, line 120 def append( data ) @chunk << data end
Append the given data to the currently active chunk.
Source
# File lib/syntax/common.rb, line 143 def end_region( gr, data=nil ) flush_chunk @group = gr @callback.call( Token.new( data||"", @group, :region_close ) ) end
Source
# File lib/syntax/common.rb, line 149 def flush_chunk @callback.call( Token.new( @chunk, @group ) ) unless @chunk.empty? @chunk = "".dup end
Source
# File lib/syntax/common.rb, line 131 def start_group( gr, data=nil ) flush_chunk if gr != @group @group = gr @chunk << data if data end
Request that a new group be started. If the current group is the same as the group being requested, a new group will not be created. If a new group is created and the current chunk is not empty, the chunk’s contents will be yielded to the client as a token, and then cleared.
After the new group is started, if data
is non-nil it will be appended to the chunk.
Source
# File lib/syntax/common.rb, line 137 def start_region( gr, data=nil ) flush_chunk @group = gr @callback.call( Token.new( data||"", @group, :region_open ) ) end
Source
# File lib/syntax/common.rb, line 115 def subgroup(n) @text[n] end
Access the n-th subgroup from the most recent match.
Source
# File lib/syntax/common.rb, line 154 def subtokenize( syntax, text ) tokenizer = Syntax.load( syntax ) tokenizer.set @options if @options flush_chunk tokenizer.tokenize( text, &@callback ) end