class Text::Reform
Introduction¶ ↑
Text::Reform
class is a rewrite from the perl module with the same name by Damian Conway (damian@conway.org). Much of this documentation has been copied from the original documentation and adapted to the Ruby version.
The interface is subject to change, since it will undergo major Rubyfication.
Synopsis¶ ↑
require 'text/reform' f = Text::Reform.new puts f.format(template, data)
Description¶ ↑
The Reform#format
method¶ ↑
Reform#format
takes a series of format (or “picture”) strings followed by replacement values, interpolates those values into each picture string, and returns the result.
A picture string consists of sequences of the following characters:
- <
-
Left-justified field indicator. A series of two or more sequential +<+'s specify a left-justified field to be filled by a subsequent value. A single +<+ is formatted as the literal character '<'.
- >
-
Right-justified field indicator. A series of two or more sequential >'s specify a right-justified field to be filled by a subsequent value. A single < is formatted as the literal character '<'.
- <<>>
-
Fully-justified field indicator. Field may be of any width, and brackets need not balance, but there must be at least 2 '<' and 2 '>'.
- ^
-
Centre-justified field indicator. A series of two or more sequential ^'s specify a centred field to be filled by a subsequent value. A single ^ is formatted as the literal character '<'.
- >>.<<<<
-
A numerically formatted field with the specified number of digits to either side of the decimal place. See _Numerical formatting_ below.
- [
-
Left-justified block field indicator. Just like a < field, except it repeats as required on subsequent lines. See below. A single [ is formatted as the literal character '['.
- ]
-
Right-justified block field indicator. Just like a > field, except it repeats as required on subsequent lines. See below. A single ] is formatted as the literal character ']'.
- [[]]
-
Fully-justified block field indicator. Just like a <<<>>> field, except it repeats as required on subsequent lines. See below. Field may be of any width, and brackets need not balance, but there must be at least 2 '[' and 2 ']'.
- |
-
Centre-justified block field indicator. Just like a ^ field, except it repeats as required on subsequent lines. See below. A single | is formatted as the literal character '|'.
- ]]].[[[[
-
A numerically formatted block field with the specified number of digits to either side of the decimal place. Just like a +>>>.<<<<+ field, except it repeats as required on subsequent lines. See below.
- ~
-
A one-character wide block field.
- \
-
Literal escape of next character (e.g. ++ is formatted as '~', not a one character wide block field).
- Any other character
-
That literal character.
Any substitution value which is nil
(either explicitly so, or because it is missing) is replaced by an empty string.
Controlling Reform
instance options¶ ↑
There are several ways to influence options set in the Reform
instance:
-
At creation:
# using a hash r1 = Text::Reform.new(:squeeze => true) # using a block r2 = Text::Reform.new do |rf| rf.squeeze = true rf.fill = true end
-
Using accessors:
r = Text::Reform.new r.squeeze = true r.fill = true
The Perl way of interleaving option changes with picture strings and data is currently NOT supported.
Controlling line filling¶ ↑
squeeze
replaces sequences of spaces or tabs to be replaced with a single space; fill
removes newlines from the input. To minimize all whitespace, you need to specify both options. Hence:
format = "EG> [[[[[[[[[[[[[[[[[[[[[" data = "h e\t l lo\nworld\t\t\t\t\t" r = Text::Reform.new r.squeeze = false # default, implied r.fill = false # default, implied puts r.format(format, data) # all whitespace preserved: # # EG> h e l lo # EG> world r.squeeze = true r.fill = false # default, implied puts r.format(format, data) # only newlines preserved # # EG> h e l lo # EG> world r.squeeze = false # default, implied r.fill = true puts r.format(format, data) # only spaces/tabs preserved: # # EG> h e l lo world r.fill = true r.squeeze = true puts r.format(format, data) # no whitespace preserved: # # EG> h e l lo world
Whether or not filling or squeezing is in effect, format
can also be directed to trim any extra whitespace from the end of each line it formats, using the trim
option. If this option is specified with a true
value, every line returned by format
will automatically have the substitution +.gsub!(/[ t]+/, '')+ applied to it.
r.format("[[[[[[[[[[[", 'short').length # => 11 r.trim = true r.format("[[[[[[[[[[[", 'short').length # => 6
It is also possible to control the character used to fill lines that are too short, using the filler
option. If this option is specified the value of the filler
flag is used as the fill string, rather than the default +“ ”+.
For example:
r.filler = '*' print r.format("Pay bearer: ^^^^^^^^^^^^^^^^^^^^", '$123.4')
prints:
Pay bearer: *******$123.4*******
If the filler string is longer than one character, it is truncated to the appropriate length. So:
r.filler = '-->' print r.format("Pay bearer: ^^^^^^^^^^^^^^^^^^^^", '$123.4') print r.format("Pay bearer: ^^^^^^^^^^^^^^^^^^^^", '$13.4') print r.format("Pay bearer: ^^^^^^^^^^^^^^^^^^^^", '$1.4')
prints:
Pay bearer: -->-->-$123.4-->-->- Pay bearer: -->-->--$13.4-->-->- Pay bearer: -->-->--$1.4-->-->--
If the value of the filler
option is a hash, then its :left
and :right
entries specify separate filler strings for each side of an interpolated value.
Options¶ ↑
The Perl variant supports option switching during processing of the arguments of a single call to format
. This has been removed while porting to Ruby, since I believe that this does not add to clarity of code. So you have to change options explicitly.
Data argument types and handling¶ ↑
The data
part of the call to format can be either in String form, the items being newline separated, or in Array form. The array form can contain any kind of type you want, as long as it supports to_s.
So all of the following examples return the same result:
# String form r.format("]]]].[[", "1234\n123") # Array form r.format("]]]].[[", [ 1234, 123 ]) # Array with another type r.format("]]]].[[", [ 1234.0, 123.0 ])
Multi-line format specifiers and interleaving¶ ↑
By default, if a format specifier contains two or more lines (i.e. one or more newline characters), the entire format specifier is repeatedly filled as a unit, until all block fields have consumed their corresponding arguments. For example, to build a simple look-up table:
values = (1..12).to_a squares = values.map { |el| sprintf "%.6g", el**2 } roots = values.map { |el| sprintf "%.6g", Math.sqrt(el) } logs = values.map { |el| sprintf "%.6g", Math.log(el) } inverses = values.map { |el| sprintf "%.6g", 1/el } puts reform.format( " N N**2 sqrt(N) log(N) 1/N", "=====================================================", "| [[ | [[[ | [[[[[[[[[[ | [[[[[[[[[ | [[[[[[[[[ |" + "-----------------------------------------------------", values, squares, roots, logs, inverses )
The multiline format specifier:
"| [[ | [[[ | [[[[[[[[[[ | [[[[[[[[[ | [[[[[[[[[ |" + "-----------------------------------------------------"
is treated as a single logical line. So format
alternately fills the first physical line (interpolating one value from each of the arrays) and the second physical line (which puts a line of dashes between each row of the table) producing:
N N**2 sqrt(N) log(N) 1/N ===================================================== | 1 | 1 | 1 | 0 | 1 | ----------------------------------------------------- | 2 | 4 | 1.41421 | 0.693147 | 0.5 | ----------------------------------------------------- | 3 | 9 | 1.73205 | 1.09861 | 0.333333 | ----------------------------------------------------- | 4 | 16 | 2 | 1.38629 | 0.25 | ----------------------------------------------------- | 5 | 25 | 2.23607 | 1.60944 | 0.2 | ----------------------------------------------------- | 6 | 36 | 2.44949 | 1.79176 | 0.166667 | ----------------------------------------------------- | 7 | 49 | 2.64575 | 1.94591 | 0.142857 | ----------------------------------------------------- | 8 | 64 | 2.82843 | 2.07944 | 0.125 | ----------------------------------------------------- | 9 | 81 | 3 | 2.19722 | 0.111111 | ----------------------------------------------------- | 10 | 100 | 3.16228 | 2.30259 | 0.1 | ----------------------------------------------------- | 11 | 121 | 3.31662 | 2.3979 | 0.0909091 | ----------------------------------------------------- | 12 | 144 | 3.4641 | 2.48491 | 0.0833333 | -----------------------------------------------------
This implies that formats and the variables from which they're filled need to be interleaved. That is, a multi-line specification like this:
puts r.format( "Passed: ## [[[[[[[[[[[[[[[ # single format specification Failed: # (needs two sets of data) [[[[[[[[[[[[[[[", ## passes, fails) ## data for previous format
would print:
Passed: <pass 1> Failed: <fail 1> Passed: <pass 2> Failed: <fail 2> Passed: <pass 3> Failed: <fail 3>
because the four-line format specifier is treated as a single unit, to be repeatedly filled until all the data in passes
and fails
has been consumed.
Unlike the table example, where this unit filling correctly put a line of dashes between lines of data, in this case the alternation of passes and fails is probably /not/ the desired effect.
Judging by the labels, it is far more likely that the user wanted:
Passed: <pass 1> <pass 2> <pass 3> Failed: <fail 4> <fail 5> <fail 6>
To achieve that, either explicitly interleave the formats and their data sources:
puts r.format( "Passed:", ## single format (no data required) " [[[[[[[[[[[[[[[", ## single format (needs one set of data) passes, ## data for previous format "Failed:", ## single format (no data required) " [[[[[[[[[[[[[[[", ## single format (needs one set of data) fails) ## data for previous format
or instruct format
to do it for you automagically, by setting the 'interleave' flag true
:
r.interleave = true puts r.format( "Passed: ## [[[[[[[[[[[[[[[ # single format Failed: # (needs two sets of data) [[[[[[[[[[[[[[[", ## ## data to be automagically interleaved passes, fails) # as necessary between lines of previous ## format
How format
hyphenates¶ ↑
Any line with a block field repeats on subsequent lines until all block fields on that line have consumed all their data. Non-block fields on these lines are replaced by the appropriate number of spaces.
Words are wrapped whole, unless they will not fit into the field at all, in which case they are broken and (by default) hyphenated. Simple hyphenation is used (i.e. break at the +N-1+th character and insert a '-'), unless a suitable alternative subroutine is specified instead.
Words will not be broken if the break would leave less than 2 characters on the current line. This minimum can be varied by setting the min_break
option to a numeric value indicating the minumum total broken characters (including hyphens) required on the current line. Note that, for very narrow fields, words will still be broken (but __unhyphenated__). For example:
puts r.format('~', 'split')
would print:
s p l i t
whilst:
r.min_break= 1 puts r.format('~', 'split')
would print:
s- p- l- i- t
Alternative breaking strategies can be specified using the “break” option in a configuration hash. For example:
r.break = MyBreaker.new r.format(fmt, data)
format
expects a user-defined line-breaking strategy to listen to the method break
that takes three arguments (the string to be broken, the maximum permissible length of the initial section, and the total width of the field being filled). break
must return a list of two strings: the initial (broken) section of the word, and the remainder of the string respectivly).
For example:
class MyBreaker def break(str, initial, total) [ str[0, initial-1].'~'], str[initial-1..-1] ] end end r.break = MyBreaker.new
makes '~' the hyphenation character, whilst:
class WrapAndSlop def break(str, initial, total) if (initial == total) str =~ /\A(\s*\S*)(.*)/ [ $1, $2 ] else [ '', str ] end end end r.break = WrapAndSlop.new
wraps excessively long words to the next line and “slops” them over the right margin if necessary.
The Text::Reform
class provides three functions to simplify the use of variant hyphenation schemes. Text::Reform::break_wrap
returns an instance implementing the “wrap-and-slop” algorithm shown in the last example, which could therefore be rewritten:
r.break = Text::Reform.break_wrap
Text::Reform::break_with
takes a single string argument and returns an instance of a class which hyphenates by cutting off the text at the right margin and appending the string argument. Hence the first of the two examples could be rewritten:
r.break = Text::Reform.break_with('~')
The method Text::Reform::break_at
takes a single string argument and returns a reference to a sub which hyphenates by breaking immediately after that string. For example:
r.break = Text::Reform.break_at('-') r.format("[[[[[[[[[[[[[[", "The Newton-Raphson methodology")
returns:
"The Newton-
Raphson
methodology"
Note that this differs from the behaviour of Text::Reform::break_with
, which would be:
r.break = Text::Reform.break_width('-') r.format("[[[[[[[[[[[[[[", "The Newton-Raphson methodology")
returns:
"The Newton-R-
aphson metho-
dology"
Choosing the correct breaking strategy depends on your kind of data.
The method Text::Reform::break_hyphen returns an instance of a class which hyphenates using a Ruby hyphenator. The hyphenator must be provided to the method. At the time of release, there are two implementations of hyphenators available: TeX::Hyphen by Martin DeMello and Austin Ziegler (a Ruby port of Jan Pazdziora's TeX::Hyphen module); and Text::Hyphen by Austin Ziegler (a significant recoding of TeX::Hyphen to better support non-English languages).
For example:
r.break = Text::Reform.break_hyphen
Note that in the previous example the calls to .break_at, .break_wrap and .break_hyphen produce instances of the corresponding strategy class.
The algorithm format
uses is:¶ ↑
-
If interleaving is specified, split the first string in the argument list into individual format lines and add a terminating newline (unless one is already present). therwise, treat the entire string as a single “line” (like /s does in regexes)
-
For each format line…
-
determine the number of fields and shift that many values off the argument list and into the filling list. If insufficient arguments are available, generate as many empty strings as are required.
-
generate a text line by filling each field in the format line with the initial contents of the corresponding arg in the filling list (and remove those initial contents from the arg).
-
replace any <,>, or ^ fields by an equivalent number of spaces. Splice out the corresponding args from the filling list.
-
Repeat from step 2.2 until all args in the filling list are empty.
-
-
concatenate the text lines generated in step 2
Note that in difference to the Perl version of Text::Reform
, this version does not currently loop over several format strings in one function call.
Reform#format
examples¶ ↑
As an example of the use of format
, the following:
count = 1 text = "A big long piece of text to be formatted exquisitely" output = '' output << r.format(" |||| <<<<<<<<<< ", count, text) output << r.format(" ---------------- ", " ^^^^ ]]]]]]]]]]| ", count+11, text)
results in output
:
1 A big lon- ---------------- 12 g piece| of text| to be for-| matted ex-| quisitely|
Note that block fields in a multi-line format string, cause the entire multi-line format to be repeated as often as necessary.
Unlike traditional Perl format
arguments, picture strings and arguments cannot be interleaved in Ruby version. This is partly by intention to see if the feature is a feature or if it can be disposed with. Another example:
report = '' report << r.format( 'Name Rank Serial Number', '==== ==== =============', '<<<<<<<<<<<<< ^^^^ <<<<<<<<<<<<<', name, rank, serial_number )
results in:
Name Rank Serial Number ==== ==== ============= John Doe high 314159
Numerical formatting¶ ↑
The “>>>.<<<” and “]]].[[[” field specifiers may be used to format numeric values about a fixed decimal place marker. For example:
puts r.format('(]]]]].[[)', %w{ 1 1.0 1.001 1.009 123.456 1234567 one two })
would print:
( 1.0) ( 1.0) ( 1.00) ( 1.01) ( 123.46) (#####.##) (?????.??) (?????.??)
Fractions are rounded to the specified number of places after the decimal, but only significant digits are shown. That's why, in the above example, 1 and 1.0 are formatted as “1.0”, whilst 1.001 is formatted as “1.00”.
You can specify that the maximal number of decimal places always be used by giving the configuration option 'numeric' the value NUMBERS_ALL_PLACES
. For example:
r.numeric = Text::Reform::NUMBERS_ALL_PLACES puts r.format('(]]]]].[[)', <<EONUMS) 1 1.0 EONUMS
would print:
( 1.00) ( 1.00)
Note that although decimal digits are rounded to fit the specified width, the integral part of a number is never modified. If there are not enough places before the decimal place to represent the number, the entire number is replaced with hashes.
If a non-numeric sequence is passed as data for a numeric field, it is formatted as a series of question marks. This querulous behaviour can be changed by giving the configuration option 'numeric' a value that matches /bSkipNaNb/i in which case, any invalid numeric data is simply ignored. For example:
r.numeric = Text::Reform::NUMBERS_SKIP_NAN puts r.format('(]]]]].[[)', %w{ 1 two three 4 })
would print:
( 1.0) ( 4.0)
Filling block fields with lists of values¶ ↑
If an argument contains an array, then format
automatically joins the elements of the array into a single string, separating each element with a newline character. As a result, a call like this:
svalues = %w{ 1 10 100 1000 } nvalues = [1, 10, 100, 1000] puts r.format( "(]]]].[[)", svalues # you could also use nvalues here. )
will print out
( 1.00) ( 10.00) (100.00) (1000.00)
as might be expected.
Note: While String arguments are consumed during formatting process and will be empty at the end of formatting, array arguments are not. So svalues (nvalues) still contains [1,10,100,1000] after the call to format
.
Headers, footers, and pages¶ ↑
The format
method can also insert headers, footers, and page-feeds as it formats. These features are controlled by the “header”, “footer”, “page_feed”, “page_len”, and “page_num” options.
If the page_num
option is set to an Integer value, page numbering will start at that value.
The page_len
option specifies the total number of lines in a page (including headers, footers, and page-feeds).
The page_width
option specifies the total number of columns in a page.
If the header
option is specified with a string value, that string is used as the header of every page generated. If it is specified as a block, that block is called at the start of every page and its return value used as the header string. When called, the block is passed the current page number.
Likewise, if the footer
option is specified with a string value, that string is used as the footer of every page generated. If it is specified as a block, that block is called at the start of every page and its return value used as the footer string. When called, the footer block is passed the current page number.
Both the header and footer options can also be specified as hash references. In this case the hash entries for keys left
, centre
(or center
), and right
specify what is to appear on the left, centre, and right of the header/footer. The entry for the key width
specifies how wide the footer is to be. If the width
key is omitted, the page_width
configuration option (which defaults to 72 characters) is used.
The :left
, :centre
, and :right
values may be literal strings, or blocks (just as a normal header/footer specification may be.) See the second example, below.
Another alternative for header and footer options is to specify them as a block that returns a hash reference. The subroutine is called for each page, then the resulting hash is treated like the hashes described in the preceding paragraph. See the third example, below.
The page_feed
option acts in exactly the same way, to produce a page_feed
which is appended after the footer. But note that the page_feed
is not counted as part of the page length.
All three of these page components are recomputed at the *start of each new page*, before the page contents are formatted (recomputing the header and footer first makes it possible to determine how many lines of data to format so as to adhere to the specified page length).
When the call to format
is complete and the data has been fully formatted, the footer subroutine is called one last time, with an extra argument of true
. The string returned by this final call is used as the final footer.
So for example, a 60-line per page report, starting at page 7, with appropriate headers and footers might be set up like so:
small = Text::Reform.new r.header = lambda do |page| "Page #{page}\n\n" end r.footer = lambda do |page, last| if last '' else ('-'*50 + "\n" + small.format('>'*50, "...#{page+1}")) end end r.page_feed = "\n\n" r.page_len = 60 r.page_num = 7 r.format(template, data)
Note that you can't reuse the r
instance of Text::Reform
inside the footer, it will end up calling itself recursivly until stack exhaustion.
Alternatively, to set up headers and footers such that the running head is right justified in the header and the page number is centred in the footer:
r.header = { :right => 'Running head' } r.footer = { :centre => lambda do |page| "page #{page}" end } r.page_len = 60 r.format(template, data)
The footer in the previous example could also have been specified the other way around, as a block that returns a hash (rather than a hash containing a block):
r.header = { :right => 'Running head' } r.footer = lambda do |page| { :center => "page #{page}" } end
AUTHOR¶ ↑
Original Perl library and documentation: Damian Conway (damian at conway dot org)
Translating everything to Ruby (and leaving a lot of stuff out): Kaspar Schiess (eule at space dot ch)
BUGS¶ ↑
There are undoubtedly serious bugs lurking somewhere in code this funky :-) Bug reports and other feedback are most welcome.
COPYRIGHT¶ ↑
Copyright © 2005, Kaspar Schiess. All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the terms of the Ruby License (see www.ruby-lang.org/en/LICENSE.txt)
Constants
- BFIELDMARK
- BJUSTIFIED
- BNUMERICAL
Matches one or more ] followed by . followed by one or more [
- BSINGLE
- BSPECIALS
various regexp parts for matching patterns.
- CLEAR_BLOCK
For use with
header
,footer
, andpage_feed
; this will clear the header, footer, or page feed block result to be an empty block.- DECIMAL
- FIELDMARK
- FIELDPAT
- FIXED_FIELDPAT
- LFIELDMARK
- LJUSTIFIED
- LNUMERICAL
- LSPECIALS
- NUMBERS_ALL_AND_SKIP
Numbers are printed as for
NUMBERS_ALL_PLACES
, but NaN values are skipped.- NUMBERS_ALL_PLACES
Numbers are printed, retaining all decimal places. Non-numeric data is printed as a series of question marks.
[[[[[.]] # format 1.0 -> 1.00 1 -> 1.00
- NUMBERS_NORMAL
Numbers are printed, leaving off unnecessary decimal places. Non- numeric data is printed as a series of question marks. This is the default for formatting numbers.
- NUMBERS_SKIP_NAN
Numbers are printed as ffor
NUMBERS_NORMAL
, but NaN (“not a number”) values are skipped.- SPECIALS
- VERSION
Attributes
Break class instance that is used to break words in hyphenation. This class must have a break
method accepting the three arguments str
, initial_max_length
and maxLength
.
You can directly call the break_* methods to produce such a class instance for you; Available methods are break_width, break_at, break_wrap, break_hyphenator.
- Default
-
Text::Hyphen::break_with('-')
Controls character that is used to fill lines that are too short. If this attribute has a hash value, the symbols :left and :right store the filler character to use on the left and the right, respectivly.
- Default
-
+' '+ on both sides
Proc returning page header. This is called before the page actually gets formatted to permit calculation of page length.
- Default
-
CLEAR_BLOCK
This implies that formats and the variables from which they're filled need to be interleaved. That is, a multi-line specification like this:
print format( "Passed: ## [[[[[[[[[[[[[[[ # single format specification Failed: # (needs two sets of data) [[[[[[[[[[[[[[[", ## fails, passes) ## two arrays, data for previous format
would print:
Passed: <pass 1> Failed: <fail 1> Passed: <pass 2> Failed: <fail 2> Passed: <pass 3> Failed: <fail 3>
because the four-line format specifier is treated as a single unit, to be repeatedly filled until all the data in passes
and fails
has been consumed.
- Default
-
false
Specifies the minimal number of characters that must be left on a line. This prevents breaking of words below its value.
- Default
-
2
Specifies handling method for numerical data. Allowed values include:
-
NUMBERS_NORMAL
-
NUMBERS_ALL_PLACES
-
NUMBERS_SKIP_NAN
-
NUMBERS_ALL_AND_SKIP
- Default
Proc to be called for page feed text. This is also called at the start of each page, but does not count towards page length.
- Default
-
CLEAR_BLOCK
Specifies the total number of lines in a page (including headers, footers, and page-feeds).
- Default
-
nil
Where to start page numbering.
- Default
-
nil
Specifies the total number of columns in a page.
- Default
-
72
If true
, causes any sequence of spaces and/or tabs (but not newlines) in an interpolated string to be replaced with a single space.
- Default
-
false
Controls trimming of whitespace at end of lines.
- Default
-
true
Public Class Methods
Takes a bat
string as argument, breaks by looking for that substring and breaking just after it.
# File lib/text/reform.rb, line 1296 def break_at(bat) BreakAt.new(bat) end
Hyphenates with a class that implements the API of TeX::Hyphen or Text::Hyphen.
# File lib/text/reform.rb, line 1307 def break_hyphenator(hyphenator) BreakHyphenator.new(hyphenator) end
Takes a hyphen
string as argument, breaks by inserting that hyphen into the word to be hyphenated.
# File lib/text/reform.rb, line 1290 def break_with(hyphen) BreakWith.new(hyphen) end
Breaks by using a 'wrap and slop' algorithm.
# File lib/text/reform.rb, line 1301 def break_wrap BreakWrap.new end
Create a Text::Reform
object. Accepts an optional hash of construction option (this will change to named parameters in Ruby 2.0). After the initial object is constructed (with either the provided or default values), the object will be yielded (as self
) to an optional block for further construction and operation.
# File lib/text/reform.rb, line 918 def initialize(options = {}) #:yields self: @debug = options[:debug] || false @header = options[:header] || CLEAR_BLOCK @footer = options[:footer] || CLEAR_BLOCK @page_feed = options[:page_feed] || CLEAR_BLOCK @page_len = options[:page_len] || nil @page_num = options[:page_num] || nil @page_width = options[:page_width] || 72 @break = options[:break] || Text::Reform.break_with('-') @min_break = options[:min_break] || 2 @squeeze = options[:squeeze] || false @fill = options[:fill] || false @filler = options[:filler] || { :left => ' ', :right => ' ' } @interleave = options[:interleave] || false @numeric = options[:numeric] || 0 @trim = options[:trim] || false yield self if block_given? end
Public Instance Methods
Construct a type that can be passed to replace
from last a string.
# File lib/text/reform.rb, line 1408 def __construct_type(str, justifiedPattern) if str =~ /#{justifiedPattern}/x 'J' else str end end
Count occurrences of n (lines) of all strings that are passed as parameter.
# File lib/text/reform.rb, line 1401 def count_lines(*args) args.inject(0) do |sum, el| sum + el.count("\n") end end
Turn on internal debugging output for the duration of the block.
# File lib/text/reform.rb, line 1280 def debug d = @debug @debug = true yield @debug = d end
Format data according to format
.
# File lib/text/reform.rb, line 939 def format(*args) @page_num ||= 1 __debug("Acquiring header and footer: ", @page_num) header = __header(@page_num) footer = __footer(@page_num, false) previous_footer = footer line_count = count_lines(header, footer) hf_count = line_count text = header format_stack = [] while (args and not args.empty?) or (not format_stack.empty?) __debug("Arguments: ", args) __debug("Formats left: ", format_stack) if format_stack.empty? if @interleave # split format in its parts and recombine line by line format_stack += args.shift.split(%r{\n}o).collect { |fmtline| fmtline << "\n" } else format_stack << args.shift end end format = format_stack.shift parts = format.split(%r{ ( # Capture \n | # newline... OR (?:\\.)+ | # one or more escapes... OR #{FIELDPAT} | # patterns )}ox) parts << "\n" unless parts[-1] == "\n" __debug("Parts: ", parts) # Count all fields (inject 0, increment when field) and prepare # data. field_count = parts.inject(0) do |count, el| if (el =~ /#{LFIELDMARK}/ox or el =~ /#{FIELDMARK}/ox) count + 1 else count end end if field_count.nonzero? data = args.first(field_count).collect do |el| if el.kind_of?(Array) el.join("\n") else el.to_s end end # shift all arguments that we have just consumed args = args[field_count..-1] # Is argument count correct? data += [''] * (field_count - data.length) unless data.length == field_count else data = [[]] # one line of data, contains nothing end first_line = true data_left = true while data_left idx = 0 data_left = false parts.each do |part| # Is part an escaped format literal ? if part =~ /\A (?:\\.)+/ox __debug("esc literal: ", part) text << part.gsub(/\\(.)/, "\1") # Is part a once field mark ? elsif part =~ /(#{LFIELDMARK})/ox if first_line type = __construct_type($1, LJUSTIFIED) __debug("once field: ", part) __debug("data is: ", data[idx]) text << replace(type, part.length, data[idx]) __debug("data now: ", data[idx]) else text << (@filler[:left] * part.length)[0, part.length] __debug("missing once field: ", part) end idx += 1 # Is part a multi field mark ? elsif part =~ /(#{FIELDMARK})/ox and part[0, 2] != '~~' type = __construct_type($1, BJUSTIFIED) __debug("multi field: ", part) __debug("data is: ", data[idx]) text << replace(type, part.length, data[idx]) __debug("text is: ", text) __debug("data now: ", data[idx]) data_left = true if data[idx].strip.length > 0 idx += 1 # Part is a literal. else __debug("literal: ", part) text << part.gsub(/\0(\0*)/, '\1') # XXX: What is this gsub for ? # New line ? if part == "\n" line_count += 1 if @page_len && line_count >= @page_len __debug("\tejecting page: #@page_num") @page_num += 1 page_feed = __pagefeed header = __header(@page_num) text << footer + page_feed + header previous_footer = footer footer = __footer(@page_num, false) line_count = hf_count = (header.count("\n") + footer.count("\n")) header = page_feed + header end end end # multiway if on part end # parts.each __debug("Accumulated: ", text) first_line = false end end # while args or formats left # Adjust final page header or footer as required if hf_count > 0 and line_count == hf_count # there is a header that we don't need text.sub!(/#{Regexp.escape(header)}\Z/, '') elsif line_count > 0 and @page_len and @page_len > 0 # missing footer: text << "\n" * (@page_len - line_count) + footer previous_footer = footer end # Replace last footer if previous_footer and not previous_footer.empty? lastFooter = __footer(@page_num, true) footerDiff = lastFooter.count("\n") - previous_footer.count("\n") # Enough space to squeeze the longer final footer in ? if footerDiff > 0 && text =~ /(#{'^[^\S\n]*\n' * footerDiff}#{Regexp.escape(previous_footer)})\Z/ previous_footer = $1 footerDiff = 0 end # If not, create an empty page for it. if footerDiff > 0 @page_num += 1 lastHeader = __header(@page_num) lastFooter = __footer(@page_num, true) text << lastHeader text << "\n" * (@page_len - lastHeader.count("\n") - lastFooter.count("\n")) text << lastFooter else lastFooter = "\n" * (-footerDiff) + lastFooter text[-(previous_footer.length), text.length] = lastFooter end end # Trim text text.gsub!(/[ ]+$/m, '') if @trim text end
Quotes any characters that might be interpreted in str
to be normal characters.
# File lib/text/reform.rb, line 1273 def quote(str) puts 'Text::Reform warning: not quoting string...' if @debug str end
Replaces a placeholder with the text given. The format
string gives the type of the replace match: When exactly two chars, this indicates a text replace field, when longer, this is a numeric field.
# File lib/text/reform.rb, line 1118 def replace(format, length, value) text = '' remaining = length filled = 0 __debug("value is: ", value) if @fill value.sub!(/\A\s*/m, '') else value.sub!(/\A[ \t]*/, '') end if value and format.length > 2 # find length of numerical fields if format =~ /([\]>]+)#{Regexp.escape(DECIMAL)}([\[<]+)/ ilen, dlen = $1.length, $2.length end # Try to extract a numeric value from +value+ done = false while not done num, extra = scanf_remains(value, "%f") __debug "Number split into: ", [num, extra] done = true if extra.length == value.length value.sub!(/\s*\S*/, '') # skip offending non number value if (@numeric & NUMBERS_SKIP_NAN) > 0 && value =~ /\S/ __debug("Not a Number, retrying ", value) done = false else text = '?' * ilen + DECIMAL + '?' * dlen return text end end end num = num.first if num.kind_of?(Array) __debug("Finally number is: ", num) formatted = "%#{format.length}.#{dlen}f" % num if formatted.length > format.length text = '#' * ilen + DECIMAL + '#' * dlen else text = formatted end __debug("Formatted number is: ", text) # Only output significant digits. Unless not all places were # explicitly requested or the number has more digits than we just # output replace trailing zeros with spaces. unless (@numeric & NUMBERS_ALL_PLACES > 0) or num.to_s =~ /#{Regexp.escape(DECIMAL)}\d\d{#{dlen},}$/ text.sub!(/(#{Regexp.escape(DECIMAL)}\d+?)(0+)$/) do |mv| $1 + ' ' * $2.length end end value.replace(extra) remaining = 0 else while !((value =~ /\S/o).nil?) # Only whitespace remaining ? if ! @fill && value.sub!(/\A[ \t]*\n/, '') filled = 2 break end break unless value =~ /\A(\s*)(\S+)(.*)\z/om; ws, word, extra = $1, $2, $3 # Replace all newlines by spaces when fill was specified. nonnl = (ws =~ /[^\n]/o) if @fill ws.gsub!(/\n/) do |match| nonnl ? '' : ' ' end end # Replace all whitespace by one space if squeeze was specified. lead = @squeeze ? (ws.length > 0 ? ' ' : '') : ws match = lead + word __debug("Extracted: ", match) break if text and match =~ /\n/o if match.length <= remaining __debug("Accepted: ", match) text << match remaining -= match.length value.replace(extra) else __debug("Need to break: ", match) if (remaining - lead.length) >= @min_break __debug("Trying to break: ", match) broken, left = @break.break(match, remaining, length) text << broken __debug("Broke as: ", [broken, left]) value.replace left + extra # Adjust remaining chars, but allow for underflow. t = remaining-broken.length if t < 0 remaining = 0 else remaining = t end end break end filled = 1 end end if filled.zero? and remaining > 0 and value =~ /\S/ and text.empty? value.sub!(/^\s*(.{1,#{remaining}})/, '') text = $1 remaining -= text.length end # Justify format? if text =~ / /o and format == 'J' and value =~ /\S/o and filled != 2 # Fully justified text.reverse! text.gsub!(/( +)/o) do |mv| remaining -= 1 if remaining > 0 " #{$1}" else $1 end end while remaining > 0 text.reverse! elsif format =~ /\>|\]/o # Right justified text[0, 0] = (@filler[:left] * remaining)[0, remaining] if remaining > 0 elsif format =~ /\^|\|/o # Center justified half_remaining = remaining / 2 text[0, 0] = (@filler[:left] * half_remaining)[0, half_remaining] half_remaining = remaining - half_remaining text << (@filler[:right] * half_remaining)[0, half_remaining] else # Left justified text << (@filler[:right] * remaining)[0, remaining] end text end
Using Scanf module, scanf a string and return what has not been matched in addition to normal scanf return.
# File lib/text/reform.rb, line 1388 def scanf_remains(value, fstr, &block) if block.nil? unless fstr.kind_of?(Scanf::FormatString) fstr = Scanf::FormatString.new(fstr) end [ fstr.match(value), fstr.string_left ] else value.block_scanf(fstr, &block) end end
Adds a n character to the end of the line unless it already has a n at the end of the line. Returns a modified copy of str
.
# File lib/text/reform.rb, line 1418 def unchomp(str) unchomp!(str.dup) end
Adds a n character to the end of the line unless it already has a n at the end of the line.
# File lib/text/reform.rb, line 1424 def unchomp!(str) if str.empty? or str[-1] == ?\n str else str << "\n" end end
Private Instance Methods
Debug output. Message msg
is printed at start of line, then obj
is output using pp
.
# File lib/text/reform.rb, line 1434 def __debug(msg, obj = nil) return unless @debug require 'pp' print msg pp obj end
Return the header to use. Header can be in many formats, refer yourself to the documentation.
# File lib/text/reform.rb, line 1314 def __header(page_num) __header_or_footer(@header, page_num, false) end
Use the page_feed
attribute to get the page feed text. page_feed
can contain a block to call or a String.
# File lib/text/reform.rb, line 1377 def __pagefeed if @page_feed.respond_to?(:call) @page_feed.call(@page) else @page_feed end end