*************************************************************
TxTool, a Utility for Word Processing
(c) 1988 by Don E. Farmer
16810 Deer Creek Dr.
Spring, TX 77379
This is SHAREWARE and may not be sold by anyone.
PLEASE DONATE $5.00 FOR THIS PROGRAM.
**************************************************************
TxTool is a GEM application for the Atari ST. It is a utility program
that was designed to be used as an adjunct to a word processor. Using
it will increase your word processing power.
TxTool computes word counts, checks spelling, reports questionable
usage of English, and does search-and-replace's that are specified by
a file. Except for word counts, TxTool requires the use of auxiliary
files, which are termed "dictionaries." These, you build with your
word processor. How powerful TxTool is for you depends largely on the
effort you put into tailoring these dictionaries to meet your needs.
(I have included some of the dictionaries I use in this arc. Please
note that the spelling dictionary is only a start. You may wish to
purchase one from Austin Code Works, rather than making your own.)
When this GEM application is opened, the menu bar displays "Desk",
"File", "Options", and "Help." Under "Help" are the selections
"General", "Counts", "Spell", "Usage", and "StrSub." Selecting these
leads to dialog boxes that serve to remind you how to operate TxTool.
They cannot take the place of the information provided here.
In its general operation, TxTool reads an input text file, which should
be ASCII for reliable results, along with the appropriate dictionary,
and then writes its results to an output file. The option, "Counts"
requires no dictionary. In all cases the input text and the dictionary
are not altered when TxTool is run. An option to mouse usage is also
provided for. By typing the key combination "Control I", for example,
the File Selector is displayed for the path name of the input stream.
The key combinations are displayed along with what they select in the
menu, the character "^" designating "Control." Using the key strokes
is faster but not easier than using the mouse.
The counts that TxTool does are of words, sentences, words per
sentence, and word frequency, this being an alphabetized list of every
word used in the input text along with the number of its occurrences.
The algorithm to do this comes from a slight modification of one given
in Kernighan & Ritchie's, The C Programming Language, as does the
binary search function used in the spelling checker.
A "word" is a string, no longer than 32 characters, of ASCII letters;
the punctuation marks, apostrophe and hyphen, are also included. A
sentence is of at most 512 characters and is a string of words
terminated by a period, a question, or an exclamation mark. This
punctuation is necessary, for TxTool processes sentences, not lines,
and will not work without it. The number of words and sentences is
often useful. The distribution of words per sentence can indicate how
much variety in sentence length there is. And, with the word frequency
list, the overuse of a particular word is easily spotted. Also, this
list can be loaded into a word processor and edited to add to a
spelling dictionary.
The spelling dictionary is an ASCII file containing a list of
alphabetized words, one word of at most 32 characters for each line in
the file. The words should have neither leading nor trailing spaces nor
anything embedded in it that is not a letter, an apostrophe, or a
hyphen. This dictionary is alphabetized in ASCII order. If you are
building a "SPELL.DIC" and are using a line sort that allows
"dictionary order", do not use it! Use the standard ASCII sort instead.
The spelling checker is not sensitive to case, and apostrophes are
significant, being considered to be letter and a part of the word that
they are in. The checker does a binary search of the dictionary for
each word in the input text, writing the words it does not find to the
output file. You do not have to listen to a chorus of dings nor tire
your trigger finger clicking repeatedly on "OK." You can load the
results of the spelling checker into your word processor, edit it, and
then let "StrSub" make the corrections for you. The checker's search
is efficient, particularly so since no data compaction or pointer
hashing is done. (Isn't cheap memory wonderful?)
To determine the number of words the spelling dictionary can hold
requires you to know how much free ram remains when the program is
executing. One fourth of this ram is allotted to pointers to the
entries while the other three fourth's hold the actual text. Suppose,
for example, it is known that 400K of ram was free when TxTool was
resident and had freed the heap, that is, the memory available for
dynamic allocation. Then 100K would be taken up by pointers, and since
a pointer requires four bytes, this would mean that the spelling
dictionary could hold at most 25K words. (Although there are about
500K words in the English language, it is said that the average person
uses less than 10K. I'm sure the owner of an ST uses more!) You can
approximate the heap by summing the size of TXTOOL.PRG, TXTOOL.RSC,
32K (TxTool takes this for its stack.), and your desk accessories, and
then subtracting this from your memory size. Doing this, I must
emphasize, is only an approximation, for we are dependent on the
gemdos allocator Malloc(), which has been know to have its quirks.
A usage dictionary will help you to avoid mistakes in idiom such as
using "off of" for "off" and "over with" for "over." It will help you
to avoid trite and redundant expressions such as "neither rhyme nor
reason" and "a smile on his face." (Where but a face would a smile
be?) There can be 7000 lines in a usage dictionary, each entry taking
two lines, and no line being longer than 64 characters. The first line
might be "a smile on his face" and the second, the report "REDUNDANT."
For each sentence in the input text, TxTool searches the target lines
in the usage dictionary for a match. When a match is made, the line
following the target is included in the report that is written to the
output file. The target line allows the character '?' to serve as a
wild card character in the searches. Thus, "h??" could be either "him"
or "her."
I have found it convenient to have a number of usage dictionaries,
"IDIOM.DIC", "TRITE.DIC", "WORDY.DIC", for example, and name my output
files "*.IDM", "*.TRI", and "*.WOR." This is just a suggestion,
however, as TxTool allows you to name your files anything you wish.
Two warnings are in order. First, the dictionary search is necessarily
linear so if you have 3500 trite expressions, and there might well be
that many, then you might want to go for a nice long walk because
searching each sentence of your input 3500 times will take a while.
Second, and I suppose there is some humor in this, working with trite
expressions is like being around the plague: you're apt to catch it!
You find yourself using ones you had never heard of until you starting
searching for them in Fowler's Modern English Usage. The ones new to
you sound pretty good; that is why, of course, they got worn out so
readily.
The dictionary for string substitution, "StrSub", also has two lines
for each entry, each no longer than 64 characters. The first line,
again, is a target string; but the second is the replacement text. When
the target is matched in the input file, the replacement text is
substituted for it. Again, the input text is searched linearly, and
only once in each sentence is a target string replaced.
For example, let's say you wanted to replace your "cats" with "dogs" in
your text. Your dictionary entry would read:
cats
dogs
and if your input text was:
Cats cats cats.
Your output would be:
Cats dogs cats.
The first "Cats" is not replaced because of case sensitivity and the
latter because of only one replacement per sentence. In practice this
is not much of a restriction as you can "bucket brigade" your files
for multiple passes. You can use StrSub as a gender changer for him's
to her's, he's to she's, etc, if you are writing reports concerning
specific male and females "persons."
Most of the constraints mentioned so far are manifest constants in the
C source code and can be changed should you compiled it again. I used
MegaMax's Lazer C, but see no reason why it could not be compiled with
another C. Making other changes to the source code is not recommended
unless you are an experienced C programmer. THE C SOURCE CODE IS
AVAILABLE FROM ME FOR $15.00.
Getting good dictionaries together is where your efforts will be
rewarded.
Writing is hard work. Trying to come up with the right words at the
right time is chore enough without worrying about your "off of"'s and
"acid test"'s. With TxTool to assist you, this editing can be done in
advance and need be done only once. Then you can allow the muse to
flow freely. But do watch out for those "aching voids" and "blushing
brides!"