xtr - the single pass file/stream translation utility
The "xtr" is a public domain single pass file or stream translation
utility. The sources are available there,
and have been published in the usenet groups
alt.sources.d
and alt.sources.
In the begining it was intended to be an extension of the standard "tr" GNU tool,
so that's why it is called xtr - "eXtented tr".
It can recognize a set of byte or character patterns in the input stream
and replace it with the defined replacement in the output stream.
The program is able to correctly resolve the cyclic replacements like:
- "aa" -> "ab"
- "ab" -> "cd"
- "ee" -> "er"
In this case the "ab" strings introduced by translation of "aa"
sequences will be not converted into the "cd" strings. Only the
original "ab" sequences will be translated.
Additionally the program can correctly handle the similar patterns of
different length like below:
- "an" -> "string1"
- "ana" -> "string2"
It will always recognize and use the longest matching pattern.
The algorithm
The characters read from the input stream are written into the buffer.
After adding of a new character, the contents of the buffer is compared
with the set of patterns. If there is any matching pattern, the new
character is added. If after adding a new character no matching pattern
is found, the longest pattern found before is translated - the pattern
is removed from the buffer, and its replacement is written to the
output. If there were no matching patterns at all, the first character
from the buffer is removed and written to the output, then
pattern matching is repeated (if there is a matching pattern longer
than the current buffer contents, we proceed with adding new characters).
Then the longest matching pattern is translated as before.
If there are no matching patterns, agian the first
character is written and removed, and the process is repeated.
The calling syntax
The program should be called as follows.
- xtr -tpattern,replacement -tpattern2,replacement2 < filein > fileout
- xtr -fdefinitions_file < file_in > file_out
The pattern (and replacement) are the character (byte) string
containing:
- normal characters
- bytes defined with hexadecimal digits, eg. '\xa4' (without quotes,
always two digits)
- bytes defined with decimal digits, eg. '\d013' (without quotes,
always three digits)
- bytes defined with octal digits, eg. '\o273' (without quotes,
always three digits)
The characters '\' and ',' are to be escaped with backslash '\'.
When defining translation with '-t' option, you need to consider the
shell translations.
The definitions_file should contain lines of the following form:
pattern,replacement
The program can be used in a pipe, eg.:
cat file1 | xtr -f test.tr | program2
The code is not very clean. I've written this tool ca. 8 years ago
and still hadn't time to polish it. So finally I've decided to give it
out "as is", placing it into public domain.
Probably it can be done much cleaner and much more efficient,
but even in the current form it was very useful for me.
Wojciech Zabolotny
Last modified: Fri Apr 9 11:44:51 CEST 2004