WZab's animated logo

xtr - the single pass file/stream translation utility

The "xtr" is a public domain single pass file or stream translation utility. The sources are available there, and have been published in the usenet groups alt.sources.d and alt.sources.

In the begining it was intended to be an extension of the standard "tr" GNU tool, so that's why it is called xtr - "eXtented tr".

It can recognize a set of byte or character patterns in the input stream and replace it with the defined replacement in the output stream.

The program is able to correctly resolve the cyclic replacements like:

In this case the "ab" strings introduced by translation of "aa" sequences will be not converted into the "cd" strings. Only the original "ab" sequences will be translated.

Additionally the program can correctly handle the similar patterns of different length like below:

It will always recognize and use the longest matching pattern.

The algorithm

The characters read from the input stream are written into the buffer. After adding of a new character, the contents of the buffer is compared with the set of patterns. If there is any matching pattern, the new character is added. If after adding a new character no matching pattern is found, the longest pattern found before is translated - the pattern is removed from the buffer, and its replacement is written to the output. If there were no matching patterns at all, the first character from the buffer is removed and written to the output, then pattern matching is repeated (if there is a matching pattern longer than the current buffer contents, we proceed with adding new characters). Then the longest matching pattern is translated as before. If there are no matching patterns, agian the first character is written and removed, and the process is repeated.

The calling syntax

The program should be called as follows.

  1. xtr -tpattern,replacement -tpattern2,replacement2 < filein > fileout
  2. xtr -fdefinitions_file < file_in > file_out
The pattern (and replacement) are the character (byte) string containing:
  1. normal characters
  2. bytes defined with hexadecimal digits, eg. '\xa4' (without quotes, always two digits)
  3. bytes defined with decimal digits, eg. '\d013' (without quotes, always three digits)
  4. bytes defined with octal digits, eg. '\o273' (without quotes, always three digits)
The characters '\' and ',' are to be escaped with backslash '\'. When defining translation with '-t' option, you need to consider the shell translations.

The definitions_file should contain lines of the following form:

pattern,replacement

The program can be used in a pipe, eg.:

cat file1 | xtr -f test.tr | program2

The code is not very clean. I've written this tool ca. 8 years ago and still hadn't time to polish it. So finally I've decided to give it out "as is", placing it into public domain. Probably it can be done much cleaner and much more efficient, but even in the current form it was very useful for me.


Wojciech Zabolotny
Last modified: Fri Apr 9 11:44:51 CEST 2004