ENCode TeXT

I often want to post text files (Like C source code) in various online
forums without having to set up a download somewhere... So people could
just "grab the text" directly from the forum posting.

Unfortunately, some forums/readers "reformat" blocks of text when they are
posted. This may consist of joining lines shorter than an arbitrary lenght,
breaking longer lines, removing leading spaces, reducing multiple spaces/tabs,
disallowing or changing certain characters, and other things which tend to
make "source code" considerably less readable.

ENCTXT and DECTXT are my solution.

ENCTXT "encodes" a file into simple lines of text characters.
Files are encoded using a "64 code set" which has "only common printable"
characters. Any other characters in the block are ignored by DECZIP.

Any possible byte value which may occur in the source file in encoded using
this code set.

DECTXT takes a file containg the emcoded text, captured from the forum and
decodes it back into the original source file.

Such blocks may be posted like this:
----------------------------------------------------------------------
8:8k8i8n8u7f8O8F8F7J8x8k8s:P:I7f8T8n8o8y7f:::T8g8t7f8k838g8s8v8r8k7f8u
8l7f8g7f8z8k838z7f8l8o:::h8k8t8i8u8j8k8j7f828o8z8n7f8E8N8C8T8X8T:f:M7f
7f8D8g818k7f8D808t8l8o8k8r8j:f:I:::z8z:::r8m7t8h8g8z7f8g7J
----------------------------------------------------------------------

The '-' lines are unimportant amd can be left in the clipped block because
'-' is not in the code set and will be ignored.

**Please make sure your block clipped from a newgroup does NOT contain
any of the "64 code set" characters except the ones posted as part of the
block. For simplicity ENCZIP/DECZIP do not encapsulate the block in any
way, and you can get odd/random results if this is not heeded!

**This encoding of any possible byte value into "printable only" text causes
a basic 1->2 size increase. To help allieviate this: ENCTXT performs a very
rudimentary compression. This usually makes the output block a similar size
to the original file (sometimes smaller)!

**This simple compression uses a fair number of "table lookups" and I've not
taken the time to optimize/streamline, so it can seem slow... Decompression
requires very little work and is therefore FAST!  Since forum posts should be
relatively small, only have to be ENCTXTed once and may be DECTXTed multiple
times, this seems reasonable to me.

**To keep encoding simple, these programs work with input files up to 64k
(65535 bytes) in size. As this is a HUGE amount of text to post, this also
seems resonable.

I have included .DVM binaries which run under "Dunfield Virtual Machine".
DVM does not have to be "installed", and lets you run .DVMs directly from the
host command line. You can download free Windows/Linux editions of DVM from
my site (see below).

I have also included ENCZIP.C amd DECZIP.C which should compile with few
changes on other platforms.

I have confirmed that they will compile "as is" with:
    My own Micro-C compiler (DOS and DVM)
    LCCWIN32                (Windows)
    GCC                     (Linux and other)

Technical notes:
----------------
To make the output all printable, ENCTXT uses base-64 numbers which
consist of:
    :ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxtz0123456789;
    0                                                              63

Each entry may contain two digits:  1111xx xxxxxx
    which is an actual character to write to the output.
    x = 8-bit character value.
or four digits: ssssss ssaaaa aaaaaa aaaaaa
    which references a section of previously output text that is duplicated
    here.
    s = size of section: (3 to 230)-3
    a = Address of section in previous text.

Dave Dunfield   -   https://dunfield.themindfactory.com
