Text and Binary modes

The Issue

On a UNIX system, when an application reads from a file it gets exactly what's in the file on disk and the converse is true for writing. The situation is different in the DOS/Windows world where a file can be opened in one of two modes, binary or text. In the binary mode the system behaves exactly as in UNIX. However in text mode there are major differences:

  1. On writing in text mode, a NL (\n, ^J) is transformed into the sequence CR (\r, ^M) NL.

  2. On reading in text mode, a CR followed by an NL is deleted and a ^Z character signals the end of file.

This can wreak havoc with the seek/fseek calls since the number of bytes actually in the file may differ from that seen by the application.

The mode can be specified explicitly as explained in the Programming section below. In an ideal DOS/Windows world, all programs using lines as records (such as bash, make, sed ...) would open files (and change the mode of their standard input and output) as text. All other programs (such as cat, cmp, tr ...) would use binary mode. In practice with Cygwin, programs that deal explicitly with object files specify binary mode (this is the case of od, which is helpful to diagnose CR problems). Most other programs (such as cat, cmp, tr) use the default mode.

The default Cygwin behavior

The Cygwin system gives us some flexibility in deciding how files are to be opened when the mode is not specified explicitly:

  1. If the file appears to reside on a file system that is mounted (i.e. if its pathname starts with a directory displayed by mount), then the default is specified by the mount flag. If the file is a symbolic link, the mode of the target file system applies.

  2. If the file appears to reside on a file system that is not mounted (as can happen when the path contains a drive letter), the default mode is text, except if the CYGWIN environment variable contains binmode.

    Warning!

    In b20.1 only, a file will be opened in binary mode if any of the following conditions hold:

    1. binary mode is specified in the open call

    2. CYGWIN contains binmode

    3. the file resides in a binary mounted partition

  3. Pipes and non-file devices are always opened in binary mode.

  4. When a Cygwin program is launched by a shell, its standard input, output and error are in binary mode if the CYGWIN variable contains tty, else in text mode, except if they are piped or redirected.

    When redirecting, the Cygwin shells uses rules (a-c). For these shells the relevant value of CYGWIN is that at the time the shell was launched and not that at the time the program is executed. Non-Cygwin shells always pipe and redirect with binary mode. With non-Cygwin shells the commands cat filename | program and program < filename are not equivalent when filename is on a text-mounted partition.

Example

To illustrate the various rules, we provide a script to delete CRs from files by using the tr program, which can only write to standard output.

#!/bin/sh
# Remove \r from the files given as arguments
for file in "$@"
do
  CYGWIN=binmode sh -c "tr -d \\\"\\\r\\\" < '$file' > c:tmpfile.tmp"
  if [ "$?" = "0" ]
  then
    rm "$file"
    mv c:tmpfile.tmp "$file"
  fi
done

This works irrespective of the mount because rule b) applies for the path c:tmpfile.tmp. According to rule d) CYGWIN must be set before invoking the shell. These precautions are necessary because tr does not set its standard output to binary mode. It would thus reintroduce \r when writing to a file on a text mounted partition. The desired behavior can also be obtained by using tr -d \r in a .bat file.

Binary or text?

UNIX programs that have been written for maximum portability will know the difference between text and binary files and act appropriately under Cygwin. For those programs, the text mode default is a good choice. Programs included in official Cygnus distributions should work well in the default mode.

Text mode makes it much easier to mix files between Cygwin and Windows programs, since Windows programs will usually use the CRLF format. Unfortunately you may still have some problems with text mode. First, some of the utilities included with Cygwin do not yet specify binary mode when they should, e.g. cat will not work with binary files (input will stop at ^Z, CRs will be introduced in the output). Second, you will introduce CRs in text files you write, which can cause problems when moving them back to a UNIX system.

If you are mounting a remote file system from a UNIX machine, or moving files back and forth to a UNIX machine, you may want to access them in binary mode as the text files found there will normally be NL format anyway, and you would want any files put there by Cygwin programs to be stored in a format that the UNIX machine will understand. Be sure to remove CRs from all Makefiles and shell scripts and make sure that you only edit the files with DOS/Windows editors that can cope with binary mode files.

Note that you can decide this on a disk by disk basis (for example, mounting local disks in text mode and network disks in binary mode). You can also partition a disk, for example by mounting c: in text mode, and c:\home in binary mode.

Programming

In the open() function call, binary mode can be specified with the flag O_BINARY and text mode with O_TEXT. These symbols are defined in fcntl.h.

In the fopen() function call, binary mode can be specified by adding a b to the mode string. There is no direct way to specify text mode.

The mode of a file can be changed by the call setmode(fd,mode) where fd is a file descriptor (an integer) and mode is O_BINARY or O_TEXT. The function returns O_BINARY or O_TEXT depending on the mode before the call, and EOF on error.