All the operations that transfer text in and out of Emacs have the ability to use a coding system to encode or decode the text. You can also explicitly encode and decode text using the functions in this section.
The result of encoding, and the input to decoding, are not ordinary
text. They are "raw bytes"---bytes that represent text in the same
way that an external file would. When a buffer contains raw bytes, it
is most natural to mark that buffer as using unibyte representation,
using set-buffer-multibyte
(see section Selecting a Representation),
but this is not required. If the buffer's contents are only temporarily
raw, leave the buffer multibyte, which will be correct after you decode
them.
The usual way to get raw bytes in a buffer, for explicit decoding, is
to read them from a file with insert-file-contents-literally
(see section Reading from Files) or specify a non-nil
rawfile
argument when visiting a file with find-file-noselect
.
The usual way to use the raw bytes that result from explicitly
encoding text is to copy them to a file or process--for example, to
write them with write-region
(see section Writing to Files), and
suppress encoding for that write-region
call by binding
coding-system-for-write
to no-conversion
.
Raw bytes sometimes contain overlong byte-sequences that look like a proper multibyte character plus extra bytes containing trailing codes. For most purposes, Emacs treats such a sequence in a buffer or string as a single character, and if you look at its character code, you get the value that corresponds to the multibyte character sequence--the extra bytes are disregarded. This behavior is not quite clean, but raw bytes are used only in limited places in Emacs, so as a practical matter problems can be avoided.
Go to the first, previous, next, last section, table of contents.