GNU Emacs Lisp Reference Manual

Go to the first, previous, next, last section, table of contents.

Explicit Encoding and Decoding

All the operations that transfer text in and out of Emacs have the ability to use a coding system to encode or decode the text. You can also explicitly encode and decode text using the functions in this section.

The result of encoding, and the input to decoding, are not ordinary text. They are "raw bytes"---bytes that represent text in the same way that an external file would. When a buffer contains raw bytes, it is most natural to mark that buffer as using unibyte representation, using set-buffer-multibyte (see section Selecting a Representation), but this is not required. If the buffer's contents are only temporarily raw, leave the buffer multibyte, which will be correct after you decode them.

The usual way to get raw bytes in a buffer, for explicit decoding, is to read them from a file with insert-file-contents-literally (see section Reading from Files) or specify a non-nil rawfile argument when visiting a file with find-file-noselect.

The usual way to use the raw bytes that result from explicitly encoding text is to copy them to a file or process--for example, to write them with write-region (see section Writing to Files), and suppress encoding for that write-region call by binding coding-system-for-write to no-conversion.

Raw bytes sometimes contain overlong byte-sequences that look like a proper multibyte character plus extra bytes containing trailing codes. For most purposes, Emacs treats such a sequence in a buffer or string as a single character, and if you look at its character code, you get the value that corresponds to the multibyte character sequence--the extra bytes are disregarded. This behavior is not quite clean, but raw bytes are used only in limited places in Emacs, so as a practical matter problems can be avoided.

Function: encode-coding-region start end coding-system: This function encodes the text from start to end according to coding system coding-system. The encoded text replaces the original text in the buffer. The result of encoding is "raw bytes," but the buffer remains multibyte if it was multibyte before.

Function: encode-coding-string string coding-system: This function encodes the text in string according to coding system coding-system. It returns a new string containing the encoded text. The result of encoding is a unibyte string of "raw bytes."

Function: decode-coding-region start end coding-system: This function decodes the text from start to end according to coding system coding-system. The decoded text replaces the original text in the buffer. To make explicit decoding useful, the text before decoding ought to be "raw bytes."

Function: decode-coding-string string coding-system: This function decodes the text in string according to coding system coding-system. It returns a new string containing the decoded text. To make explicit decoding useful, the contents of string ought to be "raw bytes."

Go to the first, previous, next, last section, table of contents.