# Coding Systems in shell-command

## Problem

The day before yesterday, I looked into the Markdown menu of Emacs and saw “Preview” and “Export” commands while I was editing one markdown document, then, I just clicked, and opened the exported HTML file, but found out that almost all my contents were emptied, that document were written in Chinese.

I had another separate Emacs instance running on a terminal, I do the same thing there, the non-ASCII characters were well outputted.

## Solution

Before, I always used markdown directly by shell-command, and it worked well. I doubt there might be something wrong in the encoding functions of markdown-mode. Then I checked.

Below is the markdown function definition in markdown-mode.

The cond expression has two branches, the first one executes markdown using shell-mode directly on the file of current buffer, the second one instead uses shell-command-on-region on a region marked in current buffer.

If markdown-mode was written by me, I would always use the shell-command-on-region, because able to preview part of a markdown document is definitely a good user experience. I was right, the markdown-command-needs-filename is set as nil by default.

I wanted to fix my encoding problem, so, I set it to t, and then “preview” again, non-ASCII characters showed up in exported HTML.

But I didn’t want to stop there, since it’s a workaround, workaround smells.

There might be some differences on encoding between shell-command and shell-command-on-region which caused this issue.

Stepped into shell-command function, I saw above comments. Then tracing into coding-system-for-read,

It looks for coding system from file-coding-system-alist at first, below are mine.

All my documents are uft-8 encoded, so, the output of shell-command are encoded as uft-8 format, too.

shell-command-on-region is different, below are the comments of it.

Instead, it uses process-coding-system-alist as encoding options, in my machine, it’s nil. So, it searches encoding options in default-process-coding-system.

In my machine the value of default-process-coding-system is

(undecided-unix . iso-latin-1-unix)

but in the terminal Emacs instance, it’s (utf-8 . utf-8), that’s why I get right output there.

Added below line to my dot emacs, problem was solved.

(setq default-process-coding-system '(utf-8 . utf-8))

Or,

(set-language-environment "UTF-8")

I discussed this problem with Xah Lee, he mentioned default-process-coding-system in his environment is utf-8, probably because he set set-language-environment to utf-8, and I did a test, it worked. I prefer this setting, because it looks like environmentally global. :)

## Complaint

Emacs supports nearly all encoding formats, there are dozens of (default) coding systems of different modes need to be set.

Imagine a package relies on one built-in function, and this function looks for coding system from some built-in variables, but a user may even do not know these variables exist, it will be hard for him to find what happened when he get a messy output.

As an amazing good editor or an operating system :), supports all kinds of encoding formats is needed, but to be a modern editor, people don’t need to know what’s under the hood every time he’s about to configure something.

We could move this responsibility to Packages that people use (some people call them plugins or extensions).

For example, we can set the output coding system to be the same as the input before executing some built-in functions which support different kinds of coding systems.

We can provide options for people to set coding system in the package they use, not to figure out what’s under the hood.

Even, we can just restrict the coding system in utf-8, leave nothing to worry about by people who uses the package.

I think all these are called as ABSTRACTION.