The day before yesterday, I looked into the Markdown menu of Emacs and saw “Preview” and “Export” commands while I was editing one markdown document, then, I just clicked, and opened the exported HTML file, but found out that almost all my contents were emptied, that document were written in Chinese.
I had another separate Emacs instance running on a terminal, I do the same thing there, the non-ASCII characters were well outputted.
Before, I always used
markdown directly by
shell-command, and it worked well. I doubt there might be something wrong in the encoding functions of
markdown-mode. Then I checked.
Below is the
markdown function definition in
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
cond expression has two branches, the first one executes
shell-mode directly on the file of current buffer, the second one instead uses
shell-command-on-region on a region marked in current buffer.
markdown-mode was written by me, I would always use the
shell-command-on-region, because able to preview part of a markdown document is definitely a good user experience. I was right, the
markdown-command-needs-filename is set as
nil by default.
I wanted to fix my encoding problem, so, I set it to
t, and then “preview” again, non-ASCII characters showed up in exported HTML.
But I didn’t want to stop there, since it’s a workaround, workaround smells.
There might be some differences on encoding between
shell-command-on-region which caused this issue.
1 2 3 4 5
shell-command function, I saw above comments. Then tracing into
1 2 3 4 5 6 7 8 9 10
It looks for coding system from
file-coding-system-alist at first, below are mine.
1 2 3 4 5 6 7
All my documents are
uft-8 encoded, so, the output of
shell-command are encoded as
uft-8 format, too.
shell-command-on-region is different, below are the comments of it.
1 2 3 4 5 6
Instead, it uses
process-coding-system-alist as encoding options, in my machine, it’s
nil. So, it searches encoding options in
In my machine the value of
(undecided-unix . iso-latin-1-unix)
but in the terminal Emacs instance, it’s
(utf-8 . utf-8), that’s why I get right output there.
Added below line to my dot emacs, problem was solved.
(setq default-process-coding-system '(utf-8 . utf-8))
I discussed this problem with Xah Lee, he mentioned
default-process-coding-system in his environment is
utf-8, probably because he set
utf-8, and I did a test, it worked. I prefer this setting, because it looks like environmentally global. :)
Emacs supports nearly all encoding formats, there are dozens of (default) coding systems of different modes need to be set.
Imagine a package relies on one built-in function, and this function looks for coding system from some built-in variables, but a user may even do not know these variables exist, it will be hard for him to find what happened when he get a messy output.
As an amazing good editor or an operating system :), supports all kinds of encoding formats is needed, but to be a modern editor, people don’t need to know what’s under the hood every time he’s about to configure something.
We could move this responsibility to Packages that people use (some people call them plugins or extensions).
For example, we can set the output coding system to be the same as the input before executing some built-in functions which support different kinds of coding systems.
We can provide options for people to set coding system in the package they use, not to figure out what’s under the hood.
Even, we can just restrict the coding system in
utf-8, leave nothing to worry about by people who uses the package.
I think all these are called as ABSTRACTION.