添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

在我使用clion编写.c程序时,若.c文件使用utf-8保存,用printf打印中文会出现乱码(无论是在clion的控制台中还是在windows命令行中都是这样),对输出的乱码进行分析得知这是由于程序输出的utf-8编码的信息被控制台用gbk编码解码导致的。

是否是因为windows控制台默认使用gbk编码格式解码导致的?

我尝试过把.c文件改为用gbk格式编码,确实在我的电脑上不出现乱码了,但是可以预见这样生成的程序在其他国家和地区的电脑上会因为他们的windows系统没有使用gbk编码导致在他们的电脑上出现乱码。为了更好的兼容性还是得使用unicode。

在mac os 上尝试使用clion,结果是不会出现乱码,这应该是mac os默认使用utf-8编码导致的。因此,我尝试开启windows的“使用Unicode UTF-8进行全球语言支持”功能,的确不再出现乱码了,但是该选项在windows上是默认不开启的。这意味着如果把开发的电脑开启unicode支持,utf-8编码的程序运行会正常,但是在其他电脑上运行由于没有开启依然会报错。而且开启该选项会导致一些已安装的软件出现乱码。

那么,我应该如何解决这个问题?使用c开发程序的企业是如何解决这一问题的?

此外,不知为什么,如下图,使用utf-8编码的python程序在windows控制台输出中文并不会出现乱码。为什么会这样?是否可以把python程序不会出现乱码的原理用在c上?

另外,根据下面AI生成的回答,我尝试了使用utf-16编码存储代码,甚至编译都出错了。

Hi, @ AliceDrop

For Windows Console , you can use SetConsoleOutputCP.
And compile the program with utf-8.

#include<stdio.h>
#include<Windows.h>
int main()
	SetConsoleOutputCP(65001);
	printf("你好");

Best regards,

Minxin Yu

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Thank you for you answer.

I learned elsewhere that adding "system("chcp 65001");" to void main() can also solve this problem. What is the difference between them , and is there anything I need to pay attention to when using them?

In addition, can I solve this problem in Linux and Mac OS environments by the these way ?

chcp is one command on the Windows command line, and it works similar to

SetConsoleCP(input), SetConsoleOutputCP(output) . Changes the active console code page.

These only apply to Windows. For Linux and Mac OS, it is recommended to go to the relevant forums for help.

The issue of printing Chinese characters as gibberish when using printf in a C program saved in UTF-8 format is due to the Windows console using the GBK encoding to decode the UTF-8 encoded information output by the program. One solution is to change the code page of the console to 65001 or CP_UTF8 using the SetConsoleOutputCP and SetConsoleCP methods before running the program. Another solution is to use UTF-16 instead of UTF-8 for output, which can be done easily with the W variant of all console APIs. It is important to note that the default "cooked" modes on input do not fully support UTF-8 yet, so the workaround is to use the algorithmically-translatable UTF-16 for reading input through ReadConsoleW or ReadConsoleInputW until the outstanding issues are resolved.

As for how enterprises solve this issue, they may have their own internal standards or best practices for handling character encoding in their C programs.

Regarding the question of why a Python program using UTF-8 encoding does not produce gibberish when printing Chinese characters in the Windows console, it is likely due to the fact that Python's print function automatically encodes Unicode strings to the console's current code page, whereas printf in C does not.

References:

  • Classic Console APIs versus Virtual Terminal Sequences - Cross-Platform Support
  •