添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

While porting code from Python 2 to Python 3, I run into this problem when reading UTF-8 text from standard input. In Python 2, this works fine:

for line in sys.stdin:

But Python 3 expects ASCII from sys.stdin, and if there are non-ASCII characters in the input, I get the error:

UnicodeDecodeError: 'ascii' codec can't decode byte .. in position ..: ordinal not in range(128)

For a regular file, I would specify the encoding when opening the file:

with open('filename', 'r', encoding='utf-8') as file:
    for line in file:

But how can I specify the encoding for standard input? Other SO posts (e.g. How to change the stdin encoding on python) have suggested using

input_stream = codecs.getreader('utf-8')(sys.stdin)
for line in input_stream:

However, this doesn't work in Python 3. I still get the same error message. I'm using Ubuntu 12.04.2 and my locale is set to en_US.UTF-8.

@RaymondHettinger Given that Python 2 and Python 3 have very different answers here, these aren't duplicate questions. Duplicate questions would have the same answers. – Gilles 'SO- stop being evil' Jun 5, 2019 at 17:59

Python 3 does not expect ASCII from sys.stdin. It'll open stdin in text mode and make an educated guess as to what encoding is used. That guess may come down to ASCII, but that is not a given. See the sys.stdin documentation on how the codec is selected.

Like other file objects opened in text mode, the sys.stdin object derives from the io.TextIOBase base class; it has a .buffer attribute pointing to the underlying buffered IO instance (which in turn has a .raw attribute).

Wrap the sys.stdin.buffer attribute in a new io.TextIOWrapper() instance to specify a different encoding:

import io
import sys
input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')

Alternatively, set the PYTHONIOENCODING environment variable to the desired codec when running python.

From Python 3.7 onwards, you can also reconfigure the existing std* wrappers, provided you do it at the start (before any data has been read):

# Python 3.7 and newer
sys.stdin.reconfigure(encoding='utf-8')
                @bukzor: Next option: open the file descriptor directly with io.open(); 0 is stdin: io.open(0) returns a TextIOWrapper() object.
– Martijn Pieters
                Dec 16, 2013 at 21:51
                @MartijnPieters: That works pretty great! Thanks! Whole script: paste.pound-python.org/show/xoUPpsfFhtKssXBzLxBd  Deleting my previous failures.
– bukzor
                Dec 17, 2013 at 1:53
                @Suncatcher: IDLE is the IDE here, and has replaced the standard sys.stdout object with a custom object. That class is part of the IDLE internal implementation, not a standard library class.
– Martijn Pieters
                Dec 24, 2020 at 16:38
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.