添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams
File "/usr/lib/python3.1/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 805: invalid start byte

Hi, I get this exception. How do I catch it, and continue reading my files when I get this exception.

My program has a loop that reads a text file line-by-line and tries to do some processing. However, some files I encounter may not be text files, or have lines that are not properly formatted (foreign language etc). I want to ignore those lines.

The following is not working

for line in sys.stdin:
   if line != "":
         matched = re.match(searchstuff, line, re.IGNORECASE)
         print (matched)
      except UnicodeDecodeError, UnicodeEncodeError:
         continue
                There's an entire CHAPTER in the Python tutorial dedicated to errors and exceptions. Try there. docs.python.org/tutorial/errors.html
– Ignacio Vazquez-Abrams
                Dec 29, 2010 at 12:59
                Yeah, I get it. I'm not asking whether Python has features related to errors and exceptions. I am using try, except statements, but these codec decode errors are not getting excepted, resulting in failed jobs.
– Deepak
                Dec 29, 2010 at 13:01

Look at http://docs.python.org/py3k/library/codecs.html. When you open the codecs stream, you probably want to use the additional argument errors='ignore'

In Python 3, sys.stdin is by default opened as a text stream (see http://docs.python.org/py3k/library/sys.html), and has strict error checking.

You need to reopen it as an error-tolerant utf-8 stream. Something like this will work:

sys.stdin = codecs.getreader('utf8')(sys.stdin.detach(), errors='ignore')
                Not... quite. codecs.open() isn't really needed in 3.x, since its capabilities are now a part of open().
– Ignacio Vazquez-Abrams
                Dec 29, 2010 at 13:11
                I'm not calling codec.open() or any function in the codec module. I think it's getting called by re.search()
– Deepak
                Dec 29, 2010 at 13:13
                Thanks Ignacio, the problems are caused because sys.stdin is opened as an error-intolerant utf-8 stream by default in py3k. I've patched my answer.
– user97370
                Dec 29, 2010 at 13:30
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.