今天在.Net Core中对外网新闻进行爬虫抓取,最初抓取到的新闻中出现了乱码,后来通过GB2312进行了编码,结果报错“ Unhandled Exception: System.ArgumentException: 'GB2312' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.”,解决方法如下
为了解决中文乱码问题,使用了如下代码
byte[] response1 = await client.GetByteArrayAsync(url1);
string temp = Encoding.GetEncoding("GB2312").GetString(response1);
Unhandled Exception: System.ArgumentException: 'GB2312' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.
1、在NuGet包中安装包System.Text.Encoding.CodePages
2、在使用编码方法(Encoding.GetEncoding("GB2312"))之前,对编码进行注册( Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);),代码如下
var url1 = "需要抓取新闻列表的url";
//以byte[]获取html
byte[] response1 = await client.GetByteArrayAsync(url1);
//将byte[]重新编码成GB2312;
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
string temp = Encoding.GetEncoding("GB2312").GetString(response1);
修改完成后重新编译成功
前言今天在.Net Core中对外网新闻进行爬虫抓取,最初抓取到的新闻中出现了乱码,后来通过GB2312进行了编码,结果报错“Unhandled Exception: System.ArgumentException: 'GB2312' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvi.
在
.Net
Core
中
使用
XElement解析
GB2312
编码
的xml文件,
代码如下:
stri
ng
xmlp = "G:\\test\\content.xml";
XElement xe = XElement.Load(xmlp);
var tittle = xe.Element("TITLE").Value.Trim();
var author = xe.Element("AUTHOR").Value.Trim();
报错
如下:
‘
GB2312
’ is not a
supported
.
System.Text.
Encoding
.Get
Encoding
("
GB2312
")
System.Text.
Encoding
.Get
Encoding
("GBK")
会抛出异常:
Unhandled Exception: System.ArgumentException: '
GB2312
' is not a su...
Unhandled Exception: System.ArgumentException: '
GB2312
' is not a
supported
encoding
name. For
information
on
def
ini
ng
a
cus
tom
encoding
, see the documentation for the
Encoding
.RegisterProvider method.