`

java html串转换成文本串

阅读更多
采用htmlparser 来解决将html串中抽取出文本串。


String str = "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\">" +
"<HTML><HEAD>" +
"<META http-equiv=Content-Type content=\"text/html; charset=gb2312\">" +
"<META content=\"MSHTML 6.00.6000.17095\" name=GENERATOR><LINK " +
"href=\"BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em}\"" +
"rel=stylesheet></HEAD>" +
"<BODY style=\"FONT-SIZE: 10pt; MARGIN: 10px; FONT-FAMILY: verdana\">" +
"<DIV><FONT face=Verdana size=2>helll,测试邮件</FONT></DIV>" +
"<DIV><FONT face=Verdana size=2></FONT>&nbsp;</DIV>" +
"<DIV align=left><FONT face=Verdana color=#c0c0c0 size=2>2011-03-03 " +
"</FONT></DIV><FONT face=Verdana size=2>"+
"<HR style=\"WIDTH: 122px; HEIGHT: 2px\" align=left SIZE=2>"+

"<DIV><FONT face=Verdana color=#c0c0c0 size=2><SPAN>shopeye7</SPAN> " +
"</FONT></DIV></FONT></BODY></HTML>" ;

System.out.println(StringUtil.html2Str(str));

效果:
helll,测试邮件 2011-03-03 shopeye7


方法:
/**
* @param html
* @return
*/
public static String html2Str(String html) {
try {
html = nvl(html);
Parser parser = Parser.createParser(html, "utf-8");
TextExtractingVisitor visitor = new TextExtractingVisitor();
parser.visitAllNodesWith(visitor);
return visitor.getExtractedText();
} catch (Exception ex) {
return null;
}
}
  • lib.rar (300.7 KB)
  • 下载次数: 206
分享到:
评论
2 楼 任楚娴 2016-10-06  
你好,请问html = nvl(html); 这句中的nvl(html)调用的是什么方法?
1 楼 legends 2014-02-12  
试试,谢谢

相关推荐

Global site tag (gtag.js) - Google Analytics