1
version: 3.0.5
读取某Excel 报了如下的错
Caused by: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xffff) was found in the element content of the document.
at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1243)
at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at com.alibaba.excel.analysis.v07.XlsxSaxAnalyser.parseXmlSource(XlsxSaxAnalyser.java:197)
看了下解压缩后的xml(xl/sharedStrings.xml) 其中包含字符 <ffff>
但用WPS是可以打开这个excel的 参考https://stackoverflow.com/questions/5742543/an-invalid-xml-character-unicode-0xc-was-found 尝试了如下的方案 均未能解决
尝试一:
byte[] bytes = Files.readAllBytes(path);
String str = new String(bytes);
String processedStr = stripNonValidXMLCharacters(str);
final InputStream inputStream = new ByteArrayInputStream(processedStr.getBytes());
EasyExcel.read(inputStream, ExcelData.class, readListener).excelType(ExcelTypeEnum.XLSX).sheet().doRead();
public String stripNonValidXMLCharacters(String in) {
StringBuilder out = new StringBuilder(); // Used to hold the output.
char current; // Used to reference the current character.
if (in == null || ("".equals(in))) return ""; // vacancy test.
for (int i = 0; i < in.length(); i++) {
current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught here; it should not happen.
if ((current == 0x9) ||
(current == 0xA) ||
(current == 0xD) ||
((current >= 0x20) && (current <= 0xD7FF)) ||
((current >= 0xE000) && (current <= 0xFFFD)) ||
((current >= 0x10000) && (current <= 0x10FFFF)))
out.append(current);
}
return out.toString();
}
报错:
Caused by: java.util.zip.ZipException: Unexpected record signature: 0XEF524B50
at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:260)
at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.getNextEntry(ZipArchiveThresholdInputStream.java:141)
尝试二:
byte[] bytes = Files.readAllBytes(path);
String str = new String(bytes);
Pattern p = Pattern.compile("[^\\u0009\\u000A\\u000D\\u0020-\\uD7FF\\uE000-\\uFFFD\\u10000-\\u10FFF]+");
String processedStr = p.matcher(str).replaceAll("");
final InputStream inputStream = new ByteArrayInputStream(processedStr.getBytes());
EasyExcel.read(inputStream, ExcelData.class, readListener).excelType(ExcelTypeEnum.XLSX).sheet().doRead();
报错
Caused by: java.util.zip.ZipException: Unexpected record signature: 0XEF524B50
at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:260)
at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.getNextEntry(ZipArchiveThresholdInputStream.java:141)
故想问下 这个问题 如何解决?