Aperture这个Java框架能够从各种各样的资料系统(如:文件系统、Web站点、IMAP和Outlook邮箱)或存在这些系统中的文件(如:文档、图片)爬取和搜索其中的全文本内容与元数据。它当前支持的文件格式如下:
Plain text
HTML, XHTML
XML
PDF (Portable Document Format)
RTF (Rich Text Format)
Microsoft Office: Word, Excel, Powerpoint, Visio, Publisher
Microsoft Works
OpenOffice 1.x: Writer, Calc, Impress, Draw
StarOffice 6.x - 7.x+: Writer, Calc, Impress, Draw
OpenDocument (OpenOffice 2.x, StarOffice 8.x)
Corel WordPerfect, Quattro, Presentations
Emails (.eml files) http://aperture.sourceforge.net/
|