Apache OpenNLP 1.8.0 发布了,OpenNLP 是一个机器学习工具包,用于处理自然语言文本。支持大多数常用的 NLP 任务,例如:标识化、句子切分、部分词性标注、名称抽取、组块、解析等。 此版本带来了许多新功能、改进和错误修复。API 已经得到改进以获得更好的一致性,并且删除了许多不被赞同的方法。
更新如下:
POS Tagger context generator now supports feature generation XML Add a Name Finder feature generator that adds POS Tag features Add CONLL-U format support Improve default Name Finder settings TokenNameFinderEvaluator CLI now support nameTypes argument Stupid backoff is now the default in NGramLanguageModel Language codes now are ISO 639-3 compliant Add many unit tests Distribution package now includes example parameters file Now prefix and suffix feature generators are configurable Remove API in Document Categorizer for user specified tokenizer Learnable lemmatizer now returns all possible lemmas for a given word and pos tag Lemmatizer
API backward compatibility break: no need to encode/decode lemmas
anymore, now LemmatizerME lemmatize method returns the actual lemma Add stemmer, detokenizer and sentence detection abbreviations for Irish Chunker SequenceValidator signature changed to allow access to both token and POS tag
下载地址: https://opennlp.apache.org/download.html |