OCR Xpress向使用.NET以及ActiveX COM工具包的軟件開(kāi)發(fā)人員提供了快速和準(zhǔn)確的全頁(yè)面視覺(jué)特性識(shí)別(OCR)功能。用OCR Xpress能將全頁(yè)面文本識(shí)別、自動(dòng)輪顯以及創(chuàng)建可搜索的文檔功能添加到你的應(yīng)用程序。該軟件開(kāi)發(fā)工具包同樣支持deskew、binarization、字符位置信息以及文檔到圖片與文本的分割。它支持輸出到多文本以及文本加圖片的格式,包括與Microsoft® Word®兼容的RTF文件以及標(biāo)準(zhǔn)的Adobe®PDF文件。
OCR Xpress delivers fast and accurate full-page optical character recognition (OCR) to software developers in .NET and ActiveX COM toolkits. Use OCR Xpress to add full-page text recognition, auto rotate, and searchable document creation to your application. This software development kit (SDK) also supports deskew, binarization, character position information, and segmentation of documents into image and text elements. It supports output to multiple text and text-plus-image formats including Microsoft® Word®-compatible RTF files and standard Adobe® PDF files.
識(shí)別13種語(yǔ)言的文本:英語(yǔ)、法國(guó)、德語(yǔ)、意大利語(yǔ)、西班牙語(yǔ)、葡萄牙語(yǔ)、丹麥語(yǔ)、荷蘭語(yǔ)、瑞典語(yǔ)、挪威語(yǔ)、匈牙利語(yǔ)、波蘭語(yǔ)以及芬蘭語(yǔ)。OCR Xpress為每一種語(yǔ)言都提供了詞典并且也支持應(yīng)用程序?qū)S玫挠脩糇远x的詞典。
OCR Xpress中的自動(dòng)輪顯功能可檢查圖片里的文本的正確方向并按照正確方向輪顯整個(gè)頁(yè)面。它也可以調(diào)整在掃描過(guò)程中變傾斜的文檔。
字符位置信息允許OCR Xpress的用戶通過(guò)使用OCR Xpress中的NotateXpress控件編校或加亮在原始圖片上的文字。用戶也可以自己創(chuàng)建PDF文件并使用位置信息將隱藏的文本放置到正確的位置。通過(guò)對(duì)每個(gè)字符的識(shí)別信心,OCR Xpress可聯(lián)合其它的OCR引擎進(jìn)行使用,就像使用SmartZone進(jìn)行投票,因此可以提高識(shí)別精確度。
OCR Xpress標(biāo)記出了識(shí)別出的不確定的字符,這樣能讓開(kāi)發(fā)人員在他們的程序中創(chuàng)建文本驗(yàn)證與字符替換功能。這使用戶可以在輸出前重新檢查和修改文本。
OCR Xpress引用了高級(jí)分割功能以標(biāo)記出輸入圖片的位置以及識(shí)別圖片(可保留其顏色)或包含可識(shí)別文本的區(qū)域。可訪問(wèn)不同的區(qū)域以進(jìn)行個(gè)別處理或自動(dòng)合并具備完整格式的文檔。Binarization功能可將彩色文檔轉(zhuǎn)換為黑白文檔以在不影響非文本區(qū)域的情況下提高識(shí)別率。為非文本區(qū)域能再插入到輸出文檔里,它的色彩可被保留。
通過(guò)提供全頁(yè)面的OCR、自動(dòng)輪顯以及可搜索的文本輸出功能,OCR Xpress可對(duì)Pegasus Imaging的產(chǎn)品功能進(jìn)行補(bǔ)充。建議使用Pegasus Imaging的SmartZone產(chǎn)品對(duì)結(jié)構(gòu)完整的表格(zonal OCR)上的英語(yǔ)文本進(jìn)行區(qū)域識(shí)別。在zonal OCR應(yīng)用程序中,可使用OCR Xpress對(duì)歐洲語(yǔ)言進(jìn)行識(shí)別。
包括的控件
每一個(gè)OCR Xpress的版本都使用相同的.NET控制組件以及COM控制組件。按照版本可使用特定的不同功能。
OCR Xpress 專業(yè)版 – 包括OCR Xpress v1組件,還包括ImagXpress Document v8、NotateXpress v8、ThumbnailXpress v1、TwainPRO v4與PrintPRO v3 components
OCR Xpress 標(biāo)準(zhǔn)版 – 除了PDF輸出功能外,具備OCR Xpress專業(yè)版的所有功能。
OCR Xpress delivers fast and accurate full-page optical character recognition (OCR) to software developers in .NET and ActiveX COM toolkits. Use OCR Xpress to add full-page text recognition, auto rotate, and searchable document creation to your application. This software development kit (SDK) also supports deskew, binarization, character position information, and segmentation of documents into image and text elements. It supports output to multiple text and text-plus-image formats including Microsoft® Word®-compatible RTF files and standard Adobe® PDF files.
Recognize text in thirteen languages: English, French, German, Italian, Spanish, Portuguese, Danish, Dutch, Swedish, Norwegian, Hungarian, Polish, and Finnish. OCR Xpress provides a dictionary for each language, and also supports a user-defined dictionary for words that are application-specific.
The auto rotate feature in OCR Xpress detects the correct orientation of the text in an image, and rotates the entire page accordingly. It can also deskew documents that become skewed during the scanning process.
Character position information allows users of OCR Xpress to redact or highlight text in the original image using the included NotateXpress component. Users can also build their own PDF files, using the position information to place the hidden text in the correct location. With the help of reported recognition confidence for each character, OCR Xpress can also be used in conjunction with other OCR engines such as SmartZone to perform voting, thereby improving resulting recognition accuracy.
OCR Xpress flags characters recognized with low confidence, allowing developers to easily build text proofing and character replacement functions into their applications. This enables users to review and make corrections to text prior to output.
OCR Xpress includes advanced segmentation to locate regions of the input image and identify them as either images (whose color can be preserved) or areas containing recognizable text. The various regions can be accessed for individualized processing, or automatically recombined into fully-formatted documents. The binarization function can convert color to black and white documents to improve recognition without affecting non-text regions, which may be retained in full color for reinsertion into the output document.
OCR Xpress complements the Pegasus Imaging product line by offering full-page OCR, auto rotate, and searchable text output capabilities. Pegasus Imaging's SmartZone product is recommended for recognition of English-language text in zones on structured forms (zonal OCR). OCR Xpress can also be used for European-language recognition in zonal OCR applications.
Included Components
Both editions of OCR Xpress use the same set of .NET controls, and COM controls. Access to specific functions is determined by the edition.
- OCR Xpress Professional - Includes the OCR Xpress v1 component, plus ImagXpress Document v8, NotateXpress v8, ThumbnailXpress v1, TwainPRO v4, and PrintPRO v3 components.
- OCR Xpress Standard - All features of OCR Xpress Professional except for PDF output.