Pentaho Data Integration支持準備及融合數據,為您的業務創建一幅完整的畫面,以進行更好的分析。完整的數據集成平臺為任何來源的終端用戶提供精確的,可實時分析的數據。由于可視化工具消除了編碼并減小了復雜度,Pentaho將大數據和所有的數據源放在了商業和IT用戶最容易獲得的位置。
Pentaho data integration prepares and blends data to create a complete picture of your business that drives actionable insights. The complete data integration platform delivers accurate, "analytics ready" data to end users from any source. With visual tools to eliminate coding and complexity, Pentaho puts Big Data and all data sources at the fingertips of business and IT users alike.
針對拖拽式開發的簡單可視化設計器
開發人員使用可視化工具能最大限度的縮減代碼,并且達到更高的效率。

拖拽可視化設計方法
- 圖形提取-轉換-加載(ETL)工具,以常規方式來加載和處理大數據源。
- 豐富的預建組件庫能訪問和轉換來自廣泛數據源的數據。
- 可視化界面調用自定義代碼,分析圖像和視頻文件以創建有意義的元數據。
- 動態轉換,使用變量決定映射域,驗證和改進規則。
- 集成調試器用以檢測和調試任務執行過程。
零編碼要求的大數據集成
Pentaho直觀的工具加速了大數據分析方案的設計、開發和部署,速度提升了高達15倍。

大數據集成變得很容易
- 完整的可視化開發工具消除了SQL編碼或編寫MapReduce Java函數。
- 通過本地支持的Hadoop、NoSQL和分析數據庫可廣泛的鏈接到任何類型數據或數據源。
- 并行處理引擎確保高效的性能和企業可擴展性。
- 支持提取和融合現有的多元數據,以生成高質量的實時分析數據。
本地靈活支持所有大數據源
深層本地連接和自適應大數據數據層的結合,加速了對領先的Hadoop分布,NoSQL數據庫以及其他大數據源的訪問。

最廣泛和最深層次的大數據支持
- 支持從Cloudera,Hortonworks、MapR到Intel等最新的Hadoop分布。
- 包含針對Cassandra、MongoDB等NoSQL數據庫的插件,也可以連接到Amazon Redshift和 Splunk等專業的數據商店。
- 當使用新的版本和功能時,自適應大數據層為企業節省了大量的開發時間。
- 高度的靈活性,降低了大數據體系變化所帶來的風險和孤立區。
- 反饋和分析增加的用戶和機器數據的數量,包括網頁內容、文檔、社交媒體和日志文件。
- 通過靈活的集群分布,可以將Hadoop數據任務集成到全面的IT/ETL/BI解決方案中。
- 支持并行批量數據加載工具,以高效的加載數據。
強大的管理
包含簡單實時可用的功能,可完成大數據集成項目等相關操作。

易于使用的進度管理
- 管理用戶和任務的安全權限。
- 從最近成功檢查點上重啟任務,并從當前失敗中回滾作業執行。
- 集成了LDAP和Active Directory中現有的的安全術語。
- 設置用戶的操作權限: 讀取、執行或創建。
- 進度數據集成過程實現了有序的流程管理。
- 監測和分析數據集成處理的性能。
數據剖析數據質量信息
剖析數據,并結合完整的數據管理功能保證了數據的質量。

數據質量管理
- 識別不遵守商業規則和標準的數據。
- 規范、驗證和清除不一致的或冗余的數據。
- 借助人類推理和Melissa數據進行數據質量管理。

Simple Visual Designer for Drag and Drop Development
Empower developers with visual tools to minimize coding and achieve greater productivity.

Drag and Drop Visual Design Approach
- Graphical extract-transform-load (ETL) tool to load and process big data sources in familiar ways.
- Rich library of pre-built components to access and transform data from a full spectrum of sources.
- Visual interface to call custom code, analyze images and video files to create meaningful metadata.
- Dynamic transformations, using variables to determine field mappings, validation and enrichment rules.
- Integrated debugger for testing and tuning job execution.
Big Data Integration with Zero-Coding Required
Pentaho's intuitive tools accelerate the time it takes to design, develop and deploy big data analytics by as much as 15x.

Big Data Integration made easy
- Complete visual development tools eliminate coding in SQL or writing MapReduce Java functions.
- Broad connectivity to any type or source of data with native support for Hadoop, NoSQL and analytic databases.
- Parallel processing engine to ensure high performance and enterprise scalability.
- Extract and blend existing and diverse data to produce consistent high quality ready-to-analyze data.
Native and Flexible Support for all Big Data Sources
A combination of deep native connections and an adaptive big data data layer ensures accelerated access to the leading Hadoop distributions, NoSQL databases, and other big data stores.

Broadest and Deepest Big Data Support
- Support for latest Hadoop distributions from Cloudera, Hortonworks, MapR and Intel.
- Simple plugins to NoSQL databases such as Cassandra and MongoDB, as well as connections to specialized data stores like Amazon Redshift and Splunk.
- Adaptive big data layer saves enterprises considerable development time as they leverage new versions and capabilities.
- Greater flexibility, reduced risk, and insulation from changes in the big data ecosystem.
- Reporting and analysis on growing amounts of user and machine generated data, including web content, documents, social media and log files.
- Integration of Hadoop data tasks into overall IT/ETL/BI solutions with scalable distribution across the cluster.
- Support for parallel bulk data loader utilities for loading data with maximum performance.
Powerful Administration and Management
Simplified out-of-the-box capabilities to manage the operations in a data integration project.

Easy to Use Schedule Management
- Manage security privileges for users and roles.
- Restart jobs from last successful checkpoint and roll back job execution on failure.
- Integrate with existing security definitions in LDAP and Active Directory.
- Set permissions to control user actions: read, execute or create.
- Schedule data integration flows for organized process management.
- Monitor and analyze the performance of data integration processes.
Data Profiling and Data Quality
Profile data and ensure data quality with comprehensive capabilities for data managers.

Data Quality Management
- Identify data that fails to comply with business rules and standards.
- Standardize, validate, de-duplicate and cleanse inconsistent or redundant data.
- Manage data quality with partners such as Human Inference and Melissa Data.