將Aspose.Words與Azure Data Lake集成
Aspose.Words是一種高級Word文檔處理API,用于執行各種文檔管理和操作任務。API支持生成,修改,轉換,呈現和打印文檔,而無需在跨平臺應用程序中直接使用Microsoft Word。
Aspose API支持流行文件格式處理,并允許將各類文檔導出或轉換為固定布局文件格式和最常用的圖像/多媒體格式。
Aspose技術交流群(761297826)
Aspose.Words可以與Microsoft Azure Data Lake服務集成:Azure Data Lake Analytics(ADLA)和Azure Data Lake Storage(ADLS)。這允許你將 Azure Data Lake 云存儲解決方案的大數據分析功能與 Aspose.Words 的強大功能相結合,使應用程序能夠以編程方式執行各種文檔處理任務,例如生成、修改、呈現、讀取或轉換不同格式之間的文檔。
本文介紹如何使用 ADLA 在 Visual Studio 中配置 C# 項目,并提供一個演示 Aspose.Words 和 Azure Data Lake 集成的示例。
先決條件
-
Active Microsoft Azure 訂閱。如果您沒有免費帳戶,請在開始之前創建一個免費帳戶。
-
安裝了 Azure 開發的 Visual Studio 2019 或 Visual Studio 2017。
-
安裝了 Azure Data Lake Tools for Visual Studio。
-
使用 ADLA 帳戶配置了 Visual Studio。
使用來自 Azure 數據湖的數據創建文檔
本主題演示如何使用 Aspose.Words 從 Azure Data Lake 上的數據庫生成包含表的文檔。這需要創建一個簡單的數據庫并實現IOutputter接口來創建用戶定義的輸出器,該輸出器以Aspose.Words支持的格式從ADLS輸出數據。
在 Azure 數據湖存儲 (ADLS) 中創建數據庫
出于演示目的,需要創建一個簡單的數據庫,其中包含用于填充結果文檔的示例數據。
客戶示例表駐留在 ADLS 上的sample_db數據庫中。若要創建此示例數據庫,請登錄到 ADLA 帳戶,單擊“新建作業”,然后提交以下腳本:
U-SQL
CREATE DATABASE IF NOT EXISTS sample_db; USE DATABASE sample_db; CREATE SCHEMA IF NOT EXISTS dbo; DROP TABLE IF EXISTS dbo.Customers; CREATE TABLE dbo.Customers ( Customer_id int, Customer_name string, Customer_domain string, Customer_city string, INDEX idx_customer_id CLUSTERED (Customer_id ASC) ) DISTRIBUTED BY RANGE (Customer_id); INSERT INTO sample_db.dbo.Customers (Customer_id, Customer_name, Customer_domain, Customer_city) VALUES (1, "John Smith", "History", "Boston"), (2, "Lisa Jaine", "Chemistry", "LA"), (3, "James Johnson", "Heraldry", "Milwaukee"), (4, "Sara Soyer", "IT", "Miami");
實現 IOutputter 接口
在 Visual Studio 中,通過添加 C# 類庫(對于 U-SQL 應用程序)來創建新項目,并將 NuGet 引用添加到 Aspose.Words。
下面的代碼示例演示如何實現 IOutputter 接口:
using Microsoft.Analytics.Interfaces; using System; using System.IO; using System.Linq; using Aspose.Words; namespace AsposeWordsOutputterUSql { [SqlUserDefinedOutputter(AtomicFileProcessing = true)] public class AsposeWordsOutputer : IOutputter { public AsposeWordsOutputer(SaveFormat saveFormat) { // Pass the specified save format. mSaveFormat = saveFormat; // Create an instance of DocumentBuilder, which will be used to build the document. mDocumentBuilder = new DocumentBuilder(); } /// <summary> /// The Close method is used to write the document to the file. It is executed only once, after all rows. /// </summary> public override void Close() { // End the table. mDocumentBuilder.EndTable(); // The stream passed from IUnstructuredWriter.BaseStream does not support seeking. // This causes an exception when saving to PDF. // To avoid problems, save the output document into MemoryStream first // and then write its content to the IUnstructuredWriter.BaseStream. using (BinaryWriter writer = new BinaryWriter(mOutputStream)) { // Save the document and close the stream. using (MemoryStream ms = new MemoryStream()) { mDocumentBuilder.Document.Save(ms, mSaveFormat); writer.Write(ms.ToArray()); } } } public override void Output(IRow row, IUnstructuredWriter output) { // Table with header row output--runs only once. if (mIsHeaderRow) ProcessHeaderRow(row.Schema); ProcessRow(row); // Reference to the instance of the IO.Stream object for saving document. mOutputStream = output.BaseStream; } /// <summary> /// Create HeaderRow of the table. /// </summary> private void ProcessHeaderRow(ISchema schema) { // Start the table before building it. mDocumentBuilder.StartTable(); // Build the table. for (int i = 0; i < schema.Count(); i++) { IColumn col = schema[i]; mDocumentBuilder.InsertCell(); // Write a header with bold font. mDocumentBuilder.Font.Bold = true; mDocumentBuilder.Write(col.Name); } mDocumentBuilder.EndRow(); // Write data with normal font. mDocumentBuilder.Font.Bold = false; // Table with header row output--runs only once. mIsHeaderRow = false; } /// <summary> /// Create Row of the table. /// </summary> private void ProcessRow(IRow row) { // Metadata schema initialization to enumerate column names. ISchema schema = row.Schema; // Data row output. for (int i = 0; i < schema.Count(); i++) { IColumn col = schema[i]; string val = ""; Type type = col.Type; // Get the cell value in the current row by column name and cast it to the column type. if (type == typeof(string)) val = row.Get<string>(col.Name); else if (type == typeof(int)) val = row.Get<int>(col.Name).ToString(); else val = "Column type is not supported."; mDocumentBuilder.InsertCell(); mDocumentBuilder.Write(val); } mDocumentBuilder.EndRow(); } private readonly DocumentBuilder mDocumentBuilder; private readonly SaveFormat mSaveFormat; private Stream mOutputStream; private bool mIsHeaderRow = true; static AsposeWordsOutputer() { // Note: The Aspose.Words license needs to be applied only once before any Document instance is created. // To execute the code only once, a static constructor is used. The below code will find and activate the license. // Uncomment the following code and add your license file as an embedded resource in the project. // Aspose.Words.License lic = new Aspose.Words.License(); // lic.SetLicense("Aspose.Words.lic"); } } }
請注意上面代碼示例中描述的許可細微差別。
在 Azure 數據湖分析 (ADLA) 中注冊程序集
若要將項目的 C# 類庫與 ADLA 帳戶集成,請將程序集注冊到 ADLA 帳戶:
- 在 Visual Studio 中,右鍵單擊項目名稱,然后選擇“注冊程序集”。
- 選擇 ADLA 帳戶名稱和數據庫名稱。
- 展開“托管依賴項”面板并選中 Aspose.Words,如下面的屏幕截圖所示。

在 Azure 門戶中運行 U-SQL 作業
若要啟動應用程序,需要在 ADLA 中運行以下 U-SQL 代碼,該代碼包含必要的引用并調用用戶定義的輸出器:
U-SQL
USE DATABASE [sample_db];
REFERENCE ASSEMBLY AsposeWordsOutputterUSQL; REFERENCE ASSEMBLY [Aspose.Words]; @test = SELECT * FROM dbo.Customers; OUTPUT @test TO "/output/Customers_AW.docx" USING new AsposeWordsOutputterUSql.AsposeWordsOutputer(Aspose.Words.SaveFormat.Docx);
您可以使用適用于特定項目的各種格式輸出文檔,例如 Docx、Doc、Pdf、Rtf、文本、Jpeg 等。有關詳細信息,請參閱保存格式枚舉。
在 ADLS 的輸出文件夾中找到該文件并下載它。

以下屏幕截圖顯示了執行應用程序后輸出文檔的外觀。
