Fill DOCX with XML data using Powershell

Given a Microsoft DOCX document (“template”) sprinkled with @@<tag> constructs, and, XML data that includes elements <tag>, the script below makes a copy of template and replaces, preserving style, the @@<tag> occurances with the XML element contents. The XML data may be raw, a file or a URL.

PS> .\filldocx.ps1 -template template.docx -xml 'https://www.w3schools.com/xml/plant_catalog.xml' -destfile foo.docx
param ([string]$template = '.\template.docx', [string]$xml = 'value', [string]$destfile = 'document.docx')

function isURIWeb($address) {
	$uri = $address -as [System.URI];
	$null -ne $uri.AbsoluteURI -and $uri.Scheme -match '[http|https]';
}

# Load XML
if (Test-Path -Path $xml -IsValid) {
    $data = [XML] (Get-Content -Path $xml -Encoding UTF8);
} elseif (isURIWeb($xml)) {
    $response = Invoke-WebRequest $xml;
    $data = [XML] $response.Content;
} else {
    $data = [XML] "$xml";
}

Copy-Item -Force -Path $template -Destination $destfile;

Add-Type -Assembly 'System.IO.Compression.FileSystem';

$zipFile = [System.IO.Compression.ZipFile]::Open($destfile, 'Update');
$doc = $zipFile.GetEntry('word/document.xml').Open();

$buffer = [System.Array]::CreateInstance([byte], $doc.Length);
$doc.Read($buffer, 0, $doc.Length) | Out-Null;
$body = [System.Text.Encoding]::UTF8.GetString($buffer);
foreach($name in ([RegEx]::new('@@([a-zA-Z]\w*)')).Matches($body) | Sort-Object { $_.value.length } -Descending | Get-Unique) {
    $value = $data.getElementsByTagName($name.Groups[1].value);
    if ($value.Count -gt 0) {
        $body = $body.Replace($name.Value, $value[0].InnerXml);
    } else {
        $body = $body.Replace($name.Value, '***');
    }
}
$buffer = [System.Text.Encoding]::UTF8.GetBytes($body);

$doc.Seek(0, [System.IO.SeekOrigin]::Begin) | Out-Null;
$doc.SetLength($buffer.Length);
$doc.Write($buffer, 0, $buffer.Length);
$doc.Close();

$zipFile.Dispose();

The code was written in order to give non programmers an intuitive way of creating automatic fillable templates, they are invited to create a DOCX document sprinkled with @@<tags> as they best see fit and are asked either to choose or use predefined understandable canonical names in forming the @@<tag> constructs; these templates are then filled on request in an intranet context with data fetched from a web service.

Note that only the first element with an given tag is used.

 Last update 2020-02-20 04:38