Fill DOCX with XML data using Powershell
Given a Microsoft DOCX document (“template”) sprinkled with @@<tag> constructs, and, XML data that includes elements <tag>, the script below makes a copy of template and replaces, preserving style, the @@<tag> occurances with the XML element contents. The XML data may be raw, a file or a URL.
PS> .\filldocx.ps1 -template template.docx -xml 'https://www.w3schools.com/xml/plant_catalog.xml' -destfile foo.docx
param ([string]$template = '.\template.docx', [string]$xml = 'value', [string]$destfile = 'document.docx')
function isURIWeb($address) {
$uri = $address -as [System.URI];
$null -ne $uri.AbsoluteURI -and $uri.Scheme -match '[http|https]';
}
# Load XML
if (Test-Path -Path $xml -IsValid) {
$data = [XML] (Get-Content -Path $xml -Encoding UTF8);
} elseif (isURIWeb($xml)) {
$response = Invoke-WebRequest $xml;
$data = [XML] $response.Content;
} else {
$data = [XML] "$xml";
}
Copy-Item -Force -Path $template -Destination $destfile;
Add-Type -Assembly 'System.IO.Compression.FileSystem';
$zipFile = [System.IO.Compression.ZipFile]::Open($destfile, 'Update');
$doc = $zipFile.GetEntry('word/document.xml').Open();
$buffer = [System.Array]::CreateInstance([byte], $doc.Length);
$doc.Read($buffer, 0, $doc.Length) | Out-Null;
$body = [System.Text.Encoding]::UTF8.GetString($buffer);
foreach($name in ([RegEx]::new('@@([a-zA-Z]\w*)')).Matches($body) | Sort-Object { $_.value.length } -Descending | Get-Unique) {
$value = $data.getElementsByTagName($name.Groups[1].value);
if ($value.Count -gt 0) {
$body = $body.Replace($name.Value, $value[0].InnerXml);
} else {
$body = $body.Replace($name.Value, '***');
}
}
$buffer = [System.Text.Encoding]::UTF8.GetBytes($body);
$doc.Seek(0, [System.IO.SeekOrigin]::Begin) | Out-Null;
$doc.SetLength($buffer.Length);
$doc.Write($buffer, 0, $buffer.Length);
$doc.Close();
$zipFile.Dispose();
The code was written in order to give non programmers an intuitive way of creating automatic fillable templates, they are invited to create a DOCX document sprinkled with @@<tags> as they best see fit and are asked either to choose or use predefined understandable canonical names in forming the @@<tag> constructs; these templates are then filled on request in an intranet context with data fetched from a web service.
Note that only the first element with an given tag is used.
Last update 2020-02-20 04:38