Archive for July, 2010

How to convert a word document to other formats using PowerShell

I recently borrowed a Sony Reader Touch Edition from someone I know to try it out. As I started using Sony’s own library manager, I quickly got bored. I then tried open source Calibre which turned out to be a lot better interface but had a major flaw when it comes to supporting Sony Reader: It didn’t support importing word documents in the library despite of Sont Reader’s capability to read it. It can however import filtered html files which work can produce. Given my lazy nature, I did not want to convert a bunch of word documents I have by hand so I set out to write a PowerShell script to do the work for me.

The script turned out to be much simpler than I thought. Here it is for everyone’s benefit.

param([string]$docpath,[string]$htmlpath = $docpath)

$srcfiles = Get-ChildItem $docPath -filter "*.doc"
$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], "wdFormatFilteredHTML");
$word = new-object -comobject word.application
$word.Visible = $False

function saveas-filteredhtml
	{
		$opendoc = $word.documents.open($doc.FullName);
		$opendoc.saveas([ref]"$htmlpath\$doc.fullname.html", [ref]$saveFormat);
		$opendoc.close();
	}

ForEach ($doc in $srcfiles)
	{
		Write-Host "Processing :" $doc.FullName
		saveas-filteredhtml
		$doc = $null
	}

$word.quit();

Save this code to convertdoc-tohtml.ps1 and you can run it on a set of word documents regardless of doc or docx extension. Also for efficiency I am using –filter in Get-ChildItem instead of piping to where-object or using If statement in the script. Why? Because it says right in the help:

“Filters are more efficient than other parameters, because the provider applies them when retrieving the objects, rather than having Windows PowerShell filter the objects after they are retrieved.”

To run the script, you simple need to point it to a folder where your source document files are and provide output folder if you wish. If not provided, source folder will also be used as output folder. Here’s how you can run it:

convertdoc-tohtml.ps1 -docpath "C:\Documents" -htmlpath "C:\Output"

If you want to know how the script can be transformed to save as different format, refer to wdSaveFormat Enumeration members on MSDN.

Originally posted at http://blogs.technet.com/bshukla

Share

Deceiving scopes of variables in a function

I was recently troubleshooting a script when I came across a problem where a variable with defined scope was not retaining its value even though scope seemed correct. Let’s look at simplified example below:

Function Global:Name-ofaFunction
{
$Global:VariableinQuestion = $null
$VariableinQuestion = "Value"
$VariableinQuestion
}

Name-ofaFunction
$VariableinQuestion

The function “Name-ofaFunction” when called, creates a variable with global scope and sets its value to null. Next, it sets the value of a variable and prints current value to host. Part of the script, I am also calling variable after running the function. This helps me verify the value.

When you run this script, however, you will notice that it prints current value (“Value”) of the variable only once.

If you debug the code, you will notice that the variable is set to null when exiting the function! Interesting why that is happening when you have defined the variable scope to be global in line 3!!!

So let’s try this code.

Function Global:Name-ofaFunction
{
#$Global:VariableinQuestion = $null
$global:VariableinQuestion = "Value"
$VariableinQuestion
}

Name-ofaFunction
$VariableinQuestion

If you notice the difference, I have commented out line 3. Also, in line 4, I have added global scope to the variable. If you run this code, you will get the value of the variable printed to the host twice. Once inside the function and second time after the function!

How did that happen? The code doesn’t look much different. Only difference is where I define global scope. Why should that matter? Let me explain.

Although I am not sure if it is a feature or a bug, it seems PowerShell is resetting variable scope to local in line 4 in first example. Because it resets the scope to local, the variable loses its value and is set to null when exiting the function. This is because the variable is created inside the function and is scoped as local for the function. This is fixed in second example by defining it as global right where I am setting its value.

As you can imagine, there are many uses of setting correct scope of the variable. i.e. You may want to use the value returned by the function in another function. When you set the scope properly, the values are retained in appropriate scopes which can be used by other piece of code like functions of cmdlets run on host.

For more information on scope, you can read my previous blog post “PowerShell Variables and Scopes” and TechNet article “about_Scopes”.

Originally posted at http://blogs.technet.com/bshukla

Share