Problems reading filled in PDF forms

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Problems reading filled in PDF forms

Ludvig F. Aarstad
Greetings all :). I have been using itextsharp very successfully in reading PDF's and extracting various parts of text. I am using it in PowerShell, and it's working great.

I've come across an issue though, and I have been searching for a solution. When reading the pdf, using the same approach as before, I get only the text that is present in a blank form, not the text that has been filled in... The pdf has been filled in using Adobe Acrobat Reader DC.

The Method I am using to read the content is this:
$Reader = New-Object iTextSharp.text.pdf.pdfreader -argumentlist filename.pdf
for($page=1;$page -le $Reader.NumberOfPages;$page++){
$lines = [char[]]$Reader.GetPageContent($page) -join "" -split "`n"
foreach($line in $lines){
#do something with the text using regex etc...
}
}

When doing this, I get a lot of text, just not the text that has been filled in...

Any ideas?