Managed Hosting



Project Home Known Issues Contact Project

PDFUtils Issue: Misreading Delimiters

Name: Misreading Delimiters
ID: 3
Project: PDFUtils
Type: Bug
Area: Code
Severity: Normal
Status: Open
Related URL: http://www.thinkcrew.com/scheduleconverter/sample.zip
Creator: Michael
Created: 05/09/12 2:32 PM
Updated: 05/16/12 6:01 AM
Description: Hi Ray,
This is a fantastic CFC and potentially could really help me solve a problem! However, I'm having some issues with getText incorrectly parsing delimiters from a PDF file.

Some background: I need to export data from proprietary software that doesn't have an export feature. I'm trying to get the data out by creating PDFs of the data, using a "|" character as a delimiter. The amount of data requires that I use 5 separate PDFs due to the large number of columns. Ultimate goal is to create one tab delimited text file with all of the data for future importing into Excel.

getText correctly parses the first PDF as:

Alpha | Beta | Gamma | Delta

But subsequent PDFs are parsed as:

Alpha Beta Gamma Delta | | |

(Bunching the data together and putting the delimiters at the end of the line).

My code and accompanying PDFs can be referenced at: http://www.thinkcrew.com/scheduleconverter/sample.zip

Any help much appreciated!!!


History: Created by modemmute (Michael) : 05/09/12 2:32 PM

Comment by cfjedimaster (Raymond Camden) : 05/16/12 5:58 AM
I'm not sure I see the issue. I downloaded your zip, and I dumped the result of the text for pdf1 and 2 and then stopped there. Looking at the text, it kinda looks right to me. The 2nd pdf had a lot of 'blanks' in it's format which seemed to be represented in the text extraction. So for example, from pdf2, page 1:

Makeup / Hair : Test | | | | ^ | | | | ^ Props : 6 Shovels , Megaphone | | | |

And when I look at the PDF, I see "Makeup" as the first item. Well technically the second one... so maybe it's missing the empty before that.

I don't know. Have you tried getting the text via cfpdf in cf9? PDFUtils was built for CF8 where some things were missing in cfpdf. It still does things cfpdf does not, but text extraction is built in now.

Comment by cfjedimaster (Raymond Camden) : 05/16/12 6:01 AM
I just tested in CF10 using cfpdf. It too seems to miss the very first cell, but outside of that seems fine just like pdfutils. (I'm pretty sure they use the same technique my pdfutils does.)

To add a comment to this bug, please login using the link above.