Tinderbox User-to-User Forum (for formal tech support please email: info@eastgate.com)
http://www.eastgate.com/Tinderbox/forum//YaBB.cgi
Tinderbox Users >> Moving to Tinderbox 6 >> pdf import of large documents
http://www.eastgate.com/Tinderbox/forum//YaBB.cgi?num=1402231743

Message started by Paul A. on Jun 8th, 2014, 8:49am

Title: pdf import of large documents
Post by Paul A. on Jun 8th, 2014, 8:49am

Following on the "butterfly" post, I was happy to plop in a pdf in the left side of tinderbox and get it "plaintextexted" as a note on the right side.

Unfortunately the pdf is truncated to what amounts to the 3rd page of 12. Is there a hard coded/soft coded/setting coded limit to the size of pdf's ?


Title: Re: pdf import of large documents
Post by Mark Bernstein on Jun 8th, 2014, 10:09am

No limit. But it's possible that some quirk of the pdf makes the pdf interpreter think the document is complete when it has merely hit a section break or something of that nature.  

Email the pdf in question and we'll take a gander.

Title: Re: pdf import of large documents
Post by Paul A. on Jun 8th, 2014, 10:29pm

It's the pdf linked to in this page:
http://www.bain.com/publications/articles/winning-operating-models.aspx

under (download pdf).

It stops right where Fig1 is inserted in the text

Thanks for looking.

Title: Re: pdf import of large documents
Post by Mark Bernstein on Jun 9th, 2014, 10:50am

When I drop the pdf into the Tinderbox map of a new empty document, Tinderbox does extract the plain text of all eight sections, starting with the title "Winning operating models" and ending with "For more information, visit www.bain.com".  4807 words in all, according to $WordCount. Select All:Copy:Paste to BBEdit gives me 4808.

[edited to remove unexpected smiley insert]

Title: Re: pdf import of large documents
Post by Paul Atlan on Jun 10th, 2014, 12:49am

Hmm, just tried again.
I get 904 words, same as last time.
Here's what I do:
* From the website, open the file as pdf, open it in Preview and then save to a local folder.
* open new tinderbox document.
* drag and drop pdf from finder to left TB window in map view.
* TB creates a note called "BAIN_BRIEF_Winning_operating_models.pdf" from the file title
* The note contains 902 words, and stops roughly at the Figure 1 spot.

Here's a screenshot:

http://cl.ly/image/0D1T2q1x3E3F

I'm not sure what I'm doing wrong.

Title: Re: pdf import of large documents
Post by Mark Anderson on Jun 10th, 2014, 6:27am

Here, the left-pane import works correctly in v6.0.0 on a MBPro 2011 with OS 10.8.5 and and a MBAir with OS 10.9.3. In both cases I have UK settings for my OS locale. Without being able to reproduce the error. What this does indicate is the issue my be due to the local ecosystem of your Mac as opposed to a generic TB6 issue or others ought to see the issue. It doesn't mean you don't have the problem, just it's harder to diagnose.  :)

Title: Re: pdf import of large documents
Post by tadmcnulty on Jun 10th, 2014, 7:08am

I just dropped a large pdf file onto the map view of a TBX6 file: a note with the same title as the pdf was created and all 8863 words of the pdf were successfully extracted into the note’s text.

Neat.

Just to make sure I hadn’t missed a great trick, I tried dropping a pdf file into TBX5, but it bounced right out, like a rubber biscuit. Same for a rich text file. Only a text file stuck to the map and made a text-filled note.

Then, just to check, I dropped a rich text file onto a TBX6 map: it safely landed as a text-filled note.  

(To my surprise, unlike those in pdf files, images in rich text files made the crossing together with their textual brothers.)

Drag and drop text import of pdf and rich text files - thanks Mark B

Title: Re: pdf import of large documents
Post by tadmcnulty on Jun 10th, 2014, 9:26am

Actually, you can import images in pdf files too, if make a note first, in either outline or map view, and then drag and drop the pdf file into the text pane of the note - as already noted in the “butterflies” thread. Sorry, should have read the thread.

Title: Re: pdf import of large documents
Post by Paul Atlan on Jun 12th, 2014, 9:55am

Ok, I figured out what happens.
My initial attempt was as follows:
* open pdf link from website, it opens a window in safari.
* right click in safari window, and choose "Open in Preview.app"
* from preview.app, choose "save as".

I then get a pdf file (in my case 2.4 MB file)

This file is truncated by TB6

If I do the following"
 * right click pdf link on website, and choose "Download file"

I get a pdf file that's bigger than the first one (2.6 MB in this case)

This file is NOT truncated by TB6.

So it seems preview.app introduces some quirks in the pdf source file that make TB6 think
This is confirmed by the following: if I take the "good" pdf and run it through preview.app, then resave, the size changes, and it no longer works properly when dropped in TB6

Software like PdfPen Pro, on the other hand, does not mangle the pdf file ...

Not sure I'd qualify this as a bug, but clearly something to be aware of

Tinderbox User-to-User Forum (for formal tech support please email: info@eastgate.com) » Powered by YaBB 2.2.1!
YaBB © 2000-2008. All Rights Reserved.