Welcome, Guest. Please Login
Tinderbox
  News:
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
  HomeHelpSearchLogin  
 
Pages: 1
Send Topic Print
pdf import of large documents (Read 4239 times)
Paul Atlan
Full Member
*
Offline



Posts: 45
Abu Dhabi
pdf import of large documents
Jun 08th, 2014, 8:49am
 
Following on the "butterfly" post, I was happy to plop in a pdf in the left side of tinderbox and get it "plaintextexted" as a note on the right side.

Unfortunately the pdf is truncated to what amounts to the 3rd page of 12. Is there a hard coded/soft coded/setting coded limit to the size of pdf's ?

Back to top
 
 
WWW   IP Logged
Mark Bernstein
YaBB Administrator
*
Offline

designer of
Tinderbox

Posts: 2871
Eastgate Systems, Inc.
Re: pdf import of large documents
Reply #1 - Jun 8th, 2014, 10:09am
 
No limit. But it's possible that some quirk of the pdf makes the pdf interpreter think the document is complete when it has merely hit a section break or something of that nature.  

Email the pdf in question and we'll take a gander.
Back to top
 
 
WWW   IP Logged
Paul Atlan
Full Member
*
Offline



Posts: 45
Abu Dhabi
Re: pdf import of large documents
Reply #2 - Jun 8th, 2014, 10:29pm
 
It's the pdf linked to in this page:
http://www.bain.com/publications/articles/winning-operating-models.aspx

under (download pdf).

It stops right where Fig1 is inserted in the text

Thanks for looking.
Back to top
 
 
WWW   IP Logged
Mark Bernstein
YaBB Administrator
*
Offline

designer of
Tinderbox

Posts: 2871
Eastgate Systems, Inc.
Re: pdf import of large documents
Reply #3 - Jun 9th, 2014, 10:50am
 
When I drop the pdf into the Tinderbox map of a new empty document, Tinderbox does extract the plain text of all eight sections, starting with the title "Winning operating models" and ending with "For more information, visit www.bain.com".  4807 words in all, according to $WordCount. Select All:Copy:Paste to BBEdit gives me 4808.

[edited to remove unexpected smiley insert]
Back to top
 
« Last Edit: Jun 9th, 2014, 11:51am by Mark Anderson »  
WWW   IP Logged
Paul Atlan
Full Member
*
Offline



Posts: 45
Abu Dhabi
Re: pdf import of large documents
Reply #4 - Jun 10th, 2014, 12:49am
 
Hmm, just tried again.
I get 904 words, same as last time.
Here's what I do:
* From the website, open the file as pdf, open it in Preview and then save to a local folder.
* open new tinderbox document.
* drag and drop pdf from finder to left TB window in map view.
* TB creates a note called "BAIN_BRIEF_Winning_operating_models.pdf" from the file title
* The note contains 902 words, and stops roughly at the Figure 1 spot.

Here's a screenshot:

http://cl.ly/image/0D1T2q1x3E3F

I'm not sure what I'm doing wrong.
Back to top
 
 
WWW   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: pdf import of large documents
Reply #5 - Jun 10th, 2014, 6:27am
 
Here, the left-pane import works correctly in v6.0.0 on a MBPro 2011 with OS 10.8.5 and and a MBAir with OS 10.9.3. In both cases I have UK settings for my OS locale. Without being able to reproduce the error. What this does indicate is the issue my be due to the local ecosystem of your Mac as opposed to a generic TB6 issue or others ought to see the issue. It doesn't mean you don't have the problem, just it's harder to diagnose.  Smiley
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
tadmcnulty
Full Member
*
Offline



Posts: 13

Re: pdf import of large documents
Reply #6 - Jun 10th, 2014, 7:08am
 
I just dropped a large pdf file onto the map view of a TBX6 file: a note with the same title as the pdf was created and all 8863 words of the pdf were successfully extracted into the note’s text.

Neat.

Just to make sure I hadn’t missed a great trick, I tried dropping a pdf file into TBX5, but it bounced right out, like a rubber biscuit. Same for a rich text file. Only a text file stuck to the map and made a text-filled note.

Then, just to check, I dropped a rich text file onto a TBX6 map: it safely landed as a text-filled note.  

(To my surprise, unlike those in pdf files, images in rich text files made the crossing together with their textual brothers.)

Drag and drop text import of pdf and rich text files - thanks Mark B
Back to top
 
 
  IP Logged
tadmcnulty
Full Member
*
Offline



Posts: 13

Re: pdf import of large documents
Reply #7 - Jun 10th, 2014, 9:26am
 
Actually, you can import images in pdf files too, if make a note first, in either outline or map view, and then drag and drop the pdf file into the text pane of the note - as already noted in the “butterflies” thread. Sorry, should have read the thread.
Back to top
 
 
  IP Logged
Paul Atlan
Full Member
*
Offline



Posts: 45
Abu Dhabi
Re: pdf import of large documents
Reply #8 - Jun 12th, 2014, 9:55am
 
Ok, I figured out what happens.
My initial attempt was as follows:
* open pdf link from website, it opens a window in safari.
* right click in safari window, and choose "Open in Preview.app"
* from preview.app, choose "save as".

I then get a pdf file (in my case 2.4 MB file)

This file is truncated by TB6

If I do the following"
 * right click pdf link on website, and choose "Download file"

I get a pdf file that's bigger than the first one (2.6 MB in this case)

This file is NOT truncated by TB6.

So it seems preview.app introduces some quirks in the pdf source file that make TB6 think
This is confirmed by the following: if I take the "good" pdf and run it through preview.app, then resave, the size changes, and it no longer works properly when dropped in TB6

Software like PdfPen Pro, on the other hand, does not mangle the pdf file ...

Not sure I'd qualify this as a bug, but clearly something to be aware of
Back to top
 
 
WWW   IP Logged
Pages: 1
Send Topic Print