Welcome, Guest. Please Login
Tinderbox
  News:
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
  HomeHelpSearchLogin  
 
Pages: 1 2 
Send Topic Print
Agent/Script for annotating by paragraph (Read 24002 times)
Peter100
Full Member
*
Offline



Posts: 10

Agent/Script for annotating by paragraph
Aug 27th, 2012, 12:12pm
 
I am looking for a way to automate annotations/notes at the paragraph level. I am new to Tinderbox. I have no programming / scripting skills. I am a PhD student trying to finish my thesis.

Scenario: I have ca 4000 docs/pdfs that I've collected over the past three 3 years. Some of these contain the keys to my thesis. These vary in terms of quality and depth. The ideas expressed in them are not always coherent or flow, but there might be some good bits at paragraph level. I could of course search/tag everything but I would still need to hunt through each one separately to find the good bits.

I am inspired by Tom Webster's video on how he uses Tinderbox for qualitative research http://brandsavant.com/processing-qualitative-research-data-with-tinderbox/ - especially how he takes an interview transcription and "explodes" it at the sentence level and then codes/tags these sentences.

Elsewhere on this forum I was advised that I shouldn't use the "explode" feature for other kinds of documents, especially at the sentence level, and I concur. However the possibilities this kind of automation still haunt me. I wonder if it is advisable to "explode" a doc/ocr'd pdf at the paragraph level instead, and use the first 50 (or so) characters of the first sentence as the note title but create a link back to the original doc so the bits can also be viewed in context? Of course I would not want to perform this on all the docs/pdfs but a big handful.

Does this make any sense? I'm grateful for any and all input from those more skilled than I!

(Note: I have left a similar post on the DevonThink forum.)
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Agent/Script for annotating by paragraph
Reply #1 - Aug 27th, 2012, 12:57pm
 
Explode at paragraph breaks is the default setting for the Explode feature (also see the Explode dialog). so far so good.

Next, you only want the first 50 or so words as the title. I which case I'd choose to use either one or two sentences as your title and let the $Text be the whole paragaph.

Lastly, you want the note to link to source. Do you mean the source TB note - i.e. the one being exploded - or the document from which the text comes?  If the latter do you already have links to these?

In short, the rough process is:
  • Select the note to Explode
  • Note menu -> Explode.
    • Before you ask, there is not automated way to invoke Explode via either a shortcut or action code.
  • On the Explode dialog, set the desired choices (if not already the defaults).
  • Click 'Explode' button.
  • Use TB action code to set the back-links to source (insufficient info as yet to given a more detailed answer).
So, apart from not being able to automate the 4k+ separate explode actions, all this sounds do-able. More info is needed (questions above) re the last linking phase.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Peter100
Full Member
*
Offline



Posts: 10

Re: Agent/Script for annotating by paragraph
Reply #2 - Aug 27th, 2012, 2:43pm
 
Thanks for the quick reply.

Quote:
Lastly, you want the note to link to source. Do you mean the source TB note - i.e. the one being exploded - or the document from which the text comes?  If the latter do you already have links to these?


I'm still not clear about the import/export process and how it will merge with my workflow. After some initial searching and collating I suspect I'll be importing from DevonThink. It could be nice to have both linking options: a link to the DevonThink doc and a link to the primary TB "note" (i.e. the explode one) but if only one option is possible then please let me know how to set it up (TB action code?). Does this clarify?

I have another q about working with pdf refs stored in a citation manager like Sente. Do people generally just drag and drop these or is there a more sophisticated way of targeting specific passages within the PDF, short of copying all the OCRed text and working with it as a TB note.
Back to top
 
 
  IP Logged
Sumner Gerard
Full Member
*
Offline



Posts: 359

Re: Agent/Script for annotating by paragraph
Reply #3 - Aug 27th, 2012, 3:29pm
 
Quote:
link to the primary TB "note" (i.e. the explode one)


You may find some ideas on how this is quite easily done in this thread. Jean Goodwin's cascade of one-off OnAdd actions is probably simpler than the agent/self-canceling rule approach, though Mark A may have more recent thoughts (thread is a bit old but think it still applies.) I found that setting up the linking was much easier than it sounds.
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Agent/Script for annotating by paragraph
Reply #4 - Aug 28th, 2012, 10:06am
 
DT, when a single item is selected allows you (Edit menu or right click) to export a DT link which will look like:

x-devonthink-item://F2CA8FC3-FD65-43BE-85F7-3572CE530893

If you add such to a URL attribute in Tinderbox then clicking TB's open link button for it will open the item from within DT (the degree of preview depending on the source doc's format - e.g. PDF, TXT, DOC, etc.). Note that these links only work on the Mac with DT installed and the relevant DT database present.

If you have DTPro (and thus access to AppleScript) you should be able to export your 4000+ filenames and their DT links to the clipboard such that pasting to TB gives you 400 notes each named for the document name and DT local link. You'd think such functionality - exporting a tab-delim list of data would be built-in but DT seems to be a roach motel for data: data checks in but has no way to leave. That said I'm not a deep DT user - perhaps one such can step forward and correct me on this.

Not tested, but I assume you can also - with DTPro or higher - export all your source docs' plain text (or the bits of them you want) to TB.

Let's now jump forward. You have 4,000 TB notes, each with some text data and with $URL set to the DT local link. You can explode each note in turn but as at TB v5.x this is a manual process.  This scenario is a good reason why. Let's assume each document has 30 paragraphs. With all exploded, you'll have c.128,000 notes (30 X 400 + 400 existing notes and 1 x "Exploded Text'' container per explode). TB's OK with that though it won't want to try and show all of those in a single view. Just today, I've made a 440k+ TBX and on my fast 2011 MBPro it runs fine except there's way too much data for intensive agent use. So, I dont think you want to assume you can dump every paragraph from 4k+ articles and start analysing it.

You'll need to chunk the data. I'd explode one or a few documents are at time, throw away the obvious rubbish and save the good bits to a single core TBX.

~~~~~~~~~
Separate issue, linking post install.  Assuming the exploded notes are still in their Exploded Text containers, then their grandparent note's $URL will hold the DT source link and the grandparent note $Text will be the immediate text source.

So, make an agent:

Query:   inside("Exploded Text")
Action:   $URL=$URL(grandparent(original)); linkTo(grandparent(original))

The result? Any explode result note will have a TB link to the not from which it was created and have that not'e same DT back link.

Does that help...?
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
David Bertenshaw
Full Member
*
Offline



Posts: 182

Re: Agent/Script for annotating by paragraph
Reply #5 - Aug 28th, 2012, 12:34pm
 
I have managed to get information semi-automaticaly out of DTP and into Tinderbox, including DTP tags, DTP-links and the text, but it does involve a bit of hacking. It's some time ago now, so the details are a bit hazy but it went something like this:

  • Hack the script Listing (which is included in DT) to include the text of each selected document, its DTP-link and its tags. (By default it only provides a list of the titles of all documents in the database.)
  • Include distinctive markers at the beginning of each DTP document, and round each DTP-link and its tags, so TBX can work on them later.
  • Run the script on the documents you want to export, and save the result text file, which looked something like this:

    Code:
    @@@ <A>First Document name </A>
    <tg>Tag1;Tag2;Tag3</tg>
    <ln>/x-devonthink-item://EF28548A-C596-461D-BA19-D37A80F077C5</ln>
    This is the text of the first document
    
    @@@ <A>Second Document name </A>
    <tg>Tag1;Tag2;Tag3</tg>
    <ln>/x-devonthink-item://etc</ln>
    This is the text of the second document 
    
    
  • Import this document into TBX. Explode it using @@@ as the delimiter.
  • Run agents on the exploded documents to strip out the tags and the dtp-link and put them into attributes within each new note. E.g. in the Agent Query field, use
    Code:
    Text((<tg>)(.+)(</tg>)) 
    
    


    And in the Agent Action field use
    Code:
    Tags = $2 
    
    

    (This procedure is described in the TBX menu.)
  • Eventually you end up with a single note per DTP document, including the content in the note body and the tags and link added automatically into the relevant attributes.


As I said it's a bit clunky, and it only produces plain text, but it seemed to work OK. Unfortunately, I've lost the script which I used to do it, otherwise I'd post it for real Applescript coders to laugh at.
Back to top
 
« Last Edit: Aug 28th, 2012, 12:38pm by David Bertenshaw »  
  IP Logged
Sumner Gerard
Full Member
*
Offline



Posts: 359

Re: Agent/Script for annotating by paragraph
Reply #6 - Aug 28th, 2012, 1:29pm
 
Quote:
DT seems to be a roach motel for data: data checks in but has no way to leave

Don't know about DT, but DTPro is more like The Eagles' Hotel California than a roach motel. "You can check out any time you like but you can never leave." Lots of ways to check your data out while still keeping your room. One easy way (no AppleScripting needed) to get a list of titles (and text) into TB is to select the items you want in DTPro, choose 'File/Export/as Outliner Processor Markup Language' and save. Open the exported OPML file in TB. That's it.

The URLs brought into the individual TB notes (which will automatically display URL in Key Attributes) are the external URLs of the original sources, not the DT local link. If you have notes you've written yourself in DTPro or items in DTPro for which you haven't captured an external URL (there usually aren't many of these, as DTPro is very good at bringing in URLs automatically when you save things there from the web) you can first populate the URL field in DTPro by selecting each item, choosing 'Edit/Copy Item Link' and pasting that into the URL field in DTPro.  That way the DT local link will then be brought into TB.  You can then just click it in TB to open up the item in DTPro.

I don't speak AppleScript but there's no doubt a way to export the local DT link in the OPML file if the above involves too much manual populating of the URL field in DTPro.
Back to top
 
« Last Edit: Aug 28th, 2012, 1:42pm by Sumner Gerard »  
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Agent/Script for annotating by paragraph
Reply #7 - Aug 28th, 2012, 6:31pm
 
Yeah. Didn't mean to be harsh about DT - amazing app.  I had some success with this:

Code:
set dataString to "Name\tURL\n"

tell application "DEVONthink Pro"
   set itemList to selection of front window
   tell front window
	repeat with anItem in itemList
	   set itemName to ""
	   set itemLink to ""
	   set itemName to name of anItem
	   set itemLink to reference URL of anItem
	   set dataString to dataString & itemName & "\t" & itemLink & "\n"
	  
	end repeat
   end tell
  
end tell
tell application "Finder"
   set the clipboard to dataString as Unicode text
end tell 



The DT forum improved it (I've note tested this:

Code:
set dataString to "Name\tURL\n"

tell application "DEVONthink Pro"
   set itemList to selection of front window
   tell front window
	repeat with anItem in itemList
	   set itemName to ""
	   set itemLink to ""
	   set itemName to name of anItem
	   set itemLink to reference URL of anItem
	   set dataString to dataString & itemName & "\t" & itemLink & "\n"
	  
	end repeat
   end tell
  
end tell
tell application "Finder"
   set the clipboard to dataString as Unicode text
end tell 


Back to top
 
« Last Edit: Aug 28th, 2012, 6:31pm by Mark Anderson »  

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Peter100
Full Member
*
Offline



Posts: 10

Re: Agent/Script for annotating by paragraph
Reply #8 - Aug 29th, 2012, 12:33am
 
Thanks to all who are jumping in here but I mist admit you've dusted me. I think I'm still back at the hotel in California.

Could someone please recap? I pretty much got lost after...

Quote:
Select the note to Explode
Note menu -> Explode.
Before you ask, there is not automated way to invoke Explode via either a shortcut or action code.
On the Explode dialog, set the desired choices (if not already the defaults).
Click 'Explode' button.
Use TB action code to set the back-links to source (insufficient info as yet to given a more detailed answer).


I might as well throw in my own curve ball: I suppose the other alternative is to do the breaking up (exploding) of the documents/pdfs in DT and then import the best bits to TB. I believe this is possible. I received some feedback here: http://forum.devontechnologies.com/viewtopic.php?f=20&t=15865&p=73527#p73514.... This could make use of DT's annotation template to see Humpty Dumpty in one piece - at least in DT - but I'm a newbie with that one too. Perhaps those of you using both apps have the solution?

Cheers!
Back to top
 
« Last Edit: Aug 29th, 2012, 12:35am by Peter100 »  
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Agent/Script for annotating by paragraph
Reply #9 - Aug 29th, 2012, 7:06am
 
Are you saying you don't understand how Explode works? What happened when you tried. Meanwhile I've made a short tutorial on using Explode- see this thread.

A missing part of the analysis is whether some aspects are practical. In the DT forum you noted this:

Quote:
Let me give an example: Say I have a PDF with 300 paragraphs. 30 of these are mildly interesting, 20 very interesting and 10 are outstanding. The rest I don't think are relevant.


As you go on to point out there's a lot of potential wastage, in terms of making unneeded extra assets. Exploding 4000 300-paragraph notes would generate c.1.2 million notes for you to review. Given the above quote you don't even want most of those. A en masse import/split process will be wasteful and an overload. Therefore I'd consider trialling, using some of your content you know well, either/both of these two methods to compare how well they fit your needs:
  • Create the desired paragraphs in DT and export them with DT back-links**. The TB end is to generate a set of notes that are text paragraphs linked back to source in DT.
  • Use custom OPML export to export whole source doc texts with their DT back-link. You would then explode these manually and as soon as possible delete any obviously unwanted paragraphs. An agent can link these notes both to their TB source note and copy the latter's DT back-link the paragraph notes. If you save the $SiblingOrder of the exploded paragraphs to a custom attribute before weeding, you'll have the source paragraph number in the note.
Once you find a method that's a good fit we can help look at doing things in more volume.

** A cool feature of these I just discovered is that for PDFs you can even add a page number parameter so if the reference is on page #14 of the target doc, the link will open it in DT scrolled to the right page.  nice touch, though note that the user needs to add this extra parameter to the default link.

Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Mark Bernstein
YaBB Administrator
*
Offline

designer of
Tinderbox

Posts: 2871
Eastgate Systems, Inc.
Re: Agent/Script for annotating by paragraph
Reply #10 - Aug 29th, 2012, 9:38am
 
Stepping back from the mechanics, let's think a bit about how we want to use these notes, and what that suggests for how we want to divide the texts and manipulate them.  Let's take two examples from two fields.

Suppose we are studying medical care in the Tudor era by exploring the account books William Cecil/Lord Burghley during the months of his last illness.  We have a list of expenditures with some annotation; so much to an apothecary, so much to a grocer, so much to an upholsterer At the outset, it's mostly a jumble. But every line once made sense: everything that was bought was bought for a reason. So, we want to keep everything, and maintain sequence and metadata for everything. But we also want to break things down by individual transaction, explore repeated transactions with the same vendors, or for the same things, or for things that turn out to be related.  Explode is our friend here, and we're bound to use maps (for informal clustering) and agents (for more formal groups) as our analysis proceeds.

Alternatively, suppose we have been reading everything we can find on the policy intentions of Nero, starting with Gibbon and proceeding through to the most recent studies. Our interest is not so much in history -- what happened -- as in historiography -- the ways in which "Nero" has been used by political and intellectual movements in the recent past.  Here, we've got thousands of pages of reading.  But much of it is not very much to the point. Our need is not to marshall all the available evidence; rather, we need insight and we need telling examples, chosen from a great array of evidence.  Here, we don't really need or want to Explode; instead, we're probably better off copying specific passages that seem useful and adding commentary or metadata.

Back to top
 
 
WWW   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Agent/Script for annotating by paragraph
Reply #11 - Aug 29th, 2012, 10:50am
 
Building on Mark B's comments. Recalling your 4000 items as being mix or your writing and research, it strikes me you're more likely to want to do a paragraph tear down of your own work whilst using TB's Find (and Find Next) or Agents to do textual analysis of full-text research articles. Thus it is likely you'll want to consider at least 2 primary discrete collections of material coming across from DT: your writing and the research.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Sumner Gerard
Full Member
*
Offline



Posts: 359

Re: Agent/Script for annotating by paragraph
Reply #12 - Aug 29th, 2012, 1:24pm
 
Back on the mechanics, for those (RTF, without too much fancy formatting or extra line returns) items currently in DT that need to be exploded into paragraphs and explored in TB, I've come across a script devised by Korm and Christian Grunenberg and Charles Turner linked to at the bottom of this post that takes selected items from DT and exports them to an OPML file that opens in TB already exploded by paragraph and including the DT back-link and original URL (plus tags and comments, if any). To activate the links in TB change the attribute type for imported user attributes 'DTurl' and 'OriginalURL' to 'url'.

At the risk of overcrowding this particular room in Hotel California as deadlines loom, would love to learn more (either here or in another thread) about:

Quote:
 A cool feature of these I just discovered is that for PDFs you can even add a page number parameter so if the reference is on page #14 of the target doc, the link will open it in DT scrolled to the right page.
 
Back to top
 
« Last Edit: Aug 29th, 2012, 1:35pm by Sumner Gerard »  
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Agent/Script for annotating by paragraph
Reply #13 - Aug 29th, 2012, 3:32pm
 
@Sumner, see this thread re DT syntax for inbound URLs.
Back to top
 
« Last Edit: Aug 29th, 2012, 3:32pm by Mark Anderson »  

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Peter100
Full Member
*
Offline



Posts: 10

Re: Agent/Script for annotating by paragraph
Reply #14 - Aug 29th, 2012, 3:59pm
 
Super thread.. why stop now?

I'm learning from the sidelines ... cheering and experimenting

New theme (thread) song?

http://www.youtube.com/watch?v=KR9Hi4wjC3Y

Explode or implode
We will take care of it...
Back to top
 
 
  IP Logged
Pages: 1 2 
Send Topic Print