Welcome, Guest. Please Login
Tinderbox
  News:
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
  HomeHelpSearchLogin  
 
Pages: 1 2 
Send Topic Print
Agent/Script for annotating by paragraph (Read 24303 times)
Peter100
Full Member
*
Offline



Posts: 10

Re: Agent/Script for annotating by paragraph
Reply #15 - Sep 05th, 2012, 10:46am
 
Quote:
it strikes me you're more likely to want to do a paragraph tear down of your own work whilst using TB's Find (and Find Next) or Agents to do textual analysis of full-text research articles. Thus it is likely you'll want to consider at least 2 primary discrete collections of material coming across from DT: your writing and the research.

Performing an explode on my pdf collection, even at the page level, is BAD idea. I realize that now. I don't know what I was thinking? I suppose I was feeling like a kid in a candy store. The suggestion to focus on my own texts, combined with focused keyword/tag searches in the pdf articles, makes much more sense.

So here is my next quest: automated pdf search/annotation... is there a way to get Tinderbox (or DevonThink) to automatically create highlighted annotations/notes in the pdfs? I would like to create an agent (is that the correct TB term?) to find, for example, all the paragraphs that match a given search criteria (e.g. 5-10 search terms) and then have that paragraph automatically highlighted/annotated with a note that indicates/lists the key terms. This way the same paragraph could get different notes. Perhaps something like this is possible in Scrivener or another app? I could then pull all the annotations together that match into a smart group for each search string. Is this over zealous?
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Agent/Script for annotating by paragraph
Reply #16 - Sep 5th, 2012, 11:21am
 
Quote:
I would like to create an agent (is that the correct TB term?) to find, for example, all the paragraphs that match a given search criteria (e.g. 5-10 search terms) and then have that paragraph automatically highlighted/annotated with a note that indicates/lists the key terms.

An agent can indeed find the items though using c.10 terms in the query, in a large corpus of notes, might be slow. Action code cannot highlight text, make text links, footnotes or new notes. The latter is deliberate to avoid ill-considered actions trying to generate millions of notes (whereupon at some point TB would get overloaded).

TB queries can't search on rich text features, e.g. bold or highlighted text, so bear that in mind also. TB's Find view - albeit with more restricted query potential - will underline all matching string if the note's window is opened from the find view list (see more).

DEVONThink's 'Pro' and higher versions [sic] have AppleScript support so might be able to highlight text (assuming PDFs aren't un-OCR-ed scans) of matching terms. You'd do better to follow that angle up in the DEVONThink support forums as I suspect you'll need to talk to those expert in the scripting side of DT.

Recalling the point up thread about generating millions of notes, I do wonder if in your quest for a single 'does-everything' feature that you'll generate more data than you'll be inclined to review once done. If a process will create more data than needed, that is a good reason to review one's strategy, even if one then continues; at least that way there are no unpleasant surprises. Thought: perhaps this scaling issue is one possible reason that there aren't lots of previous examples of the workflow being discussed?

Back to top
 
« Last Edit: Sep 5th, 2012, 11:56am by Mark Anderson »  

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Agent/Script for annotating by paragraph
Reply #17 - Sep 5th, 2012, 11:59am
 
This post, in another thread here, re DEVONThink might help with your highlighting issue.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Peter100
Full Member
*
Offline



Posts: 10

Re: Agent/Script for annotating by paragraph
Reply #18 - Sep 5th, 2012, 12:08pm
 
Thanks! I'l have a look!

Quote:
that is a good reason to review one's strategy, even if one then continues; at least that way there are no unpleasant surprises. Thought: perhaps this scaling issue is one possible reason that there aren't lots of previous examples of the workflow being discussed?


Hmm. I need to visualize and understand how I can use an app like TB before I fully embrace it. It's like flying a plane. I would never just hop in and see how it goes! I would be good and ready with the simulator first. I'm curious about TB and trying to develop a mental model of what it can do for me and how I might use it with other apps like DevonThink. In other words, I need to understand the limits before I can judge the appropriate level/scale at which TB works best. This is what drives my more 'hypothetical" questions. I certainly appreciate all the generous feedback!

I am not necessarily interested in "generating millions of notes" (only money ha ha) if there is no way for TB to serve these up in a meaningful way, for example sorting out a few dozen "greatest hits" based on their relevance score. I suppose this is where DT might come in (at least with a couple of thousand). So I am reflecting on how I might turn a million, or probably a few thousand (if it's only my own work) into piles of a few dozen per search string. The suggestion of an initial filtered search seems the most obvious (hopefully scripted). If I did this over on the DT side I could then copy the notes/chunks into TB and then fine-tune their relations/outline.

I suppose this is the workflow you have been suggesting all along. Wink I'll hop over to DT now and see what I find there...
Back to top
 
« Last Edit: Sep 5th, 2012, 12:14pm by Peter100 »  
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Agent/Script for annotating by paragraph
Reply #19 - Sep 5th, 2012, 12:59pm
 
Quote:
I need to visualize and understand how I can use an app like TB before I fully embrace it.

I do understand, but short of doing your PhD analysis or showing someone else's full workings it's rather hard. Instead, we're left with hypothesised questions that are hard to answer. Whilst my earlier answer listed some things TB can't do, in the context of the hypothesised workflow, the app is remarkably capable of text analysis though I think it's design premise starts from a different place.  So far we're trying to develop an automated workflow that finds and creates a n annotation for every instance of the target term. So, you'll have lots of actual note items/annotation/bits-of-data created in TB, DT, etc., that likely you don't need and simply clog up analysis and add to review time. At the same time, you need to allow for word stemming , homonyms, mispellings, indirect references, etc., which this process will get wrong either by annotating incorrect matches or missing correct ones.

A technique I've seen used successfully in a number of different contexts in TB is 'tagging.  Indeed, that's essentially the heart of Tom Webster's process, where this thread started. Tom was exploding data pre-review because his data suited that and was likely written (laid out) with such later use in mind. However, you don't have to Explode everything.

For distinct terms, Agents can rapidly search a whole Tinderbox 00,000s of notes and hold a reference to each one (the aliases 'in' the agent when looking at the UI). The agent's action can them, well, do all sorts of things, including adding a specific terms to an attribute.  As a note's content might refer to more than one topic of interest, use a Set attribute (essentially a de-duped list that will only hold one instance of any value added). Assume you have items of interest to your study A, B and C. You set up your notes (or use prototypes) so your set is shown in the note's Key Attributes table. Now as you read the long form text you can type the 'tag' values into your set  and even use auto-completion of terms. See a word/phrase/pasage warranting a deliberate footnote - select it and use one of the footnote option to make a new footnote (annotation) to the TB note. The footnote is linked by a defined type of link which can be queried by agents too.

Agents don't have to be permanent.  If you don't need them other than to find and review a particular set of notes, that's fine - delete the agent; it and its aliases leave but the notes it matched are intact and still retain the changes (if any) made by the agent.

If you can offer up some specimen data, it would be easy to illustrate this in more concrete terms. In a hypothetical context and with some many unfixed assumptions it's hard to give more detail. Time spent testing workflow is not, as often assumed, time wasted. Rather, it wastes less time downstream and leads to a generally better process as it forces us to see some of the edge cases to our reasoning before they strike at a less opportune moment.

Anyway, we can chip away at this - eventually you'll run out of reasons to not get started.  Wink

[Later] To help you experiment , I've just added an aTbRef article on pre-populating the lists for key attribute values.
Back to top
 
« Last Edit: Sep 5th, 2012, 1:47pm by Mark Anderson »  

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Pages: 1 2 
Send Topic Print