Welcome, Guest. Please Login
Tinderbox
  News:
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
  HomeHelpSearchLogin  
 
Pages: 1 2 3 4 
Send Topic Print
Gettysburg: a TB textual analysis experiment (Read 62292 times)
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Gettysburg: a TB textual analysis experiment
Reply #15 - Jul 10th, 2009, 8:58am
 
Paul thanks for the demo and more importantly the annotations; I know the latter can sometimes take the longer time to do. I concur about some of these Command Line things taking time & resource; use sparingly and as little as required. In your note "The examples below" there is a simple explanation for the rules running once and that is the use of a '|='. Unlike using '=', with '|=' if the left side has a value the right side is not run. So, cycle 1 (on note creation) the left side is populated, and thereafter the rule is ignored. Were you to delete the left side attribute's value the rule would run again, once.

I've found that if you put your OSA/command line code in a note's text, you (a) don't need to do the TB-related layer of escaping and (b) you can get away with using line breaks in the code that don't get passed out via runCommand. The latter allows a more readable formatting that if used in the actual CL would cause it to malfunction. It's just as editable as a macro and accessible, as $Text([name of code note]), a win-win!
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Paul Walters
Ex Member




Re: Gettysburg: a TB textual analysis experiment
Reply #16 - Jul 10th, 2009, 9:17am
 
Mark, thanks for the assistance - it would seem that |= is preferable in this context to avoid system load.   I also see the utility of putting the osaprint command string in a note rather than a macro -- however with a note how does one do the variable substitution ($1, $2, ...) that a macro provides?  (This is OT for this thread and maybe belongs elsewhere.)
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Gettysburg: a TB textual analysis experiment
Reply #17 - Jul 10th, 2009, 9:24am
 
Paul, agree yr last - comment re CL/macros/text now in a new thread here.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Charles Turner
Full Member
*
Offline



Posts: 180
New York, USA
Re: Gettysburg: a TB textual analysis experiment
Reply #18 - Jul 10th, 2009, 10:05am
 
Here's something else to take a look at:

http://www.vze26m98.net/tbx/gettysburg.png
http://www.vze26m98.net/tbx/gettysburg-time.png
http://www.vze26m98.net/tbx/gettysburg-space.png

The image is done with Omnigraffle, and you can get that file and the partner text and Applescript here:

http://www.vze26m98.net/tbx/Gettysburg.zip

So what's the big deal; just a pretty picture, correct? Well, the text was placed on the canvas as words by an Applescript. It's not a single block. Each code is placed in its own named layer, so they constitute metadata, as well as enabling the selective display of information as shown above.

So it would be fairly easy to write another script that would find what text is "below" each layer annotation, and produce a "report" of your visual coding process.

The one big "gotcha" that I ran across; you'll notice that coding line-broken phrases is done is two sections: it doesn't preserve the connectedness of the phrase. You could fix this by exploding on phrases instead of words, or adding line connections between words to code the word sequence into the graphic.

Enjoy, Charles
Back to top
 
 
WWW   IP Logged
Jean Goodwin
Full Member
*
Offline



Posts: 136
North Carolina
Re: Gettysburg: a TB textual analysis experiment
Reply #19 - Jul 10th, 2009, 10:31am
 
Morning, all:  To recap the story thus far...

We're considering how to use Tinderbox for textual analysis, which seems to require:

1.  Methods for massaging a source text into useability (e.g., removing odd characters).
2.  Methods for
   (a) selecting a passage in the source text and assigning it a code, while also
   (b) retaining information about the position of the passage in the source.
3.  Methods for displaying the results, including
   (a) with a focus on the coded passages ("deconstructed;"  e.g., all the passages with a given parent code, in position order), and
   (b) with a focus on the source ("reconstructed;"  e.g. the source with all coded passages displayed in different colors).  
(Note that there are other worthwhile possibilities:  e.g. displaying a passage in a bit of its context;  but we have to stop somewhere.)

And we've found:

On 1, Loryn and MarkA have demo'd methods for getting source text into Tinderbox.

On 2(a):
(i)  Charles--I understand you've been adding html-like tags to your sources, but I'm vague on how (plus I never was able to get MarkA's movie of it to download).  Out of curiosity: can you post an ordinary language description of your coding procedure?
(ii)  use footnote tool to select a passage, assign code by linking to a code note (Paul;  very elegant)
(iii)  use footnote tool to select a passage, assign code from a key attribute menu (me)

On 2(b):
(i) Mark's proposal is to explode the source into bits (of some relevant scale), mark up each bit with position information, and then code the bits.  It's easy to have the resulting coded passages harvest the position information.
(ii)  Paul's proposal (if I understand it) is:  when a new coded passage note is created, to search for the passage in the source, locate it, and assign its location info to the new passage note.  OK, Paul, I have two questions about this:

First--but only if it's not too much trouble--can you lay out some instructions so that a clueless person (e.g., me) can try your idea?  Honestly, just ignore this request if it's a pain, since I'm not sure even very laborious instructions would help.

Second--I was thinking of something similar to what you've done, just using agents;  it would go like this:
-Use Mark's explosion method to break the source down (probably into sentences) and assign each bit a $Position.
-When a new coded passage (footnote) is created, agent #1 searches the exploded text for the new note's $Name.
-Agent #2 assigns the new note the $Position of the exploded bit of source found by agent #1.
(Or something like that.)  However, it occurred to me that this wouldn't work if the coded passage was a string that occurred in more than one place in the source--like the important word "here" in the Gettysburg example.  Is there a work-around for this problem?

Finally, on 3:
(a) Charles has demo'd a way of displaying coded passages (including overlapping ones) in Nakakoji view.  The "footnoting" methods produce child notes, which Tinderbox is very good at displaying in all sorts of ways.
(b)  Paul and Mark have proposed a couple of ideas for using html to display a source with code data included.

Thanks, all, for a very productive discussion!
Back to top
 
 
  IP Logged
Charles Turner
Full Member
*
Offline



Posts: 180
New York, USA
Re: Gettysburg: a TB textual analysis experiment
Reply #20 - Jul 10th, 2009, 10:59am
 
Quote:
Out of curiosity: can you post an ordinary language description of your coding procedure?


The HTML example I posted was hand-coded. It took all of 30 minutes to do, including setting up the style sheet, although I was just following the decisions that you had already made about the text.

(Technically, it just takes advantage of CSS, and the DIV and SPAN tags, which allows a user-defined CLASS attribute to have a specific graphic format. So a SPAN of CLASS="time-future" can have a distinct visual expression on the page. There's also audio CSS!)

Pre-coding before you put a text into Tbox makes a lot of sense to me. You've got a lot of good tools at your disposal.

One more general point I should make (I've lost sight of whether this is the "tech" or "gedanken" thread) is that there are different objectives to text tagging that create different approaches.

In this Gettysburg thread, the goal has been to do an intensive/complete coding of a single text. There's also (and this is more what I'm interested in) a much more sparse tagging of a large corpus of texts.

I personally find that the visual results in the two unified presentations that I've made, that the coding is much too dense for me to comprehend. They're very much more trees than forest.

Best, Charles
Back to top
 
 
WWW   IP Logged
Jean Goodwin
Full Member
*
Offline



Posts: 136
North Carolina
Re: Gettysburg: a TB textual analysis experiment
Reply #21 - Jul 10th, 2009, 11:36am
 
Hi, Charles:  Actually, I'm curious about how you're adding the html-lke-codes  to the source texts you're working on in your big project--that is, if you're doing that coding in Tinderbox.

You're right in pointing out that different coding jobs are going to require different approaches, especially in displaying the results.  Still, I think that the methods for accomplishing task 2(a) demo'd in this thread would work either for intensive/single text analysis or more sparse tagging of a large corpus (as I think you're doing).

I suspect that one important limitation of the "footnote" methods is instead going to be the length of the source text.  Sometime in the next couple of months, I'm going to try one of the "footnote" methods on four 6-12,000 word texts.  My guess is that I'll probably have to explode them by paragraphs.
Back to top
 
 
  IP Logged
Paul Walters
Ex Member




Re: Gettysburg: a TB textual analysis experiment
Reply #22 - Jul 10th, 2009, 12:00pm
 
Jean had a couple of questions:
Quote:
First ... lay out some instructions ...

To set this up, one needs an attribute (Position will do) and the macro in the experimental file I posted (here at http://drop.io/FindPosition).  The macro is named FindPos and the macro's value is:

osascript -e "set TMP to offset of \\\\"$1\\\\" in \\\\"$2\\\\""

(Ignore all the hashing - it's not relevant just now why it is there.)  $1 is an argument for which you substitute the search text, $2 is an argument for which you substitute the text targeted for the search.  A sample use would be to include the following rule in the prototype for your passage notes:

$Position |= runCommand(do(FindPos,$Name,$Text(Gettysburg)))    

$Name is the name of the passage note and is substituted for $1 by the macro; $Text(Gettysburg) is our target text and is substituted for $2 in the macro.  (As Mark A pointed out, the use of |= will cause this to run once only and reduce cycle time.)

Quote:
Second ... use agents ...?

IMHO, this is a worthy of a dedicated thread.  For example, one could use an attribute ($Position) whose type is number and which Tinderbox automatically increments sequentially, so that each new passage note has a new $Position number.  A major concern here is that to maintain the integrity of the sequence value of $Position in passage notes vis-a-vis the source text one would have to do one's coding starting at the top of the passage, running to the bottom, and never jumping back into the text at some other point.  Otherwise, the sequence numbering loses its meaning.  It was for this reason that I focused on the offset indexing approach explained in the foregoing.

Nonetheless, greater Agent mavens than I will propose a workable approach.

Back to top
 
« Last Edit: Jul 10th, 2009, 12:03pm by Paul Walters »  
  IP Logged
Charles Turner
Full Member
*
Offline



Posts: 180
New York, USA
Re: Gettysburg: a TB textual analysis experiment
Reply #23 - Jul 10th, 2009, 12:42pm
 
Quote:
Actually, I'm curious about how you're adding the html-lke-codes  to the source texts you're working on in your big project--that is, if you're doing that coding in Tinderbox.


Keyboard Maestro (or any other, I'd guess, Quickeys-like utility)

There was an enthusiastic recommendation for Keyboard Maestro here on the Forum: it works very well and enabled me to remove four other utilities I was running.

The tag macros simply cut a highlighted word or phrase to the clipboard, insert the tag text on either side of the selection, and then paste it right back in the same place. It's all the same code, just different tag text.

A few I have set to function keys, and a few I have stored in a menu. If I start to generate more, I'll probably make a little Applescript application that does the above, plus display/edit all of my tags.

Also, they work for any application, so I can use them in Tbox and I'm also using them in TextMate, depending on what I want...

HTH, Charles
Back to top
 
 
WWW   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Gettysburg: a TB textual analysis experiment
Reply #24 - Jul 10th, 2009, 1:01pm
 
somewhere.)

Quote:
(plus I never was able to get MarkA's movie of it to download)

Jean - what file - email me direct if needs be.

Quote:
-Agent #2 assigns the new note the $Position of the exploded bit of source found by agent #1.
(Or something like that.)  However, it occurred to me that this wouldn't work if the coded passage was a string that occurred in more than one place in the source--like the important word "here" in the Gettysburg example.  Is there a work-around for this problem?

If the agent finds more than one 'here' note, all notes will receive the same OnAdd action unless the attribute being updated is sequential (in which case each should get a different number).

I'd concur Charles comment about putting different questions in different threads. There are so many questions flying around I'm getting confused. The temptation is to solve everything at once but often - as Mark B's patiently taught me - it's better to break everything down into smaller tasks. Then once one has a confidence with a given narrowly defined task or technique only then add it to the overall project. The short-term downside is this can mean re-doing some things a lot of time as a project may involve several techniques that need to be seamed together carefully and each new addition reveals another puzzle to unlock and add which involves study and a rebuild. But, it pays off in the medium term as one gets a much better sense of what's causing the problem when one occurs and an understanding of the 'tools' available as new challenges are revealed.  It also avoids...

Quote:
Sometime in the next couple of months, I'm going to try one of the "footnote" methods on four 6-12,000 word texts.


...dealing with big problems as the amount of text .  The larger your starting corpus, the more careful you need to be doing things like command line-based manipulations. making sure these thinks run only when needed becomes more important. The Gettysburg address is a good playground in that sense. Even so, a few rule left as '=' instead of '|=' caused resource usage to keep heading upwards. Doing this sort of analysis, it can be handy to have Activity Monitor (In your Mac's Apps -> Utilities) running so you can see if one's TB activities are stressing the app.

Don't misread the last as implying TB doesn't scale past a small number of notes. Far from it, but if you inadvertently create a scenario where you've lots of notes and you're checking them all almost constantly - that will push things much harder.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Jean Goodwin
Full Member
*
Offline



Posts: 136
North Carolina
Re: Gettysburg: a TB textual analysis experiment
Reply #25 - Jul 11th, 2009, 8:20am
 
Good morning, Paul:  You've paid me the big compliment of assuming I understand much more about "macros" and so on than I actually do!  That's a good example of the friendliness of this Forum:  online, I think it's more usual to assume others are fools (or worse).

I may be returning to your proposal if it looks like I need really exact position information for some project.  Meanwhile, though, I'm still concerned that a search on a footnote's name (implemented by a macro or by an agent) won't return correct results if that string appears multiple times in the source/target text.  

Charles, thanks for the reference to Keyboard Maestro.  With it, we now have a nice catalog of Tinderbox methods for assigning labels (codes) to source texts.

Given the way Tinderbox works, though, I'm sure there's many more, to be taken up in some future discussion.
Back to top
 
 
  IP Logged
Loryn
Full Member
*
Offline



Posts: 97

Re: Gettysburg: a TB textual analysis experiment
Reply #26 - Jul 12th, 2009, 8:15am
 
I'm amazed by the energy and creativity that MarkA, Jean and Paul are putting into this. Thanks! The discussion and the experiments are certainly extending the scope of our practical capabilities in this domain.

I'll do a fuller write-up as to where we're at on my blog during the week. For now, I'd just like to share where I'm at, and ask for some help.

My current Experimental Gettysburg file is available here: https://loryn.sugarsync.com/getfiles/d2xweczow8bwm

In this file, I have two experiments:
- Trial 1
- Trial 2

Trial 1 begins where MarkA left off, demonstrating how to incorporate an analysis layer into the Nakakoji display. To generate the Nakakoji display, place the focus on /Reassembly/Constructed and use the template Segment-view-title.

The coding was done in the Map entitled Map: Gettysburg Address. I used positional coding (Ypos, Xpos) in order to reconstruct the coding into the correct locations for the Nakakoji output.

I'll explicate the limitations I ran into later, but I think most of them are fairly apparent. (Slow, difficult, hard to see, inflexible.)

Trial 2 consists of my own adaptation of the techniques Jean and Paul were displaying. (Thanks very much to Paul and MarkA for educating me about the footnote tool. I was completely unaware of the tool's existence, use and purpose.)

Instead of using Paul's CodeLink technique, I've gone for a straight prototype-inheritance technique here. I discovered that the really important part of this technique is to respect linguistic Rank. Notice that I'm using Clause-Complex (i.e. Sentence), Clause, Phrase ranks in this analysis. That's good enough for this analysis. But if my analysis focused on individual wordings, then I'd want to drill one further rank down: to individual words.

To generate the Nakakoji for the first sentence in Trial 2, put your focus on the first note descending from Sentence 1. (i.e. the note: "Fourscore and seven years ... "). Select the template: /TEMPLATES2/•CC2.

I've found the interaction experience to be reasonably pleasing (thanks for the idea Paul!). The obvious limitation is that this approach doesn't really suit overlapping syntagms.

Now: My request for help. (Probably from MarkA.) The Nakakoji output I'm getting looks like this:

Code:
< ClauseComplex> <Clause> <Circumstantial-phrase> Fourscore and seven years ago </Circumstantial-phrase><Nominal:Actor> our fathers </Nominal:Actor><Verbal-phrase> brought forth </Verbal-phrase><Circumstance:Locative> on this continent </Circumstance:Locative><Nominal:Range> a new nation </Nominal:Range> </Clause><Clause:Dependent> <Verbal-phrase> conceived </Verbal-phrase><Circumstance:Manner> in liberty </Circumstance:Manner> </Clause><Clause:Dependent> <Conjunction> and </Conjunction><Verbal-phrase> dedicated to </Verbal-phrase><Nominal:Range> the proposition </Nominal:Range> </Clause><Clause:Dependent> <Nominal:Goal> that all men </Nominal:Goal><Verbal-phrase> are created </Verbal-phrase><Circumstance:Manner> equal </Circumstance:Manner> </Clause> </ClauseComplex> < Clause> <Circumstantial-phrase>  </Clause><Nominal:Actor>  </Clause><Verbal-phrase>  </Clause><Circumstance:Locative>  </Clause><Nominal:Range>  </Clause> </ClauseComplex> < Circumstantial-phrase>  </ClauseComplex> < Nominal:Actor>  </ClauseComplex> < Verbal-phrase>  </ClauseComplex> < Circumstance:Locative>  </ClauseComplex> < Nominal:Range>  </ClauseComplex> < Clause:Dependent> <Verbal-phrase>  </Clause><Circumstance:Manner>  </Clause> </ClauseComplex> < Verbal-phrase>  </ClauseComplex> < Circumstance:Manner>  </ClauseComplex> < Clause:Dependent> <Conjunction>  </Clause><Verbal-phrase>  </Clause><Nominal:Range>  </Clause> </ClauseComplex> < Conjunction>  </ClauseComplex> < Verbal-phrase>  </ClauseComplex> < Nominal:Range>  </ClauseComplex> < Clause:Dependent> <Nominal:Goal>  </Clause><Verbal-phrase>  </Clause><Circumstance:Manner>  </Clause> </ClauseComplex> < Nominal:Goal>  </ClauseComplex> < Verbal-phrase>  </ClauseComplex> < Circumstance:Manner>  </ClauseComplex> 



Now, the substring consisting of the following is what I expect to get!

Code:
< ClauseComplex> <Clause> <Circumstantial-phrase> Fourscore and seven years ago </Circumstantial-phrase><Nominal:Actor> our fathers </Nominal:Actor><Verbal-phrase> brought forth </Verbal-phrase><Circumstance:Locative> on this continent </Circumstance:Locative><Nominal:Range> a new nation </Nominal:Range> </Clause><Clause:Dependent> <Verbal-phrase> conceived </Verbal-phrase><Circumstance:Manner> in liberty </Circumstance:Manner> </Clause><Clause:Dependent> <Conjunction> and </Conjunction><Verbal-phrase> dedicated to </Verbal-phrase><Nominal:Range> the proposition </Nominal:Range> </Clause><Clause:Dependent> <Nominal:Goal> that all men </Nominal:Goal><Verbal-phrase> are created </Verbal-phrase><Circumstance:Manner> equal </Circumstance:Manner> </Clause> </ClauseComplex> 



But why is all the extra output being generated? How do I eliminate the additional output?
Back to top
 
 
  IP Logged
Loryn
Full Member
*
Offline



Posts: 97

Re: Gettysburg: a TB textual analysis experiment
Reply #27 - Jul 12th, 2009, 8:21am
 
Charles, that's a pretty powerful demonstration of the value of layering!
Back to top
 
 
  IP Logged
Jean Goodwin
Full Member
*
Offline



Posts: 136
North Carolina
Re: Gettysburg: a TB textual analysis experiment
Reply #28 - Jul 12th, 2009, 10:05am
 
Good morning (here), Loryn:  Thanks!--You've surprised me again with what Tinderbox can do.  To translate it (so I can make sure I understand it):  You're using Nakakoji view with export templates which both automatically reconstruct the source text from the coded passages, and also inserting html-like tags (e.g., <Clause>xxxxx</Clause>) to represent the codes.  This source text-enriched-with-tags can be copied into a note of its own (or into another application), and then manipulated/visualized in various ways.  That's a great approach to what I listed as challenge 3(b) above!

You're right, respecting "linguistic rank" is what makes this possible.  Sentences are made up of clauses, and clauses are made up of phrases.  At the "lowest" level (in your example, phrases) every bit of the source text can be assigned a single code. Of course, that means there will be more challenges if the analyst is interested in something that doesn't respect linguistic rank!--like the preliminary markup you've done about repetitions in the source text.

Finally, to me one of the interests of these experiments is how Tinderbox can make the process easier, by automating parts of it. Along these lines, I'm happy to see that Paul's CodeLink method could work to automate prototype assignment, too.  It would take a Rule in the basic footnote prototype (or a suitable agent) with an action:

$Prototype=links.outbound.CodeLink.$Name
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Gettysburg: a TB textual analysis experiment
Reply #29 - Jul 12th, 2009, 11:25am
 
@Loryn (re reply #26). I think the problem is your export codes. Forget the manual's advice about closing ^ for export tags.  Unless you're expert, always close your export codes with a ^ or else you're asking TB to guess where the end of your code comes.

I think your templates 'ItemTitle' and 'Original' need correction.

ItemTitle
Code:
^if(eval($Prototype=="SourceTokens"))^^title^ ^else^^if(eval($EndTag))^</^get($Prototype)>^else^<^get($Prototype)>^endIf^^endIf^^endIf^ 



Original
Code:
^if(eval($Prototype=="SourceTokens"))^^title^ ^else^^if( eval($EndTag) )^</^get($Prototype)>^else^<^get($Prototype)>^endIf^ 



I've 'closed' all export code tags, and removed extra whitespace. The only place I think we need a space is after ^title^ so all the words don't run together. In post v4.6, we use '==' as the equality test and '=' to assign right side value to left side - contextually you can use '=' for both (old style) but this is deprecated.

From the examples you quote it was not quite clear what the problem is but I think the above fixes it. See how you get on.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Pages: 1 2 3 4 
Send Topic Print