Welcome, Guest. Please Login
Tinderbox
  News:
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
  HomeHelpSearchLogin  
 
Pages: 1 2 3 4
Send Topic Print
Gettysburg: a TB textual analysis experiment (Read 62293 times)
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Gettysburg: a TB textual analysis experiment
Jul 08th, 2009, 1:20pm
 
Jean Goodwin kindly posted a link to a TBX exploring the textual analyis discussed in the thread "Tinderbox for Textual Analysis". I've started this thread so as to allow the general discussion to continue on the original thread and allow TB-related issues of process and implementation to be discussed separately.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Gettysburg: a TB textual analysis experiment
Reply #1 - Jul 8th, 2009, 1:41pm
 
Are all the 'codes' supposed to be prototypes in their own right? I can see you've prototypes for codes/subcodes but I wonder if the actual codes/subcodes based on them need be prototypes too. If I'm wrong, my error, otherwise happy to help prune back prototype spread. I like your use of the new smart adornments.

In my previous text de-recompile demo of yesterday I didn't have time to look into capturing the 'atomic' order of things, OutlineOrder sufficed for the immediate purpose. However, if you seed things as you break down you could get Paragraph/sentence/word# (or in other contexts something like chapter/paragraph/sentence, page/paragraph, etc.). IF you split first to paragraph and use an agent to seed $Paragraph with the paragraph.  Then as you split down to sentence you can seed $Sentence, and so on.  There's no requirement to do this but if one has the need, it can be done. TB allows 2 level sort so an agent could find all of $Paragraph=2 and then sort by $Sentence and then $WordOrder, for example. Don't forget that if you only need an agent occasionally, turn it 'off' (set priority to zero); it will still retain the aliases it last held—even through the TBX closing/reopening—but won't contribute to the agent cycle time. I think this is some of what you allude to in you 'what can't be done (yet)' note.

Nakakoji (or HTML Export or Text windows) aren't - as of now - going to give you a richer mark-up of different text segments.  Nakakoji only shows plain text. There is a point where the degree of mechanical tear down might suggest use of a different tool as Mark B mentions in the main thread.

Thanks for sharing.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Paul Walters
Ex Member




Re: Gettysburg: a TB textual analysis experiment
Reply #2 - Jul 9th, 2009, 6:35am
 
Jean's TBX is a very interesting and helpful approach.

Rather than prototypes for each code, would it be useful to have a single prototype ("Code") with attributes that define the characteristics of an instance of Code (e.g., "ParentCode", "IsSubcode" (boolean), "CodeName", and so forth)?  This separation of data from structure is more work to manage, but perhaps more flexible.
Back to top
 
 
  IP Logged
Jean Goodwin
Full Member
*
Offline



Posts: 136
North Carolina
Re: Gettysburg: a TB textual analysis experiment
Reply #3 - Jul 9th, 2009, 9:05am
 
This TBX for coding texts does rely heavily on prototypes:  There is going to be a prototype for every main and sub-code.  Uh, in retrospect, I think I did this "on purpose."

1.  In this TBX, I don't think that what MarkA calls "prototype spread" will cause problems. The analyst is never going to have to hunt through a long list of prototypes looking for just the right one.  Instead, prototype assignment happens automatically, either from OnAdd or from drop-down menus that include only "contextually" relevant choices.  

2.  It's good practice in what we're calling "textual analysis" to clearly separate (a) the codes the analyst decides to use from (b) the actual work of applying them to the source text.  When I get partway through analyzing a long text (or a bunch of short texts) and then change my mind about the codes I am using, I am in big trouble! <--voice of bitter experience  Having one container for all the codes and another container for all the texts--it "physically" reinforces this vital separation.  It is less flexible (as Paul points out), and that's good.  Among other things, it forces the analyst to put thought into designing the main codes and at least some subcodes before starting to analyze the source text(s).

3. Prototypes are cool!  To paraphrase something Rich Shields said yesterday, the last time I did any programming (if that's what it's still called) was when C didn't have any +s.  My brain gets confused when I hide a lot of key information in attributes, and bury complex if/then/else actions in agents.  I like Tinderbox because it gives me a sort of visual programming language:  this prototype has these attributes, while this next one inherits them, but changes them in these ways.  At least some tasks can be accomplished either entirely by a series of interlocking agents, or by a cascade of prototypes with agents "in between" them;  in this case, I found the prototype-cascade easier to think.

I'm curious to see what the alternative would look like, so if you guys want to try a similar experiment minus the prototypes, get to work!  I think the one big plus of my experiment is the "contextual" drop-down menus which make code selection much easier--it'd be cool if you could preserve that.

As for the key problem of getting each coded passage an attribute which identifies where in the source text it was originally located:

Mark, it'd be cool if the "seeding" idea you're talking about would work with this "footnote" method of coding.  But I think that "seeding" depends on breaking the source text down before coding it, into separate notes (e.g., one for each sentence, "seeded" with an attribute that represents its sentence number)--am I getting that right?  That procedure would be possible, but a pain for the analyst, who would have to open each note in turn in order to code the content.

Keep thinking!  One big reason that this experiment won't be very useful for real tasks is that the notes with coded passages don't contain accessible information about where in the source text that passage was originally located.  So in this TBX I can't build a display of the results (or an agent) which puts passages which were next to each other in the source text, next to each other in the display.

Meanwhile, thanks for your interest and feedback!
Back to top
 
 
  IP Logged
Paul Walters
Ex Member




Re: Gettysburg: a TB textual analysis experiment
Reply #4 - Jul 9th, 2009, 9:54am
 
So here (http://drop.io/TBXCoding1) is a very rough sketch expressing my comment, above, on prototypes.  Borrowing from Jean, this does not contain the full-blown analysis and agents in Jean's submission.  I haven't looked at doing agents or maps with this as of yet.
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Gettysburg: a TB textual analysis experiment
Reply #5 - Jul 9th, 2009, 12:44pm
 
Jean.  On #1, I stand corrected!  This is where a conceptual grasp palls against practical experience of the technique.

By 'drop-downs' are you referring to the lists called from the right margin of each key attribute (i.e. those shown at the top of a note window)?  If so, it is worth bearing in mind that the list is drawn on the   fly from values used so far for that attributes. I think you've neatly got around the possible trap whereby so values aren't yet listed because they've not been used, by setting that value in the attribute in the prototype note. Thus in subcode prototype note 'world' (at path: /Codes/space/world), it's Subcode1 value is set to 'world'. This ensures 'world' is now always included in the drop down when  Subcode1 is a key attribute (KA). I only work this example as it's a neat trick that a newer TB user might miss - i.e. why some expected list values were missing.

Note 'world'—the subcode specimen—isn't required to be a prototype for the above to work. This isn't a better/worse issue; there's no problem with the current structure and as so often with TB more than one way. I just wondered what note would want to set 'world' as it's prototype!

A way for codes/subcodes to auto-set their code/subcode values would be to for their parent's OnAdd (or prototype inheritance) to set the note nsames as the value. So for a subcode, the parent's OnAdd would have $Subcode1=$Name. The only admin gotcha is if subsequently changing/correcting the note name you might need to manually correct the attribute as well. Why not use a rule instead to trap the latter case? It adds to the overall background work for TB as Rules fire each agent cycle (or so). If forgetful, like me you could simple make an agent like:
AgentQuery: $Prototype = "•subcode"
AgentAction: if{$Subcode1 != $Name){$Subcode1 = $Name}
You can turn the agent off when not working on code names then run it after a code name edit session allowing it to update anything you missed. Then if we're being efficient we might turn the priority down or off altogether as our needs dictate.

Re seeding passage position. If you create a Number type attribute and tick the 'sequential' option (best do this before adding content) then all your exploded text notes can always be sorted on that filed alone. The only thing to remember is that all notes get a number for their to you'll need to filter (agent!) out the notes you wish to reassemble, then sort on your sequential field; protypes can help here even if they do no customizing but simply invisibly mark notes os a certain type (source vs footnotes). The approach is useful if you want to see originally non-contiguous words/phrases presented on original passage order. This one's not really possible to retro-fit to existing content as you can't control how the numbers get handed out to existing pre-notes.

Quote:
One big reason that this experiment won't be very useful for real tasks is that the notes with coded passages don't contain accessible information about where in the source text that passage was originally located.

Does the sequential number, if correctly implemented, not address this?
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Jean Goodwin
Full Member
*
Offline



Posts: 136
North Carolina
Re: Gettysburg: a TB textual analysis experiment
Reply #6 - Jul 9th, 2009, 1:06pm
 
Thanks, Paul!  As usual, your experiment taught me a couple of things, like how to make notes that are invisible on the map, but not the outline.  Straightforward--now that I see it in action!  And you have made the coding document look simpler.

Now, can you show me how to make the coding process itself easy, by giving the passage notes the key attributes that will allow quick & accurate assignment of parent/subcodes.

That is simple enough for parent codes:  I can just add ParentCode as a key attribute to the Passage prototype. Then any time I select and code (footnote) a passage, I'll get a drop-down menu listing only the available parent codes--in this example, Space, Time, Voice and Miscellaneous.

But I can't see how to do the next step:  After I select a ParentCode for a passage, I want a new key attribute to appear, with a drop-down menu listing only the subcodes for that parent code.

In this example, that would mean:  After coding a passage as Time, I would get a drop-down menu with only Past, Present, and Future on it.

My mind boggles trying to think of how to do this, so I'd love to see what you come up with!
Back to top
 
 
  IP Logged
Paul Walters
Ex Member




Re: Gettysburg: a TB textual analysis experiment
Reply #7 - Jul 9th, 2009, 1:53pm
 
Jean, I was thinking that the attributes in the "Code" prototype were not actively used in the Passage prototype.  That is, ParentCode is an attribute of a Code, but not an active attribute of a passage -- e.g., the ParentCode of "US" is "Space".  

Perhaps this workflow helps clarify:

1) Place the source text in the Source Texts container (e.g., "Gettysburg")
2) Open a text window on that source
3) Highlight the passage of interest
4) Add a footnote as child
5) When the footnote is added, it will be selected in the Explorer or Outline view
6) Drag a link to the proper code in the Codes container; make the link type "CodeLink"
7) Repeat 6 as often as desired
8) Return to Step 3

I modified my experimental document (here at http://drop.io/TBXCoding2) to add a set attribute to the Passage prototype -- a rule in the Passage prototype causes the CodesForThisPassage set to be updated with a list of all the codes to which that passage is linked (requires a link type of "CodeLink").

I see that Mark A has a different approach, so I hope I am not pushing a wrong-headed concept.
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Gettysburg: a TB textual analysis experiment
Reply #8 - Jul 9th, 2009, 3:41pm
 
@Jean - here's a demo file of the PassageOrder creation concept: PassageOrder.tbx.zip

This demos just that concept, I've not attempted to put it in the textual analysis frame (mainly because my understanding of that is more theoretical than practical.

Thoughts:
  • You don't have to tear down to words in one go. If you split to paragraphs first (with/without tabs) the paragraphs would sort correctly on PassageOrder. As long as any subsequent paragraph split was done working down the paragraphs, all split notes at the lower level would sort - either per paragraph or as a whole on PassageOrder.
  • If you use Paul's method, and have a sequential number attribute, and if you make footnotes in passage order, the footnote could be sorted on PassageOrder and be passage sequential.
  • Using footnotes, the footnote takes the source note's anchor text as it's title (N.B. whitespace is trimmed), then an agent could find the desired footnotes, sort on PassageOrder. Now with a template showing ^title^+[space char], then in a Text Export view a you could show the source text for the (agent's collected) footnote notes.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Jean Goodwin
Full Member
*
Offline



Posts: 136
North Carolina
Re: Gettysburg: a TB textual analysis experiment
Reply #9 - Jul 9th, 2009, 5:20pm
 
Hi, Paul:  Now I see what you're doing!  My attempt was assigning codes to passages with the pull-down menus of key attributes;  you're using links.  That provides a very cunning, lightweight solution to the problem (and now I get the idea behind the links operator, which I'd been ignoring).

Here's one suggestion which might make using your document even easier.  Add something like the following to the Passage prototype's Rule:

$ParentCodeForThisPassage=links.outbound.CodeLink.$ParentCode

I think it's going to be useful to have every coded passage contain both parent and subcode (e.g., the code to set a hue, and the subcode to set the saturation).  And this way the person assigning the code doesn't have to link the passage to both.

I also think to make it useable by relative novices, it might be good to have more of the code/subcode attributes set up by prototypes and OnAdd actions, not by hand.  But hey, that would ruin the beautifully simple aesthetics of this document.

So I think we need to add your experiment as #5 on the list of "ways to use Tinderbox for textual analysis."
Back to top
 
 
  IP Logged
Jean Goodwin
Full Member
*
Offline



Posts: 136
North Carolina
Re: Gettysburg: a TB textual analysis experiment
Reply #10 - Jul 9th, 2009, 6:01pm
 
Hi, Mark:  Here's a bunch of replies to your helpful (as always) comments:

I made the subcodes into prototypes so they would transmit visual info (like the color gradient example I used).  I know the same work could be done by agents, but I was thinking about this document being used by a relative novice. ( That would include me in about a week, when I will have forgotten how I set the agents up, but would still know how to modify a prototype.)

So what about the more vital problem of getting position information into a coded passage?

You have totally persuaded me that Tinderbox can break down a text by word, sentence, paragraph--whatever!  I'll admit that because of my fear of things like runCommand, I'd probably do the text cleanup in BBEdit, but your efforts to get Tinderbox to do this are heroic!  And the "sequential" check-box when creating a number attribute--that's something I didn't know about, but which works perfectly to assign a permanent position record to each word (sentence, paragraph).

What you haven't yet persuaded me is that exploding the source text will work for textual analysis.  Sorry!

1.  To code a text, I need to read it carefully--and whole.  If I could add codes/footnotes to a text and then explode it, that would work!  Unfortunately, text links don't survive explosions.  

2.  If I knew that I was only going to code full sentences or single words, then I could use your method to explode down to that level.  Unfortunately, for many purposes coding needs to be more flexible:  sometimes I'm going to pick out a single word, sometimes a sentence, sometimes a phrase of 2, 3, 4 or however many words, sometimes maybe several sentences.  There's no single way to explode a text that can accommodate them all.

I will say this, though:  If I knew I was only going to code single words or full sentences, your method for breaking down the source text and Paul's method for assigning codes would work together very well.  I could have the list of codes in one window, the exploded text in another, and just draw links between them.

Anyhow, keep thinking!
Back to top
 
 
  IP Logged
Charles Turner
Full Member
*
Offline



Posts: 180
New York, USA
Re: Gettysburg: a TB textual analysis experiment
Reply #11 - Jul 9th, 2009, 11:50pm
 
I thought I'd post this for all to see, but will hold off comment for the moment:

http://www.vze26m98.net/tbx/gettysburg.html

Quote:
1.  To code a text, I need to read it carefully--and whole.  If I could add codes/footnotes to a text and then explode it, that would work!  Unfortunately, text links don't survive explosions.

Hi Jean- If you use a structured tag in the text, like the HTML above, you could explode using agents, I would imagine.

Best, Charles
Back to top
 
 
WWW   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Gettysburg: a TB textual analysis experiment
Reply #12 - Jul 10th, 2009, 6:11am
 
Useful demo, Charles, it both shows what some creative use of HTML export and do and the limitations Loryn originally flagged up, for instance being able to visualise overlapping (as opposed to nested) mark-up passages. Looking at the HTML source of Charles page, the colour's are set via CSS with in-paragraph blocks set via <span> tags. However, don't forget that TB's Create Link dialog allows you to set a 'class' which on HTML export becomes the CSS class used for the <a> tag written for the link.  If in the above example each colour block is also a link to a footnote, then when you set the link a 'note' link is automatically made. If you manually [sic] change the link type going to the footnote to a code type and set a class then if you make a page like Charles, the CSS could colour different code links accordingly.  Things to note: links are listed in Browse Links in order of creation; non-default link types can be added in Create Links and in Browse Links but only ones added in Create Links get added to the list of defined link types fro that TBX; link types don't remember a 'class' value - that must currently be set manually per link; overlapping link (anchor)s will always be problematic in an HTML export context - HTML limit not a TB one.

No time now to make a demo of the latter but I'll try if some time frees up.

I've deliberately restricted my examples to narrow tasks as I'm not proposing the TB can necessarily do the full Analysis in the style being proposed. Instead, I've simply (dis-)proved some assumptions about what TB can/can't do.

Quote:
Quote:
1.  To code a text, I need to read it carefully--and whole.  If I could add codes/footnotes to a text and then explode it, that would work!  Unfortunately, text links don't survive explosions.


Hi Jean- If you use a structured tag in the text, like the HTML above, you could explode using agents, I would imagine.


Charles is right that tags could aid explode - that's essentially how my split-source-to-word technique works, albeit using tabs rather than mark-up tags. However, we're pushing what explode was intended for: I've always understood it as a facilitator for import, like Edit -> Remove Line Breaks and a few other features. IOW, it's there to get text in the desired narrowness of context (sentence, phrase, word, etc.) before being worked on. The sort of transform from text<->words maintaining mark-up is something not in Tinderbox, as was discussed early on in the thread.

As Mark B has pointed out, to do the formal textual analysis process being talked about, it may be better to use a tool designed from outset for that purpose (I recall he mentions some names).

Quote:
2.  If I knew that I was only going to code full sentences or single words, then I could use your method to explode down to that level.  Unfortunately, for many purposes coding needs to be more flexible:  sometimes I'm going to pick out a single word, sometimes a sentence, sometimes a phrase of 2, 3, 4 or however many words, sometimes maybe several sentences.  There's no single way to explode a text that can accommodate them all.


Well, lets think laterally, you can break the source down several times, in different containers, to paragraph/sentence/phrase/word. I've shown you the mechanics of how that can be done - OK for phrases you'd need to do some manual mark-up and this chimes with the observation of whether in such context you're using the right overall tool. By having the text broken in several ways you access to different granularity of content. By moving some of the creation process to , say, a different root-level container you can hide away the mess of making the bits you need and the apparent. There's nothing but one's imagination that stops one concurrently filleting the source to two or more different levels of granularity. Of course, there no point-and-click process to do all this in one fell swoop but I think we know that now - we're practising the possible.

So some more pieces to the jigsaw, whilst remembering TB can't (currently) do the overall process as defined. But, if you're willing to trade some of the process and embrace constraints you do get to do the work in TB!
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Charles Turner
Full Member
*
Offline



Posts: 180
New York, USA
Re: Gettysburg: a TB textual analysis experiment
Reply #13 - Jul 10th, 2009, 7:10am
 
Quote:
the limitations Loryn originally flagged up, for instance being able to visualise overlapping (as opposed to nested) mark-up passages

It's worth pointing out that this limitation is a property of commonly used markup languages: SGML, XML, HTML. They all want single inheritance so they can build a structured parse-tree.

Best, C
Back to top
 
 
WWW   IP Logged
Paul Walters
Ex Member




Re: Gettysburg: a TB textual analysis experiment
Reply #14 - Jul 10th, 2009, 7:30am
 
An issue (discussed in these threads) with decomposing a text is putting the resulting atoms back together again in the proper order.  One concept is creating an index value for each atom that is the position of that atom within the overall text.  Having done that, one could sort the atoms by their index value and have a (close approximation) of the original.  One limitation to indexing is with atoms (e.g., single letters) that occur frequently.  But, putting that aside for the moment ...

I've created another tiny experiment (here at http://drop.io/FindPosition) that has Tinderbox run an OS X command, which in turn uses AppleScript, to find the index - that is, the position of a search string within targeted text.  For the purposes of the foregoing dialog, the "search string" would be the $Name of a Footnote.  The approach would be to put the index value into an attribute ("Pos") of that Footnote.  If an index value is determined and stored thus, the Footnote can be moved to any container in the document, used for other calculations, assist in defining "textspans" as proposed by Loryn, and so forth.

There are extensive notes in the file explaining what is going on.  All terminology is idiosyncraticly mine - I am not a textual analyst.
Back to top
 
 
  IP Logged
Pages: 1 2 3 4
Send Topic Print