Tinderbox User-to-User Forum (for formal tech support please email: info@eastgate.com)
http://www.eastgate.com/Tinderbox/forum//YaBB.cgi
Tinderbox Users >> Agent, Actions, Rules & Automation >> How to parse names in TBX6?
http://www.eastgate.com/Tinderbox/forum//YaBB.cgi?num=1410808883

Message started by james a. foster on Sep 15th, 2014, 3:21pm

Title: How to parse names in TBX6?
Post by james a. foster on Sep 15th, 2014, 3:21pm

I have an attribute $Participant whose value is a person's name. It can be either "John Public" or "John Q. Public".

I also have $FirstName, $MiddleName, and $LastName attributes.

How do I parse $Participant in order to set the other attributes? I have tried regular expressions, but can't get the agent to use the value of $1, rather than the STRING $1.

Title: Re: How to parse names in TBX6?
Post by Mark Anderson on Sep 15th, 2014, 4:36pm

Use an agent. If $Participant is "John Smith"

Query:  $Participant.contains("^(\w+) (\w+)$")
Action:  $FirstName = $1; $LastName = $2;

You may need to vary the regex pattern to find other name formats, but that's the basic idea.

Title: Re: How to parse names in TBX6?
Post by David Bertenshaw on Sep 15th, 2014, 4:59pm

I've just tested this and it seems to work. But be aware -- it's so clunky that the sound you hear is real Tinderbox experts weeping...


Agent Query: $Participant.contains("(.*)\s(.*)\s(.*)")      

Agent Action: $FirstName=$1;$MiddleName=$2;$LastName=$3; if($LastName="$3"){$LastName=$MiddleName;$MiddleName=""};


NB there's no $ on Participant. Not sure why but it doesn't work with it....
[edit]Admin - edit to replace deprecated syntax with current version[/edit]

The \s if to check for white space, so it should pick up the John Q. Example.

The if statement in the action checks to see if there's no third name (i.e. Tinderbox can't find a third element in $Participant so it just reports $3 as a string) -- if so, it moves MiddleName to LastName and leaves MiddleName blank.


Of course, it doesn't deal with more complicated names...

As I said, clunky, but perhaps it will help till Mark A comes along and shows you how to do it properly!

Title: Re: How to parse names in TBX6?
Post by Mark Bernstein on Sep 15th, 2014, 5:13pm

Particpant(pattern) is true if pattern is found in $Participant(this).  It's equivalent to $Particpant.contains("pattern").

$Participant(whichNote) an attribute reference, returning the value of $Participant for the designated note.


Title: Re: How to parse names in TBX6?
Post by David Bertenshaw on Sep 15th, 2014, 5:19pm

Thanks -- I just relied on the example in the help file. Good to know the reasoning.

Title: Re: How to parse names in TBX6?
Post by Mark Anderson on Sep 15th, 2014, 5:25pm

@David. Please excuse the liberty of me editing your post, so as to change it to the current syntax. The old one works, but I think it more helpful to teach learners the current method. Both forms work, and can be looked up in the action code section of aTbRef.

Title: Re: How to parse names in TBX6?
Post by Mark Bernstein on Sep 15th, 2014, 5:26pm

Parsing names is easy -- 90% of the time. The other 10% is really hard.  

A real life example: a college professor of mine was faced with a social dilemma. He'd received a letter from a colleague, G. E. M. de Ste. Croix, addressed "My Dear Tompkins".  He naturally wished to reply in kind. But how?  My Dear de Ste. Croix?  "My Dear Ste. Croix?"  Just "Croix?"  

My local library shelves fiction alphabetically by author. I was curious about the work of Edward St. Aubyn. Should I look for it on the A shelf, or the S shelf?  To make things worse, I recall that Swarthmore, where I went to school, shelves things non-traditionally thanks to the Quaker regard for titles. So I asked the librarian. She said, "I think in library school we were told to file it under "S".  But, as a practical matter, the question is whether whoever resolved the book thought "S" or "A", and there's no knowing that!

It's possible to do a good job of this, but it's not trivial.  Instead, I use one simple trick:

     $LastName |= $3

This means, "If the last name isn't specified, it's the third pattern. Otherwise, leave well enough alone."  That means you can let the system find the last name for the common cases, and manually enter the first and  last names when you run into Dr. Oliver Wendell Holmes, Sr., T. Woodrow Wilson, Tokugawa Ieyasu, or Nampeyo.

Title: Re: How to parse names in TBX6?
Post by David Bertenshaw on Sep 15th, 2014, 5:34pm

Mark A,

No problem at all -- I was hoping whatever I wrote would be improved...  I had a go at answering to test my knowledge not to pretend I'm an expert!

I didn't know the syntax was deprecated though -- I just used the example in the v5 manual. Thanks for the update.

Title: Re: How to parse names in TBX6?
Post by Mark Anderson on Sep 15th, 2014, 5:48pm

Well the deprecation isn't exactly formal, but part of an effort to push fok in the right direction. Those who've used the app for 10 years(!) will have seen a lot of evolution. Simply making old code not work wouldn't help then at all. So there's 'legacy' support for old syntax plus, unavoidably old forms live on  in articles online.

For he new aTbRef (which I hope to start when my course finishes) I aim to drop reference, where possible, to any old syntax to aid its decline. The aim in moving from AttrName(pattern) to $AttrName.contains("pattern") is that there's more coherent use of $ to reference attributes, something new users used to find confusing.

Title: Re: How to parse names in TBX6?
Post by james a. foster on Sep 15th, 2014, 6:03pm

wow! thanks everyone! I will try these. I think I was just forgetting to put double quotes around my regular expression.

btw, when are humans going to get DOIs or IP addresses? It would make life so much easier...

Title: Re: How to parse names in TBX6?
Post by james a. foster on Sep 18th, 2014, 2:01pm

ok, that still doesn't work for me. I ACTUALLY need the query to select only the records I really want to parse, and then to parse them and set the names. In my case, I have an attribute called OType (for Output Type), and the participants' name is OParticipants. OParticipants is plural because I reuse that attribute elsewhere (to list participants on grants). So, OParticipants is actually a list! And the parser doesn't work on lists.

So, the actual desired query would be:

($OType=="Advisee")&($OParticipants.contains("(.*)\s(.*)\s(.*)"))

But parsing only the first item in the OParticipants list. Is there a way to parse on just the first item in a list?

Title: Re: How to parse names in TBX6?
Post by Mark Anderson on Sep 18th, 2014, 2:42pm

Consider a list (or set) $MyList, and string $MyString. To get a string representing the first item of the list:

$MyString = $MyList.at(0)

You can then parse the isolated string. See more on List.at().

Title: Re: How to parse names in TBX6?
Post by james a. foster on Sep 19th, 2014, 12:09pm

OK, a quick followup. The code above does NOT work, because the agent only gathers aliases to notes which match the regular expression. The above regular expression only matches names with three parts, and we want to match those with either 2 or 3. AND the pattern gets confused by line ends if we aren't careful.

Also, my OType attribute is also a list, since a note can be of more than one type. So I need to treat it like a set. Also, the default for $3 is not "" rather than "$3". So, this works:

Query: ($OType.contains("Advisee"))&(!$FullName)&($OParticipants.at(0).contains("^(\S+)\s(\S+)\s?(\S*)$"))

Action: $FirstName=$1;$MiddleName=$2;$LastName=$3; if($LastName=""){$LastName=$MiddleName;$MiddleName=""};$FullName=$1+" "+$2+" "+$3;

This picks off the first item in the OParticipants list when this note is of type "Advisee", only pickups up $3 if it's there (and leaves it blank otherwise), and only collects notes of the right type.

I didn't include the "|=" trick, just to keep it clear for myself.

Thanks everyone!!

Title: Re: How to parse names in TBX6?
Post by Mark Anderson on Sep 19th, 2014, 2:14pm

Aside: don't get too hung up on list vs sets. They're the same except sets don't allow duplicate values. Think of them as de-duped lists.

I'm glad you found a fix, but for later readers it's worth pointing out that regex are complicated and do what you ask, not what you think you asked. Some test and tune is required. Bear in mind, there's no reference TBX for this task so those helping are having to best-guess what you're file looks like.

Trying to write one regex that does everything can be a pointless quest for the non-expert (speaking from experience!). Still, if you can write 3 less complex regex that separately find all you need just run 3 agents and don't waste time trying to find the one-size-fits-all. I know it feels like defeat but complex regex are hard to write.  That the human eye/brain pull complex patterns from the page with ease is put in context by the complexity of writing a regex to do the same.  :)

Title: Re: How to parse names in TBX6?
Post by james a. foster on Sep 19th, 2014, 2:17pm

It is also worth reminding TBX users that if you use complicated queries and actions you have a big chance of screwing up your database while trying things out.

Always make a copy of your database before messing around with these sort of things!

Title: Re: How to parse names in TBX6?
Post by Mark Anderson on Sep 19th, 2014, 5:48pm

Indeed,. Even better, make new small test doc with minimal info needed to test the problem. Either make a new doc, or if you need lots of the current doc's structure, attributes, etc., make a copy and then delete most of the general data. Having a small test doc makes it easier to see what works - or doesn't.

I'd love to say I always follow my advice here.  what I can say is that it's generally when I figure no test is needed that I break stuff I wish I hadn't.

This shouldn't scare new users. Undo, back-ups and Time Machine nearly always save your bacon, after a heart-in-mouth moment. Testing in a deliberate test doc is less stressful.

Tinderbox User-to-User Forum (for formal tech support please email: info@eastgate.com) » Powered by YaBB 2.2.1!
YaBB © 2000-2008. All Rights Reserved.