Welcome, Guest. Please Login
Tinderbox
  News:
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
  HomeHelpSearchLogin  
 
Pages: 1 2 
Send Topic Print
How to parse names in TBX6? (Read 6627 times)
james a. foster
Full Member
*
Offline



Posts: 130

How to parse names in TBX6?
Sep 15th, 2014, 3:21pm
 
I have an attribute $Participant whose value is a person's name. It can be either "John Public" or "John Q. Public".

I also have $FirstName, $MiddleName, and $LastName attributes.

How do I parse $Participant in order to set the other attributes? I have tried regular expressions, but can't get the agent to use the value of $1, rather than the STRING $1.
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: How to parse names in TBX6?
Reply #1 - Sep 15th, 2014, 4:36pm
 
Use an agent. If $Participant is "John Smith"

Query:  $Participant.contains("^(\w+) (\w+)$")
Action:  $FirstName = $1; $LastName = $2;

You may need to vary the regex pattern to find other name formats, but that's the basic idea.
Back to top
 
« Last Edit: Sep 15th, 2014, 5:13pm by Mark Bernstein »  

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
David Bertenshaw
Full Member
*
Offline



Posts: 182

Re: How to parse names in TBX6?
Reply #2 - Sep 15th, 2014, 4:59pm
 
I've just tested this and it seems to work. But be aware -- it's so clunky that the sound you hear is real Tinderbox experts weeping...


Agent Query: $Participant.contains("(.*)\s(.*)\s(.*)")      

Agent Action: $FirstName=$1;$MiddleName=$2;$LastName=$3; if($LastName="$3"){$LastName=$MiddleName;$MiddleName=""};


NB there's no $ on Participant. Not sure why but it doesn't work with it....
Edited:
Admin - edit to replace deprecated syntax with current version


The \s if to check for white space, so it should pick up the John Q. Example.

The if statement in the action checks to see if there's no third name (i.e. Tinderbox can't find a third element in $Participant so it just reports $3 as a string) -- if so, it moves MiddleName to LastName and leaves MiddleName blank.


Of course, it doesn't deal with more complicated names...

As I said, clunky, but perhaps it will help till Mark A comes along and shows you how to do it properly!
Back to top
 
« Last Edit: Sep 15th, 2014, 5:23pm by Mark Anderson »  
  IP Logged
Mark Bernstein
YaBB Administrator
*
Offline

designer of
Tinderbox

Posts: 2871
Eastgate Systems, Inc.
Re: How to parse names in TBX6?
Reply #3 - Sep 15th, 2014, 5:13pm
 
Particpant(pattern) is true if pattern is found in $Participant(this).  It's equivalent to $Particpant.contains("pattern").

$Participant(whichNote) an attribute reference, returning the value of $Participant for the designated note.

Back to top
 
 
WWW   IP Logged
David Bertenshaw
Full Member
*
Offline



Posts: 182

Re: How to parse names in TBX6?
Reply #4 - Sep 15th, 2014, 5:19pm
 
Thanks -- I just relied on the example in the help file. Good to know the reasoning.
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: How to parse names in TBX6?
Reply #5 - Sep 15th, 2014, 5:25pm
 
@David. Please excuse the liberty of me editing your post, so as to change it to the current syntax. The old one works, but I think it more helpful to teach learners the current method. Both forms work, and can be looked up in the action code section of aTbRef.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Mark Bernstein
YaBB Administrator
*
Offline

designer of
Tinderbox

Posts: 2871
Eastgate Systems, Inc.
Re: How to parse names in TBX6?
Reply #6 - Sep 15th, 2014, 5:26pm
 
Parsing names is easy -- 90% of the time. The other 10% is really hard.  

A real life example: a college professor of mine was faced with a social dilemma. He'd received a letter from a colleague, G. E. M. de Ste. Croix, addressed "My Dear Tompkins".  He naturally wished to reply in kind. But how?  My Dear de Ste. Croix?  "My Dear Ste. Croix?"  Just "Croix?"  

My local library shelves fiction alphabetically by author. I was curious about the work of Edward St. Aubyn. Should I look for it on the A shelf, or the S shelf?  To make things worse, I recall that Swarthmore, where I went to school, shelves things non-traditionally thanks to the Quaker regard for titles. So I asked the librarian. She said, "I think in library school we were told to file it under "S".  But, as a practical matter, the question is whether whoever resolved the book thought "S" or "A", and there's no knowing that!

It's possible to do a good job of this, but it's not trivial.  Instead, I use one simple trick:

     $LastName |= $3

This means, "If the last name isn't specified, it's the third pattern. Otherwise, leave well enough alone."  That means you can let the system find the last name for the common cases, and manually enter the first and  last names when you run into Dr. Oliver Wendell Holmes, Sr., T. Woodrow Wilson, Tokugawa Ieyasu, or Nampeyo.
Back to top
 
« Last Edit: Sep 15th, 2014, 5:27pm by Mark Bernstein »  
WWW   IP Logged
David Bertenshaw
Full Member
*
Offline



Posts: 182

Re: How to parse names in TBX6?
Reply #7 - Sep 15th, 2014, 5:34pm
 
Mark A,

No problem at all -- I was hoping whatever I wrote would be improved...  I had a go at answering to test my knowledge not to pretend I'm an expert!

I didn't know the syntax was deprecated though -- I just used the example in the v5 manual. Thanks for the update.
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: How to parse names in TBX6?
Reply #8 - Sep 15th, 2014, 5:48pm
 
Well the deprecation isn't exactly formal, but part of an effort to push fok in the right direction. Those who've used the app for 10 years(!) will have seen a lot of evolution. Simply making old code not work wouldn't help then at all. So there's 'legacy' support for old syntax plus, unavoidably old forms live on  in articles online.

For he new aTbRef (which I hope to start when my course finishes) I aim to drop reference, where possible, to any old syntax to aid its decline. The aim in moving from AttrName(pattern) to $AttrName.contains("pattern") is that there's more coherent use of $ to reference attributes, something new users used to find confusing.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
james a. foster
Full Member
*
Offline



Posts: 130

Re: How to parse names in TBX6?
Reply #9 - Sep 15th, 2014, 6:03pm
 
wow! thanks everyone! I will try these. I think I was just forgetting to put double quotes around my regular expression.

btw, when are humans going to get DOIs or IP addresses? It would make life so much easier...
Back to top
 
 
  IP Logged
james a. foster
Full Member
*
Offline



Posts: 130

Re: How to parse names in TBX6?
Reply #10 - Sep 18th, 2014, 2:01pm
 
ok, that still doesn't work for me. I ACTUALLY need the query to select only the records I really want to parse, and then to parse them and set the names. In my case, I have an attribute called OType (for Output Type), and the participants' name is OParticipants. OParticipants is plural because I reuse that attribute elsewhere (to list participants on grants). So, OParticipants is actually a list! And the parser doesn't work on lists.

So, the actual desired query would be:

($OType=="Advisee")&($OParticipants.contains("(.*)\s(.*)\s(.*)"))

But parsing only the first item in the OParticipants list. Is there a way to parse on just the first item in a list?
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: How to parse names in TBX6?
Reply #11 - Sep 18th, 2014, 2:42pm
 
Consider a list (or set) $MyList, and string $MyString. To get a string representing the first item of the list:

$MyString = $MyList.at(0)

You can then parse the isolated string. See more on List.at().
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
james a. foster
Full Member
*
Offline



Posts: 130

Re: How to parse names in TBX6?
Reply #12 - Sep 19th, 2014, 12:09pm
 
OK, a quick followup. The code above does NOT work, because the agent only gathers aliases to notes which match the regular expression. The above regular expression only matches names with three parts, and we want to match those with either 2 or 3. AND the pattern gets confused by line ends if we aren't careful.

Also, my OType attribute is also a list, since a note can be of more than one type. So I need to treat it like a set. Also, the default for $3 is not "" rather than "$3". So, this works:

Query: ($OType.contains("Advisee"))&(!$FullName)&($OParticipants.at(0).contains("^(\S+)
\s(\S+)\s?(\S*)$"))

Action: $FirstName=$1;$MiddleName=$2;$LastName=$3; if($LastName=""){$LastName=$MiddleName;$MiddleName=""};$FullName=$1+" "+$2+" "+$3;

This picks off the first item in the OParticipants list when this note is of type "Advisee", only pickups up $3 if it's there (and leaves it blank otherwise), and only collects notes of the right type.

I didn't include the "|=" trick, just to keep it clear for myself.

Thanks everyone!!
Back to top
 
« Last Edit: Sep 19th, 2014, 12:22pm by james a. foster »  
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: How to parse names in TBX6?
Reply #13 - Sep 19th, 2014, 2:14pm
 
Aside: don't get too hung up on list vs sets. They're the same except sets don't allow duplicate values. Think of them as de-duped lists.

I'm glad you found a fix, but for later readers it's worth pointing out that regex are complicated and do what you ask, not what you think you asked. Some test and tune is required. Bear in mind, there's no reference TBX for this task so those helping are having to best-guess what you're file looks like.

Trying to write one regex that does everything can be a pointless quest for the non-expert (speaking from experience!). Still, if you can write 3 less complex regex that separately find all you need just run 3 agents and don't waste time trying to find the one-size-fits-all. I know it feels like defeat but complex regex are hard to write.  That the human eye/brain pull complex patterns from the page with ease is put in context by the complexity of writing a regex to do the same.  Smiley
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
james a. foster
Full Member
*
Offline



Posts: 130

Re: How to parse names in TBX6?
Reply #14 - Sep 19th, 2014, 2:17pm
 
It is also worth reminding TBX users that if you use complicated queries and actions you have a big chance of screwing up your database while trying things out.

Always make a copy of your database before messing around with these sort of things!
Back to top
 
 
  IP Logged
Pages: 1 2 
Send Topic Print