Welcome, Guest. Please Login
Tinderbox
  News:
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
  HomeHelpSearchLogin  
 
Pages: 1
Send Topic Print
Regex question (Read 11116 times)
Stéphane R
Full Member
*
Offline



Posts: 71

Regex question
Apr 30th, 2010, 6:29am
 
Hi all,

a quick question for users who are conversant in RegEx.
I have a lot of notes that refer to books and articles which I've named thus:

"Author Author2 Year - Title"   [e.g. "Smith 2008 - The Idea of a State", 'Author 2' is optional]

I'd like to make key attributes which contain the author(s) [as set], year, and title for each entry, in order to sort on them more easily.
My early attempts at doing this with RegEx fails miserably, although I'm sure it shouldn't be that hard at all.

Tips appreciated
Back to top
 
 
  IP Logged
Charles Turner
Full Member
*
Offline



Posts: 180
New York, USA
Re: Regex question
Reply #1 - Apr 30th, 2010, 6:51am
 
Hi Stephen-

It'd be great to get more info about the match, but assuming that author names are last names only, and don't contain accented characters; dates are always 4 character year; etc. you could start with this:

^([a-z]+) ([0-9]{4}) - ([a-z ]+)

Unless you want to build logic to act on a quiz of the match groups, I'd do this in two passes, one for "1 author" (above) and a second pass for "2 authors."

In the above, the "^" is important, otherwise the regex will also match citations of "Author2 Date - Title," which you don't want.

You might find this little util handy:

http://reggyapp.com/

Set it to Perl regular expressions.

I'm sure there are other ways to solve your issue.

HTH, Charles
Back to top
 
 
WWW   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Regex question
Reply #2 - Apr 30th, 2010, 8:49am
 
Agree with Charles that you want to think carefully about the exactly what characters you need to allow for and to possible break the regex process into several chunks.

If you do need to allow for accents, you could parse on spaces but then you must consider un-hyphenated double-barrel (or longer) surnames.  To some extent a law of diminishing returns applies.  If it's all too difficult to split out all names, consider quoting 'difficult' names to make the clean-up easier. You may want to balance some otherwise undesired mark-up with too much time spent fine tuning regex (hard for those of us inexpert with regex).
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
JB
Full Member
*
Offline



Posts: 16
Maine
Re: Regex question
Reply #3 - Apr 30th, 2010, 1:09pm
 
I'm just beginning with Tinderbox, so please indulge me.

This thread caught my eye because I've been wondering if it's possible to get Tinderbox to assign a value to an attribute based upon what's in the text of a note.

The implied second step of Stephen's question is to (get Tinderbox to ?) assign the found expression(s) to the value(s) of certain attributes. Is this possible?

By the way: Nisus provides tools that allow a virtual nincompoop to generate regex.
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Regex question
Reply #4 - Apr 30th, 2010, 1:15pm
 
Quote:
I've been wondering if it's possible to get Tinderbox to assign a value to an attribute based upon what's in the text of a note


In principle 'yes', but practically it rather depends on the unstated assumptions in your thinking. Could you perhaps expand a little or describe a practical test. There isn't an automated regex maker in case you were wondering.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
JB
Full Member
*
Offline



Posts: 16
Maine
Re: Regex question
Reply #5 - Apr 30th, 2010, 2:20pm
 
My question is more about principle than about a particular application, but the question that began this thread is close to something I might find useful. For example, I use Bookends and typically (in the past anyway, and as I think about adapting to Tinderbox I realize things may change) I paste a temporary citation marker in the text I'm writing. These have a particular format that would be readily found by a regex, say {!Alliez and Cassin, 1992, #19817}. I know how to find any notes with text containing this string, but I don't know how to manipulate that. Stephen is interested in sorting, for example. That's one use. I can think of others, but I'm hoping that if I understand the principle involved—how to identify what's found by the search operation—and make it an attribute value or use it in an agent action, that I'd be able to figure out specific tasks from there. But I may be simplifying. Is this enough to go on?
Thanks
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Regex question
Reply #6 - Apr 30th, 2010, 3:53pm
 
I don't think there are any plans for sorting on Regex.  Regex can be used for querying - as in deciding what agents can find.  Regex, via a command line can be used in action code to set attribute values.

A Regex query will only identify whole notes. If your regex works in the context of a Find dialog then opening the note from the find should allow you to cycle through all matches (citations).

Does that help clarify things?
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Mark Bernstein
YaBB Administrator
*
Offline

designer of
Tinderbox

Posts: 2871
Eastgate Systems, Inc.
Re: Regex question
Reply #7 - Apr 30th, 2010, 5:13pm
 
You could, of course, use regular expressions to populate an attribute and then sort on that attribute.  For example, you could search for a citation marker {Author date page}, extract the author, and then store the result in $Author.
Back to top
 
 
WWW   IP Logged
JB
Full Member
*
Offline



Posts: 16
Maine
Re: Regex question
Reply #8 - Apr 30th, 2010, 8:39pm
 
Quote:
You could, of course, use regular expressions to populate an attribute and then sort on that attribute.  For example, you could search for a citation marker {Author date page}, extract the author, and then store the result in $Author.


This last is what I was trying to ask.
How would I "extract the author" in such a case so as to be able to populate an attribute with the value found? I'm sure it's simple, but not obvious to me at this point.
Thanks
Back to top
 
 
  IP Logged
Charles Turner
Full Member
*
Offline



Posts: 180
New York, USA
Re: Regex question
Reply #9 - Apr 30th, 2010, 9:08pm
 
Hi James-

You want to look at "groups," which you delimit in your regex by placing parentheses around more-or-less arbitrary pieces of your query. (You'll notice them in my original response to Stephen)

Once you have a match, the various groups can be referred to with the  $0, $1, $2, etc. notation. $0 designates the entire matched string, $1 the first matched group, etc.

From there, you could use if/then/else statements to perform specific actions based on the outcome of your tests. Also, as the Marks mentioned, you can simply assign the groups to specific attributes, or concatenate them, etc. etc.

HTH, Charles
Back to top
 
 
WWW   IP Logged
JB
Full Member
*
Offline



Posts: 16
Maine
Re: Regex question
Reply #10 - May 1st, 2010, 5:11am
 
Aha. That's just what I needed to know.
I didn't realize that one could refer this way to the results of the search, with $0, etc. (I'll read up on this.)
This is great.  Thanks.
Back to top
 
 
  IP Logged
Charles Turner
Full Member
*
Offline



Posts: 180
New York, USA
Re: Regex question
Reply #11 - May 1st, 2010, 8:16am
 
This book is a pretty good intro to Regular Expressions if you're into such things:

http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124...

HTH, Charles


Back to top
 
 
WWW   IP Logged
Stephane
Full Member
*
Offline



Posts: 71
@istib
Re: Regex question
Reply #12 - Sep 21st, 2010, 5:58am
 
Hello,

I am interested in implemented the suggestion of Charles Turner above, who suggests to use the query:

^([a-z]+) ([0-9]{4}) - ([a-z ]+)

and then $0, $1, and $2 to get the field found by the query.

Although I am quite familiar with Tinderbox coding by now, this one gets me. Could someone please explain where exactly the query should be inserted (I am assuming this is meant to be an agent). I don't seem to get it to work.

Thanks a lot,
Manuel
Back to top
 
 
WWW   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Regex question
Reply #13 - Sep 21st, 2010, 7:15am
 
Basically, a regex query (more) creates one or more 'back-references'. In an agent, the AgentAction code can refer to the AgentQuery's back-references via $0, $1, etc. $0 is always the whole matched string.

It is described in TB Help. In the sidebar, click "Full Table of Contents" then in the list at right, click "Actions" and scroll about 80% down the page to the heading "Using Results from Regular Expressions".

It's also here aTbRef - but I've made a note to add something more on the topic. done!

Also see the wiki page RegularExpressions. I've also just updated the code on that page to post-v4.6.0 format (i.e. $ prefixes for attribute references. The last example on the page is a good one as it shows real-world use of multiple back references to set attributes.
Back to top
 
« Last Edit: Sep 21st, 2010, 12:05pm by Mark Anderson »  

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Pages: 1
Send Topic Print