Tinderbox User-to-User Forum (for formal tech support please email: info@eastgate.com)
http://www.eastgate.com/Tinderbox/forum//YaBB.cgi
Tinderbox Users >> Tinderbox applications >> Regex question
http://www.eastgate.com/Tinderbox/forum//YaBB.cgi?num=1272623348

Message started by Stephen Wu on Apr 30th, 2010, 6:29am

Title: Regex question
Post by Stephen Wu on Apr 30th, 2010, 6:29am

Hi all,

a quick question for users who are conversant in RegEx.
I have a lot of notes that refer to books and articles which I've named thus:

"Author Author2 Year - Title"   [e.g. "Smith 2008 - The Idea of a State", 'Author 2' is optional]

I'd like to make key attributes which contain the author(s) [as set], year, and title for each entry, in order to sort on them more easily.
My early attempts at doing this with RegEx fails miserably, although I'm sure it shouldn't be that hard at all.

Tips appreciated

Title: Re: Regex question
Post by Charles Turner on Apr 30th, 2010, 6:51am

Hi Stephen-

It'd be great to get more info about the match, but assuming that author names are last names only, and don't contain accented characters; dates are always 4 character year; etc. you could start with this:

^([a-z]+) ([0-9]{4}) - ([a-z ]+)

Unless you want to build logic to act on a quiz of the match groups, I'd do this in two passes, one for "1 author" (above) and a second pass for "2 authors."

In the above, the "^" is important, otherwise the regex will also match citations of "Author2 Date - Title," which you don't want.

You might find this little util handy:

http://reggyapp.com/

Set it to Perl regular expressions.

I'm sure there are other ways to solve your issue.

HTH, Charles

Title: Re: Regex question
Post by Mark Anderson on Apr 30th, 2010, 8:49am

Agree with Charles that you want to think carefully about the exactly what characters you need to allow for and to possible break the regex process into several chunks.

If you do need to allow for accents, you could parse on spaces but then you must consider un-hyphenated double-barrel (or longer) surnames.  To some extent a law of diminishing returns applies.  If it's all too difficult to split out all names, consider quoting 'difficult' names to make the clean-up easier. You may want to balance some otherwise undesired mark-up with too much time spent fine tuning regex (hard for those of us inexpert with regex).

Title: Re: Regex question
Post by James Barrett on Apr 30th, 2010, 1:09pm

I'm just beginning with Tinderbox, so please indulge me.

This thread caught my eye because I've been wondering if it's possible to get Tinderbox to assign a value to an attribute based upon what's in the text of a note.

The implied second step of Stephen's question is to (get Tinderbox to ?) assign the found expression(s) to the value(s) of certain attributes. Is this possible?

By the way: Nisus provides tools that allow a virtual nincompoop to generate regex.

Title: Re: Regex question
Post by Mark Anderson on Apr 30th, 2010, 1:15pm


Quote:
I've been wondering if it's possible to get Tinderbox to assign a value to an attribute based upon what's in the text of a note


In principle 'yes', but practically it rather depends on the unstated assumptions in your thinking. Could you perhaps expand a little or describe a practical test. There isn't an automated regex maker in case you were wondering.

Title: Re: Regex question
Post by James Barrett on Apr 30th, 2010, 2:20pm

My question is more about principle than about a particular application, but the question that began this thread is close to something I might find useful. For example, I use Bookends and typically (in the past anyway, and as I think about adapting to Tinderbox I realize things may change) I paste a temporary citation marker in the text I'm writing. These have a particular format that would be readily found by a regex, say {!Alliez and Cassin, 1992, #19817}. I know how to find any notes with text containing this string, but I don't know how to manipulate that. Stephen is interested in sorting, for example. That's one use. I can think of others, but I'm hoping that if I understand the principle involved—how to identify what's found by the search operation—and make it an attribute value or use it in an agent action, that I'd be able to figure out specific tasks from there. But I may be simplifying. Is this enough to go on?
Thanks

Title: Re: Regex question
Post by Mark Anderson on Apr 30th, 2010, 3:53pm

I don't think there are any plans for sorting on Regex.  Regex can be used for querying - as in deciding what agents can find.  Regex, via a command line can be used in action code to set attribute values.

A Regex query will only identify whole notes. If your regex works in the context of a Find dialog then opening the note from the find should allow you to cycle through all matches (citations).

Does that help clarify things?

Title: Re: Regex question
Post by Mark Bernstein on Apr 30th, 2010, 5:13pm

You could, of course, use regular expressions to populate an attribute and then sort on that attribute.  For example, you could search for a citation marker {Author date page}, extract the author, and then store the result in $Author.

Title: Re: Regex question
Post by James Barrett on Apr 30th, 2010, 8:39pm


Quote:
You could, of course, use regular expressions to populate an attribute and then sort on that attribute.  For example, you could search for a citation marker {Author date page}, extract the author, and then store the result in $Author.


This last is what I was trying to ask.
How would I "extract the author" in such a case so as to be able to populate an attribute with the value found? I'm sure it's simple, but not obvious to me at this point.
Thanks

Title: Re: Regex question
Post by Charles Turner on Apr 30th, 2010, 9:08pm

Hi James-

You want to look at "groups," which you delimit in your regex by placing parentheses around more-or-less arbitrary pieces of your query. (You'll notice them in my original response to Stephen)

Once you have a match, the various groups can be referred to with the  $0, $1, $2, etc. notation. $0 designates the entire matched string, $1 the first matched group, etc.

From there, you could use if/then/else statements to perform specific actions based on the outcome of your tests. Also, as the Marks mentioned, you can simply assign the groups to specific attributes, or concatenate them, etc. etc.

HTH, Charles

Title: Re: Regex question
Post by James Barrett on May 1st, 2010, 5:11am

Aha. That's just what I needed to know.
I didn't realize that one could refer this way to the results of the search, with $0, etc. (I'll read up on this.)
This is great.  Thanks.

Title: Re: Regex question
Post by Charles Turner on May 1st, 2010, 8:16am

This book is a pretty good intro to Regular Expressions if you're into such things:

http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=books&qid=1272715913&sr=1-1

HTH, Charles



Title: Re: Regex question
Post by Manuel Richard on Sep 21st, 2010, 5:58am

Hello,

I am interested in implemented the suggestion of Charles Turner above, who suggests to use the query:

^([a-z]+) ([0-9]{4}) - ([a-z ]+)

and then $0, $1, and $2 to get the field found by the query.

Although I am quite familiar with Tinderbox coding by now, this one gets me. Could someone please explain where exactly the query should be inserted (I am assuming this is meant to be an agent). I don't seem to get it to work.

Thanks a lot,
Manuel

Title: Re: Regex question
Post by Mark Anderson on Sep 21st, 2010, 7:15am

Basically, a regex query (more) creates one or more 'back-references'. In an agent, the AgentAction code can refer to the AgentQuery's back-references via $0, $1, etc. $0 is always the whole matched string.

It is described in TB Help. In the sidebar, click "Full Table of Contents" then in the list at right, click "Actions" and scroll about 80% down the page to the heading "Using Results from Regular Expressions".

It's also here aTbRef - but I've made a note to add something more on the topic. done!

Also see the wiki page RegularExpressions. I've also just updated the code on that page to post-v4.6.0 format (i.e. $ prefixes for attribute references. The last example on the page is a good one as it shows real-world use of multiple back references to set attributes.

Tinderbox User-to-User Forum (for formal tech support please email: info@eastgate.com) » Powered by YaBB 2.2.1!
YaBB © 2000-2008. All Rights Reserved.