Welcome, Guest. Please Login
Tinderbox
  News:
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
  HomeHelpSearchLogin  
 
Pages: 1
Send Topic Print
Another basic Q: match first character in string (Read 3890 times)
J Fallows
Full Member
*
Offline



Posts: 418

Another basic Q: match first character in string
Mar 4th, 2014, 3:01pm
 
I have studied the aTbRef page on this topic, http://www.acrobatfaq.com/atbref5/index/ActionsRules/Operators/FullOperatorList/..., but still feel as if I could use advice. (Also tried this http://www.boost.org/doc/libs/1_34_1/libs/regex/doc/syntax_perl.html .)

Here is the (very simple-seeming) challenge: I have a whole range of values beginning with a specific letter. Let's say they are:
KDEN
KSEA
KATL
and so on.

They are mixed among other values that do not start with K.

I would like to convert all the K-beginning values to the characters that follow. Eg KDEN->DEN, KSEA->SEA and so on.

What is the proper form for posing a query that matches a particular first character of a value? And does it differ if the value is String or Set? (To formalize this, for any 4-character value that is now Kxxx, I would like to convert it to the 3-character value xxx.)

(I ask 50/50 for my own info and in case others wonder.) Thanks in advance.
Back to top
 
« Last Edit: Mar 04th, 2014, 3:06pm by J Fallows »  
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Another basic Q: match first character in string
Reply #1 - Mar 4th, 2014, 5:41pm
 
To give a full answer we need to know (a) in what attribute these codes are stored and (b) if the attribute is a of type String, List or Set.

So, I'll assume the values are stored in a String-type attribute $MyString. As it is a String, we know there is only one value of $MyString per note. I'm testing in TB v5.12.2

Thus our agent's query is:

$MyString.contains("^K")

This matches any note where $MyString starts with a K, i.e. it is not a case sensitive match. so, on the agent's Rename dialog, tick the "Case sensitive" tick box to make the match case sensitive. Now, it won't match an empty attribute, "kden", "kDEN", or "BTHL". But it would match "KDE" or "KDEEN". To deal with the latter we should be able to use:

$MyString.contains("^K([A-Z]{3})$")

The parentheses () are to set a back-reference to part of the matched string so we can use it in the agent action:

$MyStringA = $1

The $1 value is it the 3 letters after the K captured in the () in the agent query (see more). I've used a new attribute $MyStringA so you can test the outcome without damaging the source data. Once happy, you can delete the $MyStringA attribute change the agent action to:

$MyString = $1

Here's a screengrab of my test:

Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
J Fallows
Full Member
*
Offline



Posts: 418

Re: Another basic Q: match first character in string
Reply #2 - Mar 4th, 2014, 6:07pm
 
Thank you! Will put this to use. Appreciate it, as I imagine future people who come across this explanation will too.

Update This works like a charm! Including the surprising $1 back-reference variable, which I had not known about but will now use.
Back to top
 
« Last Edit: Mar 4th, 2014, 10:07pm by J Fallows »  
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Another basic Q: match first character in string
Reply #3 - Mar 5th, 2014, 7:16am
 
It's probably worth mentioning the $ in the agent query.  In a regex ^ is start of string and $ is end of string. Earlier, I tested this:

$MyString.contains("^K([A-Z]{3})")

Which says "match from the start of a string, a capital K followed by exactly 3 uppercase letters". To my surprise, in my Tb test it matched the string KDEEN. What? That is K plus four uppercase letters. I think the issue is to do with greedy and non-greedy regex. That's a bit beyond me. The penny dropped, though, that I needed to address the rest of the string (if present) after the desired match. The simplest solution as your test case was for a fixed syntax of 4 character codes, was to add the $ regex character:

$MyString.contains("^K([A-Z]{3})$")

This now says "exactly match the whole string from start to finish, where it consists of a capital K followed by exactly 3 uppercase letters". In the test this now excludes KKK (to few characters) and KDEEN (too many).

I thought it worth adding the extra in as it's an issue I often trip over as an occasional regex user in TB where I only want to match the beginning of a string and may efforts fail because I don't take into account matching - or not of the remaining part of the string.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Pete Becker
Full Member
*
Offline



Posts: 23

Re: Another basic Q: match first character in string
Reply #4 - Mar 5th, 2014, 8:44am
 
It's not greedy vs. non-greedy: the {3} says to match exactly three characters. But there are, in general, two ways to approach matching: (1) does the string contain what I'm looking for, and (2) does the string consist only of what I'm looking for. In C++ this difference is embodied in the `regex_search` and `regex_match` functions. Presumably, TB does a search; you can force a full match by wrapping the search text between ^ and $.
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Another basic Q: match first character in string
Reply #5 - Mar 5th, 2014, 11:23am
 
Thanks for the correction. The fact one is doing a 'contains' should tip one off that you need to use ^ and $, with appropriate regex in between, to ensure a full string match.  Smiley

The is no method to do an '=' match, something like:

$SomeStringAttr = regex("K[A-Z]{3}")

and have it match only 4 character values that match the regex. To be clear, there is no such TB operator as regex(), it's just a made-up example.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Pages: 1
Send Topic Print