Welcome, Guest. Please Login
Tinderbox
  News:
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
  HomeHelpSearchLogin  
 
Pages: 1
Send Topic Print
Replace double quotes after spreadsheet import (Read 5035 times)
Sumner Gerard
Full Member
*
Offline



Posts: 359

Replace double quotes after spreadsheet import
Mar 17th, 2013, 6:05pm
 
I have tab-separated data that won't pass TB's spreadsheet import parser though they check out fine with Excel.  After much headscratching and experimentation (I thought it might be an obscure Unicode encoding problem) I've narrowed the culprit down to something more mundane: occasional unmatched double quotes in a free-form text field of the source.  They throw everything off.

This imports as expected when pasted onto TB view (resulting in two notes):

  Name<tab>Text<newline>
  Note1<tab>This is text of "note" 1<newline>
  Note2<tab>This is text of note 2<newline>


But this won't import properly into TB (though it's fine in Excel):

  Name<tab>Text<newline>
  Note1<tab>This is "text of "note" 1<newline>
  Note2<tab>This is text of note 2<newline>


It's not practical to comb through the source looking for unmatched quotes in a free-form field, and I want to leave that field in the source as is anyway. So I automated the replacement of all double quotes in the source with '_QQ_' thinking I could easily change them back once it's in TB.  That imports as expected.

But, unhappily, attempts like $Text=$Text.replace("_QQ_",""") and $Text=$Text.replace('_QQ_','"') result in empty $Text.

Would appreciate guidance.
Back to top
 
« Last Edit: Mar 17th, 2013, 10:43pm by Sumner Gerard »  
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Replace double quotes after spreadsheet import
Reply #1 - Mar 17th, 2013, 7:04pm
 
You can't escape a straight double-quote character in TB text ops (or not as far as I know). Therefore, you're better off fixing the issue at source, in Excel. You can use AppleScript  - or BA on a PC - to find all cells that contain character number 34 (a ") and where the count of said character is an odd number. Get a list of these, or colour them, then scan and fix the cells manually. It might not be the solution that you envisaged but it's probably the quickest overall.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Sumner Gerard
Full Member
*
Offline



Posts: 359

Re: Replace double quotes after spreadsheet import
Reply #2 - Mar 17th, 2013, 8:28pm
 
Thanks for the quick reply. I'm dealing with free-form data at the source (think tweets and forum messages and notes, etc.) Typos, punctuation mistakes, and other oddities naturally occur (some would say even belong) in fields like that. I want to keep the integrity of the source. And I'm doing repeat imports. So "fixing" the source each time is not a good option; hence my query here was about the Tinderbox side.

I think this tsv import behavior should be fixed in a future version of TB.  Meanwhile I would be grateful for workaround ideas on replacing double-quotes. Is this something that is best tried somehow via runCommand() or is there a more "native" way?
Back to top
 
« Last Edit: Mar 17th, 2013, 9:42pm by Sumner Gerard »  
  IP Logged
Mark Bernstein
YaBB Administrator
*
Offline

designer of
Tinderbox

Posts: 2871
Eastgate Systems, Inc.
Re: Replace double quotes after spreadsheet import
Reply #3 - Mar 17th, 2013, 11:09pm
 
You're more likely to get fixes faster if you talk to Eastgate first.

Back to top
 
 
WWW   IP Logged
Sumner Gerard
Full Member
*
Offline



Posts: 359

Re: Replace double quotes after spreadsheet import
Reply #4 - Mar 18th, 2013, 12:26pm
 
Gentlemen:

You have here a loyal, long-time user who encounters a problem while doing a project involving spreadsheet import, does hours of unexpected "homework" to isolate the cause, and asks for ideas about a workaround in Tinderbox.

The responses here seem to boil down to:

1) Just go do more work outside Tinderbox, even though that's not practical, as explained.

2) Talk to Eastgate first, somehow suggesting violation of protocol for using an Eastgate-sponsored forum.

Meanwhile, I'm still hoping for usable suggestions on a workaround on the quotations replace within Tinderbox. (Mention of a "fix" for tsv import, assuming one is warranted, was just a quick aside for future attention).

Surely there must be a way to do the quotations replace. It seems Tinderbox can do just about anything. I just don't know the most promising paths to pursue. Would much appreciate ideas.
Back to top
 
« Last Edit: Mar 18th, 2013, 12:26pm by Sumner Gerard »  
  IP Logged
Mark Bernstein
YaBB Administrator
*
Offline

designer of
Tinderbox

Posts: 2871
Eastgate Systems, Inc.
Re: Replace double quotes after spreadsheet import
Reply #5 - Mar 18th, 2013, 2:33pm
 
The underlying question is, how does Excel (or whatever your source or destination is) represent quotation marks in its csv and tsv formats?  

(This arises because these formats use quotation marks for their own purposes. For a comma-separated values file, for example, a field is quoted if it contains a comma or a return -- characters that would otherwise terminate the field or record respectively. So the quotation mark has a special meaning, and we need some convention on how we'll represent a plain old quotation mark.)

This is tricky -- not least because (a) we're dealing with an informal, ad hoc standard that evolved in an era when a standard was whatever Office did, and (b) in that era, vendors often broke competing software deliberately by complicating or obfuscating the standard. Years later, we're left with the debris.

This *probably* merely requires a revision to Tinderbox's CSV parser to accommodate whatever escaping mechanism is in use, or (alternatively) to avoid triggering a quote-handling method which should not be called.  This should be simple enough.

But I'd prefer not to break other people's workflow, and that's all too easy. And -- let's face it -- the average Tinderbox Forum User is not going to be terribly fascinated by seeing Sumner and I chatting over the Backus-Naur  representation of RFC-XXXX.  Hence, I'd suggest sending the use-case, sample data, and all the rest to Eastgate.


Back to top
 
« Last Edit: Mar 18th, 2013, 2:40pm by Mark Bernstein »  
WWW   IP Logged
Sumner Gerard
Full Member
*
Offline



Posts: 359

Re: Replace double quotes after spreadsheet import
Reply #6 - Mar 18th, 2013, 5:39pm
 
The underlying question is really not that complicated, and is probably of interest to other users. Spreadsheet import is very useful, and Tinderbox generally does it pretty well (particularly, I've found, tsv), though clearly a workaround is needed here.

The original simplified test data set, given upthread, is straightforward. Typing it into an editor of choice and then pasting it onto a Tinderbox view (or into Excel) reproduces the described behavior: unexpected in Tinderbox, as expected in Excel.

(The <tab> and <newline> are shorthand for typing in the actual characters. The straight double quotation marks can also be just typed in. Nothing esoteric here.)

In my source script I can easily automate replacements to produce this:

  Name<tab>Text<newline>
  Note1<tab>This is _QQ_text of _QQ_Note1_QQ_<newline>
  Note2<tab>This is text of note2<newline>


(Again the <tab> and <newline> are shorthand for the actual characters typed in, while _QQ_ is just my homemade token, if that's the right word, for the double quotation marks that has the virtue of safely passing TB's parser even if not in pairs.)

All this imports into Tinderbox as expected, two notes, everything in the right place.

Now how do I--with the text already successfully imported into Tinderbox--replace those _QQ_ with " ? Is an external script run through runCommand the only way? Or is there a way to do it internally?

(Surely this is not the type of question that requires confidential off-forum discussion? If so, that begs the question, "What are forums for anyway?")
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Replace double quotes after spreadsheet import
Reply #7 - Mar 18th, 2013, 6:11pm
 
Ouch. I spent quite a while, late on an evening when I'd an early start the next day, to research someone else's problem for possible avenues to sort the text transform. As it happens it's just not possible, at least not in the current app - which is all I have to work with.

As the original question didn't make it explicit that exact fidelity of quotes - even obviously incorrect ones - was important I suggested doing something in Excel was the best way to go. Indeed, had it not have been so late I'd probably have posted some AppleScript ideas.

No one's suggested the forum is off topic. It just so happens that an answer means Eastgate taking a look at the source. As Mark B's explained "it works in Excel" counts for nought if you want to pas the data to TB via tab-delim.

I didn't suggest sending the file to Eastgate as some users take offence at that suggestion, even if it's explain this is because Eastgate can look under the hood in a way fellow users - like myself - can't. Mark B separately explained why sending a file might be needed (arcana of text formats).

Being unable to provide the desired solution due to limitations in software isn't the the fault of those answering, it's just an inconvenient fact. You last post simply re-states the original question, to which the answer is unchanged.

To repeat, politely: you can't use Text.replace() to insert a double-quote apostrophe. The problem isn't the '_QQ_' placeholder, it's the problem of handling the insertion of string delimiter characters.

I guess no good deed goes unpunished. As I clearly am not helping, I'll leave this thread to others.
Back to top
 
« Last Edit: Mar 18th, 2013, 6:12pm by Mark Anderson »  

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Sumner Gerard
Full Member
*
Offline



Posts: 359

Re: Replace double quotes after spreadsheet import
Reply #8 - Mar 19th, 2013, 12:36am
 
It's not good for business, I think, to be quick to take offense or be seen to take offense when no offense is intended (or, for that matter, even if offense were intended). Suggesting (or seeming to suggest) to paying and prospective customers that posting here is somehow not the right thing to have done or was done in the wrong place or way is also not an ideal customer relations and marketing strategy. Just a thought from a loyal user and admirer who otherwise appreciates the help here.

Mark A, I really do thank you (and, of course, Mark B too), though I can tell you both were too busy to carefully read my original, very specific, query plus sample data set, including the fact that I successfully imported with placeholders into Tinderbox (not just Excel).

Quote:
To repeat, politely: you can't use Text.replace() to insert a double-quote apostrophe. The problem isn't the '_QQ_' placeholder, it's the problem of handling the insertion of string delimiter characters.

Rare indeed is the opportunity to correct a true master. But, to my surprise, it turns out that statement is not entirely true!

How does this work on your machine?

     $Text = $Text.replace("_QQ_","\x22")

I stumbled on this after hours putting together an external Python workaround script today. I thought, what the heck, this looks so similar, why not just give it a whirl directly in Tinderbox action code? Sometimes it pays to be ignorant and desperate enough to just try things.
Back to top
 
« Last Edit: Mar 19th, 2013, 7:15am by Sumner Gerard »  
  IP Logged
Pages: 1
Send Topic Print