Tinderbox User-to-User Forum (for formal tech support please email: info@eastgate.com)
Tinderbox Users >> Questions and Answers >> Practical TB5 tips for 'languages like Chinese'

Message started by Sumner Gerard on Sep 25th, 2012, 12:12am

Title: Practical TB5 tips for 'languages like Chinese'
Post by Sumner Gerard on Sep 25th, 2012, 12:12am

[edit]By the admin: for later readers, the issues here apply to TB 5 only and will be addressed in v6.[/edit]

This post by Mark A touched on the issue of working with languages "like Chinese" (that includes many languages) that require support for Unicode. I don't know the technical aspects, and "your mileage may vary" with other languages, but here is what I think I understand about the practicalities of working with Chinese characters (and other languages) in Tinderbox 5.11+.

$Title and $Text accept Chinese characters and, in most contexts, display them as expected. All other attributes, including $URL, do not. [Edit: not entirely correct; see following posts]. Nor, alas, can you 'Find' Chinese in Tinderbox.

If you must have characters in a Tinderbox attribute other than $Title and $Text, for example in a search string or other parameter in a URL, there is a pretty reliable workaround: before bringing them into Tinderbox, "url encode" or "percent encode" them with a target character set of UTF-16.  This sounds complicated but can be accomplished quite easily through an on-line encoder such as this one. (Be sure to choose 'UTF-16' in the dropdown).

For example, you won't get a clickable link if you try to enter
www.google.com/search?q=Eastgate 软件
 into Tinderbox's $URL attribute as you can, say, in DEVONthink's URL field.  

But in Tinderbox you should get the same result with
in the $URL attribute.

Each three "percent codes" represent one Chinese character.

If you aren't just doing the occasional paste here and there and need to get lots of values into Tinderbox from, say, DEVONthink or iTunes via an AppleScript script, you can't use the native AppleScript encode routines you'll find in a casual Google search (at least I haven't found one that can handle Chinese).  You have to go to the command line. In AppleScript something like this works well for me:

-- e.g. urlEncode("Nürnberg $%@") --> "N%C3%BCrnberg%20%24%25%40"
set encodedText to my urlEncode("软件")
on urlEncode(str)
     -- ljr (http://applescript.bratis-lover.net/library/url/)
     local str
           return (do shell script "/bin/echo " & quoted form of str & ¬
                 " | perl -MURI::Escape -lne 'print uri_escape($_)'")
     on error eMsg number eNum
           error "Can't urlEncode: " & eMsg number eNum
     end try
end urlEncode

The encoded result will usually "work" in the Tinderbox URL attribute.  You can also store it in other attributes, and, if you need to display it, you can decode it within Tinderbox and put it in $Text with action code (in a stamp or elsewhere) like this:

$Text=runCommand("/bin/echo " + $myEncodedString + " | perl -MURI::Escape -lne 'print uri_unescape($_)'")

(You have to close the note before running this, and, obviously, don't try it if you already have something valuable in $Text that you want to keep.)

Looking forward to full unicode support in v6!  We live in a world where Tibetan comes preinstalled on every Mac and iOS device, though not (yet) Klingon:) But meanwhile, some of the above may help.

Title: Re: Practical TB tips for 'languages like Chinese'
Post by Mark Anderson on Sep 25th, 2012, 5:02am

For later readers, the last is not quite right about TB & Unicode - though the outcome is much as Sumner describes. Still, it's important to note that Tinderbox v5.x is Unicode capable. The short description of the issue here is that whilst the app itself is Unicode-capable much of the UI's dialogs, etc., aren't and so they limit the ability for tasks like search to use Unicode characters - or certainly those of double-byte languages - in that context.  

So, the above issue is not that attributes such as $URL can't hold Chinese (and similar) characters. They can! The problem is the TB UI affordances for entering data to attributes other than $Name [sic] and $Text aren't - as at v5.x - fully Unicode capable, and won't be pre v6 at the earliest.

FWIW, TB stores its data in UTF-8 XML form and most (all?) output via export, etc. is in UTF-8 form.  However, most of the app UI dialogs and the text window key attribute tables only use MacRoman encodings which don't support anything but a small subset of the Unicode character set.

Note text ($Text): The primary text pane ($Text) of TB note windows allows Unicode entry/editing, i.e. using less common accents or non-Roman alphabets like Chinese or Coptic.

Note title ($Name): For Unicode editing of note titles, whilst the Create/Rename dialog isn't Unicode capable, you can use either Edit-in-Place on a title in a major view or when in a text window edit a shown title ($ShowTitle).

Title: Re: Practical TB tips for 'languages like Chinese'
Post by steve harf on Sep 25th, 2012, 10:15am

It appears that you can paste unicode into text attributes when "Show Columns" is enabled and the attribute is displayed.

Title: Re: Practical TB tips for 'languages like Chinese'
Post by Mark Anderson on Sep 25th, 2012, 11:01am

In Outline's 'show columns' mode you're effectively using edit-in-place on the displayed attribute. As with edit-in-place for Outline titles ($Name), the process is Unicode-capable.

Title: Re: Practical TB5 tips for 'languages like Chinese'
Post by Sumner Gerard on Sep 25th, 2012, 12:42pm

You not only can paste, but can also actively edit Chinese (and presumably other languages) with 'Show Columns'.  Didn't realize, or had forgotten, you could do that with user string attributes, not just $Name and $Text.

You can also approximate the "Find" dialog by using an agent that filters on Chinese (or other languages) you've put in a string attribute, as shown in this illustration.

Though not as convenient as a quick trip to the "Find" dialog at least it gets the job done.

But, in my experience, the $URL attribute, while it will display Chinese in columns/EIP, replaces the Chinese with '?' when activating Safari.  All three values for the URL shown here "just work" in DEVONthink, but only the encoded one works in the Tinderbox $URL attribute.  @Mark A any ideas why this is so? It can be frustrating. Download test TBX here.

Title: Re: Practical TB tips for 'languages like Chinese'
Post by Mark Bernstein on Sep 25th, 2012, 12:57pm

Note to the future:  This useful discussion applies to Tinderbox 5.

All of this will be history in Tinderbox 6.

Title: Re: Practical TB tips for 'languages like Chinese'
Post by Mark Anderson on Sep 25th, 2012, 1:35pm

@Sumner. I actually used your example URL and tried my method of pasting it into the text of a note and using the $Rule to set the $URL. In the source XML I see (I added the 'http:'//' just in case it helped):

<text >http://www.google.com/search?q=Eastgate 软件</text>

The $Rule is: $URL=$Text. In the XML I now see:

<attribute name="URL" >http://www.google.com/search?q=Eastgate 软件</attribute>

So far, so good. The data is still Unicode, even though if I display $URL as a key attribute I see "http://www.google.com/search?q=Eastgate ??". RE-checking the source XML, $URL is still correct.

On clicking the URL button in the note's sidebar, Safari opens with the URL: "http://www.google.com/search?q=Eastgate%20??#q=Eastgate%20%3F%3F". My hunch is that between leaving the $URL attribute and arriving with Safari, the data is getting taken back out of UTF-8 into a MacRoman string.

[edit]Mark B has just confirmed these Unicode related issues are being address for v6, so I've deleted my conjecture along those lines.[/edit]

Tinderbox User-to-User Forum (for formal tech support please email: info@eastgate.com) » Powered by YaBB 2.2.1!
YaBB © 2000-2008. All Rights Reserved.