Welcome, Guest. Please Login
Tinderbox
  News:
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
  HomeHelpSearchLogin  
 
Pages: 1
Send Topic Print
Practical TB5 tips for 'languages like Chinese' (Read 7203 times)
Sumner Gerard
Full Member
*
Offline



Posts: 359

Practical TB5 tips for 'languages like Chinese'
Sep 25th, 2012, 12:12am
 
Edited:
By the admin: for later readers, the issues here apply to TB 5 only and will be addressed in v6.


This post by Mark A touched on the issue of working with languages "like Chinese" (that includes many languages) that require support for Unicode. I don't know the technical aspects, and "your mileage may vary" with other languages, but here is what I think I understand about the practicalities of working with Chinese characters (and other languages) in Tinderbox 5.11+.

$Title and $Text accept Chinese characters and, in most contexts, display them as expected. All other attributes, including $URL, do not. [Edit: not entirely correct; see following posts]. Nor, alas, can you 'Find' Chinese in Tinderbox.

If you must have characters in a Tinderbox attribute other than $Title and $Text, for example in a search string or other parameter in a URL, there is a pretty reliable workaround: before bringing them into Tinderbox, "url encode" or "percent encode" them with a target character set of UTF-16.  This sounds complicated but can be accomplished quite easily through an on-line encoder such as this one. (Be sure to choose 'UTF-16' in the dropdown).

For example, you won't get a clickable link if you try to enter Code:
www.google.com/search?q=Eastgate 软件 

 into Tinderbox's $URL attribute as you can, say, in DEVONthink's URL field.  

But in Tinderbox you should get the same result with Code:
www.google.com/search?q=Eastgate+%E8%BD%AF%E4%BB%B6 

in the $URL attribute.

Each three "percent codes" represent one Chinese character.

If you aren't just doing the occasional paste here and there and need to get lots of values into Tinderbox from, say, DEVONthink or iTunes via an AppleScript script, you can't use the native AppleScript encode routines you'll find in a casual Google search (at least I haven't found one that can handle Chinese).  You have to go to the command line. In AppleScript something like this works well for me:

Code:
-- e.g. urlEncode("Nürnberg $%@") --> "N%C3%BCrnberg%20%24%25%40"
set encodedText to my urlEncode("软件")
on urlEncode(str)
	-- ljr (http://applescript.bratis-lover.net/library/url/)
	local str
	try
		return (do shell script "/bin/echo " & quoted form of str & ¬
			" | perl -MURI::Escape -lne 'print uri_escape($_)'")
	on error eMsg number eNum
		error "Can't urlEncode: " & eMsg number eNum
	end try
end urlEncode 



The encoded result will usually "work" in the Tinderbox URL attribute.  You can also store it in other attributes, and, if you need to display it, you can decode it within Tinderbox and put it in $Text with action code (in a stamp or elsewhere) like this:

Code:
$Text=runCommand("/bin/echo " + $myEncodedString + " | perl -MURI::Escape -lne 'print uri_unescape($_)'") 


(You have to close the note before running this, and, obviously, don't try it if you already have something valuable in $Text that you want to keep.)

Looking forward to full unicode support in v6!  We live in a world where Tibetan comes preinstalled on every Mac and iOS device, though not (yet) KlingonSmiley But meanwhile, some of the above may help.
Back to top
 
« Last Edit: Sep 25th, 2012, 1:40pm by Mark Anderson »  
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Practical TB tips for 'languages like Chinese'
Reply #1 - Sep 25th, 2012, 5:02am
 
For later readers, the last is not quite right about TB & Unicode - though the outcome is much as Sumner describes. Still, it's important to note that Tinderbox v5.x is Unicode capable. The short description of the issue here is that whilst the app itself is Unicode-capable much of the UI's dialogs, etc., aren't and so they limit the ability for tasks like search to use Unicode characters - or certainly those of double-byte languages - in that context.  

So, the above issue is not that attributes such as $URL can't hold Chinese (and similar) characters. They can! The problem is the TB UI affordances for entering data to attributes other than $Name [sic] and $Text aren't - as at v5.x - fully Unicode capable, and won't be pre v6 at the earliest.

FWIW, TB stores its data in UTF-8 XML form and most (all?) output via export, etc. is in UTF-8 form.  However, most of the app UI dialogs and the text window key attribute tables only use MacRoman encodings which don't support anything but a small subset of the Unicode character set.

Note text ($Text): The primary text pane ($Text) of TB note windows allows Unicode entry/editing, i.e. using less common accents or non-Roman alphabets like Chinese or Coptic.

Note title ($Name): For Unicode editing of note titles, whilst the Create/Rename dialog isn't Unicode capable, you can use either Edit-in-Place on a title in a major view or when in a text window edit a shown title ($ShowTitle).
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
steve harf
Full Member
*
Offline



Posts: 76
Cleveland, Ohio
Re: Practical TB tips for 'languages like Chinese'
Reply #2 - Sep 25th, 2012, 10:15am
 
It appears that you can paste unicode into text attributes when "Show Columns" is enabled and the attribute is displayed.
Back to top
 
 

- Steve Harf
WWW steveharf   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Practical TB tips for 'languages like Chinese'
Reply #3 - Sep 25th, 2012, 11:01am
 
In Outline's 'show columns' mode you're effectively using edit-in-place on the displayed attribute. As with edit-in-place for Outline titles ($Name), the process is Unicode-capable.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Sumner Gerard
Full Member
*
Offline



Posts: 359

Re: Practical TB5 tips for 'languages like Chinese'
Reply #4 - Sep 25th, 2012, 12:42pm
 
You not only can paste, but can also actively edit Chinese (and presumably other languages) with 'Show Columns'.  Didn't realize, or had forgotten, you could do that with user string attributes, not just $Name and $Text.

You can also approximate the "Find" dialog by using an agent that filters on Chinese (or other languages) you've put in a string attribute, as shown in this illustration.



Though not as convenient as a quick trip to the "Find" dialog at least it gets the job done.

But, in my experience, the $URL attribute, while it will display Chinese in columns/EIP, replaces the Chinese with '?' when activating Safari.  All three values for the URL shown here "just work" in DEVONthink, but only the encoded one works in the Tinderbox $URL attribute.  @Mark A any ideas why this is so? It can be frustrating. Download test TBX here.
Back to top
 
« Last Edit: Sep 25th, 2012, 1:12pm by Sumner Gerard »  
  IP Logged
Mark Bernstein
YaBB Administrator
*
Offline

designer of
Tinderbox

Posts: 2871
Eastgate Systems, Inc.
Re: Practical TB tips for 'languages like Chinese'
Reply #5 - Sep 25th, 2012, 12:57pm
 
Note to the future:  This useful discussion applies to Tinderbox 5.

All of this will be history in Tinderbox 6.
Back to top
 
 
WWW   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Practical TB tips for 'languages like Chinese'
Reply #6 - Sep 25th, 2012, 1:35pm
 
@Sumner. I actually used your example URL and tried my method of pasting it into the text of a note and using the $Rule to set the $URL. In the source XML I see (I added the 'http:'//' just in case it helped):

<text >http://www.google.com/search?q=Eastgate 软件</text>

The $Rule is: $URL=$Text. In the XML I now see:

<attribute name="URL" >http://www.google.com/search?q=Eastgate 软件</attribute>

So far, so good. The data is still Unicode, even though if I display $URL as a key attribute I see "http://www.google.com/search?q=Eastgate ??". RE-checking the source XML, $URL is still correct.

On clicking the URL button in the note's sidebar, Safari opens with the URL: "http://www.google.com/search?q=Eastgate%20??#q=Eastgate%20%3F%3F". My hunch is that between leaving the $URL attribute and arriving with Safari, the data is getting taken back out of UTF-8 into a MacRoman string.

Edited:
Mark B has just confirmed these Unicode related issues are being address for v6, so I've deleted my conjecture along those lines.
Back to top
 
« Last Edit: Sep 25th, 2012, 1:38pm by Mark Anderson »  

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Pages: 1
Send Topic Print