Message started by Komori Kou on Jun 20th, 2015, 4:09pm

Title: incorrect HTML entities
Post by Komori Kou on Jun 20th, 2015, 4:09pm

When working with various languages other than English, generated text via “HTML Exporting” is incorrect and unstable.

I will give an example.

[1] Preparing a sample note.
(1) Create a note.

(2) Set the text of the note to :

*(Japanese text, four paragraphs)

(3) Set the Attributes of the note :
HTMLDontExport -> false
HTMLExportTemplate -> /Templates/HTML page (built in)
HTMLEntities -> false

[2] Checking generated results.
You can check the results using HTML tab of the note.  Toggling between “Text” and “HTML” tabs could quickly show multiple results. Or, inspect actually generated text files using command “Export->as HTML” or “Exported Selected Note” under menu “File”.

The symptom is:
First paragraph is always exported correctly, but second or following paragraphs may be exported as HTML entities despite that attribute “HTMLEntities” is set to FALSE. When a certain paragraph is exported as HTML entities, all following paragraphs are HTML entities also. The number of the paragraph starting to be generated incorrectly seems to be random; I can’t perceive any rule in the number.

Following figures are the result of several trial one day. The start paragraph of the incorrectly-generated block fluctuates.

This phenomenon appears not only in Japanese, but also in other languages including Chinese, Korean, Russian, and even French.
I prepared sample text of each language. Please investigate with these text.


Сквозь волнистые туманы
Пробирается луна,
На печальные поляны
Льет печально свет она.

Assis devant l’ordinateur dans son uniforme bleu de prisonnier, Nelson Butler, 46 ans, montre l’écran.
« Je comprends tout ! », s’exclame-t-il fièrement.
Sur la machine, des lignes de codes, des parenthèses, des chiffres, des accolades : l’ordinaire des hiéroglyphes des programmeurs en informatique.
Lui, le prisonnier condarné à perpétuité alors qu’il n’avait que 20 ans, s’estime « béni » d’avoir appris à manier l’ordinateur.

This issue affects severely the working flow of homepage building. When HTML exporting, Tinderbox determine whether each note is changed newly or not, and exports intelligently only newly-changed notes. The next thing that I need to do is synchronizing the exported files with FTP site using a FTP tool.

Version Six, however, may export unnecessary notes every time of exporting because of having this issue, and I need to sort out the files that are truly desired to be uploaded, decode HTML entities to original characters (due to the fact that the files containing plenty of entities are significantly larger in size than their original files at just about anytime), and upload them manually. These unnecessary work may be a big obstacle.

Version Six has been having this issue since its first version. Version 6.3.0 still has same issue.

I beg you to fix it.

Title: Re: incorrect HTML entities
Post by Mark Bernstein on Jun 20th, 2015, 8:24pm

First, if you're having problems, contact technical support. Don't suffer in silence; we may not be able to fix everyone's problems overnight, but you'd be surprised.  

Don't assume that your problem has been reported. We've got lots of reports -- 1519 issues in the log, 1098 of them fixed -- but I'm pretty sure this is new. Ideally, send a small file including your templates and a sample note exhibiting this behavior.

Also, the backstage program http://www.eastgate.com/Tinderbox/TinderboxSix.html has been a terrific resource for clarifying problems and finding workaround.

Title: Re: incorrect HTML entities
Post by Mark Anderson on Jun 21st, 2015, 4:50am

I can replicate the HTML Entities issue but I don't see how that affects which pages get (re-)exported. My understanding is this depends on $LastModified and perhaps some other factors.

If export depended on comparing existing with current HTML, then changing the HTML export template ought to affect export. It doesn't! If you change the template you need to delete the existing exported pages to ensure they're exported afresh. This is something I do quite regularly during maintenance of aTbRef.

Title: Re: incorrect HTML entities
Post by Mark Bernstein on Jun 21st, 2015, 6:13am

MarkAnderson: if we re-export a note whose exported text is unchanged, but do so in one case with UTF8 and in another with entities, then the file will be marked as "changed" and our FTP client will have to upload the file again when mirroring to our site.

If we can pin down a test case, I'm confident we can resolve this without much delay.

Title: Re: incorrect HTML entities
Post by Komori Kou on Jun 21st, 2015, 6:41am

I have reported this issue to the technical support in last August.
In the exchanged mails at the time, I have received a replay saying that it would be corrected in the next release.
Although there are a few updates since the time, this issue still have remained, so I imagine that the fixation is postponed or neglected with some reasons, or that the fixation had been made already, but it might be imperfect and the staff don't recognize it. I reported, then, to forum in this time.

The affection of (re-)exporting can be shown easily. Given a Tinderbox file which is a source of a homepage, export whole contents of it once, wait several minutes without editing the file, and export again.  No note should be exported if no note is edited, but several note will be exported each time.

Title: Re: incorrect HTML entities
Post by Mark Anderson on Jun 21st, 2015, 10:10am

There seem to be two issues here.

#1 is the now (re-)reported issue of HTML entities being created when they shouldn't be (although the source still screen renders correctly).

#2 seems related to export but I'm not clear as to the actual problem. Given that export is so quick, as indeed is FTP upload what's the underlying issue over whether the same page content is re-exported or not? For clarity, I'm not being dismissive or saying there isn't an issue, but rather that it's not unambiguously expressed.

Title: Re: incorrect HTML entities
Post by Komori Kou on Jun 21st, 2015, 12:36pm

to Mark A.

Your playback may be reasonable for practical purposes. The actual problem might have be troublesome two decades ago, but it may be marginal under the modern environment with high-power PCs, vast spaces of storage devices, and high-speed Internet.

The extent of the problem I feel may depend almost exclusively on my own aestheticism.

The size of an exported file from a note which is one of the source of my homepage and contains a significant amount of text, is 57 k when exported UTF-8, whereas 121 k when HTML entities. Although the expansion ratio may vary according to the contents of notes, It means that the space of my FTP site could be consumed at double or more the speed. Fortunately, my FTP site has ample room yet.

Another possible trouble is Find/Replace job in multiple files using other editor programs, especially with grep. The situation where same text may be export as UTF-8 or HTML entities could be annoying.

Title: Re: incorrect HTML entities
Post by Mark Bernstein on Jun 21st, 2015, 1:34pm

Aha!  I have relocated your August query -- we did add a new feature for a different issue you raised on the same day, and the issue (1061) is marked as resolved.  

So, we thought it had been taken care of.  We'll re-investigate.

Title: Re: incorrect HTML entities
Post by Mark Anderson on Jun 21st, 2015, 1:44pm

Thanks I understand it now and it does seem the issue is just #1, as when that's resolved your file size and grep issues go away. Issue #1 is now back on the radar so I don't doubt it will get resolved.

Title: Re: incorrect HTML entities
Post by Mark Bernstein on Jun 22nd, 2015, 3:36pm

This will be fixed in the next backstage build (b156) and in Tinderbox 6.3.1.

Thank you for pursuing this!

