Welcome, Guest. Please Login
Tinderbox
  News:
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
  HomeHelpSearchLogin  
 
Pages: 1
Send Topic Print
incorrect HTML entities (Read 3730 times)
Komori Kou
Full Member
*
Offline



Posts: 18

incorrect HTML entities
Jun 20th, 2015, 4:09pm
 
When working with various languages other than English, generated text via “HTML Exporting” is incorrect and unstable.

I will give an example.

[1] Preparing a sample note.
(1) Create a note.

(2) Set the text of the note to :
瀬をはやみ
岩にせかるる滝川の
われても末に
あわむとぞ思ふ

*(Japanese text, four paragraphs)

(3) Set the Attributes of the note :
HTMLDontExport -> false
HTMLExportTemplate -> /Templates/HTML page (built in)
HTMLEntities -> false



[2] Checking generated results.
You can check the results using HTML tab of the note.  Toggling between “Text” and “HTML” tabs could quickly show multiple results. Or, inspect actually generated text files using command “Export->as HTML” or “Exported Selected Note” under menu “File”.

The symptom is:
First paragraph is always exported correctly, but second or following paragraphs may be exported as HTML entities despite that attribute “HTMLEntities” is set to FALSE. When a certain paragraph is exported as HTML entities, all following paragraphs are HTML entities also. The number of the paragraph starting to be generated incorrectly seems to be random; I can’t perceive any rule in the number.

Following figures are the result of several trial one day. The start paragraph of the incorrectly-generated block fluctuates.





This phenomenon appears not only in Japanese, but also in other languages including Chinese, Korean, Russian, and even French.
I prepared sample text of each language. Please investigate with these text.

[Chinese]
凡用兵之法
全國爲上
破國次之
全軍爲上

[Russian]
Сквозь волнистые туманы
Пробирается луна,
На печальные поляны
Льет печально свет она.

[French]
Assis devant l’ordinateur dans son uniforme bleu de prisonnier, Nelson Butler, 46 ans, montre l’écran.
« Je comprends tout ! », s’exclame-t-il fièrement.
Sur la machine, des lignes de codes, des parenthèses, des chiffres, des accolades : l’ordinaire des hiéroglyphes des programmeurs en informatique.
Lui, le prisonnier condarné à perpétuité alors qu’il n’avait que 20 ans, s’estime « béni » d’avoir appris à manier l’ordinateur.


This issue affects severely the working flow of homepage building. When HTML exporting, Tinderbox determine whether each note is changed newly or not, and exports intelligently only newly-changed notes. The next thing that I need to do is synchronizing the exported files with FTP site using a FTP tool.

Version Six, however, may export unnecessary notes every time of exporting because of having this issue, and I need to sort out the files that are truly desired to be uploaded, decode HTML entities to original characters (due to the fact that the files containing plenty of entities are significantly larger in size than their original files at just about anytime), and upload them manually. These unnecessary work may be a big obstacle.

Version Six has been having this issue since its first version. Version 6.3.0 still has same issue.

I beg you to fix it.
Back to top
 
 
  IP Logged
Mark Bernstein
YaBB Administrator
*
Offline

designer of
Tinderbox

Posts: 2871
Eastgate Systems, Inc.
Re: incorrect HTML entities
Reply #1 - Jun 20th, 2015, 8:24pm
 
First, if you're having problems, contact technical support. Don't suffer in silence; we may not be able to fix everyone's problems overnight, but you'd be surprised.  

Don't assume that your problem has been reported. We've got lots of reports -- 1519 issues in the log, 1098 of them fixed -- but I'm pretty sure this is new. Ideally, send a small file including your templates and a sample note exhibiting this behavior.

Also, the backstage program http://www.eastgate.com/Tinderbox/TinderboxSix.html has been a terrific resource for clarifying problems and finding workaround.
Back to top
 
 
WWW   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: incorrect HTML entities
Reply #2 - Jun 21st, 2015, 4:50am
 
I can replicate the HTML Entities issue but I don't see how that affects which pages get (re-)exported. My understanding is this depends on $LastModified and perhaps some other factors.

If export depended on comparing existing with current HTML, then changing the HTML export template ought to affect export. It doesn't! If you change the template you need to delete the existing exported pages to ensure they're exported afresh. This is something I do quite regularly during maintenance of aTbRef.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Mark Bernstein
YaBB Administrator
*
Offline

designer of
Tinderbox

Posts: 2871
Eastgate Systems, Inc.
Re: incorrect HTML entities
Reply #3 - Jun 21st, 2015, 6:13am
 
MarkAnderson: if we re-export a note whose exported text is unchanged, but do so in one case with UTF8 and in another with entities, then the file will be marked as "changed" and our FTP client will have to upload the file again when mirroring to our site.

If we can pin down a test case, I'm confident we can resolve this without much delay.
Back to top
 
 
WWW   IP Logged
Komori Kou
Full Member
*
Offline



Posts: 18

Re: incorrect HTML entities
Reply #4 - Jun 21st, 2015, 6:41am
 
I have reported this issue to the technical support in last August.
In the exchanged mails at the time, I have received a replay saying that it would be corrected in the next release.
Although there are a few updates since the time, this issue still have remained, so I imagine that the fixation is postponed or neglected with some reasons, or that the fixation had been made already, but it might be imperfect and the staff don't recognize it. I reported, then, to forum in this time.


The affection of (re-)exporting can be shown easily. Given a Tinderbox file which is a source of a homepage, export whole contents of it once, wait several minutes without editing the file, and export again.  No note should be exported if no note is edited, but several note will be exported each time.
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: incorrect HTML entities
Reply #5 - Jun 21st, 2015, 10:10am
 
There seem to be two issues here.

#1 is the now (re-)reported issue of HTML entities being created when they shouldn't be (although the source still screen renders correctly).

#2 seems related to export but I'm not clear as to the actual problem. Given that export is so quick, as indeed is FTP upload what's the underlying issue over whether the same page content is re-exported or not? For clarity, I'm not being dismissive or saying there isn't an issue, but rather that it's not unambiguously expressed.

Back to top
 
« Last Edit: Jun 21st, 2015, 10:11am by Mark Anderson »  

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Komori Kou
Full Member
*
Offline



Posts: 18

Re: incorrect HTML entities
Reply #6 - Jun 21st, 2015, 12:36pm
 
to Mark A.

Your playback may be reasonable for practical purposes. The actual problem might have be troublesome two decades ago, but it may be marginal under the modern environment with high-power PCs, vast spaces of storage devices, and high-speed Internet.

The extent of the problem I feel may depend almost exclusively on my own aestheticism.

The size of an exported file from a note which is one of the source of my homepage and contains a significant amount of text, is 57 k when exported UTF-8, whereas 121 k when HTML entities. Although the expansion ratio may vary according to the contents of notes, It means that the space of my FTP site could be consumed at double or more the speed. Fortunately, my FTP site has ample room yet.

Another possible trouble is Find/Replace job in multiple files using other editor programs, especially with grep. The situation where same text may be export as UTF-8 or HTML entities could be annoying.
Back to top
 
 
  IP Logged
Mark Bernstein
YaBB Administrator
*
Offline

designer of
Tinderbox

Posts: 2871
Eastgate Systems, Inc.
Re: incorrect HTML entities
Reply #7 - Jun 21st, 2015, 1:34pm
 
Aha!  I have relocated your August query -- we did add a new feature for a different issue you raised on the same day, and the issue (1061) is marked as resolved.  

So, we thought it had been taken care of.  We'll re-investigate.
Back to top
 
 
WWW   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: incorrect HTML entities
Reply #8 - Jun 21st, 2015, 1:44pm
 
Thanks I understand it now and it does seem the issue is just #1, as when that's resolved your file size and grep issues go away. Issue #1 is now back on the radar so I don't doubt it will get resolved.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Mark Bernstein
YaBB Administrator
*
Offline

designer of
Tinderbox

Posts: 2871
Eastgate Systems, Inc.
Re: incorrect HTML entities
Reply #9 - Jun 22nd, 2015, 3:36pm
 
This will be fixed in the next backstage build (b156) and in Tinderbox 6.3.1.

Thank you for pursuing this!
Back to top
 
 
WWW   IP Logged
Pages: 1
Send Topic Print