Discussion:
Zim File Encoding
Christian Pühringer
2011-06-26 09:58:45 UTC
Permalink
Hi all,

What encoding is used for article, metadata, categories data, ... ,
respectively for the title and url strings in the directory?
I could not find documentation on this.

Simplest for handling in zim-viewers would be to define that everything is
encoded in UTF-8. This should work with
all languages.
Other option would be to define the encoding either for the comple zim file
(e.g. in metadata), or on-a-per article (html-tag in header).
It would make sense to restrict the possible encodings to some small subset, as
else reader are not compatible with all zim-files.
In case a-per-article encoding is to be supported, it would be necessary to
specify the encoding of the directory entires separately.
Disadvantages of this approach is the higher complexity for the reader, in
particular in the per-article approach. Furthermore the definition
is more complex. (for example it needs to be defined what encoding is used if no
encoding is specified in an article/metadata.)

I'd prefer to just define everything is UTF-8, but I am not sure whether this
has drawbacks I am not aware of.
However, I think it is very important that we define something about encoding,
because else we cannot support zim files
in all languages reliable.

Best regards,
Christian
Manuel Schneider
2011-06-26 10:50:41 UTC
Permalink
Hi Christian,

you are right, this has not been identified in the ZIM File Format
article, even though we have discussed this at our very first Developers
Meeting back in 2009:

http://openzim.org/Developer_Meetings/2009-1#Minutes

Everything should be in UTF-8.

I have added a section "Encodings" to the Zim File Format article right
now to fix this in our documentation:

http://openzim.org/ZIM_File_Format#Encodings

Thanks for asking!


/Manuel
Post by Christian Pühringer
What encoding is used for article, metadata, categories data, ... ,
respectively for the title and url strings in the directory?
I could not find documentation on this.
--
Regards
Manuel Schneider

Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
Loading...