Discussion:
thoughts about zimwriter
Tommi Mäkitalo
2010-06-19 19:22:28 UTC
Permalink
Hi,

I've thought about zimwriter and also talked about it at LinuxTag a little.

The zimwriter has a internal plug in interface, which separates the source of
data from the generation of zim files. There are some implementations of the
source interface. There is the database source, which reads articles from a
database, the full text indexer, which gets the data by reading and indexing a
zimfile and one implementation, which creates a zim file from a search result.

There is still need for more implementations. We need at least a zim file
creator, which merges 2 zimfiles. Also a creator, which reads data from a
directory in the file system. This would obsolete the perl script from
Emmanuel, which writes the files to the database.

The user has to tell the zimwriter, what he wants to do. I can keep on adding
additional options to the zimwriter and also add features to the zimwriter.
Zimwriter will get bigger and bigger and gets more and more options.

My idea is to move the functionality to write a zim file to a library -
libzimwriter and write separate programs for each source implementation. So to
write a zimfile from the database, the user has to use zimwriterdb. To create a
fulltextindex we use zimindexer. To merge zimfiles we have zimmerge. All of
them are quite simple programs, which just interface to the libzimwriter.

As an additional benefit we reduce the dependency from tntdb. Only zimwriterdb
needs tntdb. All other tools do not.

The only problem I see is that I have to break the current command line
interface. I feel that the price is cheap.

Any opinions?

Tommi
Manuel Schneider
2010-06-19 19:24:07 UTC
Permalink
Hi Tommi,
Post by Tommi Mäkitalo
The only problem I see is that I have to break the current command line
interface. I feel that the price is cheap.
Any opinions?
do it ;-)
--
Regards
Manuel Schneider

Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
e***@public.gmane.org
2010-06-21 13:17:19 UTC
Permalink
Post by Tommi Mäkitalo
The user has to tell the zimwriter, what he wants to do. I can keep on
adding additional options to the zimwriter and also add features to the zimwriter.
Zimwriter will get bigger and bigger and gets more and more options.
My idea is to move the functionality to write a zim file to a library -
libzimwriter and write separate programs for each source implementation.
Ok
Post by Tommi Mäkitalo
So to write a zimfile from the database, the user has to use zimwriterdb. To
create a fulltextindex we use zimindexer. To merge zimfiles we have zimmerge. All of
them are quite simple programs, which just interface to the
libzimwriter.
Ok
Post by Tommi Mäkitalo
As an additional benefit we reduce the dependency from tntdb. Only
zimwriterdb needs tntdb. All other tools do not.
The only problem I see is that I have to break the current command line
interface. I feel that the price is cheap.
Ok, please keep us informed about the details, that I can keep my toolchain up to date.

Thank you
Emmanuel
bitte_adresse_unten_verwenden
2010-06-23 17:52:12 UTC
Permalink
Hi All,

i met Manuel and Tommi at the Linux-Tag2010, and i'm very interested in openzim
and its tools.
Thanks to all of you for your excellent work on offlining wikipedia.

Unfortunately it is still very difficult to get an actual snap of wikidepia for
offline use w/ openzim. One reason is mentioned by Tommi conerning the need of
input modules. I think, it is a good idea to build a lib that supplies the
different funtionalities in seperated modules (as discribed by Tommi).

Tommi, do you have a description of the internal datastructure for zimwriter?
Everyone who likes to implement an input module has to adapt "his" external
format to this internal datastructure (?)

And there is an other lack:
There is only one data source for wikipedia which is of an actual state: the
dump - unfortunately in mediawiki format.
The static html export process seems to be orphaned...
As last ressort we can grab all articles online - thousand ways to Rome :-(

Has anyone ideas to solve this problem?

greetings from Berlin
christopher schnirch - csh
Post by Tommi Mäkitalo
Hi,
I've thought about zimwriter and also talked about it at LinuxTag a little.
The zimwriter has a internal plug in interface, which separates the source of
data from the generation of zim files. There are some implementations of the
source interface. There is the database source, which reads articles from a
database, the full text indexer, which gets the data by reading and indexing a
zimfile and one implementation, which creates a zim file from a search result.
There is still need for more implementations. We need at least a zim file
creator, which merges 2 zimfiles. Also a creator, which reads data from a
directory in the file system. This would obsolete the perl script from
Emmanuel, which writes the files to the database.
The user has to tell the zimwriter, what he wants to do. I can keep on adding
additional options to the zimwriter and also add features to the zimwriter.
Zimwriter will get bigger and bigger and gets more and more options.
My idea is to move the functionality to write a zim file to a library -
libzimwriter and write separate programs for each source implementation. So to
write a zimfile from the database, the user has to use zimwriterdb. To create a
fulltextindex we use zimindexer. To merge zimfiles we have zimmerge. All of
them are quite simple programs, which just interface to the libzimwriter.
As an additional benefit we reduce the dependency from tntdb. Only zimwriterdb
needs tntdb. All other tools do not.
The only problem I see is that I have to break the current command line
interface. I feel that the price is cheap.
Any opinions?
Tommi
_______________________________________________
dev-l mailing list
https://intern.openzim.org/mailman/listinfo/dev-l
--
christopher schnirch !!!!!Bitte Antworten nur an folgende Adresse senden:
mailto:@schnirch-berlin.de !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Tommi Mäkitalo
2010-06-23 21:19:25 UTC
Permalink
Hi,
Post by bitte_adresse_unten_verwenden
Tommi, do you have a description of the internal datastructure for
zimwriter? Everyone who likes to implement an input module has to adapt
"his" external format to this internal datastructure (?)
the interface of input modules can be found int eh header
zim/writer/articlesource.h. To implement a new module, the two interfaces must
be implemented. Currently we have to tell zimwriter, when to use these
interfaces, but in the future people can just write a little main-function,
which just instantiates the ArticleSource-derived class and runs the writer.

When I'm ready with this, there will be at least the current functionalities
as examples and I will also write a wiki article about the input module. It
will be quite easy.

We have already talked several times about another input module. One which
grabs the articles directly via http from a mediawiki instance. Looks like the
mediawiki api is enough to do it. Mirko looked at the api already the lilnux
tag but it was not quite clear, if it will work. It just looks promising.

Tommi

(sorry, but my mail client does not accept recipient addresses without a local
part as you requested in your mail as a reply address)

Loading...