Discussion:
[Toolserver-l] Static dump of German Wikipedia
Manuel Schneider
2010-09-24 15:00:28 UTC
Permalink
Emmanuel,

maybe you can help him with a ZIM file?

/Manuel

-------- Original-Nachricht --------
Betreff: [Toolserver-l] Static dump of German Wikipedia
Datum: Fri, 24 Sep 2010 01:57:55 +0200
Von: Marco Schuster <***@harddisk.is-a-geek.org>
Antwort an: toolserver-***@lists.wikimedia.org
An: Wikimedia developers <wikitech-***@lists.wikimedia.org>,
toolserver-l <toolserver-***@lists.wikimedia.org>

Hi all,

I have made a list of all the 1.9M articles in NS0 (including
redirects / short pages) using the Toolserver; now I have the list I'm
going to download every single of 'em (after the trial period tonight,
I want to see how this works out. I'd like to begin with downloading
the whole thing in 3 or 4 days, if noone objects) and then publish a
static dump of it. Data collection will be on the Toolserver
(/mnt/user-store/dewiki-static/articles/); the request rate will be 1
article per second and I'll download the new files once or twice a day
to my home PC, so there should be no problem with the TS or Wikimedia
server load.
When this is finished in ~ 21-22 days, I'm going to compress them and
upload them to my private server (well, if Wikimedia has an archive
server, that 'd be better) as a tgz file so others can play with it.
Furthermore, though I have no idea if I'll succeed, I plan on hacking
a static Vector skin file which will load the articles using jQuery's
excellent .load() feature, so that everyone with JS can enjoy a truly
offline Wikipedia.

Marco

PS: When trying to invoke /w/index.php?action=render with an invalid
oldid, the server returns HTTP/1.1 200 OK and an error message, but
shouldn't this be a 404 or 500?
--
VMSoft GbR
Nabburger Str. 15
81737 München
Geschäftsführer: Marco Schuster, Volker Hemmert
http://vmsoft-gbr.de

_______________________________________________
Toolserver-l mailing list (Toolserver-***@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list:
https://wiki.toolserver.org/view/Mailing_list_etiquette
Manuel Schneider
2010-09-24 15:05:36 UTC
Permalink
Well, afaik PediaPress, openZIM and a few others started working to
enhance the Extension:Collection to create ZIM files which is actually a
special compressed HTML format.

We had a Skype conference two weeks ago, but I am not in the loop what
happened since then. My last status is that Tommi from openZIM was going
to fix the zimwriter interfaces so the filesource plugin can be used for
this.


/Manuel
Given the fact that static dumps have been broken for *years* now,
static dumps are on the bottom of WMFs priority list; I thought it
would be the best if I just went ahead and built something that can be
used (and, of course, improved).
Marco
you fix it.
_______________________________________________
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
--
Regards
Manuel Schneider

Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
Emmanuel Engelhart
2010-09-24 18:12:01 UTC
Permalink
I'm already working on a ZIM containing all main namespace articles from
the Wikipedia in German (thumbnails included). This file will be pretty
big, I guess around 11GB.

Emmanuel
Post by Manuel Schneider
Well, afaik PediaPress, openZIM and a few others started working to
enhance the Extension:Collection to create ZIM files which is actually a
special compressed HTML format.
We had a Skype conference two weeks ago, but I am not in the loop what
happened since then. My last status is that Tommi from openZIM was going
to fix the zimwriter interfaces so the filesource plugin can be used for
this.
/Manuel
Given the fact that static dumps have been broken for *years* now,
static dumps are on the bottom of WMFs priority list; I thought it
would be the best if I just went ahead and built something that can be
used (and, of course, improved).
Marco
you fix it.
_______________________________________________
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
https://wiki.toolserver.org/view/Mailing_list_etiquette
Continue reading on narkive:
Loading...