Discussion:
Way to count entries without redirects
Emmanuel Engelhart
2012-01-22 15:09:22 UTC
Permalink
Hi

Do we have a way to know how many articles we have in a specific
namespace in a ZIM file... but without the redirects?

Regards
Emmanuel
Manuel Schneider
2012-01-22 15:17:00 UTC
Permalink
Hi Emmanuel,
Post by Emmanuel Engelhart
Do we have a way to know how many articles we have in a specific
namespace in a ZIM file... but without the redirects?
does this help you?

http://openzim.org/Special:Statistics
http://openzim.org/Special:AllPages


/Manuel
--
Regards
Manuel Schneider

Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
Emmanuel Engelhart
2012-01-22 15:19:27 UTC
Permalink
Post by Manuel Schneider
Hi Emmanuel,
Post by Emmanuel Engelhart
Do we have a way to know how many articles we have in a specific
namespace in a ZIM file... but without the redirects?
does this help you?
http://openzim.org/Special:Statistics
http://openzim.org/Special:AllPages
Unfortunately ;( I meant "in a ZIM file".

Emmanuel
Christian Pühringer
2012-01-22 17:15:46 UTC
Permalink
Hi,

I think it's currently not possible to get the number of articles without redirects.
(Except counting all articles which are not redirects, but this would be pretty
slow)
However, I agree that it would be a useful feature, so we should consider
to add it. (Perhaps as metadata?)

Best regards,
Christian
Post by Emmanuel Engelhart
Hi
Do we have a way to know how many articles we have in a specific
namespace in a ZIM file... but without the redirects?
Regards
Emmanuel
_______________________________________________
dev-l mailing list
https://intern.openzim.org/mailman/listinfo/dev-l
Emmanuel Engelhart
2012-01-22 19:14:29 UTC
Permalink
I also think it would be valuable to store these information somewhere
in the ZIM files.
We need to save many new metadata, for each type of file (picture,
video, audio, text)... and why not other types (presentation, ...).
IMO the best we can do, is saving how many entries we have per
mime-type, so we are sure we have on the format size always the
information we need (with the finest granularity).
The reader or the zimlib should make the computation and provide the
necessary code to make something like getAudioArticleCount().

What do you think?

Emmanuel

A new metadata is certainly the most easiest solution, but I'm not sure
this is the best.
So this could potentialy add really a lot of new metadat
Post by Christian Pühringer
Hi,
I think it's currently not possible to get the number of articles without redirects.
(Except counting all articles which are not redirects, but this would
be pretty slow)
However, I agree that it would be a useful feature, so we should consider
to add it. (Perhaps as metadata?)
Best regards,
Christian
Post by Emmanuel Engelhart
Hi
Do we have a way to know how many articles we have in a specific
namespace in a ZIM file... but without the redirects?
Regards
Emmanuel
_______________________________________________
dev-l mailing list
https://intern.openzim.org/mailman/listinfo/dev-l
_______________________________________________
dev-l mailing list
https://intern.openzim.org/mailman/listinfo/dev-l
Christian Pühringer
2012-01-23 18:44:53 UTC
Permalink
Hi Emmanuel,

I am not sure whether I fully understand your proposal:
Is your idea to save the information on mime-type level instead of on namespace
level?
(Redirects have a special mime-type, therefore - as desired - they would not be
included in the numbers).
Is there a potential issue for mime-type based, that non-article entries may
have the same mime-type as article entries? (e.g. can image text be html?)
If this is not a real issue, storing mime-type fine for me, but it would be
also fine if count is stored on namespace level
(that is, entries in one namespace which are not redirects).

Benefit of storing on metadata-level is that articles which are not text or
image can be handled, disadvantage is that it is more complex for the application.
(Therefore I'd prefer if it is implemented in zimlib)

Where do you want to store the mime-type count information? As metadata or
something else?

Best regards,
Christian
I also think it would be valuable to store these information somewhere in the
ZIM files.
We need to save many new metadata, for each type of file (picture, video,
audio, text)... and why not other types (presentation, ...).
IMO the best we can do, is saving how many entries we have per mime-type, so
we are sure we have on the format size always the information we need (with
the finest granularity).
The reader or the zimlib should make the computation and provide the necessary
code to make something like getAudioArticleCount().
What do you think?
Emmanuel
A new metadata is certainly the most easiest solution, but I'm not sure this
is the best.
So this could potentialy add really a lot of new metadat
Post by Christian Pühringer
Hi,
I think it's currently not possible to get the number of articles without redirects.
(Except counting all articles which are not redirects, but this would be
pretty slow)
However, I agree that it would be a useful feature, so we should consider
to add it. (Perhaps as metadata?)
Best regards,
Christian
Post by Emmanuel Engelhart
Hi
Do we have a way to know how many articles we have in a specific
namespace in a ZIM file... but without the redirects?
Regards
Emmanuel
_______________________________________________
dev-l mailing list
https://intern.openzim.org/mailman/listinfo/dev-l
_______________________________________________
dev-l mailing list
https://intern.openzim.org/mailman/listinfo/dev-l
Emmanuel Engelhart
2012-01-24 17:42:57 UTC
Permalink
Post by Christian Pühringer
Is your idea to save the information on mime-type level instead of on
namespace level?
Yes, because we have not namespaces for all type of content, we can not
offer this garanty, and maybe apps needs to get this information on
mime-type level and not on namespace level.
Post by Christian Pühringer
(Redirects have a special mime-type, therefore - as desired - they would
not be included in the numbers).
Is there a potential issue for mime-type based, that non-article entries
may have the same mime-type as article entries? (e.g. can image text be
html?)
Everything can happen, if the ZIM editor/software is not well coded.
With my script, I decide on my own which mime-type has each content
(article, image, ...).

I do not see any issue, the only point is that if you want to get the
number of all images articles (for example), you will need to code
somewhere the code which know that make a sum of image/jpeg ; image/gif
; image/png are all image mime-type... ans so one. I think this could be
done in the zimlib.
Post by Christian Pühringer
If this is not a real issue, storing mime-type fine for me, but it
would be also fine if count is stored on namespace level
(that is, entries in one namespace which are not redirects).
Benefit of storing on metadata-level is that articles which are not text
or image can be handled, disadvantage is that it is more complex for the
application.
(Therefore I'd prefer if it is implemented in zimlib)
Yes.
Post by Christian Pühringer
Where do you want to store the mime-type count information? As metadata
or something else?
I would propose a new Metadata entry called for example "Counter"
http://openzim.org/Metadata

The value would be a string looking like that:
image/jpeg=5;image/gif=3;image/png=2...

Emmanuel
Christian Pühringer
2012-01-25 21:49:29 UTC
Permalink
Hi,
Post by Emmanuel Engelhart
I would propose a new Metadata entry called for example "Counter"
http://openzim.org/Metadata
image/jpeg=5;image/gif=3;image/png=2...
Sounds good, I'd support this proposal.

Christian
Post by Emmanuel Engelhart
Post by Christian Pühringer
Is your idea to save the information on mime-type level instead of on
namespace level?
Yes, because we have not namespaces for all type of content, we can not
offer this garanty, and maybe apps needs to get this information on
mime-type level and not on namespace level.
Post by Christian Pühringer
(Redirects have a special mime-type, therefore - as desired - they would
not be included in the numbers).
Is there a potential issue for mime-type based, that non-article entries
may have the same mime-type as article entries? (e.g. can image text be
html?)
Everything can happen, if the ZIM editor/software is not well coded.
With my script, I decide on my own which mime-type has each content
(article, image, ...).
I do not see any issue, the only point is that if you want to get the
number of all images articles (for example), you will need to code
somewhere the code which know that make a sum of image/jpeg ; image/gif
; image/png are all image mime-type... ans so one. I think this could be
done in the zimlib.
Post by Christian Pühringer
If this is not a real issue, storing mime-type fine for me, but it
would be also fine if count is stored on namespace level
(that is, entries in one namespace which are not redirects).
Benefit of storing on metadata-level is that articles which are not text
or image can be handled, disadvantage is that it is more complex for the
application.
(Therefore I'd prefer if it is implemented in zimlib)
Yes.
Post by Christian Pühringer
Where do you want to store the mime-type count information? As metadata
or something else?
I would propose a new Metadata entry called for example "Counter"
http://openzim.org/Metadata
image/jpeg=5;image/gif=3;image/png=2...
Emmanuel
Emmanuel Engelhart
2012-01-28 12:57:06 UTC
Permalink
Post by Christian Pühringer
Post by Emmanuel Engelhart
I would propose a new Metadata entry called for example "Counter"
http://openzim.org/Metadata
image/jpeg=5;image/gif=3;image/png=2...
Sounds good, I'd support this proposal.
Nobody seems to be against, so I have added this to the format in the wiki:
https://openzim.org/index.php?title=Metadata&action=historysubmit&diff=1399&oldid=1360

Emmanuel
Manuel Schneider
2012-01-30 18:51:23 UTC
Permalink
Post by Emmanuel Engelhart
https://openzim.org/index.php?title=Metadata&action=historysubmit&diff=1399&oldid=1360
great thanks!


/Manuel
--
Regards
Manuel Schneider

Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
Emmanuel Engelhart
2012-02-27 11:43:41 UTC
Permalink
Post by Emmanuel Engelhart
Post by Christian Pühringer
Post by Emmanuel Engelhart
I would propose a new Metadata entry called for example "Counter"
http://openzim.org/Metadata
image/jpeg=5;image/gif=3;image/png=2...
Sounds good, I'd support this proposal.
https://openzim.org/index.php?title=Metadata&action=historysubmit&diff=1399&oldid=1360
This feature is now fully implemented on Kiwix side in both ZIM build
script and software. So, new ZIM files I build will since now include
the new "Counter" metadata.

Emmanuel

Loading...