The names in parentheses are meant to be transliterations, not translations! Remember the purpose is to allow a language user to recognise the language, not to allow English readers to identify it. It's an interesting idea, whether we might like to give translations for the benefit of English readers, but that's not what we've done so far. -- Toby Bartels 20:55, 1 Mar 2004 (UTC)

Proper English alphabetization

Wikipedias with over 50,000 articles: Nederlands (Dutch)EnglishFrançais (French)Deutsch (German)日本語 (Japanese)Polski (Polish)Svenska (Swedish)

With this kind of order, you see, things actually make sense. In the current one, we have it switching around from the proper alphabetical order of the English names to the native tongues. Example: We have German, English, French, Japanese, Polish, and Swedish, in that order. It is clearly not ordered on the English names, because German is first. That can be fixed if we use the German name: Deutsch. So, Deutsch, English, Français, Japanese... Japanese? No, it shouldn't be Japanese, it should be 日本語. And, because those sorts of letters don't appear in the Latin alphabet, Japanese should naturally come last then. But it doesn't. We switch to 'Japanese', and allow it to stay where it is.

There are many examples of this. After Esperanto is Español, which is Spanish. So, we're going by the language's name for itself. But, after that is Suomi. We just jumped implausibly from 'E' to 'S'. That's odd. Not so odd when you notice that Suomi's English name, Finnish, would come right after 'E'.

In any case, I've alphabetized correctly now, so, unless I made a few jarring mistakes, enjoy. ^ ^

EDIT: Another note on this template! It distinguishes between the two forms of written Norwegian: Bökmal and Nynorsk. Before, we had simply 'Norsk' (which is 'Norwegian' in both forms) and 'Nynorsk'. This method is better. Also, I did not put 'Nynorsk Norwegian' or 'Bökmal Norwegian' in parentheses for English speakers because the names are the same in English and Norwegian. It is very easy to distinguish for both Norwegian- and English-speakers. - Gavin

whatever happened to "The names in parentheses are meant to be transliterations, not translations! Remember the purpose is to allow a language user to recognise the language, not to allow English readers to identify it." (see top of this page) -- I don't think many people who don't recognize a language's name for itself will be interested in visiting its WP project... dab () 09:25, 1 Feb 2005 (UTC)

Amount of content needed before a page is listed on the Main Page

Is there some agreement about minimum number of articles a Wikipedia project must have before it is allowed to be added to this section ? Because if there is I missed it. It seems to me that the level of activity for a Wiki is more important than the number of articles, particularly when a new language project has just started. When I started contributing to the English Wikipedia it had few articles and poor coverage of just about every area. However by publishing links to it we got more contributors, more activity and more content.

That's why I'd like to see some discussion before links to other Wikipedia projects are removed from this section. User:Sj's removals at the moment seem rather arbitrary. After all a fifty article count doesn't make an encyclopedia much more useful than a five article count. Even if the interface isn't fully translated, a link is useful for attracting people to help. So why not leave them all in with the aim of getting more exposure and hence more activity. -- Derek Ross 23:33, 24 Mar 2004 (UTC)

I would have to agree with Derek. Since I am working on the Urdu Wikipedia alone. And it is through this link that a few others have come in and added a couple of things to the site. So this can be a good opportunity for people to pitch in. In the end it is all about spreading knowledge is it not.

The five languages I removed all had minimal work done on them; a single bilingual contributor could have done that work in a few hours. I like the idea of promoting new and growing wikipedias! But it is important that WP not give the impression we are boasting "many languages" when we really only have content in a few. There are a few threshholds for content:

  • we say "Wikipedia has X articles in [over] L languages." L here is roughtly the # of Wikipedias with over [??] langs.
  • we list a language on our http://www.wikipedia.org/wikistats/EN/Sitemap.htm interlingual stats page]. semi-arbitrary decision made at the time; I wrote the maintainer of the page to encourage a reasonable way of selecting sites for stats.
  • we list certain languages from the full language list on the Main Page -- that is, on this page.

If you check the instructions for creating a new language, "Add a link on the English Wikipedia" comes after 5 other important steps.

Reasons for not listing all 168 languages on the full language list on the main page:

  1. Quality: If someone randomly picks a link from this "other languages" list, and is taken to a clumsily organized page with 99% of its links broken and text partly in English, partly in another language -- and the only articles linked-to are one-word or one-sentence stubs -- it reflects badly on the project.
  2. Honesty: We advertise WP has been translated into "50+" languages; and indeed, there are at least that many serious WP projects. If we then add another 20 langs which have no serious, active contributors, a visitor who randomly visits one of the underdeveloped sites will think we are boastful.
  3. Honor: It is a minor honor and a milestone to have a WP version you are working on show up on the main page. This shouldn't be the first thing you do once you've created the first User page on a new language WP, before you translate the main page text. Content comes first.

If you think any of the sites I have removed from the list are under active development, please explain why you think so! +sj+ 20:00, 2004 Mar 26 (UTC)

Note: if you are worried about generally getting others to help fledgling projects, perhaps you want to add a "help fledling languages" link at the end, with links to the tiny WPs and notes on their progress? +sj+ 20:02, 2004 Mar 26 (UTC)

We should have a hard limit like 500 or 1,000 articles for a wikipedia to be included on the main page. This would make it more fair and easier to maintain. Small wikipedias will creep into the list and possibly not be removed if the enthusiasm and activity on that particular wikipedia drops. I'll make a survey and see wherer different thresholds put us. [[User:Sverdrup|Sverdrup❞]] 15:31, 11 Aug 2004 (UTC)

I don't know if [1] is complete, but using that data, here are the results:

Threshold Number Wikipedias
10K 13
1K 40
500 45
100 59

[[User:Sverdrup|Sverdrup❞]] 16:13, 11 Aug 2004 (UTC)


I noticed the Tagalog version is not listed in your language list whereas it is in the French Wikipedia. And it's not the only one missing in the foreign languages section. The list from the French Wikipedia is much longer. I wish i could have added the other languages myself, but the page is locked... JackJeff 12:00, Apr 2, 2004 (UTC)

If there is agreement that it can be added, anyone can add it to Template:Wikipedialang as that is not protected. This will be transcluded onto the main page. Angela. 12:12, Apr 2, 2004 (UTC)
Ok. Thanks. I'm just new and I ignored it. I'll work on it. JackJeff 12:16, Apr 2, 2004 (UTC)

Moved to

Yiddish Wikipedia

Moved to MediaWiki talk:Wikipedialang. There should be a link in the languages section on the bottom.

ייִדיש http://yi.wikipedia.org/wiki/Main_Page

Firstly, keep up the good work! I'm excited there is a growing Yiddish Wikipedia. That said, there are over 150 wp domains that have a "this domain is set aside for language X" main page, but not much else. I, at least, try to avoid listing a WP language subdomain until it clearly has an active fluent contributor/ambassador -- or until it made it through the first four steps of the "how to set up a new-language wikipedia" guidelines [key interface elts translated, main page translated, 2-3 key meta pages such as "how to contribute to this site" translated]. +sj+ 00:35, 2004 Apr 9 (UTC)
I think it's good to list them anyway as that can attract new editors. Dori | Talk 00:46, Apr 9, 2004 (UTC)
Then let's list all of them on the main page, as they do on fr:. I think they do a lot of things well in that list, like including the French name for langs where it isn't obvoius. Personally, as noted above, I think it reflects poorly on WP for someone to come to our long list of langs, pick one at random, and see that it's only a half-hearted effort. We might look boastful, and all our statistics become suspect. That said, I don't mind listing all of our created versions on the main page if that's what people want. I do think it's odd to list some and not others, and there should be further discussion... +sj+ 01:49, 2004 Apr 10 (UTC)

Corsican wikipedia

Could someone add a link on the Main page to the Corsican edition (http://co.wikipedia.org) under the label "corsu". Thank you. Paul

DONE! :) ~j

Dear ~j (jen?), see my note under the Yiddish request below. Paul -- please find a fluent corsican who can at least translate the main page and the basic MediaWiki messages into Corsican before listing it on the Main Page. (Note that it is only two links away, via the "all languages" link.)
:co has three respectable articles: Algebra, Analisa, Pruvverbii; and three other non-stub articles: Sopranomi, Toponimi, and Scienzi_naturale (which was a separate article from the identical Scienze_naturale until a moment ago, when I redirected one to the other). +sj+ 00:44, 2004 Apr 9 (UTC)

That's wise. We'll work to expand the Corsican wikipedia, until it reaches a sufficient amount. Best. Paul

Whitespace changed to make main page validate

Hi, I reformatted the HTML comment at the bottom of the article. The whitespace above and below the comment was the only thing keeping the main page from being valid HTML according to http://validator.w3.org. I hope that's OK.

The problem was that this whole page is included inside a <small> tag on the main page, but the blank lines at the end of the article were converted to <p>, which isn't allowed inside <small>. Maybe I'm being pedantic, but now the main page validates. Wmahan. 22:52, 2004 Apr 13 (UTC)


I think the native word for the Cassubian language is "Kaszëbsczi." This seems to be the case, according to [2] (see the "content-language" thing at top-right). – Minh Nguyễn (talk, blog) w: 23:35, 16 Apr 2004 (UTC)


According to Sanskrit, the Devanāgarī translation for the Sanskrit language is "संस्कृतम्". – Minh Nguyễn (talk, blog) w: 21:31, 17 Apr 2004 (UTC)

Re-ordered language links

Was there some discussion somewhere before the languages were re-ordered into English alphabetical order rather than alphabetical order of the names of the languages for themselves, as was done yesterday? I find the new order extremely disorientating -- I simply could not find cy (Cymraeg) for a long time without checking the edit history and discovering it had moved to the very end, Welsh. Furthermore, nearly every other Wikipedia lists languages either in language code or individual local name order (sometimes with a translation), which results in languages being in pretty much the same location in all language lists, if they're listed at all, and thus easier to find when you're jumping between Wikipedias. -- Arwel 12:46, 3 May 2004 (UTC)

I agree it should be by order of the local name, not the English name. I'm not sure I see the benefit of having the English names on here at all. It looked a lot clearer the old way with only the local one, and is more consistent with most other Wikipedias. Angela. 20:44, May 3, 2004 (UTC)
Using a visible sort-key (i.e., listing the transliteration or English name in parens for langs in non-Latin-1 scripts) makes sense. Having the English name also makes sense to me, for the benighted folk who don't know the native name of a lang they're looking for, but I don't feel strongly about it. +sj+ 19:19, 2004 May 6 (UTC)

Indeed the advantages of having a single order across all WPs outweights the other issues for me; I'm all for switching back. +sj+ 19:28, 2004 May 6 (UTC) (Note also that fr:, which overall has a very thoughtful layout, orders langs by French name; I wonder what their reasons were. +sj+ 10:30, 2004 May 7 (UTC))

Some pros and cons:

Advantages Disadvantages
For langs with multiple local transliterations (on en:, consider different 'Latin-1'izations for Chinese or Czech), or other issues ([Ki]swahili), the name ordering is unambiguous. Ordering is different in different langs; less convenient for users who bounce b/t many WPs. Also, less consistent; it's nice to have a single layout for the 'other langs' sexn across all WPs.
Readers who follow such lang links sporadically and are far from fluent in the target lang can immediately find the lang they are looking for. Readers who are not fluent in the lang of the local WP (on en:, English) have a harder time finding the target lang they are looking for.
Non-speakers of the target lang, who are nevertheless curious about the target WP, can easily find and visit/browse the target WP, or at least see that it is on the active list (perhaps later remembering to tell their friends who speak that lang about it). Speakers of the target lang are mildly offended that the one true spelling isn't used for the ordering.
  • Why not just order all versions by the two-letter or three-letter ISO code? That's how most Wikipedias are doing it. -- Kwekubo 23:30, 8 May 2004 (UTC)
It would likely make it quite hard to find some languages. If you order by the ISO code, and provide the English name of the language before the local name, some languages, such as Croatian (Hrvatski, hr:), will seem horribly out of place alphabetically, and people will have a hard time finding it.

Likewise, if you provide the local name before the English name, some languages, such as Suomeksi (Finnish, fi:), will be horribly out of place alphabetically, and people will again have a hard time finding it.

 – Minh Nguyễn (talk, blog) w: 20:49, 14 May 2004 (UTC)

'ten largest' section

I don't know that I like taking those out of the lineup; I like being neutral about the size of the other WPs, and not trying to privilege the largest ones. That said, if they /are/ to be separated from the rest of the langs -- let's say, to improve the image of WP's multilinguality -- they shouldn't be in order of size, but ordered the same way as the rest of the list. +sj+ 19:31, 2004 May 6 (UTC)

  • looks as if they are now not ordered at all (not alphabetically by name or code, not by size and not reverse by any criterium). IMHO it would be better if there was any ordering 21:41, 6 May 2004 (UTC)
  • It's ordered the same way the rest of the list is -- by English name. If you think we should change that ordering (note that fr: is ordered similarly, by French name), then it should be changed for both the 'ten largest' and for the 'other active' lists. +sj+ 10:28, 2004 May 7 (UTC)

As they are already taken out as the "ten largest", I think they should be sorted after size. In my opinion, the title "ten largest" implies they are. We should make it easy for our readers. wikipedia.org points to the English Main Page, and the vast majority of people who are trying to find another wikipedia are looking the large ones, like the German and French. Elizabeth A 12:19, 9 May 2004 (UTC)

  • Chinese shouldn't have two separate listings here... it's just syntatctically a bit broken to say "Chinese (traditional) - Chinese (simplified)" --there should be no ndash between the two languages, and the "Chinese" shouldn't really be duplicated. "Chinese (traditional | simplified)" makes more sense (perhaps with a comma in place of a "|"). +sj+
    • My choice for format would be "<Zhongwen> (jian | fan) (Chinese)", where Zhongwen/jian/fan are replaced with suitable chinese chars, the escape chars 65288 and 65299 are the raised equivalent of Latin-1 parens (high enough to properly enclose chinese characters), and both jian and fan link to the appropriate-script main page for zh:. What do you all think? +sj+ 06:47, 2004 May 22 (UTC)

Constructed Languages

I am moving the Conlangs back into the main list for the following reasons:

  1. First of all, Tok Pisin isn't even a constructed language.
  2. Esperanto is a notable language because it contains over 12,000 articles in it's Wikipedia.
  3. It would seem wrong to tell the people who spent their time volunteering to make these Wikipedias that their languages aren't even real (which they are real).
  4. I feel like correcting it.

Hey, isn't there a Klingon wikipedia? That should be in the list too. Andre 04:28, 7 Aug 2004 (UTC)

I've taken a page out of your book (whoever you are) and acted bold in correcting it. Andre 04:54, 7 Aug 2004 (UTC)

I think that these links should be arranged in alphabetical order according to either the ISO code or the spelling in the target language itself. These links are not primarily for English speakers, but for speakers of the other languages to help them find their Wikipedia. --21:31, 9 Jun 2004 (UTC)

Just noticed a message about languages - fakelangs represent a problem in my view, and I had thought that separating natural from fake was a good way to start. Thanks -Stevertigo 06:35, 13 Jul 2004 (UTC)

Shouldn't this page be protected? It's included in the Main Page. -phma 18:08, 8 Aug 2004 (UTC)

Anyone can edit the parts of the Main Page, it's just hard to figure out how for the average vandal. Andre 19:50, 8 Aug 2004 (UTC)

"Quick links"

Why are the quick links part of Wikipedialang? Can't they just go on the normal Main Page? They probably should be.

Cutting down the list

Is there any reason not to remove the following 54 languages? All have less than 1000 articles and are not amongst the most widely spoken languages. Angela. 03:06, Aug 27, 2004 (UTC)

Elsässisch (Alsatian)Aragonés (Aragonese)Asturianu (Asturian)Azərbaycan (Azeri)Беларуская (Belarusian)Bislamaবাংলা (Bengali)Brezhoneg (Breton)Bosanski (Bosnian)ᏣᎳᎩ (Cherokee)Corsu (Corsican)Kaszëbsczi (Kashubian)فارسی (Persian)Føroyskt (Faroese)Gaeilge (Irish)Gàidhlig (Scottish Gaelic)GuaraniIdoÍslenska (Icelandic)Lojbanქართული (Georgian)ភាសាខ្មែរ (Khmer)Kurdî (Kurdish)кыргызча (Kyrgyz)Lëtzebuergesch (Luxembourgish)Lietuvių (Lithuanian)Latviešu (Latvian)MalagasyMāoriМакедонски (Macedonian)MalayalamМонгол (Mongolian)Nauri (Nauruan)NahuatlPlattdüütsch (Low Saxon)Langue d'Oc (Occitan)ਪੰਜਾਬੀ / پنجابی (Punjabi)Armâneashti (Aromanian)संस्कृत (Sanskrit)Sardu (Sardinian)Srpskohrvatski (Serbo-Croatian)Slovenčina (Slovak)Shqip (Albanian)Basa Sunda (Sundanese)Kiswahili (Swahili)Тоҷикӣ (Tajik)ไทย (Thai)TagalogtlhIngan Hol (Klingon)toki ponaTok PisinTatarça (Tatar)Volapükייִדיש (Yiddish)

I suggest keeping all those that have 1000 articles, and also ur, vi, ta, te, mr, min-nan, jv, hi, gu, bn, and ar because those are the languages with the most speakers. This includes 52 languages as shown below. Angela. 03:06, Aug 27, 2004 (UTC)

I object on the grounds that otherwise people will not know that there exists a Wikipedia in their language. I might reconsider when Main Page is no longer redirected from http://www.wikipedia.org/ Node 04:55, 31 Aug 2004 (UTC)

Afrikaansالعربية (Arabic)Български (Bulgarian)বাংলা (Bengali)Català (Catalan)Cymraeg (Welsh)Česká (Czech)Dansk (Danish)Deutsch (German)Ελληνικά (Greek)EsperantoEspañol (Spanish)Eesti (Estonian)Euskara (Basque)Suomeksi (Finnish)Français (French)Frysk (Western Frisian)Gallego (Galician)Gujaratiעברית (Hebrew)हिन्दी (Hindi)Hrvatski (Croatian)Magyar (Hungarian)InterlinguaBahasa Indonesia (Indonesian)Italiano (Italian)日本語 (Japanese)Bahasa Jawa (Javanese)한국어 (Korean)Latina (Latin)Hō-ló-oē (Southern Min)मराठी (Marathi)Bahasa Melayu (Malay)Nederlands (Dutch)Norsk (Norwegian)Polska (Polish)Português (Portuguese)Română (Romanian)Русский (Russian)Simple EnglishSlovenščina (Slovenian)Српски (Serbian)Svenska (Swedish)தமிழ் (Tamil) తెలుగు (Telugu)Türkçe (Turkish)Українська (Ukrainian)اردو (Urdu)Tiếng Việt (Vietnamese)Walon (Walloon)繁體中文 (Chinese, traditional)简体中文 (Chinese, simplified)

Personally I think all the active Wikipedias should be on the list, so to attract editors that speak that language. Andre 03:27, 27 Aug 2004 (UTC)

Agreed. Active, or with more than 10 articles is actually how I feel - something already there to add to. Node 04:55, 31 Aug 2004 (UTC)
I agree with this, though I would make it "active and localized"; I think a completed interface is an important step before listing a lang-wp as "Wikipedia in other languages".
I am currently pruning out the projects with an incomplete interface (that is, with significant untranslated interface-text on the main page) or fewer than 50 non-stub articles (as a measure of activity). +sj+ 03:54, 27 Aug 2004 (UTC)
That's not fair. Be sure at least that they have less than 50 non-stub articles if you're going to prune them simply for having a not-100%-translated mainpage.Node 04:55, 31 Aug 2004 (UTC)

Of the ones I suggested removing, the following are inactive. The numbers show the number of new pages in July, or the total number of pages ever. als (2), an (38 total), az (0), bi (2 edits in 30 days), bn (0), br (22 total), bs (13), chr (24 total), co (0), fo (50 total), gd (14), gn (1), jbo (7 total), ka (0), km (2 total), ky (2 total), lv (18), mg (3), mi (4), mk (1), ml (1), mn (0 in August), na (0), nah (11), oc (0), pa (3 total), roa-rup (2) sc (1 total), sh (1 edit in the last 30 days), sk (2), sq (4), sw (49 total), tg (0 edits in the last 30 days), tl (52 total), tlh (44 total), tokipona (30), tpi (27), vo (0), yi (0),

This means the following have some reasonable level of activity, but have less than 1000 articles and are not amongst the most widely spoken. I'm not convinced these should be included.

Asturianu (Asturian)Беларуская (Belarusian)Kaszëbsczi (Kashubian)فارسی (Persian)Gaeilge (Irish)IdoÍslenska (Icelandic)Kurdî (Kurdish)Lëtzebuergesch (Luxembourgish)Lietuvių (Lithuanian)Plattdüütsch (Low Saxon)Basa Sunda (Sundanese)ไทย (Thai)Tatarça (Tatar)

Of these, Asturian and Belarusian are not fully translated. Asturian has one very new admin, and Belarusian has none. Neither of these should be included yet. Angela. 04:47, Aug 27, 2004 (UTC)

Inactive meaning what? Tatar is widely spoken, as are Sundanese, Thai, Vietnamese, Low Saxon, Farsi/Persian, Kurdish, and Kashubian. Luxembourgish, Icelandic, Lithuanian, and Belarusian are national languages. To not include them would be unfair to their countries. Asturian is also a national language, just not of an independent nation. Node 04:55, 31 Aug 2004 (UTC)
If the rest of them are fully translated and have activity, we should keep them on the list. Andre 18:57, 27 Aug 2004 (UTC)
I've made this change now, so the active, translated ones are included, even where they have less than 1000 articles. Angela. 23:43, Aug 27, 2004 (UTC)
Where are you getting these statistics? Perhaps these have changed greatly in the last month. Please re-add ast: and sk:, which now appear active (20+ edits/day) and fully translated, with 890 and 868 articles, respectively (the latter having a remarkable collection of stubs in Philosophy). Please also add bs: (430 arts, 8 new arts and 1 new image yesterday, a decent collection of top-level articles).
Populous languages with dead wikipedias Please remove bn: (8 arts total, of which only 3 have bn: content; unfinished interface) and te: (3 arts total; English interface and english introduction) and gu: (7 arts total; English interface) and mr: (5 arts total, only 3 with mr: content; English interface). I see what you mean about these langs having a large number of speakers. I want to feature them prominently to expand their communities. But I fret about having completely dormant or unpopulated WPs mixed in with active ones...
I support having much lower standards for populous langs -- for instance, jv: has low activity and only a handful of non-stubs, but should clearly be included -- but they should still have a translated interface and some main-page content. My thoughts are, how will visitors who are excited to see their language on the lang-list, and follow that link, react to what they see? What will they think of the project as a whole? We don't want a visitor to follow a link to bn: and think, "oh, right, translated into 86 languages, my foot." +sj+ 02:20, 28 Aug 2004 (UTC)
Perhaps we could have a separate line, "Major languages in need of contribution": for large langs with small or inactive communities? +sj+
Thus my proposal of a finer distinction. They will know in advance that their Wikipedia needs some help and that it's not typical of translated Wikipedias... Node 04:55, 31 Aug 2004 (UTC)
The stats are from Wikipedia:Multilingual ranking July 2004. Angela. 15:23, Aug 28, 2004 (UTC)
Protest from slovak wiki! :) Please dont remove us! I hope we will have 1000 articles (I hope) in a few days :) We translated interface few weeks ago and now we plan to start campaign for enlarge contributors comunity (your removing could make REALLY BAD IMPACT to this plan) :::::Liso 13:06, 29 Aug 2004 (UTC)
Information and request: for moving to list of wikipedias which contains more than 1000 articles. Slovak Wikipedia just today reached this milestone. --Valasek 09:20, 8 Sep 2004 (UTC)
I've moved slovak to Wikipedias with more than 1000 articles. Good job and congratulations! (Interesting/funny note about sk: sk:Category:Filozofia (Cat:Philosophy) has ~750 articles. This means that more than 70% of Sk is about philosophy/philosophers :-) [[User:Sverdrup|Sverdrup❞]] 09:42, 8 Sep 2004 (UTC)
Sorry about that, Liso. I know you've been working hard recently! There was just some confusion about the size of sk: because it had been small and unfinished last month. +sj+ 03:22, 30 Aug 2004 (UTC)
Oh Sorry Sj! I put my comment into bad place! :-/ I see (and saw) your comments correct! :) I also wanted to tell "thanx u" to your user:page 29. aug, but I was confused because your talk++ is too sofisticated for me :) Liso 20:38, 31 Aug 2004 (UTC)

Giving some support to Liso: I don't think wikipedia links should be removed because they are too small. Sure, you're helping Wikipedia's image, but you are also hurting the growth of those small Wikipedias.

Instead of removing those links, why not make a finer distinction? Instead of just having two categories, "over-10000" and "under-10000", we can have 5 categories: "over-10000", "over-1000", "over-100", "struggling", and "planned". This way visitors know exactly what they'll see when they click on a link, and those interested in building up small wikipedias will see a link right on the main page. -- [[User:Ran|ran (talk)]] 14:16, Aug 29, 2004 (UTC)

I second that. Currently though "planned" may not be appropriate until the main page minimalists quit bitching (after all, wikipedialang does take up a significant portion of the mainpage)... Node 04:55, 31 Aug 2004 (UTC)

Major langs with small wikipedias

Now on their own line. I would hate for someone to click on bn:, which was previously on the first line of langs displayed, and judge our multilingual content by what they saw there. +sj+ 03:22, 30 Aug 2004 (UTC)

Now that another 50 langs were added back in... I don't know how I feel about the 'under 100 articles' lists, but here's a cut at various people's strong beliefs:

  • Main needs to be shorter. (raul654, sj, angela, many others)
  • Some WP projects with almost no content should be listed more prominently, because they have (for instance) over 30 million speakers (angela)
    (Particularly langs from the Indian subcontinent, it seems)
  • Not all 170 created WP databases should be listed on the main page. (angela, sj, others)
    If unfinished langs are listed on Main, their inclusion should be heavily qualified, so as not to taint the great achievements of other langs. (sj)
    • In particular, if you get someone to set up a wiki for you, but don't finish translating the main page, or worse yet, don't have anyone who really knows the language working on the project, that wiki may not be suitable for listing there.
  • Most WP projects should be listed on the main page, to increase awareness that they exist. (Danny, node)
Some WP projects with almost no content should be listed more prominently, because they have (for instance) over 30 million speakers
I frankly don't think this is necessary or wise. For starters, having a special "major languages with small wikipedias" section will soon devolve into a debate about what constitutes "small" and what constitutes "major" (especially the latter, since everyone would be trying to push their native language into the privileged list, regardless of how much progress its Wikipedia has made). Besides, if I'm interested in Wikipedia and I spoke Gujarati (say), I would contribute to its Wikipedia whether or not it's listed as "major". And If I don't speak Gujarati (which I don't), I wouldn't be contributing to it, no matter how prominently it is listed as "major". So why list it? It raises dispute without accomplishing anything.
If unfinished langs are listed on Main, their inclusion should be heavily qualified, so as not to taint the great achievements of other langs.
They're already in the "under 100" category. How much further can they be qualified? -- [[User:Ran|ran (talk)]] 22:57, Sep 1, 2004 (UTC)
Well, those wikis with articles in them are classed as "struggling," and those with just a main page, 99% of links broken, and main page half-English are "planned." Scott Gall 07:15, 11 Feb 2005 (UTC)


Just wanted to say I love the arrangement of the languages section of main page. Supersmart design. Good work. jengod 01:01, Sep 3, 2004 (UTC)

The 100,000-article mark

I like the new language setup too. Suggestion: How about another heading for Wikipedias that reach 100,000 articles? German is already there.

I'm about to put up a new template with commented-out space for a 100,000 heading with German under it. Not to worry: I've previewed the template, with German still visible under the 10,000 heading. I'm just putting it up so that the community can quickly put it in place if it sees fit. Dale Arnett 01:35, 3 Sep 2004 (UTC)

Too few langs in that group for it to be useful. +sj+ 19:42, 4 Sep 2004 (UTC)

A Wikipedian said to me "There must not be any other [besides English] Wikipedias with over 100,000 articles, because otherwise they'd have a heading for that.". So I'm adding this section on the grounds that its absence could be misleading. The argument against is readability, but the list (on the main page) still looks good to me. -- Toby Bartels 23:33, 27 Sep 2004 (UTC)

The trouble is that the 100,000 mark has only been achieved by two. Dropping the list of small wikipedias because of the two big ones is not really a good idea. I have bolded the ones with more than 100K, to give them more indication of their size. Norm 13:54, 28 Sep 2004 (UTC)

Wait, why do we have to choose between 100K and 100; why not both? Somebody wrote that it was too big on the Main Page, but how in the world is that determined? Boldface entries could be a compromise if one is necessary; but since the boldface is gone now and there is no discussion here to argue that we cannot simply have both, I will return it to how I had it. -- Toby Bartels 05:15, 12 Oct 2004 (UTC)

Because the main page is already too crowded, and this section takes up A LOT of room - as much as (if not more) than all the sister projects combined, plus the new articles and the donation banner. What you propose is only going to make a bad situation this worse. Furthermore, by breaking up the languages into a 4th section, it makes finding a particular language even hard now (with 4 alphabetical lists to go through instead of 3) →Raul654 06:07, Oct 12, 2004 (UTC)

I checked out the Japanese Wikipedia's main page recently and I found out it had a 100,000-article heading. Scott Gall 08:53, 17 Jan 2005 (UTC)


I have a question: is there any reason why were using HTML entities instead of the actual character on this page? -- anon

The English Wikipedia (like a few other wikis) only works properly with Latin-1 characters. If you actually put (say) a Chinese character into the edit box, then it will be stored as an entity in the database. (Try doing that in the Wikipedia:Sandbox, or even in a preview, and you'll see for yourself that it's so.) -- Toby Bartels 23:50, 27 Sep 2004 (UTC)

Size of this template

This template has (yet again) ballooned back up in size to 100+ languages. Let me say this clearly - this is not a good thing. It takes up well over an entire screen length on my machine (1024x768 resolution). The full list of 200+ languages is already linked - we don't need to list most of them. The language list itself took up as much space as the entire content-section of the main page. That's very lopsided.

Could you say that more clearly? (: I'm adding langs with over 100 articles back in. If you really want langs with over 30mil speakers to show up, please add them to that smallest list, perhaps changing the subsection title or with an asterisk next to the language name.

Furthermore, extra subcategories are bad - if someone is looking for a particular language, the more lists there are, the more work they have to do to find their language. →Raul654 16:17, Sep 4, 2004 (UTC)

Thoughts on reducing subcats, while improving usability: all langs over 5000 (threshhold for more-than-bot-assisted rapid growth?) could be grouped together, and then all langs over 100 (threshhold for acceptable-browsing-esperience, with some meta- and help pages). We could go back to bolding certain langs, but I don't like the way that displays. Or we could slightly modify the text-color/greyscale/thickness of certain langs, which would suit my tastes better... +sj+ 19:48, 4 Sep 2004 (UTC)

Discusion on parenthesized translations

I call upon Wikipedians to vote as to whether or not to keep the english translations (in parentheses after the native name). Please provide your name and a brief justification for your vote. Alternatively, if you would like to revert to the former method of using latin script transliterations, there is that choice too. Nicholas 09:17, 24 Sep 2004 (UTC)

Keep Translations

In other words, use Name in native script (Translation in english).

  1. AlanBarrett -- Displaying translations of foreign names seems quite appropriate in the English Wikipedia. There is no hard space constraint.
  2. Toby Bartels -- On an English wiki, including the English name is reasonable. But we should alphabetise by native name (transliterated if necessary). Fall back to transliterations if this becomes too lengthy, but it looks OK now to me.
  3. Node 20:29, 28 Sep 2004 (UTC)
  4. [[User:Ran|ran (talk)]] 08:01, Oct 9, 2004 (UTC)
  5. Donar Reiskoffer 08:41, 20 Nov 2004 (UTC)
  6. The bellman 12:55, 2004 Dec 21 (UTC) because i personaly like having a squiz at languages i dont understand.

Use Transliterations Only

In other words, use Name in native script (Transliteration in latin script), without any translation.

Remove Translations and Transliterations

In other words, use Name in native script, without any translation or transliteration.

  1. Nicholas -- Easier navigation; simpler, less chaotic-looking layout; allows room for "Wikipedias in need of attention" category
    No, it doesn't allow room for said category since it would include around a hundred Wikipedias.
  2. [[User:Norm|Norm]] 11:27, 24 Sep 2004 (UTC)
  3. [[User:Mxn|Minh Nguyễn (talk, blog)]] – Alphabetizing by English translation is confusing, because the native name is listed first. Like it or not, the English Wikipedia is the international portal to Wikipedia. Getting rid of the translations allows the non-English-speaking user to quickly find their language, keeps the list looking clean, and is consistent with the interwiki links list. 01:38, 3 Oct 2004 (UTC)
    To clarify: I still support having transliterations here, because it allows users without the appropriate fonts to find their language. – [[User:Mxn|Minh Nguyễn (talk, blog)]] 02:01, 20 Nov 2004 (UTC)
  4. There's no more need for translations of the language names here than there is for the interlanguage links on articles. Angela. 04:57, Oct 4, 2004 (UTC)
    Yes, there is. Encoding issues, bragging, and the like.
  5. Support. Would save space and look nicer. English translations and transliterations would remain on the complete list. (But how would we order them on the Main Page? By size?) Gdr 23:09, 2004 Oct 21 (UTC)
  6. mav 20:21, 1 Nov 2004 (UTC) If you can't read the name, then that encyclopedia and the link to it is no use to you. This also makes for a more compact list; just organize by alpha order of language code.
    • On the contrary, I can think of a number of not-so-implausible situations where transliterations and translations would be necessary: 1. You speak the language but your computer does not support it. Many Wikipedias have helpful information on how to enable support for text in that language. 2. You see that there are obviously plenty of non-English Wikipedias, but you want to know if there is one in Japanese because your parents, grandparents, and all your older siblings speak Japanese, but you don't. (of course you could look at the complete list, but what reason is there to suppose without any previous knowledge that the complete list will be sans transliterations/translations as well?) 3. You are a language enthusiast (if you think this is a small demographic, then check out the lj communities for language enthusiasts, and we also have a nice community here) wanting to see the Wikipedias in a couple of different languages that you don't really understand, or you want to see their page count, but since you don't understand them you can't identify them from the native language, native script version. and really, what reason is there to remove them? somebody mentioned space to add a section for less than 100 pages Wikipedias, but this would be a huge list that would take up much more than even the current template size perhaps --Node 00:43, 2 Nov 2004 (UTC)


I have now put a scrollbar into this template. This is a possible comprimise between the "keep the template small" group and the "smaller languages should be linked" group (like me). The template will appear small on the main page but all languages can be accessed by scrolling the bar. [[User:Norm|Norm]] 11:36, 28 Sep 2004 (UTC)

I'm sorry, but that looks really dumb on the main page. I'm reverting. →Raul654 12:50, Sep 28, 2004 (UTC)


About these ranges that keep on appearing and quickly disappearing on this template: do we need a vote on this? Lately, Maveric149 and Node ue have been undoing each others' edits. (I'm inclined to agree with Node's point of view, but I'd like to see what everyone else thinks about this.) – [[User:Mxn|Minh Nguyễn (talk, blog)]] 21:40, 25 Nov 2004 (UTC)

It is simply not-accurate to not include all language versions with more than 100 articles in a list with that name. It is much better to say exactly what that list is. --mav 06:47, 29 Nov 2004 (UTC)
So what? People get the idea, only incessant nitpickers will notice there is anything to it and even then they probably won't care. The information is successfully transmitted by my version, it is less confusing, and yours presupposes that the reader is an idiot while mine does not. --Node 02:03, 7 Dec 2004 (UTC)
Mav, that's a bit silly, and we really don't need any 9999's. Nothing is unclear, and even the font sizes contribute to my intuitive understanding that some of the wikis are bigger than the others. Btw, I don't agree with the ranges; at 9999 articles, that's a 10K wiki we should put in the current first tier. ✏ Sverdrup 01:34, 17 Dec 2004 (UTC)
(Also, if you are really grumpy about this, let's look at it like a switch from C; since there is no {mono|break;}}, the upper tier includes all the languages in the list ;-P ✏ Sverdrup 01:36, 17 Dec 2004 (UTC))

Replacing the 100's with 50,000's

I'd like to propose dropping the 100 article section at the bottom (100 article encyclopedias aren't terribly useful) and adding a 50,000 group at the top. What do others think of this? →Raul654 16:27, Dec 21, 2004 (UTC)

Personally, I would strongly suggest adding in a 100,000 section after Japanese makes it to 100,000 articles. 50,000 is off from the pattern that's been set by the other sections (100, 1000, 10,000) -- [[User:Ran|ran (talk)]] 02:06, Dec 22, 2004 (UTC)
That is fine with me too. Obviously as the projects grow in size (and the number of projects grow), the ranges that are shown should grow with them. →Raul654 02:11, Dec 22, 2004 (UTC)

Since no one is really objecting, can I take this as implicit agreement? →Raul654 20:02, Dec 22, 2004 (UTC)

I object on the grounds that the 50000 category would be extremely small, and to create such a new category would be basically saying "Well, now that so-and-so has reached *our* level, we're going to move up" when it's not necessary.

You are sorely wrong about 100 article encyclopedias. This section also includes Wikipedias with, for example, 990 articles, which is indeed "terribly useful". We should give a fair chance to all Wikipedias. You are just being a pedant about the length of the mainpage. You've cut just about everything else to, like, nothing, why can't you leave this part alone? --Node 02:35, 24 Dec 2004 (UTC)

Node, the discussion now is about a 100,000 category, not a 50,000 category.

As it currently stands, the language list is misleading, because it moves up in increments of ten, giving the impression that no language has passed the 100,000 mark. Once Japanese makes 100,000 (in a month or two, probably), a 100,000 section will be a welcome addition IMO. -- ran (talk) 02:48, Dec 24, 2004 (UTC)

It says "over 10,000", not "less than 10,000".

Raul is right about the size of the template, but I can't agree with him about the exclusion of these small Wikipedias. With Japanese at 100k, there will still be only 3 WPs at that level. We should wait to do anything like that until there are at least 7 or so.

If you think it's so important for people to know the actual article count of Wikipedias, it might be included after the link to the Wikipedia, but that would take up loads more space.

As you say later, these are growing Wikipedias that need cultivation (Raul is obviously not sympathetic). To start a new category with only 3 members and ditch a preexisting category with quite a few more members is rediculous. If you're so upset about the size of our language template, look at the one on zh:. People there are discussing whether or not to add a 100k level, but nobody's mentioned dropping the 100 level, and they actually already have a fourth level with the "languages of Chinese ethnic groups", and they manage to fit it in less than a screen. We can make the font size a point smaller for now, but it's not OK to make drastic changes without a complete consensus. If you're going to take the authority given to you by "the vote" and just trump policy so you can have your way, that will be a sad day in the history of en.wikipedia. (the last part was talking to Raul654) --Node 06:51, 26 Dec 2004 (UTC)

Agreed - drop the 100's and add 100,000's. How long until it does (Japanese wikipedia hits 100,000)? →Raul654 02:52, Dec 24, 2004 (UTC)

WPs stuck in the '100s' bin that have 990 articles are not a point. They will be reaching 1000 articles in a matter of weeks. And no, I do not believe collections of <1000 articles can properly be called 'encyclopedias'. They are mostly driven by small communities, and not of general interest. Also, the concern to reduce main page size is real: it is the page with the most hits by far, significantly contributing to our bandwith use, and also people surfing via cellphone/PDA pay by the kb transferred.
as for the 100'000 threshold: I disagree. We cannot procede in powers of ten indefinitely. It is easy for a project to grow exponentially at the beginning, but the growth will curb as it matures. The aim of the tiers should be to roughly categorize sizes. At the moment, 50'000 would be the more useful threshold than 100'000 (which still will only have 3 members once ja gets there, one being en itself, i.e. not a link at all). dab () 09:33, 25 Dec 2004 (UTC)

Are you able to read or contribute to any of these Wikipedias? If you were, I don't think you'd be saying that.

I have tried to surf to the mainpage with my cellphone. By far the greatest problems occur with the fact that the default template has a sidebar which isn't widely supported, and the login form isn't compatible with most phones. If you're going to worry about people browsing from handphones, you should worry about the bigger problems first and sort out the minor kinks later.

What you said about why we shouldn't have the 100'000 category in my view also applies to the 50k category. It simply isn't large enough yet to warrant separation. Catalan joining it (ie, the first language that's not the national language of an independent nation... well, excepting Andorra) was a major milestone, but I still think we should give it some time and wait for it to even out a bit (if you look at it right now, the 1000 section is growing to be around the same size as the 100 section, hopefully with a little time the same thing will happen to the 10000 section and /then/ perhaps we can consider a new section (not that we can't now - I just think it would be more practical to wait.) --Node 06:51, 26 Dec 2004 (UTC)

btw, the Poles are nearly there, too. dab () 11:30, 25 Dec 2004 (UTC)

I disagree with removing the 100-1000 wikipedias. We should be promoting the growth of these small wikipedias by linking to them. -- ran (talk) 04:03, Dec 26, 2004 (UTC)
Exactly. But Raul only cares about en.wikipedia, more specifically the length of the mainpage. There are other ways to minimize template size. You don't have to slice off a big section... --Node 06:51, 26 Dec 2004 (UTC)

While very supportive of other language (and specialty) wikipedias, as an ex-ad person/periodical editor having blocks of dense text does not help the advertisers. A clear link to a list of detailed information actually works better (you get curious people - potential contributors - searching for a specific language, rather than causing pupil drift with walls of words.) I am in favor of removing the <1000 'pedias; as mentioned above many of the active communities are near or will soon pass the 1000 mark. The others will get there, or not, on their own merits. - Amgine 07:56, 26 Dec 2004 (UTC)

for the record, my 'native' WP would be als:. I don't need it to be on the English Main page to find it, and I have no problem with it dispappearing from it until it has accumulated a decent number of articles. There is an obvious link to a complete list, and I would actually prefer not touting such a stubby project on the Main page. But I agree that it is possible to disagree with me on this, of course. dab () 11:29, 26 Dec 2004 (UTC)

Yes - I agree that that 100s list should be removed. Any Wikipedia that small is not very useful at all and the main audience for the Main Page is for readers, not writers. If somebody really wants to contribute to a language not listed on the Main Page, then they can easily find it in the complete list. I really don't care if it is replaced by a 50,000 or 100,000 list, but the latter does look a bit nicer. --mav 22:39, 26 Dec 2004 (UTC)

Saying there are 80 languages when 54 are listed is very deceptive. How can you judge their usefulness if you can't read a single one of them? --Node 22:41, 26 Dec 2004 (UTC)
Then say there are 54 or don't mention that at all. We need to make a break somewhere, and 100 is way too small. --mav 23:11, 26 Dec 2004 (UTC)
Why is 100 "too small"? I'm not the one trimming the template, nor was I the one who added the number. I'm just saying, if you're goign to take languages off the template, at least update the count too. Geez. --Node 00:47, 27 Dec 2004 (UTC)


the 'Other-langs2.png' icon may be useful on the mainpage (but if so, near the top, not next to this list), but it should not be in this template, as it shows up on Main Page (text only), now. dab () 09:59, 25 Dec 2004 (UTC)

Raul654 is prematurely implementing his proposed solution without a consensus

A major problem is going on here, Raul654 is violating policy by implementing his decision without waiting for consensus. That's not acceptable. If consensus agrees with him, then it's fine if he makes those changes, but so far consensus has NOT BEEN REACHED!!! --Node 07:00, 26 Dec 2004 (UTC)

Node, you are hardly one to talk about not having consensus. Looking at the history of this page, you've gotten in a revert war with pretty much anyone who has tried to change it in any way - this includes (but is not limited to): reverting people over the use of commas in the sizes, reverting people over the use of font tags, reverting people over including the ranges, and reverting people over which sections to include. (And of course, your massive vandalism on the Toki Pona wikipedia which almost led to wikimedia-wide ban) Clearly you have a demonstrable lack of ability to work with others. There are obviously a large number of people here who want to add a higher up number and drop the 100 article "encyclopedias" (if they can be called that), but (once again) you don't seem to quite grasp that your opinion is not the only one. →Raul654 07:07, Dec 26, 2004 (UTC)
A "revert war" usually takes two or three reverts, and that hasn't happened very often except two or three times with you. How many people here have expressed the desire recently to drop the 100+ wikipedias (note: this doesn't include Ran, who supports adding a 50k category but not removing 100+). I am talking about major changes here, not font tags or commas or ranges. And don't bring up tokipona: - you're the one who removed it from this template citing "no original research". I added it back, since I obviously hate the Toki Pona Wikipedia so much. Your blanket insults to small Wikipedias are very harsh. Have you ever considered lightening up a little? --Node 07:18, 26 Dec 2004 (UTC)
Be bold strongly suggests otherwise. There comes a point where inclusionism must be re-evaluated. The dispute resolution guidelines also states that straw polls are not binding, and tend to be divisive. - Amgine
So I can revert him with just as much confidence?
I assume you may, to the 3RR. However, rather than expending your effort in doing so, why don't you work to improve some of the wikipedias you wish to see included? I've broken 700 articles on my personal wiki in about 3 months. I'm sure you could do similarly, diversifying the listing. - Amgine 07:23, 26 Dec 2004 (UTC)
It takes a lot of work to write even one good article. What exactly is your "personal wiki"? If you were able to break 700 articles in 3 months, my guess is that a lot of them are one or two sentences long and things like that. Do you think all the Wikipedias should do that just so they can reach 1000 articles? What I can contribute to most of these Wikipedias is limited since I don't speak most of these languages. However, I do provide them with a great deal of assistance when they ask for it and I can tell you that these people are working very hard. Why add a category with 5 members to remove a category with more members? People may get confused and think we don't even have a Wikipedia in their language. Many of the Wikipedias that take off suddenly do so because people find them on the en.wikipedia mainpage (although not the majority, of course). I'm still not sure exactly what the reason is some people have to suddenly hack off one end of the template and add another level? Why not settle for bolding 50k langs and putting "languages with over 50,000 in bold"? --Node 18:44, 26 Dec 2004 (UTC)
My opinion is MediaWiki is extremely successful, and has spawned a plethora of wonderful projects which will grow, or not, based on their own merits and not the advertising of their existence on Wikipedia (which is why my own wiki is not listed here.) Adding a listing of all possible wikipedia languages is counter-productive - it does not draw attention to a specific language wikipedia as much as it chases away potential contributors by too much detail. - Amgine 07:10, 26 Dec 2004 (UTC)


We have work very hard on our language Wikipedia. Even now this template is still tiny. Please think of us and let it stay!

I second it (from tl:). -- 22:48, 26 Dec 2004 (UTC)
Work harder and your wiki will soon pass the 1000 article threshold. Until then, your wiki will still be listed in the complete list. --mav 22:45, 26 Dec 2004 (UTC)
We are already working hardest!!!! I promise you it takes much effort. Do you use small Wikipedia? Do you know the problem?
The Main Page is for readers. Wikipedias with less than 1000 articles are only of interest to contributors, not readers. --mav 22:53, 26 Dec 2004 (UTC)
This is badly wrong. Again, do you read some small Wikipedias?
No, it's dead-on correct. 99% of our (en's) hits come from people who would never make a single edit. →Raul654 23:00, Dec 26, 2004 (UTC)
What are you talking??? I say that the smaller Wikipedias often is still useful ones.
No, a 100 article "encyclopedia" is not helpful to anyone. And to say again - if most of the people looking at the en main page are not ever going to contribute, there's no point in linking to a language where there is nothing to read (and thus the only thing to do there is to contribute). →Raul654 23:15, Dec 26, 2004 (UTC)
And you know this *how*? Again, this is a RANGE. Some of these Wikipedias have 102 articles, some have 950. Some are useful, some aren't. There is stuff to read at these Wikipedias. --Node 00:45, 27 Dec 2004 (UTC)
Your actions would look less like an attempt at coordinated manipulation in an attempt to evade WP:3RR if they weren't from anon editors with no other contributions whatsoever - David Gerard 22:51, 26 Dec 2004 (UTC)
What do you talk about? I only revert the page twice until now.

good grief. It's not like anyone wants to take away your small WPs. Yes, I do occasionally read (and edit) als:. And I do not care to have it listed on the English main page before it reaches 1000 articles. It's great that you work hard on these small projects, but ask yourself whom this hard work is for. The articles you do have will be linked by interwiki, so googlebot will find them, no problem. Apart from that, you should try to be linked from relevant sites in your language. The Main page, otoh, should be optimized for readers, not for search engines, and not for editors. dab () 12:30, 28 Dec 2004 (UTC)

In some cases, there are few sites in the relevant language. als.wikipedia is a special case because it is a "niche project" of sorts. While there are probably hundreds of Wikipedians capable of editing it, very few do because they feel their efforts are better spent at de.wikipedia (I personally don't condone such an attitude). However, all users speaking Hindi, Kannada, Tamil, Malayalam, Urdu, Vietnamese, Thai, etc. will be very much pleased to see the Wikipedia in their language linked from the mainpage because these are not "niche projects" by any measure. --Node 22:13, 1 Jan 2005 (UTC)


If we are going to have a 50 000+ section (something that i fully support) then the others should be in multiples of 5s aswell. 5000+ and 500+. I think this is a much better idea, since 5000+ is significantly more useful than 1000+ but not significantly less useful than 10 000+. also the 500+ allows for some of the smaller wikipedias to get some diserved free publicity (if theyve managed to get 500 that quite an achivment) with out having as many as a 100+ list would. It also makes the whole thing more uniform.

comprimise people, comprimise.

thats my two cents anyway.

The bellman 12:09, 2004 Dec 28 (UTC)

why not? we could drop the 10**n bins entirely and have 500+, 5000+, 50000+. would that make sense? the bins would be shifted back to 1000+, 10000+, 100000+ at some point in the future (maybe when four or five WPs reach 1E5) dab () 12:43, 28 Dec 2004 (UTC)

I could live with that. --mav 20:30, 28 Dec 2004 (UTC)
I'd like to hear other input, but I think that a 10K encyclopedia is _a lot_ better than a 5K one, at least thinking about those I've seen. (This means, I like the 10K binges better) ✏ Sverdrup 03:43, 31 Dec 2004 (UTC)


if people really care so much about being linked from here, the raising of the threshold may result in lots of 1-word articles being created on the smaller WPs, just so the article count rises above 1000. (this may already have happened on sa:. Not that I care about this, just an observation. dab () 11:43, 29 Dec 2004 (UTC)

If that is the case, then we should remove that link from this template. --mav 14:48, 29 Dec 2004 (UTC)
Agreed. →Raul654 14:53, Dec 29, 2004 (UTC)
No. I did not mean to say people on Sanskrit created super short articles only to be linked in the >1000 section. I think they create super short articles because nobody is really fluent in Sanskrit, and they are content just to give a short definition, and maybe cut-and-past some English bits. That they are just above 1000 articles at this moment may or may not be a coincidece. The lesson is just that article count is not really a solid measure. Sorting by number of words would make more sense. dab () 15:06, 29 Dec 2004 (UTC)
The main problem with sa.wiki is not the number of stubs, but rather the number of pages which are simply verses of the Rgveda. The majority of the pages fall into this category and should be merged into longer pages and moved to Wikisource under religious texts. When this is done, there will be perhaps 200~300 articles maximum, with perhaps 50 of them being decent articles and most of the rest being stubs or links only (some pages only contain a list of links to other related pages - and sometimes the related pages are the same way). This issue isn't being addressed well by the adminship at sa.wiki (it *is* a very cumbersome task, and it would take much more effort than it's probably worth). And you are wrong about nobody being fluent in Sanskrit. Many learned Hindus are fluent in Sanskrit, and there are now 7000 native speakers thanks to a movement based in Kerala (if you ask me, the people who speak to their kids only in Sanskrit are off their rocker...) I believe the oldest native speakers are now in their 20s but I may be wrong. There are Wikipedians fluent in Sanskrit including but not limited to Hari Prasad Nadig. --Node 22:19, 1 Jan 2005 (UTC)
while we're on this topic, I note that the accomplishment of Swedish (over 50K articles from a language with less than 10m speakers) isn't as remarkable as I had thought -- look at the Swedish wiki. No page I have been able to find (except maybe a couple linked from the main page) is over half a page long. It's 50,000 stubs! That's disgraceful.
I had been hoping to copy and translate some of the articles on the Swedish language or grammar into English and add them to the English wiki, but I couldn't find any that were nearly as informative as what .en already has! (e.g. swedish language) Sad day for Sweden. I'd like to officially put the entire swedish wiki into Template:Requests for expansion.
Seriously, isn't there some way we can count total words instead of articles, or only count articles over 1000 words, or something? This is pitiful.
Steverapaport 23:20, 29 Dec 2004 (UTC)
What are the criteria for an article? Hmm? Articles with less than 50 characters aren't counted as it is, and it's very difficult to count words in some languages. --Node 00:45, 31 Dec 2004 (UTC)

That does it. We should categorize by number of words. We can pick the numbers from http://en.wikipedia.org/wikistats/EN/Sitemap.htm

  • en 143M
  • de 54M
  • ja 25M
  • fr 22M
  • nl 11M
  • pl 10M
  • es 10M
  • it 9M
  • sv 7M
  • pt 4M
  • zh 4M
  • he 4M
  • bg, ca, eo, et, da, fi, hu, no, ro, ru, sr, sl >1M

i.e. Swedish comes 9th, not 5th. We could make tiers >1M, >2M, >4M, >8M, >16M for example. Anyone into creating an alternate template?

I created Template:Wikipedialang (word count) as an example. It turns out only 24 WPs have >1M words, so we would probably include another tier with >250k or so. But I'll only do this if the teplate is actually to go live. dab () 12:43, 30 Dec 2004 (UTC)

I vote no for a few reasons here. 1) statistically, some languages just take more words to say the same thing than do others. Documents in German, for example, will almost always be much longer than their English equivalents for a number of reasons. Documents in Chinese are often very short when compared to their English translations. 2) There is definitely no clean cutoff here. 10M includes only a handful of Wikipedias, if you start at 1M that includes too many (ie, those which are still fairly small and should probably wait for 10k articles to be ranked on a similar tier), and if you start inbetween it looks confusing and uneven and ugly. 3) This will simply encourage people to use more words to say the same thing, which will turn Wikipedias into huge confuseatoriums. It is possible in any language as it is in English to extend the length of a sentence a significant amount without adding anything meaningful. --Node 00:45, 31 Dec 2004 (UTC)
I vote yes to your new template, Dab. No sense encouraging people to create high-profile wikis full of nothing. By the way I'm living in Sweden now and I promise to encourage people here to contribute real articles to sv.wikipedia.org. Right behind Italian is still not too bad considering that Italian has over 60 million speakers and Swedish has 9 million.

Steverapaport 13:03, 30 Dec 2004 (UTC)

Going by words would be a bad idea. Different languages divide words in different ways (e.g. many Native American languages may use just one long word where English uses several short ones), and in East Asian languages like Chinese and Japanese, the very concept of "word" is fluid at best, and spaces are not used to separate "words" in those languages, so it's impossible to count the number of words in those Wikipedias. -- ran (talk) 15:31, Dec 30, 2004 (UTC)

bad compared to what? it the concept of 'word' is fluid, how much more fluid is the concept of 'article'? I am aware of this problem, and it is ultimately impossible to measure the relevance of the individual WPs numerically. However, I am convinced that word count is much more reliable than the present practice of "article" count. So I argue my proposal goes from a 'very bad' idea to a 'slightly bad' idea, i.e. it is an improvement. dab () 15:48, 30 Dec 2004 (UTC)
another idea would be to go by number of characters. But this would be really unfair on the chinese (or, each chinese character would have to count for like four latin characters). dab () 15:52, 30 Dec 2004 (UTC)

Perhaps "fluid" isn't strong enough a word... I'd go on to say that the concept of a "word" in Chinese and Japanese exists only among linguists, is completely non-existent and meaningless for laypeople, and certainly cannot be counted in any meaningful way in our current context (unless we get a human being to manually count every article in the Chinese and Japanese wikipedias using an agreed-upon set of definitions on what a "word" is). So talking about the number of "words" in the Chinese or Japanese wikipedia is truly like comparing apples and oranges; it's like taking a barrel of oranges and saying, "there are x apples in this barrel."

How about using bytes instead? That is surely fairer than any of the methods we have so far. -- ran (talk) 17:35, Dec 30, 2004 (UTC)

True, but bytes mean much less to the ordinary individual than words. Re "the concept of a 'word' in Chinese and Japanese exists only among linguists, is completely non-existent and meaningless for laypeople"—surely you're joking! If this were true, then there would be no Chinese or Japanese dictionaries. Wikistats is able to count the number of words in the ja wikipedia, and come up with a very reasonable number. I also vote yes to the new template, but would prefer it to be divided per 10M, 1M, 100k words. GeorgeStepanek\talk 00:08, 31 Dec 2004 (UTC)

Apparently you people aren't understanding what Ran is saying. While "words" themselves do exist in Chinese and Japanese (ie, lexemes), word division is notoriously difficult and not widely agreed upon. For example, in Chinese, some people insist on separating each character as a different word. Others insist on separating at the lexemic level (ie, the translation of "company" is two characters but one lexeme). Another problem is whether or not to include grammatical particles to indicate aspect, possession, etc. A similar problem occurs in Japanese, although more of the latter (do we include grammatical particles or do we make them count as separate words?). In Greenlandic, entire English sentences can often be said with between one and three words (depending on the complexity and length of the English sentence). Word division in Thai and Lao is next to impossible. --Node 00:45, 31 Dec 2004 (UTC)

Thanks, Node, for all of the extra info. I certainly didn't know that about Thai and Lao. :)
George: Why would I be joking? Chinese dictionaries are ordered by characters, not words. If anything, using Chinese characters as the equivalent of words (as Wikistats seems to be doing) would give Chinese (and Japanese, as well) an unfair advantage over European languages, since each Chinese "word" is made up of one or more characters. -- ran (talk) 04:27, Dec 31, 2004 (UTC)
I can well accept that linguists and other experts would have ongoing and furious debate about what is and what is not a word. However, I find the assertion that laypeople would find the concept meaningless to be somewhat bizarre. A word is what you find in a dictionary. Whether it be one character or many, each heading in the dictionary describes a distinct word.
Some languages may decline and conjugate words for time, gender, aspect etc., whereas others use additional words to convery these meanings. For example, Latin uses fewer words than English to convey the same meaning—but they both use words. In this case English would have the advantage over Latin in terms of article length. Likewise, German is a more agglutinative language than English, and may encapsulate several words in one. But Germans still understand the idea of a word. German word processors include functionality to perform word counts.
Now, I accept that it may be much harder to count the number of words in a Chinese and Japanese text than in one that's in an alphabetic language. But that's a technical issue: it doesn't mean that the concept is invalid. Counting bytes is much easier, but that metric can also be misleading. A single Chinese (Unicode) character comprising two bytes may convey as much as single English word of 4-8 bytes.
The question is which metric best represents the size of the wikipedias, and which would mean the most to the ordinary individuals that use the wikipedias. It's a tough call, but given the revelations about the Swedish wikipedia, my money is on words rather than articles or bytes. You are free to think otherwise. GeorgeStepanek\talk 06:30, 31 Dec 2004 (UTC)

  • of course there are words in Japanese, and yes, even in Chinese
  • Yes, they are difficult to count. But you don't have to go to China for that. How many words is train station? two? And Bahnhof? One? How very unfair on the Germans...
  • Luckily, we don't need to come up with an exact, reliable number. This is about rough categorization. And the unfairness will consist in some WPs being left in the lower tier a few weeks than others.
  • comparing number articles, words, and bytes in zh and ja, I must say that the word count seems quite credible, and certainly good enough for our purposes.
  • the fairest measure would probably be "size of the gzipped article text". But such an approach is certainly too esoteric for the main page statistics directed at the general reader

dab () 09:04, 31 Dec 2004 (UTC)

However, I find the assertion that laypeople would find the concept meaningless to be somewhat bizarre. A word is what you find in a dictionary. Whether it be one character or many, each heading in the dictionary describes a distinct word. George: Have you ever used a Chinese dictionary before? Listen to a native Chinese speaker on this one: Chinese dictionaries are ordered by characters, not by words. Larger dictionaries do have subheadings for words/phrases, but that is not what Wikistats is counting; Wikistats counts characters, which as I have said again and again, are not words. This may seem "bizarre" to you but it's precisely how it works in Chinese.

But Germans still understand the idea of a word. German word processors include functionality to perform word counts. The German analogy simply is not valid. Germans still recognize the concept of a word with spaces between letters, which word processors can count. Chinese linguists also recognize words (in a vague way), but Chinese laypeople don't separate words out with spaces; all characters are crammed together regardless of word boundaries. So Chinese word processors count characters; they can't count words.

Ultimately it comes down to this: I'm opposed to Wikipedia spreading patent falsehoods and sloppy generalizations. At least what we have now is factually accurate in what it attempts to measure; the new template is simply incorrect in what it claims to do, and the numbers it gives are hence meaningless. Even Microsoft Word's word count feature knows better; it gives "words" and "East Asian characters" under separate headings. -- ran (talk) 23:38, Dec 31, 2004 (UTC)

I moved your comments to what I felt was a more appropriate subsection to continue this debate. I hope this is OK with you. ran, perhaps I haven't expressed myself clearly enough. I agree with your point that Wikistats currently may not be providing accurate wordcounts for Chinese and Japanese, but I do not see this as a fundamental problem. It should certainly be fixed, not least to correct the information provided by Wikistats. (A simple solution would be to introduce a scaling factor for each language based on the average number of characters per word for that language.) I suggest you take this up with the stats team, to correct these errors at source. For the purposes of this template, I feel that it's OK to take the Wikistats info as given. GeorgeStepanek\talk 23:58, 31 Dec 2004 (UTC)
The Cambridge Encyclopaedia of Language states that "[Chinese] characters in fact often represent parts of words (morphemes) as well as whole words". They agree that there is no simple one-to-one correspondence between characters and words, but they still use the word 'word'. It is not a meaningless concept for logographic languages such as Chinese. For polysynthetic languages (such as Eskimo and Mohawk) the issue is open to debate, but these languages are not widely spoken, and are not likely to appear on this template. GeorgeStepanek\talk 00:54, 1 Jan 2005 (UTC)
You have still not addressed the problem that different languages take different numbers of words to say the same thing, regardless of polysyntheticity. A paragraph in German will almost always be longer than its English translation, Chinese texts tend to be shorter than their English equivalents, and the same is true throught the language spectrum, with polysynthetic languages at one extreme, and isolating languages at the other. Languages like Arabic include as part of a word what we would separate - the translations for "in", "for", "and", "the", etc are all considered part of the word they directly modify. (and the human = wal'insaan = wa-(a)l-insaan) In a single sentence or a single paragraph this may not make a huge difference, but in an entire Wikipedia, this makes a very significant difference. --Node 21:47, 1 Jan 2005 (UTC)
it does, but the effect turns out to be smaller than the variation of article lengths. The word count is of course not accurate, and there is a systematic error in comparing different languages. But take a minute to compare article numner, word count, and byte count of the various WPs. You will note that the word count is reasonable, and more reliable than article count. That's all. dab () 22:41, 1 Jan 2005 (UTC)
No consensus has yet been reached. Why are you putting the new template in?
Look, dab: if there's one thing Wikipedia shouldn't do, it shouldn't be giving misinformation, especially about itself. To give the number of Chinese characters as the number of words is precisely that: you might as well give the number of English syllables as the number of English words. If this kind of sloppiness were found anywhere else in Wikipedia it would have been torn to pieces. And yet you're inserting it right into the main page, without any consensus whatsoever. -- ran (talk) 19:40, Jan 2, 2005 (UTC)

Okay, how about this:

Put a footnote at the bottom stating that the counts for Chinese and Japanese refer to characters, not words. -- ran (talk) 19:53, Jan 2, 2005 (UTC)

Suggested modifications to the word count template

OK, how about something like this...

Complete listMultilingual coordinationStart a Wikipedia in another language

I think this gives a good indication of the comparative sizes of the wikipedias. It is also potentially very concise. GeorgeStepanek\talk 10:09, 31 Dec 2004 (UTC)

good, down to maybe 5M. Smaller ones should still be lumped together alphabetically, because there will be no end to updating and switching otherwise. i.e. we would have
1. en (143).... sv (7)
2. WPs between (say) 500k and 5M, alphabetically.
dab () 11:00, 31 Dec 2004 (UTC)
Why not resequence only if a language jumps up to the next million words: for articles under 5M words this won't happen all that often. I'm not sure how to represent 500k-900k. How about e.g. Igpay Atinlay/Pig Latin (0.7)? GeorgeStepanek\talk 11:51, 31 Dec 2004 (UTC)
I think the information if a particular WP has 0.6 or 0.7 million articles is not of sufficient interest to sort the list by it. I think we should round to the million, and list languages within the same million alphabetically. But before we get down to figuring this out in detail (feel free to edit the suggestion template, Template:Wikipedialang (word count)) we need a consensus that we actually want to sort by word count rather than by article count. Maybe we should announce this idea on Talk:Main Page, to see if it gets shot down immediately. dab () 11:58, 31 Dec 2004 (UTC)
I have edited your template proposal to include WPs down to 500k in my suggested format. I think this is a good lower bound, as it's the size of a modest pocket encyclopaedia. The use of the ½ symbol is a little unusual, but I think that it conveys the information sufficiently well. GeorgeStepanek\talk 22:08, 31 Dec 2004 (UTC)
I like it, especially the 1/2 idea. We should include 1/4 (i.e. all WPs with >250k words) and then give it a try. dab () 10:49, 1 Jan 2005 (UTC)
Done. GeorgeStepanek\talk 22:40, 1 Jan 2005 (UTC)

In answer to User:node I'm not trying to define how many characters makes an article, I'm referring to the utility/completeness of the thing. Any article on the english Wikipedia under a certain size gets a "stub" tag, and people are asked to expand it. See stub.Steverapaport 22:46, 1 Jan 2005 (UTC)

The template looks very nice now. I especially like the concise way of providing additional information (the 143M word count of en: is not on refered to on the Main Page, at present), and the natural way of having 250k, 500k, and then 1M tiers. I suggest we give it a try on the Main Page, together with a short summary of the things discussed here on Talk:Main. dab () 09:32, 2 Jan 2005 (UTC)

whoa, but it leaks. see this paragraph is all in small fontsize now (until the template is fixed, that is). dab () 19:35, 2 Jan 2005 (UTC)