thumbnail

Which Language Has The Best Counting?

from linguistics on 2020-03-25 ↩ back

BEST SITE

BEST SITE

BEST SITE

FACT: www.languagesandnumbers.com is the best website on the Internet.

FACT: A resource for the the numbering systems of over 200 languages (and 71 conlangs) is the most important creation since pineapple on pizza.

FACT: I have no idea why I’m so excited that I found a website that contains the numbering systems of over TWO HUNDRED languages, but I didn’t realise this was something I wanted.

Today we’re finding out which language has the best counting. But how do you decide which language has the best counting?

You can’t.

Each language counts in their own different way, they all have unique quirks and none of them are better than any other.

But that would make this post too short, so instead we’re going to base it off string length. For each language(1), we’re going to find the biggest string we can make (E.g. In English “seventy-five” is 12 characters long) and compare it with the other languages. The language with the shortest string wins and is crowned “Language with the Best Counting (along with all the other languages who count just a good, but this one has the shortest string length in our scoring system)”.

(1) Only the ones I’m interested in instead of doing all of them. Not going to make that mistake again.

Although, we’re only going to compare Indo-European languages because most of them have similar alphabets and I don’t think it’s fair to compare the length of the letter “g” to whatever this: “सौ” is.

Another problem we’ll have is the simple fact that numbers go on forever, so we’ll be setting an arbitrary limit of something like… I dunno, maybe 999,999,999.

Also, I’m not going to write a program to do this, I’d actually like to learn how to count and that’d ruin the fun of it (because apparently I do this stuff for fun now). We’ll be including all spaces, commas, and hyphens in our counts as well.

Alright(2), let’s start with English as a warm-up.

(2) That’s four paragraphs in a row that start with “A”. Completely unintentional, but I searched for things with the acronym “AAAA” in case it meant something. One of which was the “Anonymous Acronym Abuse Association”. Just something to think about.

English

It should be obvious, but our limit does not make the largest string because the start of the number: “nine hundred and ninety-nine million” (36) is shorter than “nine hundred and ninety-seven million” (37). We can be sure however, that it will be in the hundreds of millions as a bigger number would mean more words.

In English, all the three-digit numbers are constructed by stating the hundreds, then adding the “and” word, then the tens and the digits. E.g. 111 is “one hundred and eleven” (22). We can the add the multiplier of “thousand”, “million”, etc. to make “one hundred and eleven thousand”, “one hundred and eleven million”, etc. This means we just need to find the biggest three-digit string and repeat it three times.

The hundreds in our three-digit number is always spelt as a singular unit from 1-9. The biggest one is tied between “seven”, “eight”, and “three” which are all 5 characters. I’m going with “eight” because that’s the day I was born.

Next, The tens are formed by adding the “-(t)y” suffix at the end of the multiplier digit root, with the exception of ten. From twenty-one to ninety-nine, the tens and units are joined with a hyphen. So, we can ignore any number below 21 because they are all only one word, and I’m not sure if you know this, but two is bigger than one. The more you know, yeah?

The biggest tens is actually just “seventy” (7) this time. So combined with our unit, it makes “seventy-eight” (13).

Now putting it all together, we get 878,878,878 or “eight hundred and seventy-eight million, eight hundred and seventy-eight thousand, eight hundred and seventy-eight” (114). We also could have done 777,777,777 or 373,373,373 to get the same length.

French

I’m going to try and summarise the explanations, so there will probably be a lot missing. If you get confused just go to the website and check for yourself.

Digits and numbers from zero to sixteen are specific words and the biggest one is 4: “quatre” (6).

Seventeen to nineteen are regular numbers, named after ten followed by a hyphen and the unit. Compared to English, this should be better as we only got two word numbers on our tens after 21. Our new numbers are: “dix-sept” [10+7], “dix-huit” [10+8], and “dix-neuf” [10+9], all the same length of 8. These are also all tied with 14: “quatorze” (8) to be the biggest.

We are going to skip 20-50 because from 61-99, base 20 is used and we are following our “two words are bigger than one” rule. This makes 70: “soixante-dix” [60+10], 80: “quatre-vingts” [4*20], and 90: “quatre-vingt-dix” [4*20+10]. The biggest of which is “quatre-vignt-dix”(3). Combined with a unit, the biggest tens we can make is 94, 97, 98, and 99, but we’ll use 99: “quatre-vingt-dix-neuf” (21).

(3) Although it could have been “quatre-vignts” if any of the numbers from one to nine were bigger than eleven to nineteen.

Now we multiply the biggest unit to get the hundreds, making 400: “quatre cent” (11) and adding our tens we get 499: “quatre cent quatre-vingt-dix-neuf” (33)(4).

(4) Au fait, you should start seeing the pattern I’m referencing numbers by now. It will mostly be: [numeric number]: “[number in words]” (length of number in words)

With that we can add the multipliers for a million: “millions”, and a thousand: “mille” to make 499,499,499: “quatre cent quatre-vingt-dix-neuf millions quatre cent quatre-vingt-dix-neuf mille quatre cent quatre-vingt-dix-neuf” (116).

Surprisingly, that’s only two characters more then our English string. But with English we had to put in commas and add the word “and” after our hundreds. Out of curiosity, I checked what the string would be like without those things, and it came up with a nice round length of 100.

German

This is where the real challenge starts, as we’ve run out of languages I already know how to count in, meaning you should probably expect some mistakes (but hopefully not).

We’re doing German first because I’ve never really had a close look at a Germanic language (besides English, but that doesn’t count), so I think it would be interesting.

Keeping with the process we used with English and French, our largest unit is 7: “sieben” (6).

The tens are formed by adding the suffix “-zig” at the end of the multiplier digit, except for 10: “zehn” (4), 20: “zwanzig” (7), and 30: “dreißig” (7). There are several tens tied for largest with a length of 7, but we’ll use 90: “neunzig” (7), which is 9: “neun” (4) with the suffix “-zig”.

Tens and units are joined with the “und” (and) word, but the unit is said before the ten. So to make 97, we combine “sieben” und “neunzig” to make “siebenundneunzig” (16).

Hundred (“hundert”) and thousand (“tausend”) are not separated from the other numbers by a space. Like with the tens, the unit is said before the multiplier, so 700 is “siebenhundert” (13) and 797 is “siebenhundertsiebenundneunzig” (29).

Millions (et al) are like hundreds and thousands but they are separated by a space. So now we can make 797,797,797: “siebenhundertsiebenundneunzig Millionen siebenhundertsiebenundneunzigtausendsiebenhundertsiebenundneunzig” (105).

Finnish

Finnish is actually an Uralic language instead of Indo-European. But I’m the one who created the rules, so I’m the one who can break them. And fun fact: no numbers are written with a space in Finnish, so if German made you squint you better get your glasses.

The largest units are 7: “seitsemän” (9) and 8: “kahdeksan” (9).

The tens are formed by adding the “-kymmentä” suffix. When composed with a digit, numbers from twenty-one to ninety-nine are formed by saying the ten, then the digit.

So our largest tens is 80: “kahdeksankymmentä” (17) and we can make 88: “kahdeksankymmentäkahdeksan” (26).

Hundreds (“sataa”), thousands (“tuhatta”), and millions (“miljoonaa”) are written the same, with the unit first then the multiplier. So 800 is “kahdeksansataa” (14) and 888 is “kahdeksansataakahdeksankymmentäkahdeksan” (40)

Now we can make 888,888,888: “kahdeksansataakahdeksankymmentäkahdeksanmiljoonaakahdeksansataakahdeksankymmentäkahdeksantuhattakahdeksansataakahdeksankymmentäkahdeksan(5) (136).

(5) I am not going to fix this, it is hilarious.

Not only is that the longest string (or single word) so far, but it’s also the biggest number.

Welsh

The traditional counting system used in the Welsh language is vigesimal, i.e. based on twenties as in the French numerals for 60-99. It is still used to express age and years, but is replaced by the decimal system for its ease to learn for English-speaking people, so we are going to use that.

There are also some syntactically and phonologically triggered variation in the form of numerals, which I think are interesting so we’ll go through them.

There are masculine and feminine forms of the numbers 2: “dau” (3) and “dwy” (3), 3: “tri” (3) and “tair” (4) and 4: “pedwar” (6) and “pedair” (6).

5: “pump” (4) and 6: “chwech” (6) also have reduced forms when followed directly by a noun: “pum” and “chwe” respectively.

To make things simple, we’re keeping to the masculine numbers. So our largest unit is 4: “pedwar” or 6: “chwech”.

Tens are form by suffixing 10: “deg” (3) to the unit with a space. Note that in 60: “chwe deg” (8) we are using the reduced form of “chwech” because “deg” is a noun. Because of this 40: “pedwar deg” (10) is the largest tens now, and we can make 46: “pedwar deg chwech” (17).

For some reason, the website shows the tens without a space (so “chwe deg” is “chwedeg”), I don’t think this is correct as all the other resources I’ve found on Welsh numbering have the space. And they even say that they’re made the same way as the numbers 11-19, which have spaces.

Anyway, our largest hundreds is 400: “pedwar cant” (11) and we put it before the tens, making 446: “pedwar cant pedwar deg chwech” (29).

Now we can add the thousands (“mil”) and millions (“miliynau”) to make 444,444,446: “pedwar cant pedwar deg pedwar miliynau pedwar cant pedwar deg pedwar mil pedwar cant pedwar deg chwech” (102).

Czech

Our last language is Czech. I’ve tried to pick a few languages that I don’t think my readers would have any experience in so you can’t nitpick me.

The largest units are 1: “jedna” (5), 4: “čtyři” (5), or 9: “devět” (5). The digits one and two also have gendered forms: “jeden/jedna/jedno” and “dva/dvě/dvě” (masculine/feminine/neuter).

Tens are formed by adding ten (“cet/desát”) to the end of the multiplier digit root. But for the first time, we have a number in the teens that is the biggest tens which is 19: “devatenáct” (10), and is made by suffixing “-náct” (-teen).

Hundreds “sto/stě/sta/set”, thousands, “tisíc/tisíce” and millions “milión/miliony/milionů” all follow the same rule, and are put after the multiplier.

This makes 919,919,919: “devět set devatenáct milionů devět set devatenáct tisíc devět set devatenáct” (76).


That’s every language I can be bothered to learn how to count in. But before we wrap things up, I’d like to talk about character encoding.

As we all know, computers do stuff in terms of “0”s and “1”s where a “0” or “1” is called a “bit”. You can have a group of 8 bits to make a “byte”. Generally each, character you see on a screen is actually one byte. This is in accordance to the “American Standard Code for Information Interchange" (ASCII) which is a character set that specifies which byte represents which character.

In ASCII, the binary “01000001” represents an uppercase “A”, and a space character is also only one byte and is “0100000” in binary. The problem with ASCII was that it only included English alphabet characters.

So instead Unicode was created to handle text in most of the world’s writing systems. Specifically most people use UTF-8, which represents each Unicode character in one to four bytes.

And since UTF-8 is supposed to be compatible with ASCII, our uppercase “A” is still only one byte, but the letter “ě” is two bytes and the character “な” is three bytes.

We have a few characters with accents in our strings, so along with counting the length of characters of the string, I’d also like to count number of bytes in the string.

You can do this with a Python REPL and the len function, which (for reasons I’m not bothered to get into) counts the number of bytes in a string by default. I’ve actually been using it this entire time to count, but applying a decode function to the string for it to only count the characters, which looks like this: len('sedmdesát'.decode('utf-8')).

Finally, here’s the table with all the languages’ score in characters and bytes.

Language Number Characters Bytes
English 878,878,878 114 114
French 499,499,499 116 116
German 797,797,797 105 105
Finnish 888,888,888 136 139
Welsh 444,444,446 102 102
Czech 919,919,919 76 84

Admittedly, there’s pretty much no difference,(7) but we can now be confident that Czech is the “Language with the Best Counting (along with all the other languages who count just as good, but this one has the shortest string length in our scoring system)”.

(7) It would be a different story if we included languages with characters always above one byte, like Japanese or Korean.