lproven | A little parable about design choice and parochiality (Reply)

As I've doubtless bored people silly with for years, another of my non-computery interests is languages and linguistics. One subsection of this is conlangs: constructed languages, as opposed to natural languages.

There are 2 conlangs that virtually anyone moderately well-read has heard of, even prior to the Lord of the Rings films: Esperanto and Klingon.

There is a pervasive urban myth that winds me up about their relative success: that more people speak Klingon fluently than Esperanto. This is very far off the mark: there are a hundred or so fluent Klingon speakers and several million Esperanto speakers, including several thousand native Esperanto speakers, the children of parents who share no mutual natural language and were thus raised speaking Esperanto.

It is a testament to the success of Esperanto that it is so well-known, even if it is widely regarded as a failure.

What is less well-known is that Esperanto is just one of dozens of conlangs. It succeeded the previous most successful one, Volapük, a late-19th century conlang that at one time had many thousands of speakers.

One of the reasons that it's done so well is that Esperanto is much easier to learn language than any natural language. It is very regular: once you know how to make one word form from another (dog -> dogs, I run -> he runs), you know to to make any.

But there is much in Esperanto that is complex. It has grammatical cases, gender-specific words, and pronouns and a fairly complex system of prepositions.

It is far simpler than Volapük, though, which has the same 4 grammatical cases as German: accusative, nominative, dative and genitive; nouns decline and verbs inflect according to the gender of the speaker and subject and so on.

This is because they were both designed by Europeans for Europeans. They knew English, they knew it didn't really use cases, and they saw the ambiguity and confusion this could cause; and cases are really easy anyway, so put them in, because they make things clearer and less ambiguous.

Europeans knew these were complexities, thus the creation of another conlang, Latino sine flexiones, a regularised form of uninflected Latin designed for use in scientific papers.

But the point is that when all you know is various European languages, including even non-indo-european ones such as Hungarian or Finnish, things like case and inflexion seem obvious. Of course you need to have stuff like this! I have read that it came as a profound shock to Johann Schleyer, the creator of Volapük, to learn that some languages completely did without such things.

Currently, I meet and 2 German girls on a daily basis. One, and a friend of another, have learned much of their English in recent years through living here, after being poor at it at school and hating it.

It profoundly shocks them that native English speakers no longer clearly distinguish between "who" and "whom", for instance. Of course this is important! In German, this is a profound difference. The fact that native Anglophones now ignore it really shakes them. But to us, it's not important.

But the thing is, looking at the structure of conlangs such as Esperanto or Volapük, you can tell that the people that created them had never studied Japanese or Bislama, for instance.

Japanese is famed for its poetry - think of the world-famous haiku style - for its drama (kabuki theatre, noh plays), for its written culture - the pervasive manga and animé that are now profoundly influencing Western art and print and other media - and so on. Everyone knows the Japanese produce books, literature, and a lot of culture that revolves around the written and spoken word.

Well, here are some facts about Japanese that would have blown the minds of the creators of Esperanto and Volapük.

It has no plurals. One dog, two dog, ten dog.

It has no pronouns, by and large. "I am Japanese" -> "Japanese is." "He is Japanese" -> "Japanese is." "They are Japanese" -> "Japanese is." There's a special phrase to indicate that you're talking about yourself, and that is largely it.

It has no articles: no words for "the", "a" or anything like it. "A book" is the same as "the books".

It has no grammatical gender, no cases, little to no inflection, no noun-adjective agreement.

And yet, this is a language that is so very expressive that it created the world's most famously abbreviated form of poetry, an entire verse which in 17 syllables formally must not only cover its subject but also work in a nature reference and a reference to a season.

There are lots of ways in which Japanese works in plenty of complexity, from its 5 to 9 levels of formal politeness to its half a dozen totally separate counting systems for living things, dead things, long thin things, round flat things, etc. etc. (We have some of those, too: you'd never fill a printer with "200 papers", you'd use "200 sheets of paper", or grab "2 or 3 pieces of paper" to write on - different systems for large and small quantities! There are no singular trousers or scissors, only pairs of them, and so on.)

Japanese has lots of prepositions: whereas we have just the 2 of "here" and "there", they have "here-by-me", "there-by-you" and "there-between-us".

But then there is Bislama, South Pacific Pidgin English, now a language in its own right and arguably the world's simplest.

It has no plurals either - you use "ol" (all) for plurals: 1 dog, but "all dogs" for several.

But it has just 2 prepositions and conjugations: "long", which covers "on", "in", "at", "to", "by" and every other permutation; and "blong", which means "of" or "from" and all other such associations.

I could go on, but the point here is quite a simple one.

When all you know are examples of things closely related to what you know, then it is natural to form false conclusions about what is normal and natural and simple and clear. If you built yourset of assumptions on local values, not knowing there are other value systems, then naturally you will think your local values are universal.

Of course when you install a program, you need to know where it is going to be installed, because only then can you associate text files with /usr/local/bin/vi or .doc with /program files/accessories/wordpad.exe. Of course to display something, programs must execute on a machine with a display. Of course a binary needs to be specific to the computer it's going to run on; anything else is wasteful, since you will need lots of copies. Of course a compiler must produce binaries targeted for a native processor, that's what "compiling" means. Of course an OS kernel needs to be compiled down to object code, or else the OS would be so laughably inefficient as to be unusable. Of course you need to use a low-level language to write a kernel in, because a kernel's job is low-level, shuffling memory and so on.

And yet, on classic MacOS, there was a database tracking the current, assumed mutable, location of program files in the filesystem, so there were no paths to binaries; and the database was part of the filesystem, so it imposed virtually no overhead - indeed it saved the time spent searching paths for binaries, as this never happened, indeed, could not.

On BeOS, this was generalised: the whole FS is a queryable database, meaning that the notion of, for instance, a mail storage file is nonsensical; you just store messages as files and you can query the FS using SQL-like queries to search for messages from Bob sent at lunchtime last Tuesday.

This was the inspiration for the Spotlight search engine introduced in Mac OS 10.4 in 2004. The filesystem was extended to index file contents upon write; when Microsoft tried to emulate this, they had to kludge around it with a background indexing process that constantly keeps updating an index, sapping performance. Same user effect, but one implementation is simple and fast, one is complex and slow.

Binaries do not need to be native: on Taos, binaries were compiled down to a platform-neutral idealised virtual machine, and the program loader converted them on the fly at load time to the right format for the CPU in use then. The same single binary worked on ARM, x86, PowerPC and everything else.

Compilers, indeed, do not need to target a specific processor; Taos showed this in the 1980s and today Java and the MS .NET CLR implement just-in-time translation and compilation at load time on hundreds of millions of machines, some of which are very low-powered and resource-constrained.

The Lisp machines showed that a kernel could be written in a very abstract high-level language, given a suitable CPU, and still perform very well. Smalltalk shows that an entire GUI can be written in an interpreted very-high-level language and still perform well; on a more modest level, the entire Sugar GUI of the One Laptop Per Child $100 laptop is written in plain old interpreted Python and it goes like stink on a very very low-end PC, a 400MHz x86 with just a few hundred meg of RAM and a less than a gig of Flash.

If you cast your net wide enough, you can find examples that contradict almost everything in computing which is normal, natural, and bleeding obvious and demonstrate that doing things in a different way can lead to serious advantages and benefits.

Even when a thousand million computers do it one way and have done for 30 years, that doesn't mean it is the right way.

By looking at and learning about other systems which are nothing to do with what you do and use every day, you can learn useful lessons. Also, more saliently, you can learn to spot the bias and assumptions that you never realised you had. And there are massive benefits to doing so.

Esperanto has got millions of speakers because it is much much easier to learn and to understand than any natural language, and this makes it worth the while to learn it. It's less effort to learn Esperanto than any natural language, so you can speak to people who don't know your language with much less effort, even if by learning something hard and complex like English or Mandarin Chinese you could potentially speak to a lot more people. You can learn Esperanto in months; English or Chinese will take many years of effort.

But just imagine. If the people behind Esperanto and its ancestor Volapük had known a bit about Asian languages such as Chinese, Japanese or Bislama, they would have discovered that something like 75% of the complexity of their lovely simple clean un-crufty new international auxiliary "world language" was just unnecessary baggage, carried along by European bias and accustomisation. They could have made something maybe ten times easier, if only they'd bothered to learn a bit more about how other people spoke first, outside of their own continent and its millennia of tradition and culture.

And maybe then Esperanto would have achieved its ambition. Instead of talking on the phone to call centres in Asia where they mangle English, instead of relying on natives who have learned our language when we travel, instead of struggling for years with French and German and Latin at school and never really getting anywhere... instead of all that massive wasted effort, instead of all those failures, we'd all be speaking an auxiliary that we learned with little effort in school.

But no. Because the creators knew what they were doing was correct, because the examples that it was the right, obvious, clear, simple way were all around them. So why bother studying something obscure that a few half-starved savages on some primitive islands on the other side of the world do? If their way had any merit, we'd all be doing it!

It's obvious!