Computer program reveals surprising shared word etymologies — like “cancer” and “chancellor”…

I find word etymologies fascinating. Every word we speak or write sits unassumingly on the surface of a rich history, sometimes spanning millenia.

A little while ago I bought a book called Dictionary of Word Origins which details the history of thousands of words, and in reading it I’m always delighted to learn about the various historical connections between words, especially when the modern forms of the words have little to do with each other. The book even mentions a few particularly surprising examples of this in its introduction, and to this day I use one of those examples (“bacteria” and “imbecile” are etymologically related!) as a go-to fun fact when the need arises. I’m great at parties.

Recently I realized that with the use of a few publicly available datasets I might be able to write a program that would automatically identify surprising shared word etymologies. After a bit of trial, error, and data-massaging, I was able to produce some results. If you’re interested in the journey there, keep reading. If you just want to see the results, you can jump to the results.

What follows is a hand-picked list of what I found to be the most interesting pairs (or triplets) of words my program produced, along with a brief bit about the actual history that I researched separately.

“Piano” is a shortened form of the Italian word “pianoforte”, which means “soft-loud”. The “piano” part comes from Latin “planus”, meaning “level, flat, even”, and which is also the source of the word “plain” and eventually “plainclothed”.

One of many of the pairs of words in these results that seem obvious once pointed out, “potable” and “poison” both ultimately come from Latin “potare”, meaning “to drink”. “Potare” also gives English the word “potion”, a close cousin of “poison”.

Both of these words derive ultimately from the Latin “ago”, meaning “act”, “do”, “make”, and a bunch of other things.

English “actor” is a short hop away from “ago”, but “coagulate” takes a longer path: “ago” ➔ “cogo” (“collect”) ➔ “coagulum” (“a clot”) ➔ “coagulo” (“to clot”).

Both “estate” and “contrast” ultimately derive from Latin “stare”, meaning (among other things) “stand”.

“Contrast” is a shortened form of the Latin “contrastare” (“contra-” meaning “against”, so “stand against”, which is a literal description of what you do when you compare things).

“Estate” comes to English via the Latin “stare” ➔ “status” (“position, place”), which then gives the English “state” and eventually “estate”.

“Pay” and “peace” are descended from the Latin “pax”, meaning “peace”.

“Pay” takes a slightly longer journey than “peace”, coming from Latin “pacare”, meaning “appease”, as in “appeasing a creditor”. So etymologically, to “pay” someone means to “create peace by settling a debt”.

These words all descend from the Greek “karkinos”, meaning “crab”, which became “cancer” in Latin.

“Cancer” was applied to tumors because the swollen veins around a tumor were said to look like a crab.

“Cancer” had an alternative meaning, “enclosure” (which is, historically, where the meaning “crab” was derived, because of the way a crab’s pincers form a circle). This alternative meaning helped the word evolve into the Latin “cancellus” – a barrier dividing two parts of a building. Applied metaphorically, this eventually became the English “cancel”.

“Chancellor” comes from the Latin “cancellarius”, originally a court official who, wanting to be separated from the public, stood on one side of a cancellus.

I find etymologies like this that have clear physical roots especially fascinating.

“Fantastic” and “phenotype” both descend from the Greek “phainein”, meaning “show”.

The path from “phainein” to “phenotype” is fairly plain, but “fantastic” takes a longer path via Greek “phantos” (“visible”) ➔ Greek “phantazesthai” (“have visions, imagine”) ➔ Greek “phantastikos” (“imaginary, fantastic”) ➔ Old French “fantastique” (“fantastic”).

The leap from a word meaning “imaginary” to a word meaning “fantastic” struck me as odd initially, but apparently it comes from the sense of the word “imaginary” as “unreal”.

Both words ultimately descend from Latin “lex”, meaning “law”.

“Legal” takes a short hop from the Latin “legalis”.

The history of “college” is more complicated – “lex” became Latin “lego” (“choose, appoint”) ➔ Latin “collega” (“partner”, or “one chosen to work with another”) ➔ Latin “collegium” (“group of colleagues”). So a “college” is, etymologically speaking, a group of people chosen to work together.

Historically, this word was often used to refer to a corporation, and only became associated with universities in the past couple hundred years.

“Lien” and “ligament” are descended from the Latin “ligare”, meaning “tie”. Both words have taken relatively short paths to their current English forms.

This is another case that I find so delightful in which a word with a physical meaning (“ligare”) has taken a metaphorical leap to become a modern word (“lien”).

While it seems like “journal” and “journey” should be close cousins, their nearest common ancestor is in fact quite old – the Latin “diurnus”, meaning “daily”.

A “journal” is a book written in to record the day.

A “journey” was historically the distance that could be traveled in a single day. The “in a single day” bit of that has since been lost, leaving “journey” to just mean “travel”.

I never would have picked those two words out of a lineup as having a shared etymological root, but sure enough it sits right there – the “du” in the middle of each word, which ultimately derives from Latin “duco”, meaning “lead”.

“Educate” comes from the Latin “eductus”, meaning to “lead or bring out”, and then the Latin “educare” (“raise, train, mould”). I love the image of education as the process of extruding a refined person out of a base of unrefined material.

“Subdue” comes from the latin “subduco”, meaning “lead under”. Again, a very clear physical description of what the word means – to put beneath you, or bring under control.

