Computational Philology

Beyond Good and Evil,
Beyond Translation

Sentence embeddings, five translators, and a philosopher who predicted his own untranslatability 140 years before I could measure it.

I have read Beyond Good and Evil in four translations. Not because I am thorough, but because I kept switching, unsatisfied. Kaufmann felt like a professor standing between me and the text, footnoting away the danger. Hollingdale felt closer, rawer. Zimmern, the Victorian, softened everything that should bite. Each version claimed to be Nietzsche, yet none felt like the same book.

This is not a complaint. Translation is impossible. Every translator knows this. The question is what kind of impossibility you prefer: the careful academic who explains the joke, or the literary stylist who rewrites it for a new audience. Neither is wrong. Both are betrayals.

So I did what seemed natural: I ran the translations through sentence embeddings to see if the machine could detect what I felt. Could NLP quantify the distance between interpretations? Could it identify where translators diverge most?

What I found was stranger than I expected.

"Was sich am schlechtesten aus einer Sprache in die andere übersetzen lässt, ist das tempo ihres Stils..."
— BGE §28

"That which translates worst from one language into another is the tempo of its style..."

In aphorism 28, Nietzsche argues that the tempo of a language, its rhythm and cadence, is rooted in what he calls the "average tempo of its metabolism." This is not metaphor. He means it literally: a language carries the physiological signature of its speakers. German moves differently than French. The translator who captures the words but loses the tempo has captured nothing.

This aphorism showed the second highest divergence among all five translators in my analysis. The passage about untranslatability was itself the hardest to translate consistently. Either this is a beautiful confirmation or a suspicious coincidence. I choose to find it beautiful.

231Aphorisms Aligned
5Translators
0.833Max Fidelity

The Translators and Their Projects

These are not interchangeable renderings. Each translator brought a theory of what Nietzsche was doing and how to make it work in English. Understanding the numbers requires understanding the people.

Helen Zimmern, 1906

The first major English translation, and it shows. Zimmern knew Nietzsche personally, which sounds impressive until you realize she was filtering him through Victorian sensibilities. Her prose is stiff where his is vicious. She writes "one must consider" where Nietzsche spits. The embedding model detects her as the most distant from the German, and from the other English translators. A century of language evolution separates her from Norman, and the machine sees every year.

Walter Kaufmann, 1966

The academic standard for fifty years. Kaufmann's project was rehabilitation: rescuing Nietzsche from the Nazis, proving he was a serious philosopher, not a proto-fascist lunatic. The translation reflects this mission. Careful. Scholarly. Heavily footnoted. Sometimes you feel Kaufmann interpreting before you can interpret for yourself. The embeddings place him close to Hollingdale (0.886 similarity), which makes sense: same era, same academic context, probably reading each other's work.

R.J. Hollingdale, 1973

My favorite, and the embeddings agree. Hollingdale was self-taught, not a professor but a translator by trade. He rendered almost everything Nietzsche wrote. His approach was literary rather than academic: trust the reader to handle Nietzsche raw, without protective footnotes. The prose moves. He sits at the semantic center of all translations, closest to the German (0.833) and closest to everyone else. Whether this makes him "best" depends on what you want. It makes him the faithful middle, the point other interpretations orbit.

Marion Faber, 1998

Oxford World's Classics. Faber aimed for accuracy over style. The result is reliable but rarely surprising. Good for study, less good for feeling the text. She sits between the academic Kaufmann and the literary Hollingdale, a sensible median.

Judith Norman, 2002

The Cambridge edition, co-edited with Rolf-Peter Horstmann. Norman takes interpretive risks the others avoid, updating dated references, modernizing idioms. The result reads more easily but drifts further from the German. The embeddings detect this: she clusters with Zimmern in interpretive distance, though a century apart in time.

The Semantic Centroid

I computed pairwise cosine similarity across all translator embeddings. One name emerged as the gravitational center: Hollingdale.

Similarity to Original German

Hollingdale is not just closest to the German. He is closest to everyone. This does not mean he is "best." It means his translation occupies the semantic middle ground. The others deviate from it in different directions: Zimmern toward Victorian formality, Norman toward modern accessibility, Kaufmann toward academic precision.

What does it mean to be the centroid? Perhaps that Hollingdale made the fewest interpretive choices, stayed closest to the literal while remaining readable. Or perhaps he simply averaged out the possibilities. The machine cannot tell us which. Only that the pattern exists.

The Fingerprint

UMAP projection of 1,386 aphorism embeddings (231 aphorisms across 6 sources) reveals distinct clusters. Each translator leaves a signature. The model can classify who translated what without being told.

Hover over the legend to isolate each cluster. German sits at center. Kaufmann and Hollingdale cluster nearby. Norman and Zimmern drift further.

This is perhaps the most striking result. Translators have fingerprints. Their stylistic choices are consistent enough across 231 aphorisms that a dimensionality reduction algorithm can separate them visually. Whatever "voice" means, it shows up in vector space.

Where They Diverge

Some passages translate consistently across all five versions. Others scatter wildly. The variance tells you where the German underdetermines the English, where translators had to make choices no dictionary could dictate.

§35σ = 0.305

Voltaire, truth-seeking, embedded French phrases

§28σ = 0.288

The meta-aphorism on translation itself

§59σ = 0.281

Human superficiality as survival instinct

§102σ = 0.233

Discovering reciprocated love (very short)

§83σ = 0.226

Instinct and the house fire

The pattern: short aphorisms diverge more. Less context means more ambiguity, more room for interpretive freedom. Passages with embedded French scatter wildly because translators handle code-switching differently. And self-referential passages, where Nietzsche writes about language itself, prove hardest to render consistently.

The Orthography Problem

Nietzsche wrote before the 1901 German orthography reform. My source text uses 19th century spellings that modern embedding models do not fully recognize:

ArchaicModernEmbedding Sim.
giebtgibt~0.52
WerthWert~0.53
TheilTeil~0.55
seynsein~0.48

The model sees "Werth" and "Wert" as different words. I built a normalizer, about fifty substitution rules, that improved German-English alignment by 0.002-0.003 across all translators. Small, but the rankings stayed stable. Hollingdale still wins.

The "th" to "t" shift came in 1901. Words derived from Greek lost their classical spellings. "Theater" kept its "th" because Germans still perceived it as foreign. Language politics, then as now.

The Full Matrix

Hover to see exact similarity scores. The Kaufmann-Hollingdale cluster (0.886) is the tightest. Norman-Zimmern (0.806) are most distant from each other.

What This Actually Measures

I should be honest about the limitations. Sentence embeddings capture semantic similarity in modern multilingual web-text space. They do not capture philosophical fidelity to Nietzsche's conceptual framework. The model learned "meaning" from Wikipedia and Reddit, not from Thus Spoke Zarathustra.

I tested a philosophy-tuned model, fine-tuned on Stanford Encyclopedia triplets. It was better at distinguishing "das Vornehme" (the noble) from "das Gemeine" (the common), concepts central to Nietzsche's aristocratic ethics. But worse at cross-lingual alignment. You cannot optimize for both. The trade-off is real.

So what am I actually measuring? Relative divergence patterns. Where translators cluster and where they scatter. The finding is not which translation is best. It is that interpretive schools exist, that translator fingerprints are real, and that §28's claim about untranslatable tempo shows up in the math.

"Der Wille zur Wahrheit... wer hat uns eigentlich diese Frage gestellt?"
— BGE §1

"The will to truth... who among us has actually posed this question?"

There is something ironic about subjecting Nietzsche to computational analysis. The philosopher who attacked "the faith in opposite values" now has his words projected into a vector space where similarity is a cosine distance. He would probably despise it. Or find it amusing that we keep trying to systematize what resists systematization.

What I Learned

I started this project to see if embeddings could detect translation differences. They can. The more interesting discovery was §28. I did not go looking for it. The variance analysis surfaced it. Nietzsche's meta-commentary about the impossibility of translation emerged as one of the hardest passages to translate consistently.

The man was right. Some things resist translation. The embeddings agree.