Thoughts On Truth In Our Time

Posted: 2025/01/16 | Author: kevinmontrose | Filed under: code, pontification | Tags: ai, technology, truth |3 Comments

Anybody else been thinking about the truth recently? Not “why doesn’t everyone agree with me, the obviously correct person” – but how do people learn something, decide it is true, and how do falsehoods become truths to so many?

Anyone? Anyone? Bueller?

Obviously the 2024 election, shall we say, heightened my concerns but I’ve been mulling this over for a year or so. A couple anec-datum really stood out to me when talking with folks over that time.

One was being told that NYC had banned gas stoves in a blatantly unconstitutional move, which I kinda doubted, in part because I live in NYC and have a gas stove. Best I can tell, this is a garbling of a 2023 NYS budget requirement that buildings built starting in 2026 don’t have gas hookups (though there are numerous exceptions). The median NYC building is something like 90 years old, so calling this a ban is just not true in either a de jure or de facto sense. And yet, there are loads of Google hits for “NYC gas stove ban” asserting it is.

Another was hearing that raising tariffs on Chinese goods would lower grocery prices. This is not how tariffs work, definitionally they impose a tax which raises prices. I’m sure lots of you saw that Google Trends search making the rounds, which suggests this confusion isn’t isolated. Plus the US doesn’t import much food (some light Googling suggests ~15%), and even that’s mostly from Canada and Mexico (which makes sense, since food is often bulky and perishable).

Both of these were in person, group gatherings, with people I had existing relationships with – not randos on the internet. And I want to emphasize, I’m not concerned about whether these are good policies (I have mixed opinions of both) but about the truth-ness of these assertions and how the people making them came to believe them.

Truth Is Expensive, Lies Are Cheap

It is tempting to blame malicious actors, and I think there is something to that, but at scale I think a lot of the problem is economic.

It took me about an hour to write the preceding 3(-ish) paragraphs, most of it spent digging up citations (for things I was already familiar with), not counting the vague musings I’ve had over the past year. It would take about 90 seconds to read it out loud. A typical news segment is 2 minutes long, a Youtube video averages 10 minutes, a podcast averages maybe 40 minutes. Optimistically that’s a 40x ratio of “time spent researching” to “content”.

It’s much cheaper to just make something up. Or copy from someone else, who is also incentivized to make something up.

One of the takeaways from Hbomberguy’s (very long) plagiarism video is how plagiarism is a natural consequence of the drive for ever increasing amounts of “content.” I believe the ease with which falsehoods spread has clear parallels.

While not new to our times (“Lie Can Travel Halfway Around the World While the Truth Is Putting On Its Shoes” is a quote [erroneously] attributed to Mark Twain in the early 1900s), the volume and “democratization” of it is. Some random Jungian psychologist can start a Youtube channel and become shockingly influential, if you paid attention to US politics you probably encountered a feline feces themed Twitter user at least once, if you’re a programmer you’ve probably read a lot of some random ex-Microsofty’s blog and know it’s influence. You’re reading a random current-Microsofty’s blog right now. I wouldn’t say any of this is new but it is different.

Authority And Consensus

This joke is older than some of my co-workers, yet evergreen. (source)

There’s a question I’ve pointedly left unanswered thus far – what makes something true?

This is such a tricky question there’s a whole branch of philosophy about it, epistemology, which is interesting in its own right. Unfortunately, at least as a layperson surveying the field, it doesn’t appear to have many concrete applications. Perhaps a reader will point me at a useful concretization?

For my purposes I assert that, in the real world, what people consider to be true is perceived consensus amongst recognized authorities. I believe this definition can be justified, and provides some useful insights.

In terms of justification, for one I think it matches how we behave in reality. If you feel sick you go to a doctor who diagnoses you. You trust the doctor’s opinion because some authority has certified them as a doctor, and their training (established by a consensus of other doctors) informs the diagnosis. At no point are you, the patient, truly capable of evaluating their work but (at least in general) you will accept their conclusions as true.

This definition also explains my experiences in the introduction. Why did my family/friends/colleagues believe these falsehoods? Because they heard them on the news, or from a candidate from a major political party, or from a trusted person. That a claim was broadcast implies it had some verification behind it (even it didn’t), a mention in a candidate’s speech implies the party as a whole supports it (even if they don’t), and a person you trust is assumed to be applying the same standards recursively (even if, sometimes, they’re just thinking out loud).

It also explains things that aren’t (generally) believed. There are plenty of people who’ll tell you evolution is fake, or the Earth is only a few thousand years old, or aliens built the pyramids, or Taylor Swift’s dating habits are to sway election results. For most claims, it is impractical for the listener to refute or verify them, but because these people are not perceived as authorities their claims are not considered true. We all watched the Higgs boson’s existence become true as authorities (who I am not fit to judge) came to a consensus (that I am not fit to contribute to) that it had been detected. We still don’t know the whole “truth” of dark matter (its existence, its composition, how much, do we need it, do we need it, do we need it) despite many authorities expressing opinions, because there is no consensus amongst them.

A useful insight is that truth is built up iteratively – it’s not that something “is” true, it’s that many people come to an agreement that it is. This relation is also additive, people come to a “truth” for many different reasons – for example (in a scientific context) an initial experiment might not persuade a consensus, but a replication will persuade some more, and a prediction that bore out yet more, and so on.

Another insight is that truth is dynamic – authorities are not eternal, nor are consensuses. I’m just old enough to remember when Wikipedia was not a thing you could cite, but now (at least, according to school age children I interact with) you can use it when its articles cite sources. Psychology has provided lots of examples of overturned consensuses of late with the replication crisis, but it’s hardly unique to them – America’s Dad is rather less trusted than when I was a child, to put it mildly.

There is some resemblance to PageRank in these insights, though popularity and relevance aren’t quite what we’re after with “truth”. Regardless, I do think this definition gives a decent foundation for modeling how truth is actually determined in the real world.

In Which AI Does (Not) Save The Day

I explicitly told WordPress to generate a robot with too many fingers, and yet…

It’s 2025, so you can’t write a blog post in tech without an AI angle – I promise this one is actually relevant.

Also I’m a skeptic of the current AI push, as anyone checking my socials would know.

My experience with coding assistance is that they’re very good at producing variations of introductory material (which is already prevalent online) and very bad at the kinds of things I actually spend my time working on, which is always some niche issue. Maybe they can replace searching for tutorials or documentation, but I doubt they replace programmers. Perhaps Google should be worried – but I’m not. (As an aside, I consider this Apple paper good evidence that there’s no “intelligence” in these algorithms but that’s a topic for another post).

And yet…

LLMs are really interesting. They represent an advancement in natural language processing, even if all the “intelligence” stuff is wishful thinking. Topic modelling, translation, even some of the generative stuff – all massive improvements over prior work, and all runnable on commodity hardware.

How does this relate to truth? Well, if you had a model of truth (in keeping with my definition above) and wanted to classify a claim’s truth-ness in a consistent way (that is, citing your sources) could you do it with LLMs? I decided to find out.

The first stumbling block is that while I do believe my definition of truth is how people actually operate, there’s not a big TruthDB™ to query. But there’s something close (which I’ve sprinkled throughout this post, if you’ve been paying attention) – Wikipedia. Specifically, articles on Wikipedia which have inline citations to websites.

Why just those articles?

Inline citations can be tied to specific claim in articles rather the whole article
Citations can be verified by (a person) checking the link

So my sketched out pipeline was:

Download Wikipedia
Extract all relevant citations and articles
Topic encode all articles
For any claim I wanted to test:
- Topic encode the claim
- Take the top N (I used N = 5) most cosine-similar articles and their citations
- Use an entailment model to check if each statement with citation against the claim
- Spit out the results

For the topic modelling I used sentence-transformers/all-MiniLM-L6-v2, for the entailment model I used google/t5_xxl_true_nli_mixture, and for the glue I used some of the worst Javascript, C#, and Python you have ever seen. It was the holidays, I was in a rush 🤷‍♂️.

So does it work?

I came across this video with six false claims, a few of which were (IMO) testable, like…

In 2012, President Barack Hussein Obama repealed the Smith-Mundt act, which had been in place in 1948. The law prevented the government from putting its propaganda on TV and Radio.

So I ran it through and got… mixed results. The topic extraction does a decent job of finding relevant citations, but the entailment doesn’t meaningfully rank them and doesn’t identify the contradiction in a US President “repealing” a law – a power that office does not have.

As an example, the highest related sections extracted was this one:

The Act was developed to regulate broadcasting of programs for foreign audiences produced under the guidance by the State Department, and it prohibited domestic dissemination of materials produced by such programs as one of its provisions. The original version of the Act was amended by the Smith–Mundt Modernization Act of 2012 which allowed for materials produced by the State Department and the Broadcasting Board of Governors (BBG) to be made available within the United States.

Relevant citation was: https://www.govtrack.us/congress/bills/112/hr4310/text

One other false claim from the same video.

Nazis are socialists.

Here the models did better, selecting many relevant sections and ranking these two as the most “counter” to it.

Nazism, formally National Socialism, is the far-right totalitarian socio-political ideology and practices associated with Adolf Hitler and the Nazi Party (NSDAP) in Germany.

Citations: https://www.bundestag.de/resource/blob/189776/01b7ea57531a60126da86e2d5c5dbb78/parties_weimar_republic-data.pdf & https://www.britannica.com/event/Nazism

By the early 1920s, the party was renamed the National Socialist German Workers’ Party in order to appeal to left-wing workers, a renaming that Hitler initially objected to.

Citation: https://bibliotheques.paris.fr/Monteleson/doc/SYRACUSE/312027/naissance-du-parti-national-socialiste-allemand-les-debuts-du-national-socialisme-hitler-jusqu-en-19?_lg=fr-FR (note my crummy glue scripts failed to realize this was a book citation)

So, in this very limited test, the LLMs did kinda find relevant refutations / citations. I would never just trust the algorithm, you need a human in the loop (as with the Unfriendly Robot we built in a past life), but there’s some promise here.

The interesting thing here isn’t that these are good citations of refutations, but that the process is automated – automation scales, and gets cheaper per unit as it does. Plus I hacked this together in about a day and threw a laptop’s worth of compute at it for maybe a week – and most of that was in processing the Wikipedia dump files.

Can A Better Internet Be Built?

Today things kind of suck:

We are awash in spam and scams
LLMs are supercharging their generation
Disinformation, willful and otherwise, is commonplace
Our institutions are woefully failing to meet the challenges of the above

As I said earlier, I do think a lot of that is explained by the economics of disinformation. But maybe we already have the tools to change those?

It’d have to be something new – I don’t quite think anything that exists today is quite right.

Wikipedia is too narrow and has too much baggage – though I imagine it’d be an important component of bootstrapping something new.
Reddit is more focused on what is popular than what is true.
Twitter is… well, a lot is wrong with Twitter. Community notes seem relevant although it’s too opaque about who is a part of the consensus.
Bluesky (and Mastodon) are Twitter but not a nazi bar, which is commendable but not useful.

I don’t quite know what that new thing would be, but this little exercise did give me some small hope something could be built.

3 Comments on “Thoughts On Truth In Our Time”

Mike Taylor says:

2025/01/17 at 20:37

There’s a question I’ve pointedly left unanswered thus far – what makes something true? […] I assert that, in the real world, what people consider to be true is perceived consensus amongst recognized authorities.

There’s some sleight of hand going on here. You’ve asked one question (what makes something true) but answered a different one (what people consider to be true). This becomes a problem later on when you write “truth is built up iteratively … in a scientific context an initial experiment might not persuade a consensus, but a replication will persuade some more”. But Newton’s laws of motion were always true, long before they had become recognised by a consensus of scientists, and before they were even formulated.
- kevinmontrose says:
  
  2025/01/17 at 20:59
  
  Fair, I could have been more precise there. I do mean “what makes something perceived to be true” – a little wordier.
  
  I am focused on what people think is true (and why they think that), as there’s no real way to assert a “ground truth”.
  
  Like, Newton’s laws of motion aren’t true in a ground truth sense – Relativity supersedes them. And something might (probably will?) supersede Relativity, we just don’t know it yet.
  
  Both have been perceived to be true at various points in time, and how that happens is interesting to think about.
  - Mike Taylor says:
    
    2025/01/18 at 11:32
    
    “Fair, I could have been more precise there. I do mean “what makes something perceived to be true” – a little wordier.”
    
    OK, that makes sense. I think the problem here is that I misunderstood what you were aiming for in the first place.
    
    The case of Newton’s laws is still interesting, though. You’re right, of course, that relativity provides a more precise refinement of those laws; but the discovery of relativity has not made Newton’s laws any less true that than were. They were always exactly as true as they are now — i.e. completely true for almost all practical purposes, and not quite true enough when dealing with astronomical masses and distances.
    
    So it’s still the case that what’s changed is not whether Newton’s laws are true, or even how true they are — only how true they are perceived to be. And as you point out, that is important.

Kevin Montrose