New from me: Truth-Agnostic Chatbots Show the Need for a Search Alternative

Over at CIGI. Is it a problem that search engine companies, whose only job is to return information that people can trust and use, have hitched their wagon to a technology that produces falsehoods?

Yes. Yes it is. If companies won’t take their internet-cataloguing responsibilities seriously, we need to reconsider whether we should leave search responsibilities to the private sector:

Exactly how reckless are these companies being? Think about it in terms of how a search tool usually functions. When a user inputs a search term, Google (or Bing) serves up a series of links deemed to be relevant to the user. Although its algorithm remains a black box, Google Search is based in part on the assumption that the number of links that refer to a specific webpage can serve as a proxy for its authoritativeness. …

Now, consider what it means to put a generative AI chatbot on top of this format. As people, myself included, have pointed out in the three months since OpenAI unleashed ChatGPT on an unprepared world, generative AI has a tendency to generate falsehoods. This is because it is merely a complex auto-complete machine. The text that a GPT (generative pretrained transformer) generates — to call what it produces answers is to insult actual thought — is created by the GPT’s calculations of what the next word is likely to be, based on the texts on which the model was “trained,” itself the product of underpaid behind-the-scenes workers and often in horrific circumstances.

That it’s a machine for creating what can only really be called bullshit (following the definition of American moral philosopher Harry Frankfurt: speech produced with no regard as to whether it is true or not) has become comedically clear in the past several days, with Bing’s GPT producing text that is petulant, threatening, whiny and argumentative, and not at all helpful in serving up the world’s knowledge.

Inserting these chatbots into search introduces an enormous degree of uncertainty and unreliability. It’s tantamount to placing a BS-creation machine between the user and the search results. Google and Microsoft are well aware of how unreliable this tech is. While Google’s gaffe has received most of the attention, Bing has also generated its own share of howlers. And both companies explicitly warn their users that they cannot necessarily trust the output that they, as businesses, are serving them.

It’s audacious: People depend on search engines to find information they can use. Now, these companies are telling users that they can’t necessarily trust the information that they provide. These are not the actions of companies that care about supporting the healthy knowledge ecosystems all societies need to survive and thrive. …

Corporate search’s ChatGPT-driven embrace of generative AI may have exhilarated Microsoft and embarrassed Google, but the rest of us should take the opportunity to reconsider the costs of our information ecosystem. We have entrusted the world’s information to companies that have little regard for the essential service they’re supposed to provide.

Check out the whole piece over at CIGI.

Posted in chatgpt, search | Tagged , , , | Comments Off on New from me: Truth-Agnostic Chatbots Show the Need for a Search Alternative

New(ish) from me: Tech world sees trust as a weakness not a glue

Over at the Toronto Star, I discuss the question of trust and tech, focused on (wait for it…) Large Language Models and ChatGPT. Far too many of the more-optimistic takes on chatbots, say, how they will help people better express themselves, fail to take into account the importance of trust in the production of knowledge, focusing on the words rather than the process:

Consider the hope that ChatGPT might help weaker and marginalized students. In fact, it will almost certainly do the opposite, because it will undermine our trust in these students. …

The value of the essay, or the introductory letter, is inseparable from the process that produces it. It reflects the writer’s thinking. You’re in deep trouble if you can’t trust that something was written by the person who claims to have written it because you can’t be sure they’ve demonstrated their mastery of a subject, or that they are who they claim to be.

In a ChatGPT world I’m going to be more, not less, suspicious of the well-written letter or essay by an applicant from an unfamiliar school or country; more, not less, suspicious of plodding but mostly accurate writing. For those who would save the essay by disaggregating it and grading the parts, know that all writing can be faked if it isn’t happening right in front of you.

What should we do? The first thing is to recognize the problem for the monumental challenge that it is:

Responsible societies place guardrails around innovation processes. Drugs have saved countless lives, but we don’t let Pfizer dump a new compound into our water supply just to see what happens. The smallest academic research projects undergo more vetting than these epoch-altering billion-dollar gambles. At the very least, governments must prevent companies from recklessly testing these technologies publicly without considering their society-altering effects.

When trust is treated as unimportant, bad things happen. Crypto investors learned this the hard way. For the rest of us — students, educators, politicians, citizens — our lesson has just begun.

Posted in chatgpt, data regulation, Digital Regulaton | Tagged , , | Comments Off on New(ish) from me: Tech world sees trust as a weakness not a glue

The three pillars supporting the chatbot craze

Comment of the day, courtesy of Eevee on Mastodon:

remarkable to watch the curve of computing go from “it will do exactly, precisely what you ask of if” to “here’s a few heuristics for less well-defined problems” to “self-driving is good enough, give us billions of dollars” to “we put autocomplete on our search engine to generate a whole fictional website about what you’re looking for but we don’t really know why”

As a description of the general trajectory of mainstream Silicon Valley marketing pitches, that’s pretty much it. As for how we got to the point where search-engine companies think it’s a good idea to put a bullshit-generating autocomplete machine between the user and the information the user’s looking for, I see it as the result of three mutually reinforcing forces: ideological, economic and political.

Ideological: This downward descent toward the mystification of computing is driven by dataism. If you believe that everything can be represented by fundamentally neutral data, and that all you need to explain and predict everything is enough data and enough computing power, then you’re going to believe in crazy things like Artificial General Intelligence, and you’re going to think that Large Language Models will produce valid knowledge. Of course, since data is never neutral, and the world in all its complexity will never be reducible completely to digital data, you’ll also end up (re)inventing phrenology, because data is always partial, and we always end up injecting our biases into knowledge-creation, intentionally or not.

Economic: In a dataist world, bullshit like self-driving and Spicy Autocomplete for search are more valuable as marketing tools than as functioning technologies. Fuelled by dataism, people have shown themselves more than willing to believe that words spewed by a computer are more trustworthy than those linked more directly to actual people. It’s why companies go to such lengths to position themselves as labour-light tech companies, even when it’s the behind-the-scenes labour that makes it all go.

It’s why instructors who would incinerate an academic-paper mill if given the chance think it’s completely fine to use a paper generated via statistical probabilities as a starting point for our students’ education. Chatbot-generated academic papers are empty calories built on correlations. In contrast, papers produced by people working for paper mills may create low-quality work that, like chatbot papers, can be passed off by a student as their own. But they’re created through a recognizable knowledge-creation process.

Both are awful products, but in my years as a student and teacher, I’ve never heard of any class dissecting a paper mill-generated paper.  So, why are some teachers using crappy chatbot-created statistical word amalgamations as starting points for discussion, and not paper-mill papers? The difference is the technology: computers are seen as more authoritative than actual people, and so they get a pass.

Also economically, of course, from a search perspective, if you can keep users from actually visiting the sites you’re plundering for your Autocomplete Answers, then you’ll be able to hoard more of that sweet, sweet ad revenue.

Political: If only there were an institution that was capable of placing binding controls on how tech companies operate, to keep them from experimenting on the general public for the sake of their own bottom line. As I’ve noted elsewhere, if drug companies engaged in the kind of reckless behavour that OpenAI, Microsoft and Google have – not just in search, but in so many other areas (hiya, Google Street View) – they’d be facing massive fines and possibly even jail time. Unfortunately, the libertarian belief that tech and the internet are special and should be subject to minimal regulation, remains a potent force. Because you wouldn’t want to thwart innovation.

Mutually reinforcing

And that’s the problem. Each of these reinforces the other. The people running these companies aren’t just in it for the money; they’re in it for the revolution, no matter how stupid that revolution actually is (see: bitcoin, blockchain, web3). Doing things differently would require that they work against not only their perceived economic interests, but also the ideological belief that the world actually is like a giant computer, and if we only had enough data and a big enough computer, they can crack this nut. That more and more people are embracing dataism as an ideology is a related big problem.

And when you add an unhealthy does of libertarianism to this economic and ideological mix, it becomes harder for society to defend itself against their reckless actions.

Combine the three together – dataism, tech privatization and libertarianism – and you end up with an industry that seems likely to continue pushing bravely toward a dataist future. It’s why, regardless of what happens to Bing and ChatGPT, tech utopianism is likely to persist. Lucky us.

Here’s a picture of a fancy cat:

Posted in chatgpt, machine learning | Tagged , , | Comments Off on The three pillars supporting the chatbot craze

Why it’s a mistake to compare calculators to ChatGPT

Courtesy of one of my morning papers, a discussion of how Canadian universities are reacting to ChatGPT, which includes this quote from University of Calgary Associate Professor of Education Sarah Elaine Eaton:

“There’s a complete moral panic and technological panic going on, and I think we need to take a step back and look at other kinds of tech that have been introduced,” Prof. Eaton said.

“We’ve heard people say things like they think this is going to make students stupid, that they’re not going to learn how to write or learn the basics of language. In some ways it’s similar to arguments we heard about the introduction of calculators back when I was a kid.”

The idea that ChatGPT (and technologies like it — this isn’t a one-app issue) and the calculator are similar enough to allow for useful comparisons is rapidly gaining hold. Sam Altman, the CEO of OpenAI, the for-profit company that has billions invested in the mainstreaming of this tech, has embraced the comparison wholeheartedly:

“Generative text is something we all need to adapt to,” he said. “We adapted to calculators and changed what we tested for in math class, I imagine. This is a more extreme version of that, no doubt, but also the benefits of it are more extreme, as well.”

The ChatGPT-calculator comparison is based on the argument that, just as the calculator automated numerical calculation, ChatGPT has automated the writing and research process. The thinking goes, if we can adapt to one type of automation, we can adapt to this.

But are they comparable? Let’s think it through.

For background, I started elementary school in the late 1970s, so pocket calculators happened on my watch. Although my PhD is in political science, I’ve taken undergraduate calculus courses and graduate-level econometrics (for my Economics MA). As a doctoral student, I’ve TAed a quantitative methods course (i.e., I ran the labs where the students honed their stats skills). And I’m currently teaching our graduate methods course. So I’ve been around mathy stuff for a few decades now.

As for the ChatGPT side of things, my recent CIGI piece builds on research I’ve done over the past several years, building toward a co-authored book that should be published later this year: we just handed in our last round of major revisions.

Let’s start on the calculator side of things. If you need a refresher on how calculators function, check out this Wikipedia page or this explainer. Or this one. But for our purposes, what’s important is that it automates the process of calculation in a predictable way, by translating numbered-key inputs into binary language, and then performing operations on them via logic gates (e.g., AND, OR, NOT).

In its way, it’s not that different from using an abacus. Importantly, it produces consistent and verifiable outputs. And by examining the calculator’s processors to make sure they’re put together properly, we can confirm that our inputs will give us an accurate answer.

Now think about machine learning. ChatGPT’s output is not predictable, nor do its creators fully understand how it gets from point A to point B. That’s kind of what generative AI means. Its output, however, is based on a statistical analysis of texts converted to data points. Ars Technica has a fantastic and accessible write-up about how all this works, which I highly recommend.

Validity and reliability

All this raises important methodological questions about ChatGPT’s use as an input into research — into its use by scholars, students and others who want to create knowledge. One of the key ideas we teach our undergraduates in political science is that methods and indicators must be both reliable and valid. They must be reliable in terms of consistency: results won’t change no matter how many times you use it. To be valid, they have to accurately represent the thing that you want to measure.

Obviously, a properly functioning calculator is both highly reliable and produces valid results: No matter how many times I input “2” “+” “2”, and no matter how many different calculators I use, the output will always be “4”, which by definition is the correct answer.

If we take ChatGPT, or a similar chatbot, seriously as academic tools, its problems with both validity and reliability should become clear. By definition, they should not output the same thing every time for any non-trivial questions: if we wanted a calculator, we’d have a calculator. But also, answers to the same question will conceivably differ across chatbots. All calculators work the same way, but chatbots, like search engines, reflect the idiosyncrasies of their designs, and their designers.

So, how do we know which answers are “correct”? This is a bigger problem for non-calculator questions, because in reality, there may be more than one “correct” answer. (One of the biggest mistakes engineers make is their assumption that because the study of society doesn’t need to be numbers- or statistics-based, it’s easier than the hard sciences. The opposite is true.) The short answer is, we can’t, at least not on the chatbot’s terms. If we are not to take its correlation-based outputs as gospel, we need to evaluate it according to other criteria, whether it’s scientific method or our own unquestioned assumptions about how the world works. In any case, as a method for generating knowledge (which is what research methods are), chatbots are unreliable in the methodological sense.

The issue of validity gets to the question of what, exactly, chatbots are representing. A good survey, for example, will convincingly link its specific questions about, say, annual household income and voting intentions, to larger concepts, such as the role of income and political affiliation. The problem with chatbots is that their source material isn’t the world, but writing about the world. They’re sampling texts, not measuring things as they exist.

That’s a step further removed from reality than we may want to be as researchers. At the very least, the question of validity as it relates to chatbots needs to be considered more directly than I, at least, have seen in ChatGPT-focused discussions.

Automation and assessment

ChatGPT and calculators are very different in terms of how they produce their output, as well as the reliability and validity of their output. Those should be enough for all academics to be very, very concerned about using ChatGPT in their work without subjecting it to far more critical analysis than it’s received.

But it’s the automation of key parts of the educational process that is the real concern for academics, and should be for students and their parents. Does ChatGPT merely automate the writing process, and can we look to the introduction of pocket calculators as a guide to how to ensure that we educators can continue to assess and guide our students’ educational development? Let’s find out!

Calculators, as I’ve noted, allow the user to find the (consistently) correct answer to mathematical questions. As educators, however, we want to teach the process behind the calculation. This is the beating heart of the scientific (and, I would argue, democratic) worldview: That it is good for people to understand how the world works, and not just take it as a given.

As an undergraduate and later a graduate student in the 1990s and 2000s, I actually experienced the transition to software-focused statistics classes. The early part of my academic career featured much more figuring out how to run statistical tests by hand and using rudimentary command-line-based computer programs to analyze my datasets. By the time I graduated with my PhD in 2011, the statistical packages were so much easier to use, just point and click.

Although I certainly remember thinking how easy the #kidstoday had it, never, from secondary school through to my PhD, did my teachers and instructors ever focus on the final result. Instead, they graded the process. The tool: the partial mark. Every step of the way, we had to show our work, and we lost points if we missed a step in our calculations, even if we got the final answer right.

Then again, and this is a question for the punch card and slide rule generation: did teachers ever grade math only by focusing on the answer? I mean, for anything above a Grade 2 level?

So while there may have been a moral panic over the introduction of calculators, it’s important to note that the solution — show your work and grade the process — was there from the beginning.

(And just to be clear, while calculators have advanced to become more like computers, the calculator-chatbot comparison is about the introduction of each technology, not what the technologies in their mature form.)

What about chatbots?

Chatbots automate the writing and research process. Putting aside questions of reliability and accuracy for the moment, for educators this technology poses the same problem as the calculator, in that the answer — the final product — is in many ways the least important part of the equation. Just as math teachers want their students to understand how math works, the point of the essay or any similar written assignment is to teach the student how to research and how to collect and evaluate evidence.

The researching and writing of the essay is the means to the end of teaching the student how to, for lack of a better word, think. Even more importantly, and what separates the calculator from the chatbot, the “correct” answer to an essay is far more indeterminate than with calculator-focused math problems.

That said, this is where a comparison with the calculator is actually useful. It highlights the fundamental pedagogical challenge posed by chatbots: How can we disaggregate written assignments into something amenable to a “part marks” approach? Can this even be done?

For some forms of evaluation, sure. In-class exams function in this way: the student has to write for three hours in front of a witness, without computer aid. Some people have suggested oral exams, but we should recall that teachers are not exempt from the human propensity to be convinced by someone who sounds convincing but knows nothing.

For essays, though, that’s the problem. Many teachers already award part marks for an essay. Students in my class typically have to provide the components of a paper in the run-up to the final essay. So, the research question and argument, the literature review and source list, and so on. These steps effectively disaggregate the essay into its component parts.

The problem, of course, is that chatbots can be used to generate all parts of this disaggregated essay. The value in a literature review or article analysis is in the reading, but a student can just pull that off the shelf from a chatbot. Unlike the initial situation with calculators, any written work that takes place beyond the instructor’s watchful eye, will be suspect. Partial work and partial marks won’t work.

The sad reality is that, if these for-profit companies are allowed to continue on their current trajectory, it will get increasingly more difficult to trust student work produced anywhere but in the classroom or under constant digital surveillance.

Even sadder, and much less widely acknowledged, is the hard reality that we do not currently possess a technology for teaching students how to think and research, or for creating knowledge, that can hold a candle to the written essay. Exams offer a solid check on a student’s knowledge and understanding, but it is in the process of writing and research — of trying to understand a reading rather than just accepting a machine-driven summary output — that creates both understanding and the capacity to create your own knowledge. People may hate the essay for any number of reasons, but there is currently no Plan B.

Thinking through our responses

The mistake made by Prof. Eaton and OpenAI’s CEO is to assume that the automation of different processes are functionally equivalent. This is not the case. Different technologies have different purposes. Different automation processes produce different results.

A thoughtful, intelligent and useful response to the challenge that ChatGPT poses to the education system must first start with an understanding of the essay as a form of technology, including its fundamental purpose, and then consider how ChatGPT’s automation of the writing process either promotes or inhibits these objectives.

A second step would be to recognize that ChatGPT, despite how it is portrayed, including in the Globe article, is not an inevitability. Technologies are human creations, and the direction of their development is shaped by human actions and regulation. One question we might ask ourselves is: should this technology’s development be left to the whims and prejudices of self-interested Silicon Valley billionaires? Or should we as citizens act to defend ourselves from their hubris?

A third step would be to move beyond easy platitudes about how we’ll just replace the essay with something else. The education field is riddled with half-baked ideas, enthusiastically embraced one year only to be forgotten the next (remember MOOCS? I do.). In the face of the reality that universities and instructors are already abandoning the essay as a pedagogical tool, anyone who would downplay the challenge that ChatGPT has the responsibility to offer — right here, right now — a workable solution for how we can continue to do the basic job of educating our students.

ChatGPT is a technology with powerful companies and billions of dollars behind it. It is already causing havoc in the education sector, upending a cornerstone of centuries of scientific education and advancement.

We need to take it seriously. Simply dismissing ChatGPT as a souped-up calculator is highly misleading. It does nobody any favours.

Posted in chatgpt, education, machine learning, Uncategorized | Tagged , , , | Comments Off on Why it’s a mistake to compare calculators to ChatGPT

New from me: Unlike with academics and reporters, you can’t check when ChatGPT’s telling the truth

Over at The Conversation. Please do check it out.

Looking at the comments (protip: never look at the comments), I think it’s important to clarify that my point isn’t that you can trust journalists or academics because they’re academics or journalists. It’s that the method that they follow gives you, the reader, the ability to verify (or refute) their work. Think their sources are dodgy? Is the evidence presented one-sided? By all means, doubt away!

I’m leaving to the side that readers’ own assessment skills may not be the greatest and can lead them down paranoid rabbit holes, but the point’s the same. The reader can get their assessment wrong, just as the researcher can. However, that’s not the fault of the process, but rather the reader’s interpretation of the process. Whether used correctly or incorrectly, the important point is that it’s the presentation of evidence in a particular (scientific) manner that provides the grounds for critiques.

One of the biggest challenges we face in dealing with machine learning and artificial intelligence is that understanding it fully requires that we think about what knowledge is. These are discussions that most people in both the hard and social sciences tend not to be very comfortable with. But just as monetary economists need to understand what money is to do their work, if we’re going to thoughtfully incorporate machine-learning processes into our lives, we’re going to have to understand the different ways we can create knowledge. This very much includes considering how knowledge that is created via the scientific method (show your work, outline your method, test against reality) is fundamentally different from knowledge created statistically by analyzing not the world, but words about the world, by autocomplete functions. One key difference, as I discuss here, is that unlike science-based knowledge, autocomplete knowledge contains no method within itself to confirm, or deny, its authenticity.

That’s a huge problem that cannot be wished away.

Here’s a photo of two fearsome predators in their natural environment.

Posted in chatgpt, machine learning | Tagged , , , | Comments Off on New from me: Unlike with academics and reporters, you can’t check when ChatGPT’s telling the truth