Chat GPT, Stats, and Math: What are we going to do?

Along with everyone else freaking out about Chat GPT (John Dickerson did a nice segment on CBS) I thought I might test it out with a few basic questions from classes I teach. This year my focus is on introduction to statistics, so I thought I would start by asking it a few open-ended and computational questions.

Question 1: What is the difference between descriptive statistics and inferential statistics?

Chat GPT's answer:

Descriptive statistics involves summarizing and describing a set of data. It is used to describe the characteristics of a sample of data, such as the mean, median, mode, standard deviation, and other measures of central tendency and dispersion. Descriptive statistics does not involve making conclusions about the larger population from which the sample was drawn.
Inferential statistics, on the other hand, involves using a sample of data to make inferences or predictions about the characteristics of a larger population. It involves using statistical techniques to estimate the probability that an observed relationship or difference in the sample data would occur by chance, and to make decisions about whether the observed relationship or difference is likely to be representative of the larger population. Inferential statistics allows researchers to draw conclusions about a population based on data from a sample, and to test hypotheses about relationships between variables.

Overall this is really good, which I am unsettled by.

Question 2: Use the example of squirrels to describe the difference between descriptive statistics and inferential statistics.

Chat GPT's answer:

Sure! Here is an example of how descriptive statistics and inferential statistics might be used to study squirrels:

Descriptive statistics:
A researcher counts the number of squirrels in a park over the course of a week and records the data. The researcher then calculates the mean, median, and standard deviation of the squirrel counts. These statistical measures are used to describe the characteristics of the sample of data (the squirrel counts).
Inferential statistics:

The researcher wants to determine if there is a difference in the number of squirrels observed in the park on weekdays versus weekends. They collect data on the number of squirrels observed on weekdays and weekends for several weeks and use inferential statistics to test the hypothesis that there is a difference in the mean number of squirrels observed on weekdays versus weekends. The researcher might use a t-test to compare the means of the two samples and determine whether the observed difference is statistically significant, or whether it could have occurred by chance. If the difference is found to be statistically significant, the researcher can conclude that there is a difference in the number of squirrels observed on weekdays versus weekends in the larger population of all days in the park.

I thought I could poke the system into giving me something nonsensical, but this is also really good. The 'Sure!' caught me off guard, it felt a little too-human.

Question 3: The gestation period for dogs is normally distributed with a mean of 61.4 days with a standard deviation of 1.5 days. What is the probability that the gestation period of a dog is more than 62 days?

Chat GPT's answer:

To find the probability that the gestation period of a dog is more than 62 days, you can use the following formula to standardize the data:
z = (x - mean) / standard deviation
Where x is the value of interest (in this case, 62 days), mean is the mean of the distribution (61.4 days), and standard deviation is the standard deviation of the distribution (1.5 days).
Substituting these values into the formula gives:
z = (62 - 61.4) / 1.5 = 0.6 / 1.5 = 0.4
This value represents the number of standard deviations that 62 days is above the mean of the distribution. To find the probability that the gestation period of a dog is more than 62 days, you can use a standard normal table (or a calculator or software that can compute probabilities from the standard normal distribution) to find the probability associated with a z-score of 0.4.
The probability that a z-score is greater than 0.4 is equal to 1 minus the probability that a z-score is less than 0.4. Using a standard normal table (or a calculator), you can find that the probability that a z-score is less than 0.4 is 0.6554. Therefore, the probability that a z-score is greater than 0.4 is 1 - 0.6554 = 0.3446.
This means that the probability that the gestation period of a dog is more than 62 days is approximately 0.3446.

This is a pretty standard question in statistics courses, and the system answered it flawlessly, something other systems (PhotoMath, etc.) struggle with.

Question 4: The gestation period for dogs is normally distributed with a mean of 61.4 days with a standard deviation of 1.5 days. A sample of 25 dogs is taken. What is the probability that the mean gestation period of this sample is more than 62 days?

Chat GPT's answer:

To find the probability that the mean gestation period of a sample of 25 dogs is more than 62 days, you can use the central limit theorem. The central limit theorem states that the distribution of the sample mean will be approximately normal, regardless of the shape of the population distribution, as long as the sample size is large enough.
For a normal distribution, the standard deviation of the sampling distribution of the mean is equal to the standard deviation of the population divided by the square root of the sample size. In this case, the standard deviation of the sampling distribution of the mean is:
standard deviation of the sampling distribution = 1.5 / sqrt(25) = 0.3
This is the standard deviation of the distribution of the sample mean. To find the probability that the mean gestation period of a sample of 25 dogs is more than 62 days, you can use the following formula to standardize the data:
z = (x - mean) / standard deviation of the sampling distribution
Where x is the value of interest (in this case, 62 days), mean is the mean of the population distribution (61.4 days), and standard deviation of the sampling distribution is the standard deviation of the distribution of the sample mean (0.3).
Substituting these values into the formula gives:
z = (62 - 61.4) / 0.3 = 0.6 / 0.3 = 2
This value represents the number of standard deviations that 62 days is above the mean of the distribution of the sample mean. To find the probability that the mean gestation period of a sample of 25 dogs is more than 62 days, you can use a standard normal table (or a calculator or software that can compute probabilities from the standard normal distribution) to find the probability associated with a z-score of 2.
The probability that a z-score is greater than 2 is very small, and can be found to be approximately 0.0228 using a standard normal table (or a calculator). This means that the probability that the mean gestation period of a sample of 25 dogs is more than 62 days is approximately 0.0228.

The difference between this and the previous question is slight, but signals if a student understands the difference between a distribution and a sampling distribution. Chat GPT certainly does, and even cites the CLT!

Question 5: A student wants to determine whether there is a relationship between how many hours they study, and their 10 exam grades for their courses. They run a single-tail hypothesis test and find the test statistic is t = 2.987. Should they reject the null hypothesis at? What should they do with this information?

Chat GPT's answer:

To determine whether the student should reject the null hypothesis, you need to know the degrees of freedom (df) and the critical value of the t-distribution at α = 0.01. The critical value is the point beyond which you would reject the null hypothesis if the test statistic falls in that area of the distribution.
If the df and critical value are known, you can use the test statistic to make a decision about the null hypothesis. If the test statistic falls in the critical region of the t-distribution (i.e., if it is greater than the critical value), you would reject the null hypothesis. If the test statistic does not fall in the critical region, you would fail to reject the null hypothesis.
If the student finds that the test statistic is t = 2.987 and the critical value at α = 0.01 is greater than 2.987, they would fail to reject the null hypothesis. This means that there is not enough evidence to conclude that there is a relationship between the number of hours the student studies and their exam grades.
It is important to note that failing to reject the null hypothesis does not mean that the null hypothesis is true. It simply means that the data do not provide sufficient evidence to reject the null hypothesis. The student may want to collect more data and re-run the hypothesis test to see if the results are consistent.

This is a question I tested out last term, and I'm not totally convinced I should keep it, although Chat GPT did pretty good with the given information. To be clear I was assessing whether a student knew what to do, or what questions to ask, with the given information.

I then tried entering some more basic algebra and calculus questions:

Question 6: Solve 0 = x^2 + 5x + 6

Chat GPT's answer:

To solve the equation 0 = x^2 + 5x + 6, you can use the quadratic formula:
x = (-b +/- sqrt(b^2 - 4ac)) / (2a)
Where a, b, and c are the coefficients of the quadratic equation in the form 0 = ax^2 + bx + c.
In this case, a = 1, b = 5, and c = 6, so the quadratic formula becomes:
x = (-5 +/- sqrt(5^2 - 416)) / (2*1)
This simplifies to:
x = (-5 +/- sqrt(25 - 24)) / 2
Which simplifies to:
x = (-5 +/- sqrt(1)) / 2
Which simplifies to:
x = (-5 +/- 1) / 2
This gives us two solutions: x = (-6) / 2 = -3 and x = (-4) / 2 = -2.
Therefore, the solutions to the equation 0 = x^2 + 5x + 6 are x = -3 and x = -2.

Seems pretty standard compared to PhotoMath and Wolfram Alpha.

Question 7: Find the critical values of f(x) = x^3 - 9x^2

Chat GPT's answer:

The critical values of a function are the values of x at which the function has a local maximum or minimum, or at which the function is not differentiable (i.e., has a vertical tangent).
To find the critical values of a function, you need to find the values of x at which the derivative of the function is equal to zero or is undefined.
The derivative of a function gives the slope of the function at a particular point. If the derivative is equal to zero at a particular point, this means that the slope of the function is zero at that point, which could indicate a local maximum or minimum. If the derivative is undefined at a particular point, this means that the function has a vertical tangent at that point.
To find the critical values of f(x) = x^3 - 9x^2, you can take the derivative of the function:
f'(x) = 3x^2 - 18x
To find the values of x at which the derivative is equal to zero, you can set f'(x) equal to zero and solve for x:
0 = 3x^2 - 18x
This equation can be rewritten as:
3x^2 - 18x = 0
Using the quadratic formula, the solutions to this equation are:
x = (18 +/- sqrt(18^2 - 430)) / (2*3)
This simplifies to:
x = (18 +/- sqrt(324)) / 6
Which simplifies to:
x = (18 +/- 18) / 6
Which gives us two solutions: x = 0 and x = 3.
Therefore, the critical values of f(x) = x^3 - 9x^2 are x = 0 and x = 3.
It is important to note that these are only the values of x at which the derivative is equal to zero. To determine whether these points correspond to local maxima, minima, or vertical tangents, you need to check the second derivative of the function. If the second derivative is positive at a point where the first derivative is zero, this indicates a local minimum. If the second derivative is negative at a point where the first derivative is zero, this indicates a local maximum. If the second derivative is undefined at a point where the first derivative is zero, this indicates a vertical tangent.

Again, not much different than existing systems in terms of finding the critical values, yet Chat GPT also included detailed information about how to use the second derivative which most students don't (wouldn't?) do.

Question 8: Find the definite integral of f(x) = 3x^2-2cos(x) on the interval [0, 10]

Chat GPT's answer:

To find the definite integral of a function on an interval, you can use the following formula:
∫f(x)dx = F(b) - F(a)
Where ∫f(x)dx is the integral of the function f(x) on the interval [a, b], F(x) is an antiderivative of f(x) (i.e., a function whose derivative is f(x)), and a and b are the lower and upper limits of the interval, respectively.
To find the definite integral of f(x) = 3x^2 - 2cos(x) on the interval [0, 10], you need to find an antiderivative of f(x) and evaluate it at the upper and lower limits of the interval.
An antiderivative of f(x) = 3x^2 - 2cos(x) is:
F(x) = x^3 - 2sin(x) + C
Where C is an arbitrary constant.
To find the definite integral of f(x) on the interval [0, 10], you can substitute the upper and lower limits of the interval into the antiderivative and subtract the result at the lower limit from the result at the upper limit:
∫f(x)dx = F(10) - F(0) = (10^3 - 2sin(10) + C) - (0^3 - 2sin(0) + C) = 1000 - 2sin(10)
Therefore, the definite integral of f(x) = 3x^2 - 2cos(x) on the interval [0, 10] is 1000 - 2sin(10).

Not much different than other systems, but the additional text and description of the steps is impressive.

I'm pretty shocked how detailed and specific the system is at answering mathematical questions. A few observations:

Overreliance on the Quadratic Formula, but that behavior isn't much different than most students.
Students who would just copy and paste these responses would be easily identifiable if we compare their discussion forum responses, or other text we are confident they created.
There is quite a bit of added information and context Chat GPT provides that some students may or may not understand how it connects to the prompt.

So what are we going to do about Chat GPT? Some students are likely to use it, and unless faculty have a very good idea of how a student writes, I'm not convinced we could detect its use. One idea for online classes that a colleague of mine (Jennifer Ward) has shared is to have students take an online final as usual, possibly submitting their work or not. Then for a considerable number of points (or as a prerequisite to posting their score) have students explain a random selection of their answers verbally over Zoom.

While you could use oral exams, or even proctoring, I like this approach as it gives faculty a chance to assess what a student understands, students get to know faculty better, and it makes the course about people engaging with a topic synchronously. This last aspect might be the most difficult though, scheduling 20-120 individual meetings for each assessment.

I do want to thank Ryan Watkins for his article Update Your Course Syllabus for chatGPT which has some good thoughts and suggestions on how to deal with this system. Similarly to Chegg, PhotoMath, and other similar systems I see two approaches faculty could take: systematic prevention, and value-based prevention. With systematic prevention faculty and schools build policies, rules, assignments, criteria, etc. that prevent students who want to use these systems. Value-based prevention would be to explain what the value is in not cheating and learning the course material for themselves. The suggestion above about interviewing students about their answers would be systematic prevention, however I see it upholding specific values we want to instill in students: a love of learning, curiosity, etc.

What are your thoughts? How are you going to address Chat GPT in your classes? Are you focused on systematic prevention, but want to move towards value-based prevention? Do you think value-based prevention is rubbish and want to focus on systematic prevention? I welcome your thoughts and appreciate your experience and perspective.

Applied Abstractions

Chat GPT, Stats, and Math: What are we going to do?

Comments

Post a Comment

Popular posts from this blog

Mid-Summer catch-up: Accounting and presenting on BTC

"Why A.I. Isn’t Going to Make Art" by Ted Chiang

Culinary Math and Visual Mnemonics