Oh my goodness. GPT-o1 got a perfect score on my Carnegie Mellon University undergraduate #math exam, taking less than a minute to solve each problem. I freshly design non-standard problems for all of my exams, and they are open-book, open-notes. (Problems included below, with links to GPT-o1's answers.) While eating Pie in the afternoon, I showed the exam to one of our math Ph.D. students (a former International Mathematical Olympiad Gold Medalist from Belarus), and he said "Hmm. Non-Trivial. Good." Our undergraduate students are also very good. This exam was not easy for them, as the score distribution shows. Today is the 2-year anniversary of the public release of GPT-4. Two years ago, it caught my eye because it exhibited sparks of insight, similar to what I would see when I talked to clever kids who learned quickly. That gave me the instinct and urgency to start warning people. Today's observation of GPT-o1 being able to ace my hard college exam, makes me feel like we're close to the tipping point of being able to do moderately-non-routine technical jobs. I was impressed by every student in my class who got a perfect score. The fastest such person took 30 minutes. And GPT-o1 only costs $60 per million words output, which means that each problem cost about 5 cents to solve. A total of around 25 cents, for work that most people can't complete in 1 hour. Problem 1: Consider the recurrence a_n = a_{n-1} + a_{n-2}, with the first initial condition being a_0 = 1. Find all real number values for the second initial condition a_1 such that lim_{n \rightarrow \infty} a_n = 0. https://lnkd.in/eDPH5GWu Problem 2: Find coefficients such that the sequence a_n = n \sqrt{2} + 2^n \pi satisfies the following recurrence, for some initial conditions. You don't need to find the initial conditions. a_k = ___ a_{k-1} + ___ a_{k-2} + ___ a_{k-3} https://lnkd.in/ebAw3uSx Problem 3: Fill in the blanks. The middle entry of the result of: [[0, 0, 1], [1, 0, 0], [2, 3, 4]]^n [[5], [6], [7]] is the term a_n of this recurrence: a_k = ___ a_{k-1} + ___ a_{k-2} + ___ a_{k-3} a_0 = ___ a_1 = ___ a_2 = ___ https://lnkd.in/eZCnsT4n Problem 4: Consider the recurrence with initial condition a_0 = 1, where for each n \in {1, 2, 3, ...}: a_n = \sum_{k=0}^{n-1} a_k Find the generating function f(z) = a_0 + a_1 z + a_2 z^2 + ..., which looks something like f(z) = (1-2z)/(1-z) https://lnkd.in/ezf69TFX Problem 5: Prove that the coefficient of x^{2025} in (x + x^2)^0 + (x + x^2)^1 + ... + (x + x^2)^{2025} is a Fibonacci number https://lnkd.in/ewTmveuK My main work nowadays is to build and scale up a community of people (through education) to face the challenges of the AI age together. I thought I had more years. Now we have to move faster.
Well, these are problems that are anayltic, and they tend to be more suitable for large language model AI. I have been playing around with ChatGPT and find that it absolutely fails at physical systems, even simple ones like pendulums. "Hallucinogenic results", is a good descriptor, as is "confidentally wrong". Even when I ask it to make Python code, it gets it quite wrong, or deletes my code and thus "simplifies" it. It also can't handle code of any useful size. It is a very sharp tool for small well defined problems, but chokes on anything bigger.
It excels at the math competition type problems, but when it has to think the way a grad student would think about a dissertation, the differences are pretty obvious (a few groups' test results so far). I do agree that the time to prepare is now, though. AI will hit human levels in a lot of fields soon, and displaced workers may not find other job options. 70% unemployed and not looking populations don't confer societal stability.
Thanks incredible but sounds like professors need to run their exams through the latest models before giving it out to students. I think we’re going back to in class on paper exams
Students do not need to solve the problems in the AI world. But, they still need skills to find out that the solution given by the AI agent is correct.
All the more reason why schools need to act more urgently around AI literacy esp in K12.
It clearly shows where the world is heading!! Competing with AI is useless... we need to figure out ways to help students develop skills to do soooo much more with the aid of AI. It offers an unparalleled opportunity compared to any other time in human history! The future lies in collaboration between human intelligence and artificial intelligence just the way you are doing it Po-Shen Loh :)
Have you tried other models? I used o3-mini the most given limit on o1, the model sometimes is wrong at solving AMC10 problems.
Game-changer. The implications go beyond academic performance—this brings closer to AI handling moderately non-routine technical tasks in real-world industries. The cost efficiency alone is staggering.
o1 was likely trained on the answers to your exam so not much to be so surprised about. If you had the answers and didn’t have to type with your fingers, you may have had a chance to beat ChatGPT…
A geek who can speak: Co-creator of PiML and MoDeVa, SVP Risk & Technology H2O.ai, Retired EVP-Head of Wells Fargo MRM
4dProbably, we need to stop putting focus on teaching ‘how to do’ but more ‘what to do’ and even more on ‘why.’ Teach students to formulate problems instead of problem solving.