Simple rocket science was requested of the new AI. Burned.

Tiera Fletcher scrutinized an AI chatbot’s rocket science.

“That’s accurate, that’s factual,” she thought as she read the AI-generated explanation of “the rocket equation,” one of the most fundamental equations.

She paused as the bot tried to write the rocket equation.

“No,” she said. “Too few variables.”

Rocket With The Fletchers outreach creator Fletcher is a rocket scientist. She reviewed rocketry literature and graphics generated by the latest AI technologies to see if the computer programs could teach people how rockets fly.

Poor results. ChatGPT, OpenAI’s recently launched chatbot, failed to repeat rocketry’s most basic equations in almost every example. Some equation descriptions were also incorrect. Another AI algorithm also failed the assignment. Others could develop rocket engines that looked amazing but would fail tragically if built.

OpenAI declined NPR’s interview request but released a new version with “enhanced factuality and mathematical capabilities” on Monday. NPR’s rapid test showed it had improved, but it still made errors in critical calculations and couldn’t solve simple math problems.

Independent researchers argue these failures, especially in contrast to rocketry’s 50-year usage of computers, show a fundamental weakness that may limit the emerging AI programs: they cannot figure out the facts.

“Some people have a notion that we will overcome the truth problem of these systems by merely feeding them more data,” says AI expert Gary Marcus, author of Rebooting AI.

“They’re missing something more fundamental,” Marcus explains.

Liftoff calculation
Space flight requires computers since the 1960s. An autonomous launch procedure guided astronauts’ Saturn V rockets into orbit. Today, computers fly rockets because they can monitor and change their complicated systems faster than humans.

“We cannot run rockets without computers,” argues MIT rocket scientist Paulo Lozano. Computers help design and test new rockets faster, cheaper, and better. “Computers matter,” he says.

Latest AI programs are remarkable. ChatGPT has been tested by Internet users worldwide since November. Doctors used it to write insurance letters. Buzzfeed announced it would generate tailored quizzes with the program. Colleges and universities fear chatbot cheating.

AI might help with rocket science.

ChatGPT has failed to reproduce even the simplest rocketry concepts. It also botched rocket equations and thrust-to-weight ratios, which quantify a rocket’s flight ability.

After studying six rocketry findings for many minutes, Lozano concluded, “Oh sure, this is a fail.”

Image-generating programs like OpenAI’s DALL•E2 failed. They drew complicated schematics that loosely resemble rocket engines but missing apertures for hot gasses when asked to draw a rocket engine. Midjourney and Stable Diffusion graphics programs also created enigmatic motor designs with pipes that went nowhere and shapes that would never fly.

Sorry, Dave.
According to Hugging Face research scientist Sasha Luccioni, the unusual results show that the new AI’s programming is a significant divergence from rocketry systems utilized for decades. “The computer operates extremely differently,” she says.

Traditional rocket-design and flight computers have all the equations. Programmers thoroughly test computer programs to ensure they behave as planned.

New systems create their own rules. They find patterns in a database of millions or billions of text or images. They then utilize those patterns as rules to create new writing or graphics they anticipate viewers would like.

Results approximate human ingenuity well. ChatGPT has written poetry and songs on VCR peanut butter sandwiches. Luccioni thinks AI like this might help artists create new ideas.

“They generate, they hallucinate, they build new word combinations depending on what they learned,” Luccioni explains.

When the program is requested to write the rocket equation, its limitations become apparent.

She believes it’s replicating physics textbooks it’s been exposed to. It can’t verify its mashed-up text. Its output may contain errors.

If requested to repeat information, the program may produce conflicting results. Luccioni argues the program’s self-training from millions of texts makes it statistically likely to say Paris is France’s capital. It occasionally chooses a different city since it’s trying to guess the next word in the conversation with its human counterpart. (This may explain ChatGPT’s many rocket equations, some better than others.)

Luccioni said these flaws are expected. She claims ChatGPT was designed for writing, not math. It responds well to human input and interacts with people.

“It gets things wrong, because it’s not actually designed to get things right,” explains University of Washington linguistics professor Emily M. Bender, who studies AI systems. “It sounds plausible.”

Bender fears ChatGPT’s linguistic skills and lack of information. Some suggest using ChatGPT to create legal documents and defenses for minor offences. “AI doesn’t know the laws, it doesn’t know what your current circumstance is,” Bender says. “It can piece together training data to build a legal contract, but that’s not what you want.” Given its lack of knowledge, ChatGPT for medical or mental health services could be disastrous.

Accuracy
ChatGPT’s requirements for fact-checking are unknown. Meta, the parent company of Facebook, used an AI system for scientific articles, but it generated fraudulent references and was shut down in days.

Bender questions if there is a simple method to have these computers select only “correct” information since they construct human-sounding prose by statistically analyzing massive databases.

“It can’t be error-free,” she explains.

Luccioni and Bender suggest employing diverse training procedures to teach AI systems will improve them. Researchers are improving that training. Yejin Choi, a University of Washington and Allen Institute for Artificial Intelligence AI researcher, has trained an AI machine using a virtual textbook of approved content. Its new-situation understanding improved.

“Really, beneath the surface, there’s these massive unsaid assumptions about how the world works,” Choi told Short Wave.

Hyper-autocomplete
AI expert Gary Marcus concerns that the public is vastly overestimating these new programs. He argues we’re readily fooled by objects that look human. “These algorithms are essentially autocomplete on steroids,” he adds.

Marcus agrees with Bender that the new systems’ tendency to make mistakes may be so ingrained that they cannot be made more honest. Due to the complexity of self-taught programs, it’s uncertain how to enhance results.

Marcus believes there’s no fundamental theoretical explanation of how they work.

He believes AI may need a more direct technique to determine truthfulness.

He argues we need a new architecture that reasons over facts. “That ought to be there.”

Leave a Comment

Your email address will not be published. Required fields are marked *

*
*