ysabetwordsmith: Cartoon of me in Wordsmith persona (Default)
ysabetwordsmith ([personal profile] ysabetwordsmith) wrote2025-05-18 04:18 pm
Entry tags:

Artificial Intelligence

Professors Staffed a Fake Company Entirely With AI Agents

As Business Insider first reported, the results were dismal. The best-performing model was Anthropic's Claude 3.5 Sonnet, which struggled to finish just 24 percent of the jobs assigned to it. The study's authors note that even this meager performance is prohibitively expensive, averaging nearly 30 steps and a cost of over $6 per task.

Google's Gemini 2.0 Flash, meanwhile, averaged a time-consuming 40 steps per finished task, but only had an 11.4 percent rate of success — the second highest of all the models. The worst AI employee was Amazon's Nova Pro v1, which finished just 1.7 percent of its assignments at an average of almost 20 steps.



While corporations may wish to replace human employees with software, it is not yet feasible for complex tasks.  Only the simplest jobs are really at risk.

lilyhargrave: (Default)

[personal profile] lilyhargrave 2025-05-18 09:24 pm (UTC)(link)
That is somewhat reassuring to know!
mdlbear: blue fractal bear with text "since 2002" (Default)

[personal profile] mdlbear 2025-05-19 08:48 am (UTC)(link)

Another article on the same site says that AI Chatbots Are Becoming Even Worse At Summarizing Data. Why am I not surprised?

gatheringrivers: (Noods)

Re: Well ...

[personal profile] gatheringrivers 2025-05-20 09:03 am (UTC)(link)
...Hapsburg AI.

LOL! I'm going to have to remember that, thank you for putting it that way! :)
gatheringrivers: (Threadweaving)

[personal profile] gatheringrivers 2025-05-19 08:59 am (UTC)(link)
Sadly, it won't stop them from TRYING at least.