

Been a few months since I used co-pilot, but they use a model that’s worse than GPT-4/4o which is a big step down from the reasoning models.
Try out Cline, aider, or one of the tools devs actually use with the latest models from Anthropic/Google/OpenAI.
https://aider.chat/docs/leaderboards/
Didn’t look through all the issues but there were things like
The agent was blocked by configuration issues from accessing the necessary dependencies to successfully build and test. Those are being fixed and we’ll continue experimenting.
Been out less than a week, let’s see how it’s doing in a year.
Not sure what you mean, we are seeing results at an increasing pace if anything. A lot more complexity going into it than ‘increasing text/GPUs’ though.
https://arcprize.org/leaderboard
AlphaEvolve recently achieved what you are after.
AlphaEvolve discovered a new scheduling heuristic for Google’s Borg cluster management system, recovering an average of 0.7% of global compute resources that were previously stranded due to resource fragmentation.
Google’s annual capital expenditures in the tens of billions, this efficiency translates to hundreds of millions of dollars saved annually