GitHub AI

Posted on 12 February 2026

In this post I just wanted to document my personal experience with GitHub Copilot, considering it’s one of the more well-funded “high-profile” uses of the technology; it really concerns me when things go this badly. I think people like to hand wave AI and put it in the camp of “it just works”, it’s “magical” or “it’s a new frontier” however, when you boil it down, it’s just software at the end of the day, and it breaks. It takes engineers to build, test and maintain.

What often takes me off guard with an AI system is they seem to break in ways that I, as a software engineer, can’t straightforwardly explain. I don’t have access to the code that runs GitHub Copilot, but in my head I have an idea of how I would put it together with Models and MCP servers.

This uncertainty around how these AI systems break combined with what always comes across as; a lack of proper testing, good software engineering practices and a general push to get AI to do stuff that normal software can do makes me quite concerned for the future. Especially as we are starting to see these systems driving critical infrastructure or important services.

Anyway, on to my GitHub Copilot Experience:

I contributed some PRs to the Zulip messenger project year and mostly forgot about them until I landed on the GitHub home page with the Copilot chat window staring back at me when I was reminded about my PRs. Usually I would navigate to the project, click on PRs and sort my user, however, this time I gave the AI a go with the below prompt, which was my first ever prompt for GitHub Copilot.

The AI chewed on the questions for a while before presenting me with the below response:

Asking AI about my PRs

What is impressive to me is this answer is wrong on three different levels:

The completely wrong PR

I gave up and ended up searching the “old-fashioned way” which returned me the correct PR.

Manual search for PR

What really befuddles me here is I assumed the AI would be calling a “search PR” tool via an MCP server to construct the same query did in order to perform this operation? Or perhaps use a RAG database to find the correct PR? However in both those cases I just don’t understand how it got this simple query so fundamentally wrong.

Anyway, I do not wish to labour the point. I just wanted to document my experience and be able to bring this example up with evidence in the future. I would be very interested in peoples thoughts on this behaviour, so feel free to drop me an email!


Side note: This blog post was published on a Swedish train on the way back from the IKEA museum!

Site Build:
2026-02-16:16:40:18.174
Loading page hits...
🐾 Copyright (C) Tom Cope 2020 - 2026 | All Rights Reserved 🏳️‍🌈
GDPR Notice - This Website does not use cookies.