Would be karma for all the unnecessary flights we have taken as a species.
In particular anyone who does 'mileage runs' and emits huge amounts of CO2 just so they have the 'privilege' to sit in a slightly nicer chair in a dull airport lounge.
>In particular anyone who does 'mileage runs' and emits huge amounts of CO2 just so they have the 'privilege' to sit in a slightly nicer chair in a dull airport lounge.
I doubt anyone is doing this? At best they're grinding out flights so they can get free first/business class seats later.
People do this to meet minimum requirements for mileage tiers, e.g. I know someone who was close to Diamond status on Delta and went to Miami and back without leaving the airport area just for the miles.
Look on the Flyertalk BA forum 2005-2020. Was a huge thing and not always for upgrades, because BA have been stingy with upgrades for a long long time. Lounge access/baggage/priority boarding etc was a huge part of it
Popular mileage runs were London to Honolulu with lots of sectors on the way iirc !!
This explains why China's defense capabilities are outpacing the west in 2026. The defense behemoth who castrates users by denying them the all-powerful TrackPoint will be doomed to irrelevance very soon.
100% agreed. I wish someone would make a test for how reliably the LLMs follow tool use instructions etc. The pelicans are nice but not useful for me to judge how well a model will slot into a production stack.
At first when I got started with using LLMs I read/analyzed benchmarks, looked at what example prompts people used and so on, but many times, a new model does best at the benchmark, and you think it'll be better, but then in real work, it completely drops the ball. Since then I've stopped even reading benchmarks, I don't care an iota about them, they always seem more misdirected than helpful.
Today I have my own private benchmarks, with tests I run myself, with private test cases I refuse to share publicly. These have been built up during the last 1/1.5 years, whenever I find something that my current model struggles with, then it becomes a new test case to include in the benchmark.
Nowadays it's as easy as `just bench $provider $model` and it runs my benchmarks against it, and I get a score that actually reflects what I use the models for, and it feels like it more or less matches with actually using the models. I recommend people who use LLMs for serious work to try the same approach, and stop relying on public benchmarks that (seemingly) are all gamed by now.
Would you be willing to give a rough outline of one or a few test cases? I am having a bit of a hard time imagining what and how you are testing. Is it like "change the signature of function X in file @Y to take parameter Z" and then comparing the result with what you expect?
So why is Claude not cheaper than ChatGPT? Why won't they let me remove my payment info afterwards? Most other platforms like Steam let you do that. I don't want my shit sitting there waiting for the inevitable breach.
Everything is perception though. You are looking at this with your own perception, biases, and heuristics just like everyone else. There is no 'right' way to hire.
You’re right, but on the other hand once you have a basic understanding security, architecture, etc you can prompt around these issues. You need a couple of years of experience but that’s far less then the 10-15 years of experience you needed in the past.
If you spend a couple of years with an LLM really watching and understanding what it’s doing and learning from mistakes, then you can get up the ladder very quickly.
I find that security, architecture, etc is exactly the kind of skill that takes 10-15 years to hone. Every boot camp, training provider, educational foundation, etc has an incentive to find a shortcut and we're yet to see one.
A "basic" understanding in critical domains is extremely dangerous and an LLM will often give you a false sense of security that things are going fine while overlooking potential massive security issues.
Somewhere on an HN thread I saw someone claiming that they "solved" security problems in their vibe-coded app by adding a "security expert" agent to their workflow.
All I could think was, "good luck" and I certainly hope their app never processes anything important...
Found a problem? Slap another agent on top to fix it. It’s hilarious to see how the pendulum’s swung away from “thinking from first principles as a buzzword”. Just engineer, dammit…
But if you are not saving "privileged" information who cares? I mean think of all the WordPress sites out there. Surely vibecoding is not SO much worse than some plugin monstrosity.... At the end of the day if you are not saving user info, or special sauce for your company, it's no issue. And I bet a huge portion of apps fall into this category...
> If you spend a couple of years with an LLM really watching and understanding what it’s doing and learning from mistakes, then you can get up the ladder very quickly.
I don't feel like most providers keep a model for more than 2 years. GPT-4o got deprecated in 1.5 years. Are we expecting coding models to stay stable for longer time horizons?
reply