AI is creeping into the Linux kernel - and official policy is needed ASAP

Lee Duna@lemmy.nz · 8 months ago

AI is creeping into the Linux kernel - and official policy is needed ASAP

ExLisper · 8 months ago

I guess the policy is that the code is reviewed. What does it matter if it was AI generated or not? If someone submits bullshit AI generated code he will be ignored in the future.

soc@programming.dev · 8 months ago

I would be deeply uncomfortable to work in an environment where one couldn’t ask the author of a change for insights or rationale, because the author let some machine write it and therefore lacks any deeper understanding.

8 months ago

For me it’s grounds to deny a merge request. Can’t explain your code? Then it’s evidently not clear enough. Come back when it is.

Senal@programming.dev · 8 months ago

Volume and Moderation.

Generating slop is significantly quicker.

You get an increase in volume of people pushing slop, which then has to be reviewed. In addition to the increase in submissions you also get the increase in fidelity/general complexity of the submissions.

Reviewing a PR generated by LLM’s used by amateurs is more involved than an equivalent PR written directly by said amateur.

Straight up coding mistakes aren’t most of the issue, it’s the complex architectural and logical bugs that are going to be the problems.

Stuff that’s functional but logically/architecturally unsound is much harder to spot and it’s significantly easier to generate these kinds of issues with an LLM than to write them out by hand.

If someone submits bullshit AI generated code he will be ignored in the future.

Like this for example, a seemingly reasonable functional argument that is relatively logically unsound, in that is focuses on a narrow “happy path” and ignores where the actual issues are.

1 . To get to the stage where you can block this person you need to review the code first and identify if there is an issue.

Doing this for LLM generated code takes longer, on average.

It’s also now possible for people less skilled to generate a higher volume of code that looks more reasonable, so that increases the total amount of reviews needed.

So the existing process of reviewing people and code is now a multiple more difficult and resource consuming.

Which is generally what people want addressed.

Can LLM’s help?, possibly.

Are there issues that are going to become a large resource problem if we don’t actually address them, yes.

ExLisper · 8 months ago

Ok, so you’re suggesting that people are submitting kernel patches that somehow modify the architecture of the kernel/it’s components, that the new architecture is very complex and hard to analyze, that the those architectural changes are part of roadmap and are not rejected right away and that those big, complex architectural level patches are submitted with high frequency. Somehow I doubt all of it.

I think the slop patches are small fixes suggested by some AI code analysis tools, that architectural and complex changes are part of well defined roadmap and don’t come out of nowhere and that code that doesn’t follow conventions is easily spotted and rejected. The linked article talks only about marking the code as AI generated (IMHO useless but harmless) and increasing volume of AI slop patches. The idea that maintainers spend time analyzing complex LLM generated code submitted by random amateurs looking for possible architectural bugs sounds like a fantasy to me.

Senal@programming.dev · 8 months ago

TL;DR;

You asked why it mattered if it’s LLM generated or not, i provided examples where it does matter, nothing you’ve said in your reply seems to refute that so I’ll just assume we’ve agreed on this point.

The rest of this reply is just me replying to your additional arguments.

Ok, so you’re suggesting that people are submitting kernel patches that somehow modify the architecture of the kernel/it’s components, that the new architecture is very complex and hard to analyze, that the those architectural changes are part of roadmap and are not rejected right away and that those big, complex architectural level patches are submitted with high frequency. Somehow I doubt all of it.

I mean, i didn’t say any of that but feel free to doubt a position you just made up.

I think the slop patches are small fixes suggested by some AI code analysis tools.

There’s no reason to believe that LLM usage is limited to small patches.

that architectural and complex changes are part of well defined roadmap and don’t come out of nowhere and that code that doesn’t follow conventions is easily spotted and rejected.

In a well maintained project, sure, ish, but let’s just say you’re right about the plan/roadmap phase.

The spotting and rejection you mentioned are now significantly more time and resource consuming for the reasons i stated in the previous reply.

Also when i used the word architecturally i was referring to the logical domain of the patch and the things it interacts with, i wasn’t implying that LLM’s would get a chance at re-architecting an entire project as large as the Linux kernel.

At least i’d hope not.

The linked article talks only about marking the code as AI generated (IMHO useless but harmless) and increasing volume of AI slop patches.

I’m not sure of the usefulness of this kind of marking in practice, but i can tell you a way in which it might be useful.

The way you need to go about evaluating LLM generated code vs human code can be different.

And before you get on your high horse I’m not saying we shouldn’t be doing a good job reviewing in general, of course we should.

Review and testing resources are limited in most practical settings, we should be focusing on best utilising that resource in the most efficient manner possible.

There are tools specifically geared towards evaluating LLM generated code for specific mistakes, this marking would enable a more efficient usage/allocation of review resources over and above the baseline code-quality tests.

The idea that maintainers spend time analyzing complex LLM generated code submitted by random amateurs looking for possible architectural bugs sounds like a fantasy to me

Which is clear from your answers, if you don’t understand how pull request review works in practice you’re going to struggle to make a coherent argument that requires that understanding.

To answer the statement directly, there’s sometimes no efficient way to tell which patches are from amateurs, even without LLM’s.

The issue isn’t even just relegated to amateurs, i would like to assume a competent dev of any skill level wouldn’t be submitting patches they don’t understand but that’s just not always the case.

and again, think architecture with a ‘little a’ rather than a ‘big A’.

Logical flow and domain understanding in a relatively limited scope, rather than system-wide structural change.

The difference between tactics and strategy.

ExLisper · 8 months ago

Are you Linux kernel contributor?

Senal@programming.dev · edit-2 8 months ago

No.

You ?

edit: If any of my answers made it seem like i was, let me know and i’ll adjust them, that was not my intention.

ExLisper · 8 months ago

No. Let’s wait for someone who knows what they are talking about.

Senal@programming.dev · 8 months ago

You mean like a software developer who has to deal with PR’s from sources that may or may not include LLM generated code ?

If that’s the case, i might know someone…

Wait… unless your original assertion was very specifically about only linux kernel development and not about the principles that apply to software PR review and LLM’s as a whole ?

In that case, i don’t have anyone to hand and you should probably mark it “Active Linux Kernel Contributors Only”.

It’s clearer that way.

communism@lemmy.ml · 8 months ago

The issue is that it’s easy for AI generated code to be subtly wrong in ways that are not immediately obvious to a human. The Linux kernel is written in C, a language that lets you do nearly anything, and is also inherently a privileged piece of software, making Linux bugs more serious to begin with.

The other problem is, of course, you can block someone submitting AI slop but there’s a lot of people in the world. If there’s a barrage of AI slop patches from lots of different people it’s going to be a real problem for the maintainers.

ExLisper · 8 months ago

The issue is that it’s easy for AI generated code to be subtly wrong in ways that are not immediately obvious to a human.

Same with human generated code. AI bug are not magically more creative than human bugs. If the code is not readable/doesn’t follow conventions you reject it regardless of what generated it.

The other problem is, of course, you can block someone submitting AI slop but there’s a lot of people in the world. If there’s a barrage of AI slop patches from lots of different people it’s going to be a real problem for the maintainers.

You don’t need official policy to reject a barrage of AI slop patches. If you receive to many patches to process you change the submission process. It doesn’t matter if the patches are AI slop or not.

Spamming maintainers is obviously bad but saying that anything AI generated in the kernel is a problem in itself is bullshit.

communism@lemmy.ml · 8 months ago

saying that anything AI generated in the kernel is a problem in itself is bullshit.

I never said that.

Same with human generated code. AI bug are not magically more creative than human bugs. If the code is not readable/doesn’t follow conventions you reject it regardless of what generated it.

You may think that, but preliminary controlled studies do show that more security vulns appear in code written by a programmer who used an AI assistant: https://dl.acm.org/doi/10.1145/3576915.3623157

More research is needed of course, but I imagine that because humans are capable of more sophisticated reasoning than LLMs, the process of a human writing the code and deriving an implementation from a human mind is what leads to producing, on average, more robust code.

I’m not categorically opposed to use of LLMs in the kernel but it is obviously an area where caution needs to be exercised, given that it’s for a kernel that millions of people use.

jaykrown@lemmy.world · 8 months ago

It’s about the people. If the AI generated code is subtly wrong, then it’s on the community to test it and spot it. That’s why it’s important to have protocols and testing. The funny thing is you can also use AI to highlight bad code.

CallMeAnAI@lemmy.world · 8 months ago

Slippery slope bullshit. Completely ignoring that humans do all this dumb shit.

tazeycrazy@feddit.uk · 8 months ago

I think this needs to be the policy for all generated content. You can’t stand behind the AI blaming it for its errors. It’s like a CEO of a multinational bank blaming the inturn for a market crash.