Cisco Security Leader Sergii Demianchuk on Why AI Slop in Code Is a Supply-Chain Problem, Not a Style Problem

June 27, 2026

A pull request that no one reasoned about is not a style issue. It is an unsigned package entering your build. For most of the last decade, the security industry has worked to answer a single question about the code it ships: where did this come from, and can we trust it. Software bills of materials, dependency attestation, and provenance tracking all exist to make that question answerable. Generative AI has quietly reopened it from a direction nobody planned for. The risk is no longer only the dependency you imported without checking. It is the function a teammate generated, never read, and merged because it looked plausible. The code compiles, the tests pass, and the chain of human judgment that security depends on has a hole in it that no scanner is looking for.

Sergii Demianchuk is a Senior Software Engineering Technical Leader at Cisco with 17 years of experience in network and application security, vulnerability management, secure software development lifecycles, and software bill-of-materials systems. He judged the AI Slop Scan Hackathon from a vantage point most detection work ignores: not whether AI-generated text reads badly, but whether code whose reasoning cannot be traced can ever be trusted in a secure system.

AI Slop Scan, organized by Hackathon Raptors, asked teams to build tools that detect, measure, or mitigate AI-generated low-quality content across code review, documentation, marketplace reviews, and general writing. Forty-three teams shipped. Most of the public conversation about AI slop treats it as an aesthetic complaint, the dull sheen of machine-written prose. Demianchuk reads the code-focused entries through a harsher lens because, in the systems he secures, unreviewed code is not ugly. It is a liability with no owner, and the absence of human reasoning behind it is a measurable loss of trust.

His career has been spent making trust auditable. Intrusion detection, vulnerability assessment frameworks, and secure SBOM work all share one premise: you cannot defend what you cannot trace. “Security has spent twenty years learning to ask ‘where did this come from and who vouched for it,'” he says. “An SBOM is just that question, formalized for dependencies. AI-generated code asks the same question in a place we never had to ask it before, which is the code your own team wrote, or rather, the code your own team accepted without writing. When a developer generates a function and merges it without understanding it, the provenance of that logic is gone. Nobody can attest to why it does what it does. That is not a quality problem. That is a trust boundary with a hole in it.”

Measuring the thing security actually cares about

The submission that aligned most closely with Demianchuk’s frame was Signal-OSS, a scanner for code review that measures the information density of pull requests, commit messages, and code comments rather than guessing at authorship. It does not ask whether an AI wrote a PR. It asks whether the PR carries any signal, the markers of a person who understood the change.

To a security architect, that is the more useful question by a wide margin. “Authorship is the wrong variable, and I am glad a team said so out loud,” Demianchuk says. “I do not care whether a human or a model typed the characters. A senior engineer using a model well produces trustworthy code, and a careless one produces garbage by hand. What I care about is whether there is evidence that someone reasoned about this change, because that reasoning is the thing I audit when something breaks. Signal-OSS measures the presence of that reasoning. In security terms, it is measuring whether a change is attestable. That is exactly the property an SBOM gives you for a dependency, applied to the review itself.”

He extends the point to where it bites in production. A code review is supposed to be a checkpoint where a human vouches for a change. When the review itself is hollow, an approving comment generated to look like diligence, the checkpoint is theater. “The dangerous version of AI slop in code is not the obviously bad PR,” he says. “It is the one with a confident, fluent description that restates the diff and explains nothing, approved by a review that is equally fluent and equally empty. Now you have a change in your codebase that two layers of process pretended to vet, and neither one did. A tool that flags hollow reviews is a tool that finds the exact place your audit trail is lying to you.”

The difference between describing a change and explaining it

That distinction, between a description that restates and one that explains, was the explicit thesis of another entry Demianchuk found pointed at the right target. Showreceipts, from team BharatShowreceipts-DELTA, introduced a metric for separating pull-request descriptions and documentation that merely report a change from those that justify it. Its framing was blunt: slop reports, humans think. A hollow description narrates the diff in prose. A real one explains why this approach, what was rejected, and what could break.

For Demianchuk, that “why” is not documentation hygiene. It is the security-relevant payload. “When I do incident response, the first thing I need is intent,” he says. “What was this change trying to do, and what did the author believe about the system when they made it. A description that just restates the diff gives me nothing, because I can already read the diff. The intent is what is missing, and intent is exactly what an AI-generated description omits, because the model never had any. So a metric that scores whether a description carries genuine reasoning is, whether the team framed it this way or not, scoring whether the change will be auditable after it ships. That is a security property hiding inside what looks like a writing-quality tool.”

He is precise about why this matters more for AI-generated code than for the human-written kind it imitates. A human who writes a thin description usually still holds the reasoning in their head and can reconstruct it. “When the description is AI-generated and the author never read the code, there is no head to go back to,” he says. “The reasoning does not exist anywhere. You cannot recover it in an incident, because it was never created. That is the new failure mode, and it is genuinely new. We are not talking about lazy documentation. We are talking about decisions that were never made by anyone, sitting in production, waiting.”

Provenance for claims, not just code

A third project pushed the provenance idea past code and into evidence. Papyrus, from team OneAbove, audits the reference layer of a document: it verifies that citations actually exist, classifies how they fail, and scores whether evidentiary claims align with their sources. It does not try to detect AI authorship. It checks whether the things a document asserts are backed by what it cites.

Demianchuk treats this as the same discipline applied one level up. “Fabricated citations are the textbook failure of language models, and they are a provenance failure exactly like a forged dependency signature,” he says. “A claim that cites a source that does not exist, or does not say what it is cited for, is an unverified input wearing the costume of a verified one. Papyrus is doing supply-chain verification on assertions instead of packages. In a security context that is not academic. The moment AI-generated reports start informing decisions, in compliance, in threat intelligence, in any place a document becomes an input to action, the question of whether its evidence is real becomes an attack surface. A tool that checks the reference layer is checking provenance, and provenance is the whole game.”

What a security architect would build next

Demianchuk’s reading of the field converges on a set of design principles he would hand to anyone building detection for AI-generated code, drawn from how the security industry already handles trust at scale.

Score attestability, not authorship. The defensible question is whether a change carries evidence that someone reasoned about it, the same property an SBOM provides for a dependency. Whether a human or a model produced the characters is unanswerable and beside the point.

Treat the hollow review as the real threat. An empty PR is visible. A confident, fluent review that vouches for nothing is the failure that survives process, because it is designed to look like diligence. Detection should target the checkpoint that is lying, not just the obvious junk.

Preserve intent, because intent is what incident response runs on. Tools that reward descriptions explaining why a change exists are protecting the one artifact that cannot be reconstructed after the fact when the author never reasoned about the code.

Verify the evidence layer as a supply chain. Citations, sources, and referenced facts are inputs, and unverified inputs are an attack surface. Checking that claims match their sources is provenance work, regardless of what the tool is called.

Assume the trust gap moves inward. The industry learned to distrust external dependencies. The new gap is internal: code a team accepted without authoring. Detection has to look at the code your own people merged, not only the code you imported.

The verdict, and the boundary that just moved

For Demianchuk, AI Slop Scan was a preview of a boundary the security field is going to have to redraw. For two decades the discipline assumed that code written inside the organization carried implicit human judgment, and that the untrusted material came from outside, in dependencies and inputs. Generative tooling broke that assumption quietly. Now the untrusted material can originate inside the team, in changes no human reasoned about, and the old perimeter does not see it.

“We built an entire security culture around the idea that our own engineers understood the code they committed,” he says. “That assumption is no longer safe, and almost no tooling has caught up to that. The slop problem, looked at from a security chair, is the leading edge of it. These teams were building detectors for low-effort content, but the useful ones were really building provenance tools, ways to tell whether the reasoning behind a change exists at all. That is going to matter far beyond hackathons. The first serious breach traced back to a confidently generated, never-reviewed function that everyone assumed someone understood is coming, and when it does, the question on the incident bridge will be the oldest one in security: who vouched for this, and how do we know. The teams that learn to answer that for AI-generated code are building the security tooling of the next decade, whether they know it yet or not.”

AI Slop Scan was organized by Hackathon Raptors, a Community Interest Company supporting innovation in software development. The event challenged 43 teams to build tools that detect, measure, and mitigate AI-generated low-quality content across code review, documentation, marketplace reviews, and general writing. Sergii Demianchuk, a Senior Software Engineering Technical Leader at Cisco with seventeen years in application and network security, served as a judge for the event.