Will Your AI-Built Apps Actually Work? 4 Steps Enterprises Must Take

Dan Faulker is the CEO of SmartBear, helping teams build, test, and ship quality software.gettyAI is creating software code faster than ever, and it is increasingly testing that code, too.That’s not a good scenario for the overall quality of software development. If AI is talking to AI, and it gets any piece wrong, who’s to know? This is a classic black-box problem. And with reports of AI code-gen pretending to pass tests when it hasn’t, it’s an urgent issue for commercial software.One of the key skills in testing is to know what to focus on. An AI code-gen that generates an infinite amount of tests is useless because the signal gets lost in the noise, and in practical terms, the test runs would never complete. So, if development teams are pushed to hit production goals and deploy a lot of AI-generated code that is increasingly tested by the same AI system that produced it, quality will suffer.Testing Applications, Not Just CodeTesting code is only one part of ensuring application integrity, making sure an application actually works as it's intended. Equally important is the application-level testing that takes place after the code-level testing, and that is falling behind as AI code assistants put code creation into hyper speed. This gap exists because AI code-generation tools are not designed to perform commercial-grade application testing. Their strength is in producing and validating code, not in rigorously testing how entire applications behave under real-world conditions.Last Line Of DefenseAs AI accelerates both the volume and speed of development, pressure on quality will rise—and so will application failures unless testing keeps pace. If AI code generation is writing code, reviewing it, generating and executing its own tests, and reporting the results, then application-level testing becomes the final line of defense to ensure systems actually work as intended at scale.In this environment, how can enterprises be confident that applications reaching production will perform as expected? Here are four steps to consider, whether you’re building or buying in an AI-driven development landscape.1. Make quality non-negotiable. The more applications are developed as black boxes through a series of prompts, the more care and scrutiny must be applied to the finished product. QA needs to become a first-class function again.In many industries, organizations are required to demonstrate their quality procedures through audits. While every vendor conducts testing, none can test everything—and most acknowledge this. The reality is that complete testing is impossible, so each vendor’s approach reflects its own risk tolerance and level of QA maturity.As a result, the responsibility increasingly shifts to the application user. Organizations must understand their own risk profile, risk tolerance and risk–benefit tradeoffs when relying on applications to run their businesses and serve their customers.2. Test applications where your users are. Ensure applications perform reliably across browsers, operating systems, devices and under realistic loads in real-world environments. That’s where true application integrity either holds or breaks down.The accelerating pace of AI-driven code and application development makes user-centric testing more critical than ever, as systems can change far more rapidly than before. Research from my company shows that 64% of software and quality assurance experts are concerned that applications aren’t adequately tested across all deployment environments.Without sufficient visibility across development, staging and production, speed can quickly turn into unmanaged risk.3. Increase testing. Find ways to increase application-level testing to keep pace with the velocity of AI code-gen and to ensure adequate testing coverage. This might be autonomous testing by a party outside of the AI-code creation piece of development. 4. Don’t forget the human in the loop. Autonomous testing should not become completely independent overnight. Certain levels of nuance and business insight belong solely to humans. Additionally, humans are better at evaluating what outcomes are needed and whether the application is creating them. Automation plus human expertise results in stronger quality assurance processes. Reaching Full PotentialAI can generate code, but it isn’t sufficient on its own to test it. Paired with stronger QA practices that ensure application integrity, this combination significantly increases the likelihood that AI-driven applications will meet—or exceed—expectations. Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Will Your AI-Built Apps Actually Work? 4 Steps Enterprises Must Take

Other newsrooms on this story

Related reading

Three skills that matter when AI handles the coding

One of the most common reasons that AI products fail? Bad data | Fortune

AI coding is now everywhere. But not everyone is convinced.

The 5 myths of the agentic coding apocalypse

The False Expertise Trap: How To Keep Your Team Thinking In The AI Era

Building an agentic AI strategy that pays off - without risking business failure