Changing the definition of software testing in the AI era

As enterprises move toward understanding and incorporating AI in not only their products and services but how they operate to bring these new, more powerful experiences to market, they will be expected to fundamentally change and transform their operational and execution procedures. The impact of AI on how an enterprise operates and generates business value will be profound, and in a world that is software driven, the act of building software, in addition to the software itself, will be driven by AI.

One of the most interesting impacts of AI to software development is the change in the definition of testing software and certifying that it is ready and acceptable for use by customers and users. AI-driven software is different from traditional software because the quality criteria has very different software testing metrics to ensure that the product or service continues to deliver high levels of quality in the real world and when/if the quality dips, a quality recovery process is launched and the issue resolved. However, this is easier said than done and requires a thoughtful change in how software is tested, quality assured, verified and maintained.

Evolving the 'test discipline'

The purpose and motivation of the test discipline changes fundamentally. The previous gate where a piece of software was certified as ready and done was considered the final quality check. However, in the AI-era, this gate becomes simply a milestone with new successive milestones that track and measure the quality of the AI-enabled software when it is being used by real users. The following are three key differences in how software testing needs to be done in the AI era.

Functional testing to meet business criteria

Testing for AI-driven software needs to ensure that the software continues to respect and adhere to business logic, business constraints and business-relevant KPIs. Akin to floor and ceiling functions, functional testing of AI-driven software should ensure that the software never exceeds or violates key business principles required to protect the enterprise, its employees , its customers and users.

Training bias and assumption validation

An extremely significant impact of the introduction of AI into software development is from any biases and assumptions made in the training data set. Such biases and assumptions can have a dangerous and unexpected impact on the behavior of the software, putting user, customer and employee satisfaction in jeopardy and potentially hurting the enterprise financially and/or through a hit on brand reputation. The testing discipline has to evolve to look out for such biases and assumptions in the training data set and alert the product development team to protect the enterprise from potentially severe ramifications.

AI experimentation during product development

During the product development phase, the goal of the testing discipline is to enable quick and fast integration and iteration over multiple AI models driven by multiple AI technologies, algorithms and configurations. The testing discipline should be able to not only measure the accuracy and quality of various AI models but also ensure that anomalous or unexpected behavior is reported and fed back to the data science team. In addition, the testing discipline needs to develop maturity to run multiple experiments in parallel to try out various models in parallel and use the behavior to select the best performing models.

AI bugs

The testing discipline needs to integrate the finding and reporting of AI bugs into the software development process. This can be a function of specific test cases that look for the boundary/edge conditions but also use random or new data to test the quality of the software leveraging AI. In addition to the quality of the AI model embedded in the software, the impact of the combination of the AI model and business logic on the expected behavior needs to be measured and understood. Given the business logic and the AI model are implemented by separate teams, it is critical that the integration be tested and the behavior of the software clearly documented.

Post-release monitoring

AI-driven software introduces a new paradigm of quality measurement where the behavior of the AI-enabled software can go into the unexpected realm very quickly as the software faces and learns from new data or experiences a market/user shift where the type of users and the environment of usage does not match the assumption made during the training of the embedded AI model. The test discipline needs to ensure that new biases do not creep into the system and neither do existing biases expand and grow to completely derail the software. In addition, the enterprise needs to watch out for unexpected behaviors that can alienate and annoy users triggering mass abandonment of the enterprise products and services.

Impact on traditional test metrics and test processes

As enterprises and their product development teams get used to building and delivering AI-enabled software products, they need to understand the impact of the AI integration on the traditional test metrics. Test metrics are already evolving with the move towards continuous testing, as evidenced by new startups like SeaLights, which is taking a new and more holistic view of metrics, looking at tests across all testing levels.

With AI, metrics will need to be adapted in their measurement and computation to account for the new and more complex testing process required to deliver and continuously maintain the quality of AI-enabled software. It is critical that engineering and data science work hand in hand and work off the same set of test metrics to ensure that both work off the same ground truth and can align and organize to deliver the best AI-enabled software.

This article is published as part of the IDG Contributor Network. Want to Join?

Previous Post
Next Post