We are in the early days of adoption for generative AI, but senior executives already have high hopes for the technology: 79% of CEOs think that Gen AI will help them increase efficiencies, and 52% see it leading to greater opportunities for growth, according to a recent Deloitte survey. Moreover, a McKinsey Global Survey found that 40% of executives see their organizations upping their investments in artificial intelligence because of Gen AI advances.
Beyond the hype, the risks associated with generative AI are well documented: inconsistent quality, "hallucinations," biased or inappropriate outputs, copyright infringement and security lapses. All these uncertainties are understandably giving organizations pause as they weigh Gen AI opportunities.
But aside from these governance and regulatory issues, executives weighing Gen AI initiatives also need to be aware of the specific operational challenges that make generative AI particularly difficult for many – if not most – organizations to deploy and manage effectively. To that end, here are a couple practical operational considerations to keep in mind as you look at ways to incorporate generative AI into your business.
Version Control
Organizations have different options for how they leverage generative AI, including building and running Gen AI models in-house or taking advantage of third-party models through an API call. In the latter cases, where you're making an API call to an external source for a generative AI model, you'll need to track the evolution of the model from version to version. OpenAI, for example, provides multiple versions of GPT. If you leverage GPT, then every time they release a new version, you'll have to decide whether you want to call the new version, say, GPT4, or stay with a prior version.
You also need to ensure that you are updating the APIs in cases where an external source stops supporting a version. If a vendor gives you a six-month grace period to update to the new version, you need to make sure you switch to the new version in a timely manner or your API calls will start failing. Think about what that could mean for an application like a customer service chatbot that fails without warning because you neglected to make the necessary updates.
We have seen these kinds of challenges play out with core technologies like Kubernetes, which updates approximately every three or four months. Many enterprises struggle to keep up with that pace of updates, even with support for each new release extending over a year. Consequently, lots of organizations wind up running Kubernetes versions that are not supported, creating risks and vulnerabilities.
Similar issues can arise even if you develop your generative AI model in-house, or if you take a hybrid approach of using an external model and then fine-tuning for your specific data. You still need to manage versions because you'll need to go through the fine-tuning process and run all the verifications around the output every time a new version of the model gets released – even if it's your own model.
Infrastructure Requirements
Of course, if you're running generative AI models in-house, you need to be aware of the hardware requirements. These models tend to be very big – bigger than most organizations are used to dealing with – and LLMs require substantial computational resources, both in terms of processing power and memory.
You're most likely going to need to run your Gen AI models on high-end GPUs for speed. Putting aside cost and a potential repeat of the crypto-boom GPU shortage, these high-power processors come with their own set of challenges, like driver management, code version compatibility, and performance optimization to ensure smooth and efficient operations, in addition to their high energy requirements.
Finally, aside from the normal 24/7 availability requirements for customer-facing applications, you also need to ensure that your serving infrastructure can support the kind of real-time response that, say, a Gen AI-powered chatbot requires. LLMs often have inherent latency due to their complex architecture and resource-intensive computations, and minimizing latency to provide real-time or near-real-time responses can be a significant challenge.
In that light, if your experience with machine learning to date has been primarily focused on supporting batch operations, you'll need to consider whether your current infrastructure and tooling will reliably support real-time use cases.
As generative AI continues to evolve and establish a foothold across industries, it's essential to understand and address the operational challenges inherent in the technology. Gen AI offers up tremendous potential for innovation and new value, but only if organizations can address the practical requirements for realizing that potential.
Join Verta CEO Manasi Vartak for a discussion of the emerging best practices for governing the Large Language Models (LLMs) that underpin Gen AI tools like ChatGPT. Register to save your seat for the live event or view the on-demand recording.