Artificial Intelligence and Machine Learning are getting embedded in the fabric of every product, be it our Daily News, AI-assisted therapy, or Automated investments. As we directly or indirectly delegate more decision-making to AI & ML models, the question arises:
Are the software systems supporting AI & ML models as robust and reliable as they need to be?
In this post, we talk about an often-overlooked but critical issue in AI & ML, one without which we cannot begin to trust models to run our products and businesses. We talk about model versioning.
Building an ML model seems pretty straightforward from the outside: you get some data, you write code, and then you debug until it works, right? Not quite.
Whereas in traditional software development, the goal is to correctly and efficiently implement a known function, in AI & ML, the goal is to learn a function. As a result, ML model development is all about experimentation. A data scientist will process data, create features, train and test the model, and depending on the results, go back and update the data or model. Rinse-and-repeat.
To quantify the number of models built during a typical modeling project, during my Ph.D., we analyzed data from Kaggle competitions. The table below shows the number of submissions made by winning teams across ten different Kaggle competitions.
We found that winning teams routinely build multiple hundreds of models before arriving at the winning model. Moreover, since teams only submit their `best` models, we estimate that the total number of models built by a team is an order of magnitude larger.
What is surprising is that in spite of the iterative nature of modeling, we find that Kaggle teams don’t always think ahead to versioning. They reactively and selectively track their models, often losing the ability to examine how their past experiments performed and inform future experiments.
Without robust tools to version models, data scientists and ML engineers have resorted to many creative, but inadequate, workarounds. Here are a few that might seem familiar.
Many data scientists (myself included) have at some point used folders to track different versions of our models. Multiple blogs (e.g., here, here) build on this method and recommend fixed folder structures for tracking datasets, models, outputs, etc. While this solution might work for individual researchers, it breaks down when data is stored only on a local machine, or when different teams prefer different organizational methods, or when programmatic access to this data is necessary (e.g., find me the model with the highest accuracy for this project)
It’s true that spreadsheets are very powerful and being able to do a quick pivot table is magical. But the heterogeneity of ML experiments cannot be captured in a spreadsheet — the code changes dozens of times, the number of hyperparameters varies almost with every execution, uniquely naming architectures is near-impossible, and, most importantly, each data scientist needs to remember to update the spreadsheet after every experiment. That never happens, just look at team wikis.
The screenshot above is from an actual Kaggle kernel and left me speechless. Data scientists deserve better than having to track versions as comments. A software engineer having to do this would likely quit.
I get this question a lot! I ❤ git and I can’t imagine writing code without it. But code versioning, fundamentally, would solve only part of the model versioning problem. This is because:
Model = Code + Dataset + Config + Environment
Git is fantastic for the first piece, not the rest. You must version code, but you can’t only version code. Figure out a way to version your data. Figure out a way to version your hyperparameters and other config parameters. Figure out environment versioning. Now, you’re getting somewhere.
As models become the new code and run large parts of our products and businesses, we require much more rigorous and robust tooling for models. And the first order of business is to get a model versioning system in place. Not just a patched-up system, but one that is purpose-built for models — one that is designed with the empirical nature of model development in mind and is equipped to handle the interplay between models, data, config, and environment.
At MIT, we built ModelDB, one of the first purpose-built, open-source model management systems. ModelDB takes a model-centric view of the world and provides the ability to track models, hyperparameters, metrics, tags, and pipelines — any metadata you might need about a model. In addition, it provides GUI and API-based access to all model information. Since its release, ModelDB has been used by research groups, AI-first companies, and financial institutions.
At Verta, we have extended ModelDB to provide full model reproducibility, including data, code, config, and environment versioning. Similar to ModelDB, models are first-class citizens in the Verta platform and the system is built expressly to support the full model lifecycle.
Between ModelDB and Verta, we have logged hundreds of thousands of models and counting! Data scientists using model versioning are seeing tangible improvements in productivity due to better visibility, never having to lose work again, and ease of collaboration.
Just like Git and GitHub/GitLab/BitBucket fundamentally changed the way software is developed, so will model versioning systems change the way models are developed.
About Manasi:
Manasi Vartak is the founder and CEO of Verta.ai, an MIT-spinoff building software to enable production machine learning. Verta grew out of Manasi’s Ph.D. work at MIT CSAIL on ModelDB. Manasi previously worked on deep learning as part of the feed-ranking team at Twitter and dynamic ad-targeting at Google. Manasi is passionate about building intuitive data tools like ModelDB and SeeDB, helping companies become AI-first, and figuring out how data scientists and the organizations they support can be more effective. She got her undergraduate degrees in computer science and mathematics from WPI.
About Verta:
Verta builds software for the full ML model lifecycle starting with model versioning, to model deployment and monitoring, all tied together with collaboration capabilities so your AI & ML teams can move fast without breaking things. We are a spin-out of MIT CSAIL where we built ModelDB, one of the first open-source model management systems.
Thanks to Natalie Bartlett, Ravi Shetye, and Shantanu Joshi.