At the inaugural TWIMLCon in San Francisco, I led an unconference session focused on a topic that is on the minds of many companies: whether to build ML infrastructure in-house, whether to buy it, or whether to just leverage open-source.
This unconference session turned out to be most popular with lots of lively discussion. Here are the notes from this session (anonymized for privacy):
Most teams finding themselves in need of ML Infrastructure start by looking at reference implementations of ML Platforms from large tech companies like Uber (Michelangelo) and AirBnb (BigHead). These platforms provide a good rubric of what a typical ML platform might require. However, since none of these implementations are open-source, all teams must decide whether to build, buy, or piece together open-source components of their infrastructure.
The general consensus of the attendees was that current open-source offerings for ML Infrastructure, though progressing rapidly, are not mature enough for most teams (e.g., unlike MySQL for databases or AirFlow for data pipelines.) Either they require an immense amount of setup and hacking or are inadequate in the functionality they offer. In addition, someone from the team must become an expert in that platform and keep up with frequent changes in the platform.
The consensus on building in-house was that building in-house makes sense if your application has a very specialized use case (e.g., prediction latency must be <20 ms, or your models must work with a legacy model serving system.) In this case, a team might spend a long time customizing an off-the-shelf offering, making the Buy decision less worthwhile. In addition, if your setup requires a lot of customization, it is worth building the required competency in-house instead of depending on a third-party.
We took an informal poll on how long it took a team to build their ML Platform in-house. The answers spanned quite a spectrum:
Depending on the complexity of the platform, building in-house can take anywhere from a couple of solid ML Engineers working over a few quarters (small shop) to a dozen engineers and two years (for a serious deployment).
An attendee noted that sometimes it may be unclear what a team requires from their ML Infrastructure. In that case, trying to build a prototype in-house can help identify the gaps and loopholes to better inform the Build vs. Buy decision.
The strongest reason to buy an ML Platform vs. building in-house is that building in-house represents an opportunity cost.
The time that your team spends building ML Infrastructure is the time spent not doing something else, e.g., product features, better instrumentation.
— TWIMLCon Attendee
If building infrastructure is not going to help you differentiate in your business (a low leverage activity), don’t build in-house. Particularly if you’re on a small team, building infrastructure is not the best use of your resources; they could be better spent on product development and servicing customers.
When asked to list the justifications to upper management for buying an ML Platform, attendees listed the following:
First, know what you are looking for. This was probably the point most highlighted by multiple attendees. For instance, have answers to the following questions:
Once you know what you want:
As you might imagine, this process can take several weeks, so plan accordingly.
The biggest and rather surprising piece of advice that was provided was: push your customers on their needs first. Lots of times, the customer is looking to better understand the landscape of tools out there and see what might match their needs.
So helping the customer identify and articulate their needs is extremely valuable.
Offer to partner with customers on trials to identify where your platform might have gaps with respect to their use case. You’d rather know this early only vs. once a trial is done. Communicate clearly what integrations you offer now and are willing to offer in the future.
Articulate accurately what you do and do not do — if a platform does everything, it does nothing.
— Platform Buyer
Vendors are over-promising and under-delivering. If you can do the opposite, you will stand out!
Thank you to TWIMLCon for hosting this Unconference session and to everyone who attended. If I missed anything or if you have any questions, please reach out at manasi@verta.ai.
Unrelated to TWIMLCon, my company works with many teams who have faced the build-vs-buy-vs-open-source dilemma and we have created a cheat sheet of tradeoffs involved in ML Infrastructure.
Submit your email to check it out and feel free to reach out for questions.
About Manasi:
Manasi Vartak is the founder and CEO of Verta, an MIT-spinoff building software to enable production machine learning. Verta grew out of Manasi’s Ph.D. work at MIT CSAIL on ModelDB. Manasi previously worked on deep learning as part of the feed-ranking team at Twitter and dynamic ad-targeting at Google. Manasi is passionate about building intuitive data tools like ModelDB and SeeDB, helping companies become AI-first, and figuring out how data scientists and the organizations they support can be more effective. She got her undergraduate degrees in computer science and mathematics from WPI.
About Verta:
Verta builds software for the full ML model lifecycle starting with model versioning, to model deployment and monitoring, all tied together with collaboration capabilities so your AI & ML teams can move fast without breaking things. We are a spin-out of MIT CSAIL where we built ModelDB, one of the first open-source model management systems.