Advances in machine learning (ML) and AI are emerging on a near-daily basis—meaning that industry, academia, government, and society writ large are evolving their understanding of the associated risks and capabilities in real time. As enterprises seek to capitalize on the potential of AI, it’s critical that they develop, maintain, and advance state-of-the-art ML practices and processes that will offer both strong governance and the flexibility to change as the demands of technology requirements, capabilities, and business imperatives change.
That’s why it’s critical to have strong ML operations (MLOps) tooling, practices, and teams—those that build and deploy a set of software development practices that keep ML models running effectively and with agility. Capital One’s core ML engineering teams demonstrate firsthand the benefits collaborative, well-managed, and adaptable MLOps teams can bring to enterprises in the rapidly evolving AI/ML space. Below are key insights and lessons learned during Capital One’s ongoing technology and AI journey.
Standardized, reusable components are critical
Most MLOps teams have people with extensive software development skills who love to build things. But the continuous build of new AI/ML tools must also be balanced with risk efficiency, governance, and risk mitigation.
Many engineers today are experimenting with new generative AI capabilities. It’s exciting to think about the possibilities that something like code generation can unlock for efficiency and standardization, but auto-generated code also requires sophisticated risk management and governance processes before it can be accepted into any production environment. Furthermore, a one-size-fits-all approach to things like generating code won’t work for most companies, which have industry, business, and customer-specific circumstances to account for.
As enterprise platform teams continue to explore the evolution of ML tools and techniques while prioritizing reusable tools and components, they can look to build upon open-source capabilities. One example is Scikit-Learn, a Python library containing numerous supervised and unsupervised learning algorithms that has a strong user community behind it and which can be used as a foundation to further customize for specific and reusable enterprise needs.
Cross-team communication is vital
Most large enterprises have data scientists and engineers working on projects through different parts of the company. This means it can also be difficult to know where new technologies and tools are built, resulting in arbitrary uniqueness.
This underscores the importance of creating a collaborative team culture where communication about the big picture, strategic goals, and initiatives is prioritized—including the ability to find out where tools are being built and evolved. What does this look like in practice?
Ensure your team knows what tools and processes it owns and contributes to. Make it clear how their work supports the broader company’s mission. Demonstrate how your team can feel empowered not to build something from scratch. Incentivize reuse and standardization. It takes time and effort to create a culture of “innersourcing” innovation and build communications mechanisms for clarity and context, but it’s well worth it to ensure long-term value creation, innovation, and efficiency.
Tools must map to business outcomes
Enterprise MLOps teams have a broader role than building tools for data scientists and engineers: they need to ensure those tools both mitigate risk and enable more streamlined, nimble technology capabilities for their business partners. Before setting off on building new AI/ML capabilities, engineers and their partners should ask themselves a few core questions. Does this tool actually help solve a core problem for the business? Will business partners be able to use it? Will it work with existing tools and processes? How quickly can we deliver it, and is there something similar that already exists that we should build upon first?
Having centralized enterprise MLOps and engineering teams ask these questions can free up the business to solve customer problems, and to consider how technology can continue to support the evolution of new solutions and experiences.
Don’t simply hire unicorns, build them
There’s no question that delivering for the needs of business partners in the modern enterprise takes significant amounts of MLOps expertise. It requires both software engineering and ML engineering experience, and—especially as AI/ML capabilities evolve—people with deeply specialized skill sets, such as those with deep graphics processing (GPU) expertise.
Instead of hiring a “unicorn” individual, companies should focus on building a unicorn team with the best of both worlds. This means having deep subject matter experts in science, engineering, statistics, product management, DevOps, and other disciplines. These are all complementary skill sets that add up to a more powerful collective. Together, individuals who can work effectively as a team, show a curiosity for learning, and an ability to empathize with the problems you’re solving are just as important as their unique domain skills.
Develop a product mindset to produce better tools
Last but not least, it’s important to take a product-backed mindset when building new AI and ML tools for internal customers and business partners. It requires not just thinking about what you build as just a task or project to be checked off the list, but understanding the customer you’re building for and taking a holistic approach that works back from their needs.
Often, the products MLOps teams build—whether it’s a new feature library or an explainability tool—look different than what traditional product managers deliver, but the process for creating great products should be the same. Focusing on the customer needs and pain points helps everyone deliver better products; it’s a muscle that many data science and engineering experts have to build, but ultimately helps us all create better tooling and deliver more value for the customer.
The bottom line is that today, the most effective MLOps strategies are not just about technical capabilities, but also involve intentional and thoughtful culture, collaboration, and communication strategies. In large enterprises, it’s important to be cognizant that no one operates in a vacuum. As hard as it may be to see in the day-to-day, everything within the enterprise is ultimately connected, and the capabilities that AI/ML tooling and engineering teams bring to bear have important implications for the entire organization.
This content was produced by Capital One. It was not written by MIT Technology Review’s editorial staff.