Databricks Simplifies Machine Learning Model Management at Scale with MLflow Model Registry
AMSTERDAM and SAN FRANCISCO, Oct. 16, 2019 – Databricks, the leader in unified data analytics, today announced Model Registry, a new capability within MLflow, an open-source platform for the machine learning (ML) lifecycle created by Databricks. The new component enables a comprehensive model management process by providing data scientists and engineers a central repository to track, share, and collaborate on machine learning models. The Model Registry manages the full lifecycle of models and their stage transitions from experimentation to staging and deployment. Since introducing MLflow at Spark+AI Summit 2018, the project has more than 140 contributors and 800,000 monthly downloads making it the leader in ML lifecycle management.
“Everyone who has tried to do machine learning development knows that it is complex. The ability to manage, version and share models is critical to minimizing confusion as the number of models in experimentation, testing and production phases at any given time can span into the thousands,” said Matei Zaharia, co-founder and CTO at Databricks. “The new additions in MLflow, developed collaboratively with hundreds of contributors, are enabling organizations worldwide to improve ML development and deployment. With hundreds of thousands of monthly downloads, we are encouraged that the community’s contributions are making a positive impact.”
Databricks’ MLflow offering already has the ability to log metrics, parameters, and artifacts as part of experiments, package models and reproducible ML projects, and provide flexible deployment options within the platform or any cloud inference services or containers. The MLflow Model Registry builds on these capabilities by allowing organizations to collaborate on models and optimize the development lifecycle of ML models as they move from being logged into actual deployment through:
- One collaborative hub: The MLflow Model Registry facilitates the sharing of expertise and knowledge about building and deploying machine learning models across development teams by making models more discoverable, and providing collaborative features to jointly improve on common ML tasks.
- Flexible CI/CD pipelines: MLflow Model Registry allows teams to remain in control of machine learning models by either automatically transitioning a model into production based on predefined conditions, or manually controlling and validating lifecycle stage changes for your models from the experimentation phase, to testing and production.
- Visibility and governance: Large enterprises often have thousands of ML models in the experimentation, testing, and production phases at any point in time. The MLflow Model Registry provides full visibility and enables governance of each by keeping track of model history and managing who can approve changes.
The Model Registry is available on Databricks and provides the benefits of its Unified Data Analytics Platform including enterprise-level security, scale, and fine-grained access controls. As part of the MLflow open source offering, the Model Registry component is now also available to the open source community through GitHub. For more information on MLflow, visit www.databricks.com/mlflow.
Databricks helps data teams solve the world’s toughest problems. As the leader in Unified Data Analytics, Databricks helps organizations make all their data ready for analytics, empower data-driven decisions across the organization, and rapidly adopt machine learning to outpace the competition. The company’s global customer base has thousands of organizations including Comcast, Shell, Expedia, and Regeneron. Databricks is venture-backed and founded by the original creators of popular open source projects, including Apache Spark, Delta Lake and MLflow. To learn more, follow Databricks on Twitter, LinkedIn and Facebook.