Advanced Computing in the Age of AI | Friday, March 29, 2024

AWS’s Web-based IDE for ML Development: SageMaker Studio 

AWS, Azure, Google Cloud, IBM Cloud, Oracle – they’ll all vying to become the dominant force of gravity in the public cloud services market, and among the most fiercely fought over areas of cloud leadership is AI/machine learning enablement. Given that AI’s TAM is roughly ∞* and that the FAANGs are out ahead of everyone on AI expertise, it makes sense they would commercialize the technologies they use and that they’ve developed to attract enterprise AI customers to their platforms.

A centerpiece of AWS’s AI market strategy is SageMaker, a managed service that provides developers and data scientists who aren’t necessarily ML experts with the tools to build, train and deploy ML models. Launched two years ago, AWS has designed SageMaker to lighten the heavy lifting from each step of the machine learning process.

Since its inception, the product suite has been expanded into SageMaker Studio, which AWS CEO Andy Jassy, at the annual re:Invent conference in Las Vegas this week, described as an integrated, web-based IDE (interactive development environment) for machine learning that lets developers collect and store code, notebooks, data sets, settings and project folders in a single setting.

“For the first time, SageMaker Studio starts to pull together the tools that developers are used to using with traditional software (development) – debuggers, profilers, automation, management – into a single pane of glass which can be used to build, train, deploy and manage on machine learning models in a way, which is way easier and way more accessible for even more developers and even more data scientists,” said AWS AI VP Matt Wood, on stage with Jassy.

The product suite components include:

SageMaker Notebooks: Based on open source Jupyter Notebooks used by developers to create and share documents containing live code, equations, visualizations and narrative text, SageMaker Notebooks enable users “to easily create and share Jupyter notebook without having to manage any infrastructure,” according to  Julien Simon, AWS AI and ML evangelist for EMEA, “you can also quickly switch from one hardware configuration to another.”

Jassy said SageMaker Notebooks offer the advance of being paired with elastic compute. Previously, “if it turns out that you need more compute or less compute, you actually have to go and spin up another instance. And then you have to do all the work to transfer the contents from the first notebook to the new notebook, it's just a little bit tedious… So now, you can just spin up a notebook with a click, it happens in seconds. If it turns out that you need more compute than you thought in this notebook, you just tell us the CPU that you want with that notebook.”

SageMaker Experiments: Designed to organize the chaos caused by the highly iterative nature of ML development, this component is intended to help data scientists track and compare thousands of ML jobs, including training, data processing and model evaluation jobs.

“When you're doing machine learning, you're trying all kinds of experiments and you're iterating like crazy across lots of different parameters and dimensions,” Jassy said. “And as you iterate a lot, it creates all these artifacts and they live all over the place,” making it difficult to find and share them.

SageMaker Experiments is designed to capture input variables, parameters and configurations automatically. “Now you can not only browse your active experiments and see them in real time,” he said, “but you can also search for older experiments – by name, by input parameters, by data set user, by algorithm or even the results.”

SageMaker Debugger: Built for complex training issues, Simon said this component “automatically introspects your models, collects debugging data and analyzes it to provide real-time alerts and advice on ways to optimize your training times and improve model quality.”

Jassy said SageMaker Debugger helps developers interpret their models, to lift the fog of ML training caused by working across dozens of parameters. “A lot of time, you don't really know which dimensions are really impacting the model,” he said. “Looking at a trained model is … totally opaque, it's gibberish to the naked eye. People want to have a better idea of what's driving their models, so they can adjust it and fix it and so they can explain it.”

The debugger tool includes a capability called “feature prioritization,” which Jassy said “puts a spotlight on the actual dimensions or features that are having impact on the model,” enabling developers to see what's driving the model, to see which dimensions have been left out of an under-performing neural network model not producing expected predictions, and to detect if the model is overly reliant on a few numbers or dimensions, causing bias.

SageMaker Model Monitor: Another tool designed to open the ML model black box, the SageMaker Model Monitor is built to detect quality deviations – a.k.a. “concept drift” – in models that have been operational for extended periods.

Jassy said concept drift can happen when there’s a change in external conditions – such as the impact of higher interest rates on the real estate market – on which ML model is based. This requires model changes, “but it turns out that the overwhelming majority of models are (extremely) complicated…, and it's really hard to find the concept drift.”

The tool identifies concept drift by creating a set of baseline statistics on the data used to train a model. “Then we analyze all the predictions, compare it to the to the data used to create the model, and then we give you a way to visualize where there appears to be concept drift… You can take charge of that, and figure out how to make adjustments.”

SageMaker Autopilot: This tool is designed to automate model building, including algorithm selection, data preprocessing, model tuning and infrastructure.

“With Autopilot, here's what happens,” Jassy said, “you send us your CSV file with the data you want a model for, or you can just point to the S3 (AWS storage) location. And Autopilot does all the transformation of the models and puts in a format so we can do machine learning. It selects the right algorithm, and then it trains 50 unique models with a little bit different configurations of the variables, because you don't know which ones are going to lead to the highest accuracy. By the way, even if you know how to build machine learning models, having to train 50 models takes quite a bit of time… And then what we do is we give you, in SageMaker Studio, a model leaderboard where you can see all 50 models ranked in order of accuracy.”

* is the symbol for infinity.

EnterpriseAI