Skip to main content

Command Palette

Search for a command to run...

🚀 Deploy ML Models Using TensorFlow & AWS SageMaker

From Notebook to Scalable Production in the Cloud

Published
•3 min read
🚀 Deploy ML Models Using TensorFlow & AWS SageMaker
B

Forward-thinking IT Operations Leader with cross-domain expertise spanning incident & change management, cloud infrastructure (Azure, AWS, GCP), and automation engineering. Proven track record in building and leading high-performance operations teams that drive reliability, innovation, and uptime across mission-critical enterprise systems. Adept at aligning IT services with business goals through strategic leadership, cloud-native transformation, and process modernization. Currently spearheading application operations and monitoring for digital modernization initiatives. Deeply passionate about coding in Rust, Go, and Python, and solving real-world problems through machine learning, model inference, and Generative AI. Actively exploring the intersection of AI engineering and infrastructure automation to future-proof operational ecosystems and unlock new business value.

Gradient Descent Weekly — Issue #10

Training a model in TensorFlow is the easy part.
Getting it into production without becoming a DevOps engineer? That’s the game.

In this issue, we’ll walk through the real-world approach to deploying TensorFlow models using AWS SageMaker — step by step, with clarity, context, and command-line confidence.

Whether you’re a solo dev or scaling for enterprise, SageMaker offers a powerful managed service that handles:

  • Model hosting

  • Auto-scaling

  • Monitoring

  • Versioning

  • A/B testing

Let’s turn that .h5 or SavedModel into a production-ready endpoint.

đź§± Why TensorFlow + SageMaker?

ToolWhat It Handles
TensorFlowModel development, training, export
SageMakerHosting, scaling, monitoring, and inference

This duo gives you a full-stack MLOps story:

  • Build in TensorFlow

  • Train locally or in SageMaker

  • Deploy in a few lines of code

  • Integrate seamlessly with AWS ecosystem (S3, CloudWatch, Lambda, etc.)

🛠️ Step-by-Step: Deploying TF Models with SageMaker

🔹 Step 1: Train and Save Your TensorFlow Model

Train your model as usual:

model.fit(X_train, y_train, epochs=10)
model.save("my_tf_model")

This creates a TensorFlow SavedModel directory with all the files SageMaker needs.

🔹 Step 2: Upload Your Model to S3

SageMaker requires models to be hosted in Amazon S3.

import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.tensorflow import TensorFlowModel
import os

s3 = boto3.client('s3')
bucket_name = 'my-ml-model-bucket'
model_dir = 'my_tf_model'

# Upload directory
s3_path = f"s3://{bucket_name}/{model_dir}"
!aws s3 cp --recursive {model_dir} {s3_path}

🔹 Step 3: Deploy Using TensorFlowModel

from sagemaker.tensorflow import TensorFlowModel

role = get_execution_role()

tf_model = TensorFlowModel(
    model_data=s3_path,
    role=role,
    framework_version="2.12",
    sagemaker_session=sagemaker.Session(),
    entry_point="inference.py",  # optional if using default handler
)

predictor = tf_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.large"
)

âś… Tip: Use inference.py to define custom input/output preprocessing and postprocessing logic.

🔹 Step 4: Send Real-Time Predictions

input_data = {
    "instances": [[5.1, 3.5, 1.4, 0.2]]
}
response = predictor.predict(input_data)
print(response)

Boom—you now have a fully managed inference endpoint in AWS.

đź§  Behind the Scenes: What SageMaker Does

  • Provisions an EC2 instance

  • Loads your TensorFlow model from S3

  • Exposes a secure REST endpoint

  • Scales up/down based on load (if enabled)

  • Automatically handles retry, timeout, and failover logic

  • Monitors with CloudWatch

No infrastructure glue required. You write ML, AWS handles ops.

⚙️ Optional Enhancements

âś… Enable Auto Scaling

predictor.endpoint_name  # use this in autoscaling config

client = boto3.client('application-autoscaling')

client.register_scalable_target(
    ServiceNamespace='sagemaker',
    ResourceId=f'endpoint/{predictor.endpoint_name}/variant/AllTraffic',
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    MinCapacity=1,
    MaxCapacity=5,
)

âś… Add Monitoring with SageMaker Model Monitor

Catch:

  • Input drift

  • Prediction drift

  • Outliers

  • Bias

You can even trigger retraining automatically via Lambda.

đź’€ Common Pitfalls to Avoid

  • ❌ Not uploading the full SavedModel directory to S3

  • ❌ Ignoring input/output format expectations (use instances)

  • ❌ Deploying massive models on tiny instances (check GPU needs!)

  • ❌ Leaving endpoints running when not in use (💸💸💸)

📦 When to Use Batch Inference Instead

If:

  • Your predictions aren’t real-time

  • You have 100,000+ records to process in one go

  • You want to save costs

Use SageMaker Batch Transform instead:

tf_model.transformer(instance_count=1, instance_type="ml.m5.large").transform(
    data="s3://bucket/input.csv",
    content_type="text/csv",
    split_type="Line",
)

🧭 Final Thoughts: Don’t Just Train, Ship

Training models is science.
Deploying models is engineering.
And SageMaker makes that engineering production-grade.

By combining TensorFlow’s training power with SageMaker’s deployment muscle, you turn your models into living, breathing services—complete with scaling, observability, and control.

This is how you build not just prototypes, but products.

đź”® Up Next on Gradient Descent Weekly:

  • Monitoring Your Deployed Models: Metrics That Matter.