🚀 Deploy ML Models Using TensorFlow & AWS SageMaker
From Notebook to Scalable Production in the Cloud

Forward-thinking IT Operations Leader with cross-domain expertise spanning incident & change management, cloud infrastructure (Azure, AWS, GCP), and automation engineering. Proven track record in building and leading high-performance operations teams that drive reliability, innovation, and uptime across mission-critical enterprise systems. Adept at aligning IT services with business goals through strategic leadership, cloud-native transformation, and process modernization. Currently spearheading application operations and monitoring for digital modernization initiatives. Deeply passionate about coding in Rust, Go, and Python, and solving real-world problems through machine learning, model inference, and Generative AI. Actively exploring the intersection of AI engineering and infrastructure automation to future-proof operational ecosystems and unlock new business value.
Gradient Descent Weekly — Issue #10
Training a model in TensorFlow is the easy part.
Getting it into production without becoming a DevOps engineer? That’s the game.
In this issue, we’ll walk through the real-world approach to deploying TensorFlow models using AWS SageMaker — step by step, with clarity, context, and command-line confidence.
Whether you’re a solo dev or scaling for enterprise, SageMaker offers a powerful managed service that handles:
Model hosting
Auto-scaling
Monitoring
Versioning
A/B testing
Let’s turn that .h5 or SavedModel into a production-ready endpoint.
đź§± Why TensorFlow + SageMaker?
| Tool | What It Handles |
| TensorFlow | Model development, training, export |
| SageMaker | Hosting, scaling, monitoring, and inference |
This duo gives you a full-stack MLOps story:
Build in TensorFlow
Train locally or in SageMaker
Deploy in a few lines of code
Integrate seamlessly with AWS ecosystem (S3, CloudWatch, Lambda, etc.)
🛠️ Step-by-Step: Deploying TF Models with SageMaker
🔹 Step 1: Train and Save Your TensorFlow Model
Train your model as usual:
model.fit(X_train, y_train, epochs=10)
model.save("my_tf_model")
This creates a TensorFlow SavedModel directory with all the files SageMaker needs.
🔹 Step 2: Upload Your Model to S3
SageMaker requires models to be hosted in Amazon S3.
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.tensorflow import TensorFlowModel
import os
s3 = boto3.client('s3')
bucket_name = 'my-ml-model-bucket'
model_dir = 'my_tf_model'
# Upload directory
s3_path = f"s3://{bucket_name}/{model_dir}"
!aws s3 cp --recursive {model_dir} {s3_path}
🔹 Step 3: Deploy Using TensorFlowModel
from sagemaker.tensorflow import TensorFlowModel
role = get_execution_role()
tf_model = TensorFlowModel(
model_data=s3_path,
role=role,
framework_version="2.12",
sagemaker_session=sagemaker.Session(),
entry_point="inference.py", # optional if using default handler
)
predictor = tf_model.deploy(
initial_instance_count=1,
instance_type="ml.m5.large"
)
âś… Tip: Use inference.py to define custom input/output preprocessing and postprocessing logic.
🔹 Step 4: Send Real-Time Predictions
input_data = {
"instances": [[5.1, 3.5, 1.4, 0.2]]
}
response = predictor.predict(input_data)
print(response)
Boom—you now have a fully managed inference endpoint in AWS.
đź§ Behind the Scenes: What SageMaker Does
Provisions an EC2 instance
Loads your TensorFlow model from S3
Exposes a secure REST endpoint
Scales up/down based on load (if enabled)
Automatically handles retry, timeout, and failover logic
Monitors with CloudWatch
No infrastructure glue required. You write ML, AWS handles ops.
⚙️ Optional Enhancements
âś… Enable Auto Scaling
predictor.endpoint_name # use this in autoscaling config
client = boto3.client('application-autoscaling')
client.register_scalable_target(
ServiceNamespace='sagemaker',
ResourceId=f'endpoint/{predictor.endpoint_name}/variant/AllTraffic',
ScalableDimension='sagemaker:variant:DesiredInstanceCount',
MinCapacity=1,
MaxCapacity=5,
)
âś… Add Monitoring with SageMaker Model Monitor
Catch:
Input drift
Prediction drift
Outliers
Bias
You can even trigger retraining automatically via Lambda.
đź’€ Common Pitfalls to Avoid
❌ Not uploading the full
SavedModeldirectory to S3❌ Ignoring input/output format expectations (use
instances)❌ Deploying massive models on tiny instances (check GPU needs!)
❌ Leaving endpoints running when not in use (💸💸💸)
📦 When to Use Batch Inference Instead
If:
Your predictions aren’t real-time
You have 100,000+ records to process in one go
You want to save costs
Use SageMaker Batch Transform instead:
tf_model.transformer(instance_count=1, instance_type="ml.m5.large").transform(
data="s3://bucket/input.csv",
content_type="text/csv",
split_type="Line",
)
🧠Final Thoughts: Don’t Just Train, Ship
Training models is science.
Deploying models is engineering.
And SageMaker makes that engineering production-grade.
By combining TensorFlow’s training power with SageMaker’s deployment muscle, you turn your models into living, breathing services—complete with scaling, observability, and control.
This is how you build not just prototypes, but products.
đź”® Up Next on Gradient Descent Weekly:
- Monitoring Your Deployed Models: Metrics That Matter.






