Kuasai essentials MLOps untuk deploy, monitor, dan maintain machine learning model dalam produksi. Pelajari model versioning, CI/CD pipeline, monitoring, dan retraining strategy.

Membangun machine learning model adalah satu hal. Men-deploy ke produksi dan menjaganya bekerja reliably adalah hal yang sama sekali berbeda.
Sebagian besar organisasi dapat melatih model yang mencapai 95% accuracy dalam Jupyter notebook. Tetapi memindahkan model itu ke produksi—di mana ia harus menangani real data, scale ke jutaan prediction, dan maintain performance seiring waktu—memerlukan skillset yang sama sekali berbeda.
Di sinilah MLOps masuk. MLOps (Machine Learning Operations) menerapkan prinsip DevOps ke sistem machine learning. Ini tentang mengotomatisasi seluruh lifecycle: dari data preparation melalui model training, validation, deployment, monitoring, dan retraining.
Tanpa MLOps, Anda berakhir dengan model yang degrade secara silent, data pipeline yang break secara unexpected, dan tidak ada cara untuk debug apa yang salah. Dengan MLOps, Anda memiliki ML system yang reproducible, reliable, dan maintainable.
Code → Build → Test → Deploy → Monitor → MaintainStage yang jelas, deterministic outcome, version control di setiap step.
Data → Feature Engineering → Model Training → Evaluation → Deployment → Monitoring → RetrainingLebih kompleks karena:
MLOps menjembatani gap ini dengan memperlakukan ML system seperti software system:
Data Pipeline → Feature Store → Model Training → Model Registry → Deployment → Monitoring → Retraining
↓ ↓ ↓ ↓ ↓ ↓ ↓
Version Version Version Version Version Metrics Automated
Control Control Control Control Control Tracking TriggersSetiap komponen di-version, di-test, dan di-monitor.
ML model hanya sebaik training data mereka. Data pipeline harus reproducible dan versioned.
Data Pipeline Architecture:
Raw Data Source
↓
Data Validation (Schema, Quality)
↓
Feature Engineering
↓
Feature Store (Versioned)
↓
Training Dataset (Versioned)Contoh: Data Pipeline dengan DVC (Data Version Control)
git init
dvc initTrack data file:
dvc add data/raw/training_data.csv
git add data/raw/training_data.csv.dvc .gitignore
git commit -m "Add training data v1"DVC menyimpan data dalam remote storage (S3, GCS) dan track version seperti Git:
# Lihat data history
dvc dag
# Checkout previous data version
git checkout <commit-hash>
dvc checkoutIni memastikan reproducibility: diberikan commit hash, Anda dapat recreate exact training dataset.
Feature store adalah centralized repository untuk feature (derived data yang digunakan dalam model). Ini menyelesaikan beberapa masalah:
Feature Store Architecture:
Raw Data
↓
Feature Computation
↓
┌─────────────────────────────┐
│ Feature Store │
├─────────────────────────────┤
│ Batch Features (Historical) │
│ Real-time Features (Online) │
└─────────────────────────────┘
↓ ↓
Training Pipeline Serving PipelineContoh: Feast Feature Store
from feast import Entity, FeatureView, FeatureService
from feast.infra.offline_stores.file_source import FileSource
# Definisikan entity
user = Entity(name="user_id", join_keys=["user_id"])
# Definisikan feature view
user_features = FeatureView(
name="user_features",
entities=[user],
ttl=timedelta(days=1),
schema=[
Field(name="user_id", dtype=Int64),
Field(name="total_purchases", dtype=Float32),
Field(name="avg_order_value", dtype=Float32),
Field(name="days_since_signup", dtype=Int32),
],
source=FileSource(path="data/user_features.parquet"),
)
# Definisikan feature service
user_service = FeatureService(
name="user_service",
features=[user_features],
)Ambil feature untuk training:
from feast import FeatureStore
fs = FeatureStore(repo_path=".")
# Dapatkan historical feature untuk training
training_df = fs.get_historical_features(
entity_df=pd.read_csv("data/user_ids.csv"),
features=[
"user_features:total_purchases",
"user_features:avg_order_value",
"user_features:days_since_signup",
],
).to_df()Ambil feature untuk serving (real-time):
# Dapatkan latest feature untuk prediction
features = fs.get_online_features(
features=[
"user_features:total_purchases",
"user_features:avg_order_value",
"user_features:days_since_signup",
],
entity_rows=[{"user_id": 123}],
).to_dict()Feature yang sama, di-compute identically, untuk training dan serving.
Model training harus reproducible dan tracked.
Training Pipeline:
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Mulai MLflow run
with mlflow.start_run():
# Log parameter
mlflow.log_param("n_estimators", 100)
mlflow.log_param("max_depth", 10)
mlflow.log_param("random_state", 42)
# Load feature
X, y = load_features()
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train model
model = RandomForestClassifier(
n_estimators=100,
max_depth=10,
random_state=42
)
model.fit(X_train, y_train)
# Evaluasi
train_accuracy = model.score(X_train, y_train)
test_accuracy = model.score(X_test, y_test)
# Log metric
mlflow.log_metric("train_accuracy", train_accuracy)
mlflow.log_metric("test_accuracy", test_accuracy)
# Log model
mlflow.sklearn.log_model(model, "model")MLflow track:
Model Registry:
import mlflow
# Register model
model_uri = "runs:/abc123/model"
mv = mlflow.register_model(model_uri, "fraud_detection")
# Transition ke staging
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="fraud_detection",
version=1,
stage="Staging"
)
# Transition ke production
client.transition_model_version_stage(
name="fraud_detection",
version=1,
stage="Production"
)Model registry menyediakan:
Sebelum deploy, model harus pass rigorous test.
Validation Check:
import numpy as np
from sklearn.metrics import precision_recall_curve
def validate_model(model, X_test, y_test):
"""Validasi model sebelum deployment"""
# 1. Performance threshold
accuracy = model.score(X_test, y_test)
assert accuracy >= 0.90, f"Accuracy {accuracy} below threshold"
# 2. Precision/Recall balance
y_pred = model.predict(X_test)
precision, recall, _ = precision_recall_curve(y_test, y_pred)
assert precision.mean() >= 0.85, "Precision too low"
assert recall.mean() >= 0.80, "Recall too low"
# 3. Fairness check (no bias across group)
for group in ["group_a", "group_b"]:
group_mask = X_test["group"] == group
group_accuracy = model.score(X_test[group_mask], y_test[group_mask])
assert abs(group_accuracy - accuracy) < 0.05, f"Bias detected in {group}"
# 4. Prediction stability
predictions_1 = model.predict(X_test)
predictions_2 = model.predict(X_test)
assert np.array_equal(predictions_1, predictions_2), "Non-deterministic predictions"
# 5. Latency check
import time
start = time.time()
for _ in range(1000):
model.predict(X_test[:1])
latency = (time.time() - start) / 1000
assert latency < 0.1, f"Latency {latency}s exceeds threshold"
return TrueDeploy model sebagai versioned, reproducible artifact.
Deployment Architecture:
Model Registry
↓
Model Serving (REST API)
↓
┌─────────────────────────────┐
│ Load Balancer │
├─────────────────────────────┤
│ Replica 1 │ Replica 2 │ ... │
└─────────────────────────────┘
↓
Monitoring & LoggingContoh: Deploy dengan BentoML
import bentoml
from sklearn.ensemble import RandomForestClassifier
# Simpan model
model = RandomForestClassifier()
model.fit(X_train, y_train)
bentoml.sklearn.save_model("fraud_detector", model)
# Definisikan service
@bentoml.service
class FraudDetectionService:
model_ref = bentoml.sklearn.get("fraud_detector:latest")
@bentoml.api
def predict(self, features: dict) -> dict:
model = self.model_ref.model
prediction = model.predict([features.values()])
return {"fraud_probability": float(prediction[0])}Deploy:
bentoml serve fraud_detection_service:latest --productionIni membuat containerized, versioned model service siap untuk production.
Model degrade dalam production. Monitoring mendeteksi issue sebelum mempengaruhi user.
Apa yang Harus Dimonitor:
┌─────────────────────────────────────────┐
│ Model Performance Metric │
├─────────────────────────────────────────┤
│ • Accuracy, Precision, Recall │
│ • Latency, Throughput │
│ • Error rate │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ Data Drift Detection │
├─────────────────────────────────────────┤
│ • Feature distribution change │
│ • Prediction distribution change │
│ • Outlier detection │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ System Metric │
├─────────────────────────────────────────┤
│ • CPU, Memory, Disk usage │
│ • Request latency, error rate │
│ • Model serving availability │
└─────────────────────────────────────────┘Contoh: Data Drift Detection
from scipy.stats import ks_2samp
import numpy as np
def detect_drift(reference_data, current_data, threshold=0.05):
"""Deteksi jika current data drift dari reference"""
for feature in reference_data.columns:
# Kolmogorov-Smirnov test
statistic, p_value = ks_2samp(
reference_data[feature],
current_data[feature]
)
if p_value < threshold:
print(f"DRIFT DETECTED: {feature} (p-value: {p_value})")
return True
return False
# Monitor dalam production
reference = load_training_data()
current = load_recent_predictions()
if detect_drift(reference, current):
# Trigger retraining
trigger_retraining_pipeline()Model degrade seiring waktu. Retraining harus automated dan triggered oleh data drift atau performance degradation.
Retraining Pipeline:
Monitor Model Performance
↓
Detect Drift atau Degradation
↓
Trigger Retraining
↓
Train New Model
↓
Validate New Model
↓
A/B Test (Optional)
↓
Deploy atau RollbackContoh: Automated Retraining Trigger
import schedule
import time
def check_model_health():
"""Periksa jika model perlu retraining"""
# Dapatkan current model performance
current_accuracy = evaluate_model_on_recent_data()
baseline_accuracy = 0.90
# Periksa untuk drift
has_drift = detect_data_drift()
# Trigger retraining jika diperlukan
if current_accuracy < baseline_accuracy * 0.95 or has_drift:
print("Triggering retraining...")
trigger_retraining_job()
return True
return False
# Schedule daily check
schedule.every().day.at("02:00").do(check_model_health)
while True:
schedule.run_pending()
time.sleep(60)Berikut adalah complete MLOps workflow:
1. Data Preparation
├── Collect raw data
├── Version dengan DVC
└── Validate schema & quality
2. Feature Engineering
├── Compute feature
├── Store dalam Feature Store
└── Version feature definition
3. Model Training
├── Load feature dari Feature Store
├── Train model dengan MLflow
├── Log parameter, metric, artifact
└── Register model dalam Model Registry
4. Model Validation
├── Performance test
├── Fairness check
├── Latency test
└── Approve untuk deployment
5. Deployment
├── Build container image
├── Deploy ke staging
├── Run smoke test
├── Deploy ke production
└── Monitor health
6. Monitoring
├── Track prediction
├── Detect data drift
├── Monitor performance metric
└── Alert pada anomali
7. Retraining
├── Detect drift atau degradation
├── Trigger retraining pipeline
├── Validate new model
└── Deploy atau rollbackMasalahnya: Tidak dapat reproduce past result atau debug issue.
Mengapa terjadi: Tim fokus pada code versioning, lupa tentang data.
Cara menghindarinya:
Masalahnya: Model perform baik dalam training tetapi buruk dalam production.
Mengapa terjadi: Feature di-compute berbeda dalam training vs. serving.
Cara menghindarinya:
Masalahnya: Bad model di-deploy ke production.
Mengapa terjadi: Rushing untuk deploy tanpa thorough testing.
Cara menghindarinya:
Masalahnya: Model performance degrade secara silent.
Mengapa terjadi: Tidak ada monitoring atau drift detection.
Cara menghindarinya:
Masalahnya: Model menjadi stale, performance degrade.
Mengapa terjadi: Retraining manual dan infrequent.
Cara menghindarinya:
Masalahnya: Tidak dapat recreate past result atau debug issue.
Mengapa terjadi: Random seed tidak set, dependency tidak pinned.
Cara menghindarinya:
ML Code + Data + Config → Reproducible ModelVersion semuanya: code, data, hyperparameter, environment.
Data Pipeline → Training → Validation → Deployment → Monitoring → RetrainingManual step rawan kesalahan dan tidak scale.
Model Performance + Data Drift + System Metric → AlertTangkap issue sebelum mempengaruhi user.
Unit Test → Integration Test → Validation Test → A/B TestSetiap layer menangkap issue berbeda.
Mengapa model ini? Mengapa feature ini? Mengapa threshold ini?Future Anda akan berterima kasih kepada present Anda.
Canary Deployment → A/B Test → Rollback StrategyAsumsikan sesuatu akan salah. Miliki rencana.
Mulai sederhana:
Tambahkan kompleksitas secara bertahap:
Full MLOps:
Jangan over-engineer awal. Mulai dengan minimum viable MLOps setup dan evolve saat system Anda berkembang.
MLOps adalah tentang membawa software engineering discipline ke machine learning. Ini bukan hanya tentang deploy model—ini tentang membangun reliable, maintainable, dan scalable ML system.
Komponen kunci:
Mengimplementasikan MLOps memerlukan investasi upfront, tetapi membayar dividen dalam reliability, maintainability, dan team velocity. Mulai dengan pilot project, establish best practice, dan scale secara bertahap.
Perbedaan antara model yang bekerja dan model yang bekerja reliably dalam production adalah MLOps.