Overview
This project showcases a serverless, machine learning-powered cybersecurity solution deployed entirely on AWS. It detects anomalous network activity, such as DDoS attacks, unauthorized access, and phishing attempts using a trained model hosted on SageMaker. The system automates data ingestion, preprocessing, model training, deployment, and real-time inference. It is secured with IAM policies, monitored via CloudWatch, and designed for scalability and automation.
Data Storage and Management
Amazon S3
- Stores raw network traffic logs collected from various sources.
- Holds preprocessed datasets and extracted features used for training.
- Archives model artifacts including training outputs and serialized models.
- Bucket access tightly controlled via IAM and used as input/output for SageMaker and Lambda.
Machine Learning Pipeline
Amazon SageMaker
- Trains an XGBoost model to classify network activity as normal or malicious.
- Deploys the trained model as a real-time inference endpoint.
- Automates the ML lifecycle with SageMaker Pipelines:
- Data preprocessing
- Feature engineering
- Model training
- Evaluation
- Deployment
- Endpoint is integrated with Lambda for real-time threat detection.
Data Preprocessing and Feature Engineering
AWS Lambda
- Triggered to process raw logs stored in S3.
- Extracts features such as IP entropy, packet size variance, and protocol usage.
- Outputs structured datasets back to S3 for SageMaker ingestion.
- Configured with IAM roles for secure access to S3 and SageMaker.
Monitoring and Logging
Amazon CloudWatch
- Captures logs from Lambda preprocessing and SageMaker inference.
- Tracks performance metrics such as latency, invocation count, and error rates.
- Extensible with CloudWatch Alarms to notify on anomalies or failures.
- Logs include flagged threats and prediction confidence scores.
Security and Permissions
AWS IAM
- IAM roles scoped for least privilege:
- Lambda
- Execution role with access to S3 and SageMaker.
- SageMaker
- Role with access to training data and model artifacts.
- Policies enforce encryption at rest and in transit.
- All services interact securely via IAM-authenticated API calls.
Architecture Summary
- Storage – Amazon S3: Raw logs, preprocessed datasets, model artifacts; IO for Lambda/SageMaker.
- Compute – AWS Lambda: Automates preprocessing & feature extraction; orchestrates IO.
- Machine Learning – Amazon SageMaker: Trains & deploys model; manages lifecycle with Pipelines.
- Monitoring – Amazon CloudWatch: Logs/metrics; alarms for anomalies.
- Security – AWS IAM: Least-privilege access; encryption at rest/in transit.
Summary of Architecture Flow
- Raw logs are uploaded to S3.
- Lambda is triggered to preprocess data and extract features.
- Preprocessed data is stored back in S3 and fed into SageMaker for training.
- Trained model is deployed as a SageMaker endpoint.
- Inference requests are sent to the endpoint via Lambda or other triggers.
- CloudWatch logs all activity and monitors for anomalies.
- IAM roles enforce secure access across all services.
Skills Demonstrated
- End-to-end ML pipeline automation with SageMaker Pipelines
- Feature engineering and preprocessing with Lambda
- Secure data handling and encryption via IAM
- Real-time inference and logging with CloudWatch
- Scalable, serverless architecture design
- Threat detection modeling with XGBoost