Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Sotheby's Auction Project
Project type
End to End Data Science / Data Engineering Project
Github
Project Overview
The project has several stages
1) A web scraper in python to acquire all the data from the site.
2) An AWS AMI (Amazon Machine Image) to run the script)
3) A fleet of 256 AWC EC2 instances to run the script in parallel on the many different pages of the site to increase the scraping speed
4) An AWS S3 bucket to store the raw data
5) An Apache Spark script using AWS Glue to preform ETL on the raw data to create an enriched clean version of the data
6) Creating a data lake in AWS S3 with a Glue data catalogue and using Athena for queries on the cleaned data.
To Do
Perform EDA on the data to learn more about it
Create and manage ML models on Amazon Sagemaker to predict auction prices
I scraped data from Sotheby's Auction site and performed ETL on the data into a data lake-house on AWS.
The next stages of this project will be doing some exploratory data analysis on the data and then will be training various machine learning models to be able to predict the price of future auction using Amazon Sagemaker.

