Python | SQL | Docker | Google Cloud Storage | Big Query | Airflow | Spark | Cloud Dataflow | Cloud Bigtable
A data pipeline in this case is a set of processes that are used to collect, process, and store data. By collecting data from a variety of sources point area and storing it in a central location, data pipelines make it easier to access and analyze data. This can help TransJakarta business to make better decisions by providing them with a better understanding of customers, operations, and markets.
Use docker to create PostgreSQL container and pgAdmin for connect and management database.
Use Python in Jupyter Notebook for ingestion data .csv to database PostgreSQL.
Ingestion Data to Google Cloud Storage with Airflow.
Import an official Drone Racing League dataset with race statistics from 2016-2022
Run queries to explore and understand relationships within the DRL dataset
Analyze different types of race statistics to answer questions like "What was the fastest time in a certain event?" or "How did a pilot's performance improve over time?"
Build out a Data Studio report to visualize pilot performance
Cloud Dataflow for scalable data ingestion system that can handle late data
Cloud Bigtable, our scalable, low-latency time series database
TensorFlow for Scalable ML pipeline