On this page
article
Capstone Project
DataTribe Collective Data Engineering Capstone Project
Welcome to the DataTribe Collective Data Engineering learnings repository!
This is a shared space for engineers to learn, build, and showcase their data engineering projects.
you can find the project repository here.
π₯ How to Contribute
- Fork the repository
- Create a folder with your GitHub username or full name:
- Add your project inside your folder: So, the tree structure will look like this:
your-name/
βββ README.md β Describe your project description and solution idea clearly
βββ dags/
βββ scripts/
βββ data/
βββ notebooks/
- Commit and push your changes.
- Open a Pull Request to the
main
branch.
Problem Idea and Data Sources
- Choose a problem idea that interests you. It may include data ingestion, transformation, and/or visualization.
- Open datasets can be found from here
Guidelines
- Follow the structure above to keep things organized.
- Include a clear
README.md
inside your folder. - Use the learnt GCP related tools or any other tools that you are comfortable with, no strict rules.
- Keep all your code and data inside your folder only.
- Projects will be reviewed by the DataTribe before merging.
Extras
- Try to utilize GitHub Actions to automate the build process
- Use
Makefile
,Poetry
, orrequirements.txt
as needed. It’s a good practice to include aMakefile
for easy setup and execution of your project.
Communication
- Join the DataTribe Collective Discord server to connect with fellow learners, ask questions, and share your progress in the #data-learning-path.
- If you still are figuring out how to get started, please feel free to reach out, people can form groups and also commit to work together, but try to work individually as much as possible.
- Datasets might be huge, so try to work on a smaller datasets first and then scale it up!
Community Goals
- Share practical, real-world data engineering workflows
- Learn from each otherβs approach and stack
- Encourage open discussion and feedback