dlthub education We've got 2 courses coming up soon, and a 3rd one in planning (dlt+ fabric) If you would like to participate, sign up here: https://dlthub.com/events
dltHub
Softwareentwicklung
Supporting a new generation of Python users when they create and use data in their organizations
Info
Since 2017, the number of Python users has been increasing by millions annually. The vast majority of these people leverage Python as a tool to solve problems at work. Our mission is to make them autonomous when they create and use data in their organizations. For this end, we are building an open source Python library called data load tool (dlt). Our users use dlt in their Python scripts to turn messy, unstructured data into regularly updated datasets. It empowers them to create highly scalable, easy to maintain, straightforward to deploy data pipelines without having to wait for help from a data engineer. We are dedicated to keeping dlt an open source project surrounded by a vibrant, engaged community. To make this sustainable, dltHub stewards dlt while also offering additional software and services that generate revenue (similar to what GitHub does with Git). dltHub is based in Berlin and New York City. It was founded by data and machine learning veterans. We are backed by Dig Ventures and many technical founders from companies such as Hugging Face, Instana, Matillion, Miro, and Rasa.
- Website
-
https://dlthub.com/
Externer Link zu dltHub
- Branche
- Softwareentwicklung
- Größe
- 11–50 Beschäftigte
- Hauptsitz
- Berlin
- Art
- Privatunternehmen
- Gegründet
- 2022
Orte
-
Primär
Berlin, DE
Beschäftigte von dltHub
Updates
-
dltHub hat dies direkt geteilt
Is data ingestion giving you trouble? It doesn't have to be that complicated. 🔧 Despite advances in data engineering, ingestion remains a major pain point for data teams. Why? - UI-based tools lacking scalability for production. - Open-source solutions creating messy, hard-to-maintain code. - In-house solutions falling short on security, monitoring, and reliability. 💡 But there's a better way. Tools like dltHub and Prefect bring software engineering best practices to data ingestion, making scalable, code-first pipelines easier than ever. Learn how: - dlt defines connectors and pipelines as code. - Prefect handles orchestration with automation and scheduling. - Together, they enable robust, modular data pipelines. 👉 Read the full article and learn how to build robust, scalable data ingestion pipelines: https://lnkd.in/dXaBGTCY #DataEngineering #ELT #DataPipelines #DataPlatform
-
What do Reddit users say about migrating to Iceberg? Apache Iceberg is getting a lot of attention. People love its interoperability, freedom from vendor lock-in, and better scalability. But is it actually delivering on those promises, or is it just adding complexity? Here’s what folks in the trenches are saying: 💭 "We're using it as half the business is Athena on a data lake, and the other half is Snowflake and dbt boys. So Iceberg allows the silos to meet in the middle somewhat." – wallyflops 💭 "We have hundreds of terabytes of event data and need to remove some lines due to GDPR. Having a ton of metadata (which Iceberg basically is) and tools like hidden partitions, z-ordering, etc., helps a lot." – data_grind 💭 "We are switching (eventually over one or two years) to Iceberg. The goal? Allow data to be queried from other compute engines—Trino and Snowflake primarily." – SupermarketMost7089 💭 "Switching to Iceberg can be useful, but it really depends on your data stack and use case. A media company switched to Iceberg to enable analytics teams to ingest data from various sources without being tied to one specific processing engine." – Signal-Indication859 Thinking about making the switch? Your company might want to check out dlt's Iceberg options, whether you’re leaning towards Filesystem or Athena, they’ve got you covered. Or maybe Iceberg still feels like a bag of trade-offs? No worries—there’s a whole Reddit thread debating it. Jump in, see what people are saying, and let’s discuss.👇 #ApacheIceberg #DataEngineering #BigData #DataLakes
-
-
Iceberg, dlt, sqlmesh, serverless, modelling automation. Do i need to say more? Check out this beautiful project! https://lnkd.in/e2B6bRDJ
🚀 Excited to share my serverless lakehouse implementation! I've built a complete data platform that demonstrates how to implement a serverless lakehouse architecture combining the HOOK methodology with Unified Star Schema for analytics. The project: - Transforms AdventureWorks data through a clean, three-layer Analytical Data Storage System (ADSS) architecture (DAS→DAB→DAR) - Uses the HOOK methodology for business alignment without complex ETL - Implements a full Unified Star Schema with extended Puppini Bridge functionality - Provides point-in-time analysis through intelligent temporal resolution - Runs completely serverless using DuckDB, Iceberg, SQLMesh by Tobiko, dlt by dltHub and Streamlit. This implementation demonstrates how we can achieve both technical excellence and business alignment in data modeling - without sacrificing either. The solution generates 200+ models programmatically via configuration, making it incredibly maintainable and extensible. If you're interested in business-aligned data modeling or implementing the HOOK methodology in practice, check out the repository! GitHub Repo: https://lnkd.in/dvS5-mwZ #DataEngineering #HOOK #DataModeling #DataArchitecture #DataWarehouse #Lakehouse #DuckDB #Serverless #Iceberg #UnifiedStarSchema #AnalyticalDataStorageSystem #SQLMesh #dlt #Streamlit
-
How do you run dlt on Airflow? Check out this comprehensive guide from our Consulting Partners Untitled Data Company https://lnkd.in/eVv4WRdc
-
Managed cloud databases markup compute by 35-70x. It was fine when humans ran queries. AI doesn’t need coddling. Iceberg lets you: ✅ Run queries anywhere (Trino, DuckDB, Athena—your choice) ✅ Cut insane compute costs ✅ Keep control over your data, not rent access to it It’s your budget. Stop setting it on fire. https://lnkd.in/efVyNpVt
-
Make sure to check out the dlt-SQLmesh integration that scaffolds your project and takes you from loading to transforming in one CLI Check out this gem! https://lnkd.in/e9qX_Mjt
Wow, it’s amazing what we’ve achieved in SQLMesh in the last two quarters! We’re at our Tobiko biannual offsite and Iaroslav Zeigerman is going through what our core team has accomplished. Amazing features like true multi engine support, linting, blue printing, ClickHouse athena, Snowflake dynamic tables, and dltHub integration! #SQLMesh is moving ahead at lightning speed. Can’t wait to see where we’ll be at the next offsite!!
-
-
dltHub hat dies direkt geteilt
Migrate your data stack without breaking it. Iceberg is AI-ready, composable, and modular - just start using it! Read more: https://lnkd.in/e_EETaKV
-
🔥Why pay cloud overhead? Test pipelines locally with BYOC. In data engineering, we often default to spinning up cloud instances, but there isn't always a need. Your local machine is ready to handle more than you might expect. BYOC (Bring Your Own Compute) lets you run workloads on your own machine instead of relying on cloud servers. How BYOC can help: 🔹 Local Development: Running workloads on your own hardware isn't just about saving costs - it's about transforming your development workflow. Think faster iterations, immediate feedback, and zero waiting time for cloud resources to spin up 🔹 Zero-Cost Iterations Test, fail, adjust, repeat - all without touching your cloud budget. It's like having an infinite sandbox for your data experiments. The workflow is beautifully simple: 1. Build your pipeline locally with dlt 2. Test, Query, and analyze with dlt Cache 4. Push to production only when everything's perfect Why this matters? - Faster development cycles - Complete control over your compute - No surprise cloud bills - Instant feedback loops What's particularly exciting is how this fits into modern data practices. No more hoping your code works in production - you've already proven it locally. Have you experimented with BYOC? Are you interested in trying it? Discuss below!
-
Try the dltHub AI code assistant!
dltHub built a custom AI code assistant that helps you create and maintain dlt pipelines. We are particularly excited about their MCP server block, which brings in context about your pipeline runs, database tables, and more Check out their blog post: https://lnkd.in/gYCVRqx6