Joe Reis
Salt Lake City, Utah, United States
82K followers
500+ connections
View mutual connections with Joe
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
View mutual connections with Joe
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
About
Best Selling Co-author of Fundamentals of Data Engineering (O'Reilly 2022), Data Engineer…
Services
View Joe’s full profile
Other similar profiles
-
Usama Fayyad
San Francisco Bay AreaConnect -
Sathish Rajamani
New York City Metropolitan AreaConnect -
Rob Marano
New York, NYConnect -
Brad Feldman
Los Angeles Metropolitan AreaConnect -
Hanns-Christian Hanebeck
Supply Chain | Innovation | Next-Gen Visibility | Blockchain | AI & Optimization | Strategy
Dallas, TXConnect -
Dr. Ricky A. Gallaway, PhD, MBA, PMP, CSM, ITILv3
Managing Director @ TCC | International Business Strategist | Leadership Coach | Innovative Entrepreneurial Development: The New H.E.I.R-A of Mother Africa - Shaping Visionaries in Key Sectors | Author | ΚΑΨ
United StatesConnect -
Paul Rauner
Risk and Emergency Management Professional
St Louis, MOConnect -
Simone Di Somma
ItalyConnect -
Gabriel Orthous
Roswell, GAConnect -
Arif Ansari
Fontana, CAConnect -
Atif Ghauri
New York City Metropolitan AreaConnect -
Judah Phillips
Greater BostonConnect -
Aspen Olmsted
Melrose, MAConnect -
Brent Dykes
Author of Effective Data Storytelling | Founder + Chief Data Storyteller at AnalyticsHero, LLC | Forbes Contributor
Lehi, UTConnect -
Dr. Ernesto Lee
Fort Lauderdale, FLConnect -
Mark DeSantis
Pittsburgh, PAConnect -
Tisson Mathew
Lake Oswego, ORConnect -
Tom Chappelear
Omaha, NEConnect -
Ike Nassi
Los Gatos, CAConnect -
Charles Miglietti
Greater BostonConnect
Explore more posts
-
Erik Widman, Ph.D.
Whoever claims building RAG systems is easy has never built one at scale. 🤯 There is a significant gap in the literature and existing frameworks. RAG architectures are designed for POCs, not for hundreds of thousands of documents or supporting tens of thousands of users. Designing these systems requires a cross-functional team with software engineering and data science skills blended with a product mindset. Earlier this summer, I spoke at the Databricks AI Summit, highlighting guiding principles we are taking at CVS Health for building enterprise-grade RAG systems. Every company is unique, and while there is no one-size-fits-all solution, here are three principles that will help you on your journey: 👉 Spend time on discovery—It's tempting to dive in and start building immediately, but you may get lost in the desert without doing your due diligence and understanding what delivers the most ROI. Our team spent approximately a quarter on discovery. 👉 Design for modularity—The LLM space is changing rapidly, and you want to build reusable components that can easily be swapped out or upgraded when something better comes along. 👉 Humans + Machines—Technology alone can not solve the knowledge search problem. Develop processes and governance to help teams curate the data, which will be ingested into your system. These lessons are just the tip of the iceberg. Watch my full Databricks presentation and leave your comments and questions below to learn more tips and tricks. #RAGSystems #EnterpriseAI #DataEngineering #MachineLearning #AIArchitecture #AI #DataScience #LLM #SoftwareEngineering #AIAtScale #KnowledgeSearch #TechLeadership #AIFrameworks #AIInnovation #DatabricksAISummit #ProductMindset #CVSHealth
62
5 Comments -
Samantha Magnus
🛑 ❗ Leaving technical names in your dashboard smacks of inconsideration for your audience. 👀 I’ve seen both inexperienced and more seasoned dashboard developers guilty of this underscore-toting offence and it makes me cringe every time! 😬 You can’t expect your business users to correctly guess what each graph is showing from alphanumeric-and-underscore soup. 🍜 Even for fairly straightforward terms like sales_num - the mature dashboard designer will ensure it is changed it to a properly capitalized and spaced business name label ---- before it ever rests on business users' eyes! 👩💼 👨💼 Check out my article for more info 👇
11
1 Comment -
Katharine McKee
Amazon launching a “discount” level of the platform for goods under $20 that will ship directly from factories in China and arrive in 9-11 days is an interesting sign of the times. Clearly demand is there when you look at the value of Temu and Shein but more importantly this is an interesting data model that Amazon hasn’t explored before. I’m very curious to see how this shakes out long term. Would love to hear your thoughts. #amazon #temu #shein #discount
11
9 Comments -
Sateesh Pabbathi
If your team includes Spark experts or data scientists, Databricks might be your go-to for all transformations. ADF can remain the orchestrator for simpler tasks. If your transformations are extremely complicated, Databricks’ advanced Spark tuning is beneficial. Question: How comfortable is your team with Spark internals (shuffle, partition, caching)?
3
-
Nithin Ramachandran
The thirst of data in all organizations is insatiable. While everyone argues over which LLM is the best, pragmatic data leaders are and should be concerned about how to serve data and semantic intelligence to these thirsty models. It’s time to talk about platform interoperability. That is the surest answer to this predicament that every CDAO finds him/her/them in. It’s a topic that does not get as much of a voice as it should.
46
6 Comments -
Fabric Developer
Finally Michael John Peña and David Ding get to talk about #RealtimeAnalytics in #Fabric. Out of all the amazing features of the platform, this might be the biggest differentiator. The simplicity of the full integration is truly inspiring. https://lnkd.in/gX8nU3Pv
-
Dr. Filip Floegel
🧩 Reading [Google Research's latest article](https://lnkd.in/ejk3_wMH) on understanding relationships between datasets got me thinking about the bigger picture in data management. This research brings out the importance of uncovering and automating complex relationships within raw data—a task that’s essential but often challenging. ✨ Palantir’s Ontology approach came to mind here as a powerful example of putting these ideas into action. By creating a digital twin of business entities and their relationships, Palantir enables companies to interact with data at a higher, business-relevant level. This abstraction layer transforms raw data into meaningful insights. ⚙️ To bring further structure, tools like Featuretools, Woodwork, Compose, and EvalML offer an open-source framework set that standardizes features and prepares data for machine learning. With **Featuretools** for automated feature engineering and **Woodwork** for consistent data typing, this toolkit enables the creation of reusable data products right from raw schemas and tables. 🌐 Imagine combining Google’s research-driven insights, Palantir’s digital twin framework, and the Featuretools ecosystem. Together, they could enable a truly dynamic data ecosystem—one where relationships are not just mapped but also standardized, enriched, and primed for decision-making. #DataManagement #Ontology #FeatureEngineering #GoogleResearch #Palantir #MachineLearning #OpenSource
1
-
Sateesh Pabbathi
ADF charges per pipeline activity/data movement, while Databricks charges for cluster compute hours (plus DBU costs). If your transformations are minimal but you do lots of data copying, ADF might be cheaper. If you run big transformations, Databricks may be more cost-effective or faster at scale. Question: Do you see yourself spinning up large ephemeral clusters, or do you have continuous tasks?
2
-
Sateesh Pabbathi
Schema Evolution: Why It Matters: Data never stands still—new fields emerge, old ones become obsolete. Formats like Avro and Parquet handle schema changes gracefully. Without proper schema evolution, data lakes turn messy, making downstream analytics unreliable. Future-proofing your storage strategy means planning for change! Question : How do you handle schema changes in your data pipelines?
-
Amara Moosa
Want to ditch data analysis confusion & become a rockstar? Our latest post defines reproducibility & shows you how to document your work for clear, repeatable results. https://wix.to/kEpcU5r #newblogpost #DataScience #Reproducibility #dataanalystskills #analytics
2
1 Comment -
Craig Norton
OPTIMAL PERFORMANCE INDEX SYSTEM (OPIS) Two same branded shoes the Brooks Ghost 13 and Ghost 16 are not the same shoes. I wonder if your specialty running store knows this. Our structural parameter testing shows the difference. Weight; The Ghost 13 weighed in at 3.23 percent less than the Ghost 16. It weighed in at 9 ounces and the Ghost 16 at 9..3 ounces. Midfoot Stability; The Ghost 13 provided 34.38 percent more midfoot stability than the Ghost 16. It tested to have a TCI (torsional control index) of 43 inch pounds (moderate range) and the Ghost 16 a TCI of 32 inch pounds(minimal range). Hindfoot Stability; The Ghost provided 21.08 percent more hindfoot stability than the Ghost 16. It tested to have a VCI (vertical compression index) of 7.86mm (soft heel midsole) and the Ghost 16 a VCI of 9.96mm (softer heel midsole). Vertical Support (loaded heel to toe drop) The Ghost 13 provided 102.94 percent more vertical support than the Ghost 16. It tested to have moderate vertical support with a 4.14mm loaded heel to toe drop. The Ghost 16 tested to have minimal vertical support with a 2.04mm loaded heel to toe drop. Potential Stability; The Ghost 13 provided 98.77 percent more potential stability than the Ghost 16. It tested to have a potential shoe stability index of 42.93 placing it in our database at the higher level of the neutral stability category. The Ghost 16 tested to have a potential shoe stability index of 21.6 placing it significantly lower in the neutral stability category. Conclusion; Based on weight and energy return the Ghost 16 will be a higher performing running shoe. The Ghost 13 will provide more prevention for injuries for neutral/mild pronators due to a significantly higher stability rating and vertical support. Again, does your specialty running store know this?
7
1 Comment -
J2 Ventures
Excited to see J2 portfolio company Kusari featured in Google’s white paper, "Securing the AI Software Supply Chain.” Kusari’s open source tool, GUAC (Graph for Understanding Artifact Composition), aggregates and queries metadata across the AI supply chain. GUAC is Google’s system of choice in understanding their large, open source software supply chains. DM Kusari CEO Tim Miller or check out GUAC’s Github page if you want to learn more about their approach to managing the AI model development lifecycle. #aisecurity #sbom #google #opensource #supplychain
41
3 Comments -
Doug Gray
https://lnkd.in/gSAcfbxa Part 10 of my series Why Data Science Projects Fail is available today on INFORMS Analytics Magazine. Tom Davenport once said that "Models make the enterprise smarter, but models embedded in mission-critical business processes and systems make the enterprise more economically efficient." That is the end game of data science models--- generating significant business value and economic impact, and to achieve that, repeatably, reliably, and sustainably, requires that models are deployed as systems. Today's installment addresses the most challenging part of implementing model-based systems: Getting from Sandbox Model to Production System. Part 10 encapsulates many aspects of data science project complexity that were addressed in prior installments such as data availability, project management, and change management, and introduces new dimensions that determine how long it will take and how much it will cost to get from a sandbox model to a fully deployed and operational enterprise-grade production system. Levels of Dynamism, Integration, Mission-Criticality, and Problem and Model Complexity are addressed as four of the key determinants of how easy or difficult it will be to get your model from the sandbox to being a fully deployed, enterprise system, and integrated as part of the overall corporate technology ecosystem. The article provides examples of projects that span the continuum of relatively easy and inexpensive to deploy to extraordinarily complex and expensive to deploy. Successfully deploying and operating data science models as part of enterprise systems requires a holistic, comprehensive team of not only data scientists, but also data engineers and analysts, business process (domain SMEs) analysts, managers and executives, cloud engineers, ML Pipeline & Ops engineers, change management pros, software engineers and system integrators to fully realize the benefits such models promise.
21
2 Comments
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Joe Reis in United States
-
Joe Reis
Wayzata, MN -
Joe Reis
Vice President of Sales Agilight North America at GENLED Brands
Greater Boston -
Joe Reis
Manager of IT Governance and Compliance at L'Oréal
Bridgewater, NJ -
Joe Reis
FrontEnd Developer (React | Vite.js | Next.js | Tailwind CSS | MongoDB | Node.js | Express.js)
Shelton, CT
74 others named Joe Reis in United States are on LinkedIn
See others named Joe Reis