Near Real-Time Data: The Beauty of Pub/Sub Systems and Getting Started with Kafka
Every machine learning project is different, but all of them rely on the same thing: data. Without it, an AI project is just an idea. The success of some projects might rely on "Near real-time data", a term that has joined the elite ranks of "cloud" and "big data" in recent buzzword canon. But what is it that makes near real-time data pipelines so different from traditional methods? We'll begin with a brief explanation of the Publish/Subscribe (Pub/Sub) method of data transmission and how it enables us to ingest, process, transform and emit thousands of messages, or more, per second. We'll walk through a basic, single node, Kafka implementation together, and I'll share some of the stumbling blocks we experienced at Chubb during our own Kafka rollout.
Laptop Recommended Laptop for the Hands-on Walkthrough portion, but not required
Born-again Data Science convert. After six years of designing skyscrapers for top-10 engineering firms, I decided to finally align my career with my interests. I quit my comfortable engineering job to engage in a full-time Data Science Immersive course at General Assembly in NYC, and began teaching Data Science immediately upon graduation. Along the way, I've developed dozens of data science passion projects ranging from predicting West Nile Virus outbreaks to analyzing the deteriorating state of the traditional U.S. higher education system. I've been building bespoke ETL pipelines and solving data problems at Chubb since February, 2019.