The 2014 PyCon in Montreal is over and 138 videos from the conference have already been posted. There are a huge variety of topics covered. To avoid scrolling through the entire list every time I want to see what I missed by staying home here is a subset of session and tutorial videos of interest to data science and machine learning fans. I have listed running times as h:mm:ss. Summary text is from the pyvideo.org site.
Diving into Open Data with IPython Notebook & Pandas, 0:30:55
I’ll walk you through Python’s best tools for getting a grip on data: IPython Notebook and pandas. I’ll show you how to read in data, clean it up, graph it, and draw some conclusions, using some open data about the number of cyclists on Montréal’s bike paths as an example.
Know Thy Neighbor: Scikit and the K-Nearest Neighbor Algorithm, 0:20:56
One of the great features of Python is its machine learning capabilities. Scikit is a rich Python package which allows developers to create predictive apps. In this presentation, we will guess what type of music do Python programmers like to listen to, using Scikit and the k-nearest neighbor algorithm.
Enough Machine Learning to Make Hacker News Readable Again, 0:28:49
It’s inevitable that online communities will change, and that we’ll remember the community with a fondness that likely doesn’t accurately reflect the former reality. We’ll explore how we can take a set of articles from an online community and winnow out the stuff we feel is unworthy. We’ll explore some of the machine learning tools that are just a “pip install” away, such as scikit-learn and nltk.
How to Get Started with Machine Learning, 0:25:50
Provide an introduction to machine learning to clarify what it is, what it’s not and how it fits into this picture of all the hot topics around data analytics and big data.
Realtime predictive analytics using scikit-learn & RabbitMQ, 0:28:58
scikit-learn is an awesome tool allowing developers with little or no machine learning knowledge to predict the future! But once you’ve trained a scikit-learn algorithm, what now? In this talk, I describe how to deploy a predictive model in a production environment using scikit-learn and RabbitMQ. You’ll see a realtime content classification system to demonstrate this design.
Mining Social Web APIs with IPython Notebook, 3:25:24
Social websites such as Twitter, Facebook, LinkedIn, Google+, and GitHub have vast amounts of valuable insights lurking just beneath the surface, and this workshop minimizes the barriers to exploring and mining this valuable data by presenting turn-key examples from the thoroughly revised 2nd Edition of Mining the Social Web.
Bayesian statistics made simple, 3:15:29
An introduction to Bayesian statistics using Python. Bayesian statistics are usually presented mathematically, but many of the ideas are easier to understand computationally. People who know Python can get started quickly and use Bayesian analysis to solve real problems. This tutorial is based on material and case studies from Think Bayes (O’Reilly Media).
Beyond Defaults: Creating Polished Visualizations Using Matplotlib, 3:08:23
When people hear of matplotlib, they think rudimentary graphs that will need to be touched up in photoshop. This tutorial aims to teach attendees how to exploit the functionality provided by various matplotlib libraries to create professional looking data visualizations.
Data Wrangling for Kaggle Data Science Competitions — An etude, 3:22:04
Let us mix Python analytics tools, add a dash of Machine Learning Algorithmics & work on Data Science Analytics competitions hosted by Kaggle. This tutorial introduces the intersection of Data, Inference & Machine Learning, structured in a progressive mode, so that the attendees learn by hands-on wrangling with data for interesting inferences using scikit-learn (scipy, numpy) & pandas
Hands-on with Pydata: how to build a minimal recommendation engine, 3:21:00
In this tutorial we’ll set ourselves the goal of building a minimal recommendation engine, and in the process learn about Python’s excellent Pydata and related projects and tools: NumPy, pandas, and the IPython Notebook.
Python for Social Scientists, 3:27:00
Many provocative social questions can be answered with data, and datasets are more available than ever. Start working with it here. First we’ll download and visualize one data set from the World Bank Indicators page together, using Matplotlib. Then you’ll have time on your own to pick another data set from any online source and plot that. At the end every person/pair will share what they found.
Exploring Machine Learning with Scikit-learn, 3:24:14
This tutorial will offer an introduction to the core concepts of machine learning, and how they can be easily applied in Python using Scikit-learn. We will use the scikit-learn API to introduce and explore the basic categories of machine learning problems, related topics such as feature selection and model validation, and the application of these tools to real-world data sets.
Diving deeper into Machine Learning with Scikit-learn, 3:15:13
This tutorial session is an hands-on workshop on applied Machine Learning with the scikit-learn library. We will dive deeper into scikit-learn model evaluation and automated parameter tuning. We will also study how to scale text classification models for sentiment analysis or spam detection and use IPython.parallel to leverage multi-CPU or ad-hoc cloud clusters.