Yup. (A much delayed post.) I ran in to this problem working with various tree classifier packages on a course project. It’s one of those problems which causes errors with messages that don’t really seem to point to a label punctuation issue.
Visualization Links
These are cut’n’pasted from my Coursera Computaional Investing class forum. Not yet examined, just here to follow up on.
I don’t know what Prof. Tucker chose for his company but highstock (http://www.highcharts.com/stock/demo/) is in my opinion the best around (free for non commercial products). In any case here is a list of other good visualization libraries:
http://g.raphaeljs.com/
http://www.jqplot.com/
http://plugins.jquery.com/project/gchart
http://vis.stanford.edu/protovis/
http://polymaps.org/
http://code.google.com/p/flot/
Machine Learning is coming
Are you prepared?
Take a look at Everything You Wanted to Know About Machine Learning, But Were Too Afraid To Ask, that’s a good place to start. It is an article on the big ml blog about an article (meta article?). Charles Parker summarizes Pedro Domingos’ short paper A Few Useful Things to Know about Machine Learning, which is what to read next.
I’m not real sure about this next one: A First Encounter with Machine Learning which is a 93-page pdf from Max Welling. The first paragraph of the preface explains the focus of the book. This may be an early version though; there are “??” references to figures and the bibliography is empty. Be warned, very “mathy”.
Prepared? Prepared for what? Yeah, this is where we’re going. Andrew Ng’s 10-week Machine Learning class starts on Coursera April 22. Lots of good reviews for this online.
Not yet scheduled but also very interesting looking is Geoffrey Hinton’s Neural Networks for Machine Learning, also on Coursera. Hinton was recently swallowed up by Google, but maybe not entirely. Let’s hope Coursera runs this again this year.
Tons more links to follow are in the Stack Overflow post Overwhelmed by Machine Learning—is there an ML101 book?
Free Statistics Textbook
SAS competitor StatSoft has a free version of their monster statistics textbook available online free. It covers lots of ground wide rather than deep. I see it as more of a reference guide than a class textbook that would have derivations and proofs. So in that sense it’s more useful than a textbook. The online book is not exceedingly obvious to find from the company’s home page so follow the link here to go directly to the book.
DaVinci Data Detectives
It all started last December with Richard Hackathorn’s post to the discussion boards of the Boulder/Denver BigData and Data Science and Business Analytics Meetups proposing a study group for an upcoming online data analysis class.
Are you a data detective? Love to discover the secrets hidden deep within data? …Do something useful with it? Starting January 22 for 8 weeks, Coursera is offering “Data Analysis” taught by Jeff Leek. Using the R statistical language, this course is a practical hands-dirty introduction to crucial statistics, like linear regression, principal components analysis, cross-validation, and p-values. Further it’s FREE! Watch the one-minute video at http://www.coursera.o….
Taking a Coursera course is a lonely venture! Join this local study group, where the goal is for all of us to earn that certificate, becoming certified data detectives! Starting on January 24, we will meet on Thursdays over lunch, 11:30 to 1:00, at the Vault of the DaVinci Institute http://www.davinciins… 511 E South Boulder Road, Louisville, CO 80027. Pack a brown bag, stop at Subway, or pre-order from Ralphie’s Tavern (next door). Join us for the fun (and a bit of hard work)!
Richard is right, online courses can be very lonely places. Course discussion forums and wiki pages help, but are no substitute for meeting your classmates in the real world. The study group has been a great success. I learned of Roger Peng’s Computing for Data Analysis course through the group — it was a great way to learn programming in R. The weekly study sessions have helped to clear up confusions and stay motivated. One of our members is now using R and some of the techniques from this class in their day job for a task formerly done with Excel.
Data Detectives ends on 3/14 with our 8th session and final week of the Data Analysis class. For many of us the data science learning won’t stop there though. We have Andrew Ng’s Machine Learning and Bill Howe’s Introduction to Data Science coming up soon.
Cowabunga! Data Science Power! (can we get a pizza?)
O’Reilly Webcast: Deep Learning
Jeremy Howard was the guest on O’Reilly’s 3/5 webcast. The focus was on how deep learning techniques are being applied in Kaggle competitions. The highlighted competition was won by a team with no knowledge of the problem domain and used what seemed to be explained as a deeper than usual neural network approach. The team was lead by Geoffrey Hinton, who happens to teach Coursera’s Neural Networks for Machine Learning. This class was offered in 2012 but is not yet scheduled to run in 2013. Hmmm.
I found only a description of this talk in O’Reilly’s webcast archive but no link for playback like many of their other webcasts. They did however send me a URL for playback (http://event.on24.com/r.htm?e=
Evolution of Regression Modeling
Part 1 of this series from Salford Systems was held on 3/1. This was the first of a four part series, each being two weeks apart. Sessions 1 and 3 are lecture format, 2 and 4 will be hands-on with Salford’s modeling software. They offer a free 10 day trial of the software and promise an additional 30 days on request. So time the download right and this can cover both hands-on sessions. As a data analysis noob, this series gives a great view of what comes beyond the basics. Links to the video and slides for this and other Salford webinars can be found here.