Since the beginning of this summer I have been practicing a lot with Scikit-learn to improve my knowledge of Machine Learning both on theory and practice. Last week I also tried to tackle some of the Kaggle competitions although they are really tough if you wish to get into the top 50 best scores. Perhaps I will post something about the experience in the future.
Scikit-learn is a (almost) ready to use package for Machine Learning in Python. It is so very user friendly and in some cases not that much coding around is needed to achieve interesting results such as in the case of the cars dataset.
The cars dataset, from the UCI Machine Learning Repository, is a collection of about 1700 entries of cars each with 6 features that can be easily recognized by the name (buying, maint, doors, persons, lug_boot, safety). Check the dataset description for more detailed information. The feature to be predicted is “class” and the possible values are unacc,acc,good,v-good. Most of the features are categorical, therefore they need to be encoded into numbers. Pandas is great for quick features encoding.