From Fashionista to Pythonista
Chloé Van Vreckem
Ladies, forget about high heels. Big Data is Hot!
We all heard it : “The sexiest job in the next 10 years will be statistician” (Hal Varian, chief economist at Google) This has been multiple times reused and translated in a lot of great magazines as :
Data scientist is considered to be the “the Sexiest Job of the 21st Century”.
Therefore the time has come to learn a program language in order to communicate with computers to handle and manipulate “big” data and to extract valuable customer insights (for our customer companies).
Python is one of those program languages, if you can speak Python you are known as a Pythonista. To become a renomated Pythonista the following steps are recommended:
1. BASICS AND LIBRARIES: “The little black dress”
Like the mandatory little black dress, Pythonistas have their basic rules and libraries.
As in any new language, you need to know the vocabulary and the grammar first. Words should be spelled correctly and be constructed in correct “sentences”. You find a lot of free tutorials to set up your computer and to learn the basics of Python on the internet.
Key to Python’s usefulness is its simplicity. By adding extra existing modules and packages, the possibilities in data handling, data analysis and data modelling become endless. Must-install libraries are NumPy, Pandas, Matplotlib and Scikit-learn. NumPy (Numerical Python) to perform basic linear algebra functions, Pandas for structured data manipulations and thus making data ready for modelling and/or machine learning, Matplotlib for plotting a variety of graphs to visualize the data and Scikit-learn for actually learning from features created in Pandas.
Personally, I really like the blogs written by analytical experts here: www.analyticsvidhya.com. “A complete tutorial to learn data science with Python from scratch” is an excellent starting point for your first exploration. Gradually you can learn more by clicking on the recommended links on those topics you want to investigate more.
2. BUILDING BLOCKS : “The clothing patterns”
Don’t start “draping” the clothes, make your patterns first.
It is ok that you are not experienced in the beginning, we don’t start with “drapping clothes” or in case of data science with difficult techniques to handle data. Start with hardcode, write a story in structured building blocks of code, and add a lot of comments with information on what the code is actually executing. Writing your own as short as possible functions to reuse later is more sophisticated though, but not obliged from the beginning.
A script is always made up out of building blocks. Start small and extent gradually, and make sure that every block is executed properly.
3. REUSAGE OF CODE AND SCRIPTS: “The culture of green clothing”
Recycle Python code like you reuse fabrics and accessories.
You don’t have to build everything from scratch. It is not a crime to reuse some of your former building blocks in your new collection of commands. As an example, we have worked out a small exercise.
We did a summer campaign for an international distributor of hair and beauty products. The newest beauty trend is extreme colourful mascara. In stores, a fancy display with test products was installed. We compared a control group (seeing only this display) with two test groups. One group received a discount voucher and one group received the same discount voucher combined with a free sample of our newest purple mascara. The target group were female customers that bought skincare in store. The objective is to assess the impact of discount vouchers and test samples on the buying behaviour (whether the customer buys the trend or not) within next month. And to identify the profile of the customer that is interested in this trend. To receive the dataset and Python code, you can click here. Feel free to copy-paste and execute the code in your own Python Interpreter and to recycle this code on your own dataset.
4. GET INSPIRATION AND EXPERIENCE: “The fashion magazines”
The latest examples in Python blogs and conferences are the latest trends in fashion magazines.
Use ideas (building blocks) from other program designers and incorporate this into your own design. Online you can find a lot of well-explained hands-on examples of Python-code and blogs/communities where specialists are sharing their challenges and solutions with you. You can grow your program-skills by learning from their experience and by experimenting yourself with open-source datasets and code. Two popular examples of open-source communities where you can learn to do data science are GitHub and Kaggle.Use ideas (building blocks) from other program designers and incorporate this into your own design. Online you can find a lot of well-explained hands-on examples of Python-code and blogs/communities where specialists are sharing their challenges and solutions with you. You can grow your program-skills by learning from their experience and by experimenting yourself with open-source datasets and code. Two popular examples of open-source communities where you can learn to do data science are GitHub and Kaggle.
5. DESIGN AND BE THE TREND: “Be avant-garde”
Building useful, elegant and clever programs for others to use is a very creative and valuable activity.
Once you have found your muse and turned yourself into a Phytonista who is skilled in the art of programming, you will be able to look at data or an information problem and develop a program to solve the problem.
By doing public program challenges (eg. organized by Kaggle), you even can win some money! However, once you know how to program, you will find it a pleasant, intuitive and creative process, so money won’t be your driving force.
On the other hand… it may get in handy, so you can buy those must-have high heel shoes that were offered to you in a personalized ad by your favourite online store.