The point of this page is to provide information on the statistics and programming resources that I wish I would have know about when I was a first year. It's a work in progress, so recommendations are always welcome. This is also a great resource for data science tutorials.
I taught myself R in the first year of graduate school (if I can, you can). In many ways it is my preferred statistical language. A primary reason for this is that the software is free. I also like R because its syntax resides somewhere between Python and STATA (though more experienced programmers would rightfully link it to C), making it easy for me to shift back and forth as necessary. And to be honest, I've always just admired the minimalism of simply calling something R.
A downside to R is that it generally takes more typing to process a command than you might experience in STATA--but the difference is minimal and you'll get over it quickly. R does have a less robust and centralized corpus of help material for social scientists compared to STATA. With that said, I can generally find what I need in a short amount of time.
Time to Play
- Basics of R
- Quick R
- Try R Tutorial
- R for Data Science
- Tip and Tricks
- More tips and tricks
- Cross Validated: Resources galore
- R for STATA Users
- Intro Stats with R
- R Markdown
Worth learning! (I use knitr)
- Tidy Text Mining
- Basic Text Mining
- Introduction to parallel computing
- R and fsQCA intoduction
- Data Mining
- Shiny (for apps)
- pdfTools (scrape texts from pdf)
- Spatial visualizations
- Intro to Github for R users
- Resources of Bayesian Stats
If you are in the Triangle Area of NC, join our Python in Social Science group! Email me for more details.
I have a lot of fun with Python. I'm far from expert, but stack overflow is a goldmine of indiosyncratic help for just about every challenge you might face in Python (or any other coding language, for that matter). Python also has some of the best tutorials online, and iPython Notebook (now Jupyter) is one of the cleanest interfaces for creating lectures and interactive blogposts.
If you are just starting out, you should know that Python comes with a standard library of modules/packages (just like R), but most of the modules you'll need will have to be imported. This is why I encourage installing Anaconda (see below). As a sociologist, the modules that I first learned and use most often include: numpy, pandas, scikit-learn, beautifulsoup, and matplotlib (I provide more detail below). It could also be useful to familiarize yourself with GitHub; I found this tutioral fun and useful for doing so...though there are many other options.
- Install Python
Not necessary for Mac users.
- Install Jupyter Notebook
Jupyter Notebook (previously iPython) is a web-based interactive environment where you can excute and combine code with text, stats, plots, and more. Here are some tips & tricks.
- Install Anaconda
Makes installing packages much easier.
General Data Analysis
- Our big list of Tutorials (made with Neal Caren)
- Codeacademy's Python Course
- Coursera's Data Science Courses
- Python Crash Course for Scientists
- Python for Data Science
- Pandas (here are short and long tutorials, video tutorials, recipe books, and etc...)
- Plotting data
Generally people use the matplotlib package, but there are other options (see here for plottingtimeseries data). There is also package to use R's ggplot in Python. Also, reproducing plots for publication.
APIs & Scraping Websites
-Brian Keegan's Analysis of Violence in Chicago
Some nice graph examples included.
LDA is a common type of topic modelling. Here is a lecture by David Blei, providing more detail.
- Scikit-learn Cookbook
- Intro to text comparison
- Python for Digital Humanities
- Fuzzy matching texts