Taylor Whitten Brown

Sociology PhD Candidate, Duke University

I am a PhD candidate in the Department of Sociology at Duke University (2nd year), with an MA in sociology from the University of North Carolina, Chapel Hill and an MSc in evidence-based social intervention from the University of Oxford. I am currently away from my department, being hosted by INCITE at Columbia University while I conduct research in NYC.

I study how group inequalities persist within markets despite cultural change. Specifically, I focus on gender inequality and the role of status systems in sustaining disparity. I'm also very involved with the discipline of computational social science and its application to social media studies. My MSc focused on the design and evaluation of randomized-controlled trials (@DSPI_Oxford), and my MA focused on community norm consensus and its link to individual behavior--specifically intimate partner violence.

Before starting my PhD, I fulfilled an appointment at the National Science Foundation in Washington DC. I have also worked in international development, completed internships in Italy and in Ghana, and served in leadership positions for a number of women's organizations.  

 

On a personal note, I enjoy witnessing and creating art (see About Me).

The point of this page is to provide information on the statistics and programming resources that I wish I would have know about when I was a first year. It's a work in progress, so recommendations are always welcome. This is also a great resource for data science tutorials.

R


I taught myself R in the first year of graduate school (if I can, you can). In many ways it is my preferred statistical language. A primary reason for this is that the software is free. I also like R because its syntax resides somewhere between Python and STATA (though more experienced programmers would rightfully link it to C), making it easy for me to shift back and forth as necessary. And to be honest, I've always just admired the minimalism of simply calling something R.

A downside to R is that it generally takes more typing to process a command than you might experience in STATA--but the difference is minimal and you'll get over it quickly. R does have a less robust and centralized corpus of help material for social scientists compared to STATA. With that said, I can generally find what I need in a short amount of time.

Installation

- Install R (it's free!)
- Install R Studio


Getting Started

- Basics of R
- Quick R
- Try R Tutorial
- R for Data Science
- Tip and Tricks
- More tips and tricks
- Cross Validated: Resources galore
- R for STATA Users
- Intro Stats with R
- R Markdown
        Worth learning! (I use knitr)

Time to Play

- ggplot2
- Tidy Text Mining
- Basic Text Mining
- Introduction to parallel computing
- R and fsQCA intoduction
- Data Mining
- Shiny (for apps)
- pdfTools (scrape texts from pdf)
- Spatial visualizations
- Intro to Github for R users
- Resources of Bayesian Stats




PYTHON


If you are in the Triangle Area of NC, join our Python in Social Science group! Email me for more details.

I have a lot of fun with Python. I'm far from expert, but stack overflow is a goldmine of indiosyncratic help for just about every challenge you might face in Python (or any other coding language, for that matter). Python also has some of the best tutorials online, and iPython Notebook (now Jupyter) is one of the cleanest interfaces for creating lectures and interactive blogposts.

If you are just starting out, you should know that Python comes with a standard library of modules/packages (just like R), but most of the modules you'll need will have to be imported. This is why I encourage installing Anaconda (see below). As a sociologist, the modules that I first learned and use most often include: numpy, pandas, scikit-learn, beautifulsoup, and matplotlib (I provide more detail below). It could also be useful to familiarize yourself with GitHub; I found this tutioral fun and useful for doing so...though there are many other options.
Installation
- Install Python
        Not necessary for Mac users.
- Install Jupyter Notebook

Jupyter Notebook (previously iPython) is a web-based interactive environment where you can excute and combine code with text, stats, plots, and more. Here are some tips & tricks.


- Install Anaconda
        Makes installing packages much easier.

Getting Started

- Our big list of Tutorials (made with Neal Caren)
- Codeacademy's Python Course
- Coursera's Data Science Courses
- Python Crash Course for Scientists
- Python for Data Science
- Pandas (here are short and long tutorials, video tutorials, recipe books, and etc...)
- Plotting data

Generally people use the matplotlib package, but there are other options (see here for plottingtimeseries data). There is also package to use R's ggplot in Python. Also, reproducing plots for publication.

General Data Analysis

-Brian Keegan's Analysis of Violence in Chicago
        Some nice graph examples included.


APIs & Scraping Websites

- Coursera's API courses
        Scroll to the bottom of the page
- Twitter's API
- Neal Caren's Webscraping Tutorial
- Beautifulsoup and a tutorial

Text Analysis

- Intro to Text Analysis
- Intro to Latent Dirichlet Allocation (LDA)

LDA is a common type of topic modelling. Here is a lecture by David Blei, providing more detail.


- NLTK
- Scikit-learn Cookbook
- Intro to text comparison
- Python for Digital Humanities
- TextBlob
- Fuzzy matching texts


OTHER TOOLS


Here are some other useful apps, libraries, etc. that might come in handy.

- Need to scrape tables from pdfs? (Need to scrape pdf texts? See 'pdftools' in the R section above)
- Learn VIM for text editing