DATA SCIENCE NIGHTS @NORTHWESTERN
  • Coming up
  • What are data science nights?
  • Archives
  • Contact

Getting started with data science
​

 This is a list of resources we've found helpful for getting started with data science, including
  • installing useful software
  • guides for learning the basics
​Let us know if you have suggestions for additions!

Software installation
​

This covers:
  • R and RStudio
  • Python 3.6 (Anaconda 5.0.1)
  • Git  

R (popular statistical/data science programming language) and RStudio (powerful/convenient tool for writing and testing R scripts)
​Go to this page and follow steps to install both R and RStudio. Instructions for both Windows and Mac are included.
Some notes:
  • When downloading R on Windows, note that you need to run the installer with administrative privileges (navigate to where you downloaded the .exe file, right click on the file, and select “Run As Administrator”)
  • When downloading R on Mac, be sure to choose the package suitable for your OS X version. For now, you may ignore the messages to install Clang (C compiler), GNU Fortran (fortran compiler), and XQuartz. However, these may be useful to you in the future to build R packages from source code.
  • When downloading R, the choice of mirror doesn’t really matter (just pick one in the US)
  • When downloading RStudio, choose the free version
  • Ignore the SDSFoundations sections at the end of the document  

Python (popular programming language) installation using Anaconda (package manager)
The Anaconda installation of Python includes several useful tools for Python programming. It also provides Anaconda Navigator, a graphical tool to search through all the utilities in Anaconda, which you can play around with after installation.

Detailed instructions for installing anaconda for either Windows or Mac can be found here.

Notes:
  • The Mac instructions provide information for either a graphical installation or a command line installation. Just choose one set of instructions. The graphical installation is more intuitive.
  • Python, and thus Anaconda, can come in two different - partially incompatible - flavors: Python 3 and Python 2. For the Data Science Nights we strongly encourage usage of the future-proof Python 3 (e.g.: Python 3.6).

git (version control software)
Instructions taken from here.

​Mac:
  • Check to see if git is already installed by opening Terminal and typing in git --version.
  • If installation is needed, download git from here

Windows:
  • Download from here and run the .exe file. When given choices during the installation process, all default options should be fine.

git is the name of the software that manages version control, and GitHub is a website where code can be stored and shared using git. To use it, you’ll need to sign up for a (free) account.

Also, git can be used through the command line or through a graphical user interface, GitHub Desktop.

GitHub Desktop (optional GUI for git/github)
  • Install from here for Mac or Windows​​

Learning resouRces
​

NICO's Intro to programming for big data (python): https://github.com/amarallab/Introduction-to-Python-Programming-and-Data-Science

University of Edinburgh's coding club has a set of tutorials in github, markdown, python, data visualization, and more
​
Generally, Northwestern students, staff, and faculty have access to:
  • lynda courses at lynda.com (see here for access instructions)
  • data camp R and python courses (see here for access instructions)

​Learn git with a tutorial or the git documentation
Powered by Create your own unique website with customizable templates.
  • Coming up
  • What are data science nights?
  • Archives
  • Contact