Practice Problem Set 5A: Strings

  1. Run the following code to load the data. View the codebook and other information on the on the data here.

  2. Separate the project title into the main title and sub-title.

  3. Create a subset of the data with projects that have the word STEM in their title.

  4. Count the number of grants by directorate. Do you see anything odd?

  5. Clean up the directorate variable and create the table again. Arrange the table in descending order.

  6. Count the length of the org_zip variable. Look at the distribution of the length.

  7. Split the org_zip variable into into two variables. First containing the 5 number zip code and the second containing the add-on 4 number code.

  8. Challenge problem: Create a graph identifying the most commonly occurring word across the abstracts from grants that are on Ted Cruz’s list.