stuff not in lec notes:
Data Exploration Best Practices
- open in text editor to inspect
- check for:
- comments at beginning or end
- column separators (tabs, semicolons, commas)
- special missing value codes
- expected number of rows
eg. crab.txt has 174 lines, expect 173 rows (minus header), verify with nrow(data) == 173
Reading FIve-Number Summary
- distance Q1 to median
- distance median to Q3
- if equal: symmetric in the middle
- distance min to Q1
- distance Q3 to max
- if unequal: skewed outside the middle :LiLightbulb: if mean smaller than median → skewed left