Introduction
The dataset I will be using ('Guber1999data.csv') includes average SAT (Scholastic Aptitude Test) scores by U.S. state. These scores are average Verbal scores, average Math scores, and average total scores. The dataset also includes some variables for each state that might impact the average SAT score. These variables include the amount of money spent on students, the student-teacher ratio, the salary of teachers, and the percent of students who take the SAT.
I will be utilizing this SAT scores data to visualize the average SAT scores for the 50 U.S. states. I will also work to determine what variables (money spent on students, student-teacher ratio, the salary of teachers, and percent of students who take the SAT) can be used to help predict a state's average total SAT score based on this data.
To help determine the effects of the many variables on a state's average total SAT score I utilized a linear model. The dependent variable was the average total SAT score (SATT) and the independent variables were the amount spent (Spend), the student-teacher ratio (StuTeaRat), the salary of the teachers (Salary), and the percent of students who actually take the SAT exam (PrcntTake).
Based on the results of the model (which can be seen on the right), the percent of students who take the SAT is the most significant predictor for the average total SAT score. According to this linear model, a 1 unit increase in the percent of students who take the exam will decrease the average total SAT score by 2.904 points. This means that there is a negative relationship between the average total SAT score and the percent of students who take the SAT. Despite the other independent variables not being as significant, the model does show that the student-teacher ratio has a negative relationship with the average total SAT score while both the amount spent and teacher's salary have a positive relationship with the average total SAT score.
Conclusions
Average SAT scores vary in the United States from state to state, however, we can utilize data visualization to easily see which states have high average SAT scores and which have low average SAT scores. We can also visualize the percentage of students in the U.S. who actually take the SAT in order to determine which states have a large percentage of students taking the exam as having a large percentage of students taking the exam could impact the states’ average SAT scores. After running a linear model, it was determined that the percent of students in a state that take the SAT exam is a significant predictor for that state’s average total SAT score, while the other variables (amount of money spent on students, student-teacher ratio, and teach salary) are not significant predictors. The substantive effects plots further confirm the findings of the linear model and show the negative relationship between the percent of students that take the SAT and the average total SAT score. We can also see this on the maps. North Dakota is the only state with an average total SAT score above 1100, however, only 1-15% of students in the state actually took the exam. Similarly, South Carolina has an average total SAT score between 801 and 850, but had 46-60% of students take the SAT. Additionally, Massachusetts and Connecticut had 75-90% of students take the SAT, however, both states had average total SAT scores between 901 and 950. Overall, a state’s average total SAT score is related to the percent of students in that state that take the SAT. This may be because states with a lower percentage of students taking the SAT have a majority of students getting higher scores which would therefore lead to an overall higher average total SAT score for the state. States with a large percentage of students taking the exam have more potential for students to bring down the average SAT score of that state.