Last weekend, my friend Ian and I were talking, and we got to the topic of trying to find data online to fact check newspapers. As newspapers get squeezed due to readership moving online, they have less money to do in-depth research. We hear more stories about Britney Spears and her ilk not just because of her hot pants, but because she's cheaper to cover. Sometimes, when newspapers put out something, we find out later that their facts weren't exactly right.
One of things he wanted to fact-check was the national debt, and how it related to our politicians. I imagine he was incensed at the state of financial affairs and wanted to know who were the fools that did it. He wanted to know what the senate majority was plotted against the national debt.
After that, I took some time in the last couple of days to find the data, and then write a ruby script to scrape it and plot it on a javascript webpage. Here's the resulting graph:
The first surprising thing was how fast the national debt has grown in the last 30 years. It's been an exponential growth. I think that while increased spending have happened, the interest on the debt has a significant effect on that growth.
The second surprising thing was how much democrats dominated the era between 1930 and 1980, with a sliver of Republican majority in the 50's. And before that, it was dominated by the Republicans. Apparently, political ideology shifts back and forth.
One of the things I have to note about the graph is that the first set of Red in the early 1800's is not the modern Republican party we know today. It was another party of the same name, I think also called Jeffersonian-Republicans. The gaps before and after were times when there were no Democrats or Republicans. There were Pro-administration or Anti-administration, or Pro-Jackson, Anti-Jackson, or Whigs and others. In addition, there weren't 100 seats in the Senate at the very beginning, so we see some instances where Senate Majority wasn't more than 50 seats.
Oh, and because I'm lazy, I didn't label the axis. The left y-axis is the # of senate seats held by the majority party. The right y-axis is the national debt in dollars.
We can see from the graph that the explosive rise in national debt occurred in the last two or three decades. In addition, both parties had Senate majority at the time. Not only that, but the Senate Majority party only had a slight majority, which meant that it could tip in favor of the other party from congress to congress.
Seeing how it was exponential, I plotted it as a log-plot. Ian quipped that "it's terrifying that it makes sense to plot the national debt in log scale."
You can see more details here. Remember, a line in a log plot means exponential growth. We can see that there are times in history that the US debt dropped or rose at a significantly rapid rate. I was surprised to see that in the mid 1830's, it looked like the US cleared itself of its debt. I don't know enough about history to know whether the US just defaulted or it paid the debt back. The significant rises in debt seem to correlate with the major wars. 1860's for the US Civil War, the 1915's for WWI and 1940's for WWII.
Now the last graph is not for the faint of heart. It's a graph of the rate of change of the national debt on a log scale.
One of things he wanted to fact-check was the national debt, and how it related to our politicians. I imagine he was incensed at the state of financial affairs and wanted to know who were the fools that did it. He wanted to know what the senate majority was plotted against the national debt.
After that, I took some time in the last couple of days to find the data, and then write a ruby script to scrape it and plot it on a javascript webpage. Here's the resulting graph:
The first surprising thing was how fast the national debt has grown in the last 30 years. It's been an exponential growth. I think that while increased spending have happened, the interest on the debt has a significant effect on that growth.
The second surprising thing was how much democrats dominated the era between 1930 and 1980, with a sliver of Republican majority in the 50's. And before that, it was dominated by the Republicans. Apparently, political ideology shifts back and forth.
One of the things I have to note about the graph is that the first set of Red in the early 1800's is not the modern Republican party we know today. It was another party of the same name, I think also called Jeffersonian-Republicans. The gaps before and after were times when there were no Democrats or Republicans. There were Pro-administration or Anti-administration, or Pro-Jackson, Anti-Jackson, or Whigs and others. In addition, there weren't 100 seats in the Senate at the very beginning, so we see some instances where Senate Majority wasn't more than 50 seats.
Oh, and because I'm lazy, I didn't label the axis. The left y-axis is the # of senate seats held by the majority party. The right y-axis is the national debt in dollars.
We can see from the graph that the explosive rise in national debt occurred in the last two or three decades. In addition, both parties had Senate majority at the time. Not only that, but the Senate Majority party only had a slight majority, which meant that it could tip in favor of the other party from congress to congress.
Seeing how it was exponential, I plotted it as a log-plot. Ian quipped that "it's terrifying that it makes sense to plot the national debt in log scale."
You can see more details here. Remember, a line in a log plot means exponential growth. We can see that there are times in history that the US debt dropped or rose at a significantly rapid rate. I was surprised to see that in the mid 1830's, it looked like the US cleared itself of its debt. I don't know enough about history to know whether the US just defaulted or it paid the debt back. The significant rises in debt seem to correlate with the major wars. 1860's for the US Civil War, the 1915's for WWI and 1940's for WWII.
Now the last graph is not for the faint of heart. It's a graph of the rate of change of the national debt on a log scale.
Because the changes have been so dramatic in the last couple of decades, it dwarfs any changes in earlier periods on a linear scale. Here, we can see that for most of the history of the US, the debt change fluctuated up and down. There were periods of lots of spending, but then also periods of cutting back. Sometimes, one party was responsible for doing that, and sometimes, it was another. But in the recent years, we've just been accruing debt. See that little dip in the late 90's/early 2000? That was the big savings from the Clinton era. Note that it's a log scale, so that a little dip up high on the scale means that it's huge when it's lower down. If we had the level of debt back in the 1920's, we've almost have cleared it.
So all this has been interesting, but is this what falls under the "all that jazz" category? What does it have to do with the web and programming? I've been thinking about all the public the data that's out there, and how to get to it. The conclusion was that it was pretty damn hard. There were four steps to tell this story to you. I had to find the data, scrape it, clean it, and then graph it. Of the four steps, the hardest part was scraping and cleaning. It took a good 4+ hours to do it, and I'm a programmer. Most other people that were curious enough could use excel, but last I checked, excel didn't do data scraping on web pages. Hello cut and paste.
I think it should be much much easier for citizens in a country where we elect government officials to be informed and see this data for themselves. Before, we had relied on journalists to give me the straight dope on these facts. But, as I mentioned before, the newspapers have been in decline. As a result there's less budget to pay for good reporting watching the government and what it's doing. Beyond watching the government, I expect that people generally have questions that can be best answered by graphs of public data--and those answers aren't just yes or no.
As an example, another friend of mine, Matt, is single and looking for the ladies. However, living in Columbia, MD, it's a tough dating environment--everyone's under 18 or over 40. So if he could move, which counties in Maryland has the highest number of single females?
Do any of you find that you have similar questions that can be answered by public data in graphs?
Data Sources:
http://www.treasurydirect.gov/govt/reports/pd/histdebt/histdebt.htm
http://www.senate.gov/pagelayout/history/one_item_and_teasers/partydiv.htm
So all this has been interesting, but is this what falls under the "all that jazz" category? What does it have to do with the web and programming? I've been thinking about all the public the data that's out there, and how to get to it. The conclusion was that it was pretty damn hard. There were four steps to tell this story to you. I had to find the data, scrape it, clean it, and then graph it. Of the four steps, the hardest part was scraping and cleaning. It took a good 4+ hours to do it, and I'm a programmer. Most other people that were curious enough could use excel, but last I checked, excel didn't do data scraping on web pages. Hello cut and paste.
I think it should be much much easier for citizens in a country where we elect government officials to be informed and see this data for themselves. Before, we had relied on journalists to give me the straight dope on these facts. But, as I mentioned before, the newspapers have been in decline. As a result there's less budget to pay for good reporting watching the government and what it's doing. Beyond watching the government, I expect that people generally have questions that can be best answered by graphs of public data--and those answers aren't just yes or no.
As an example, another friend of mine, Matt, is single and looking for the ladies. However, living in Columbia, MD, it's a tough dating environment--everyone's under 18 or over 40. So if he could move, which counties in Maryland has the highest number of single females?
Do any of you find that you have similar questions that can be answered by public data in graphs?
Data Sources:
http://www.treasurydirect.gov/
http://www.senate.gov/
Believe it or not, Howard county really isn't unusual in terms of population age. Median is 35.5 with about 20% of all residents between 20 and 34 years. I am sure that can't compare with some urban areas, but it is not as bad as some make it out to be. As to the question of where the singles live, the census data is not as forthcoming with that information.
ReplyDeleteAllegany 39.1
Anne Arundel 36.0
Baltimore 37.7
Calvert 35.9
Caroline 37.0
Carroll 36.9
Cecil 35.5
Charles 34.6
Dorchester 40.7
Frederick 35.6
Garrett 38.3
Harford 36.2
Howard 35.5
Kent 41.3
Mongomery 36.8
Prince Georges 33.3
Queen Annes 38.8
St. Marys 34.2
Somerset 36.5
Talbot 43.3
Washington 34.7
Wicomico 35.8
Worcester 43.0
Baltimore City 35.0
http://factfinder.census.gov/servlet/QTTable?_bm=y&-context=qt&-qr_name=DEC_2000_SF1_U_DP1&-ds_name=DEC_2000_SF1_U&-tree_id=4001&-redoLog=true&-all_geo_types=N&-_caller=geoselect&-geo_id=05000US24001&-geo_id=05000US24003&-geo_id=05000US24005&-geo_id=05000US24009&-geo_id=05000US24011&-geo_id=05000US24013&-geo_id=05000US24015&-geo_id=05000US24017&-geo_id=05000US24019&-geo_id=05000US24021&-geo_id=05000US24023&-geo_id=05000US24025&-geo_id=05000US24027&-geo_id=05000US24029&-geo_id=05000US24031&-geo_id=05000US24033&-geo_id=05000US24035&-geo_id=05000US24037&-geo_id=05000US24039&-geo_id=05000US24041&-geo_id=05000US24043&-geo_id=05000US24045&-geo_id=05000US24047&-geo_id=05000US24510&-search_results=01000US&-format=&-_lang=en
Haha, awesome. I haven't had a chance to grab it for Howard county, but I'll be sure to tell Angert.
ReplyDelete