RSS

Circos Data Visualization How-to Book

Earlier this year we have looked at a powerful data visualization tool called Circos developed by Martin Krzywinski from the British Columbia Genome Science Center. The previous post looked at an example of how this tool can be used to show complex connectivity pathways in the human neocortex, so-called Connectograms.

Circos Book Cover

The Circos tool can be used interactively on the above website. In that mode you upload jobs via tabular data- and configuration-files and have some limited control over the rendering of the resulting charts. For full expressive power and flexibility, Circos can also be downloaded freely and used on your computer for rendering with extensive customization control over the resulting charts.

I have been asked to review a new book titled “Circos Data Visualization How-to“, published by Packt Publishing here. It’s main goal is to guide through the above download + installation process and get you started with Circos charts and their modification. Here is a brief review of this book.

Although originally developed for visualizing genomic data, Circos has been applied to many other complex data visualization projects, incl. social sciences. One such study was done by Tom Schenk, who analyzed the relationships between college majors and the professions those graduates ended up in. It appears as if this work inspired the author to write this book to help others with using Circos.

I downloaded the book in Kindle format and read it on the Mac due to the color graphics and the much larger screen size. It’s well structured and around 70 pages in printed form. The book focuses first on the download and install part, then has a series of examples from first chart to more complex ones using customization such as colors, ribbons, heat maps or dynamic binding.

Flow Chart for creation of Circos charts

Flow Chart for creation of Circos charts

Circos is essentially a set of Perl modules combined with the GD graphics library.

The first part is on Installing Circos, with a chapter each on Windows 7 and on Linux or Mac OS. Working on MAC I went the latter route. I ended up right in the weeds and it took me about 4 hours to get everything installed and working. The description is derived from a Linux install and is generally somewhat terse. It assumes you have all prerequisite tools installed on your Mac or at least that you are savvy enough to figure out what’s missing and where to get it. I had to dust off some of my Unix skills and go hunting for solutions via Google to a list of install problems:

  • directory permissions (I needed to warp the exact instructions with sudo)
  • installing Xcode tools from Apple for my platform (make was not preinstalled)
  • understanding cause of error messages (Google searches, Google group on Circos)
  • locating and installing the GD graphics library (helpful installing-circos-on-os-x tips by Paulo Nuin)
  • version and location issues (many libraries are in ongoing development; some sources have moved)

Others may find this part a lot easier, but I would say there should be an extra chapter for the Mac with tips and explanations to some of these speed bumps. On the plus side, the Google group seems to be very active and I found frequent and recent answers by Circos author Martin Krzywinski.

The next part of the book is easy to understand. One creates a simple hair-to-eye color relationship diagram. Then configuration files are introduced to customize colors and chart appearance. All required data and configuration files are also contained in the companion download from the Packt Publishing book page.

Chart of relationship between hair and eye colors

Chart of relationship between hair and eye colors

The last part of the book goes into more advanced topics such as customizing labels, links and ribbons, formatting links with rules, reducing links through bundling, and adding data tracks as heat maps or histograms. This is the meat for those who intend to use Circos in more advanced ways. I did not spend a lot of time here, but found the examples to be useful.

Contributions by State and Political party during 2012 U.S. Presidential Elections

Contributions by State and Political party during 2012 U.S. Presidential Elections

This section ends abruptly. One gets the feel that there are other subtleties that could be explored and explained. A summary or outlook chapter would have been nice to wrap up the book and give perspective. For example, I would have liked to hear from the author how much time he spent with various features during the college major to professions project.

In summary: This book will get you going with Circos on your own machine. Installing can be a challenge on Mac, depending on how familiar you are with Unix and the open source tool stack. The examples for your first Circos charts are easy to follow and explain data and configuration files. The more advanced features are briefly touched upon, but require more experimentation and time to understand and appreciate.
Circos author Martin Krzywinski writes on his website: “To get your feet wet and hands dirty, download Circos and a read the tutorials, or dive into a full course on Circos.” The How-to book by Tom Schenk helps with this process, but you still need to come prepared. If you are a Unix power user this should feel familiar. If you are a Mac user who rarely ever opens a Terminal then you might be better off just using Circos via the tableviewer web interface.
Lastly, I would recommend buying the electronic version of this book, as you can cut & paste the code, leverage the companion code and documents. A printed version of this book would be of very limited use.

 
1 Comment

Posted by on December 6, 2012 in Education, Scientific

 

Tags: , , ,

2012 Election Result Maps

2012 Election Result Maps

The New York Times has covered the 2012 U.S. presidential election in great detail, including the much heralded fivethirtyeight Blog (after the 538 electoral votes) by forecaster Nate Silver. His poll-aggregation model has consistently produced the most accurate forecasts, and called 99 of 100 states correctly in both the 2008 and the 2012 elections.

A popular visualization is the map of the 50 states in colors red (Republican) and blue (Democrat) plus green (Independent). Since most states allocate all their electoral votes to the candidate with the most votes in that state, this state map seems the most important.

2012 Election Result By State (Source: NYTimes.com)

This map hardly changed from 2008, only Indiana and North Carolina changed color. Hence the electoral vote result in 2012 (332 Dem206 Rep)  is similar to that of 2008 (365 Dem173 Rep). The visual perception of this map, however, is that there is roughly the same amount of red and blue, with slightly more red than blue. This perception becomes even stronger when looking at the results by county.

2012 Election Results By County (Source: NYTimes.com)

Why is the outcome so strongly in favor of the blue (Democrat) when it looks like the majority of the area is red? The answer is found in very uneven population density of the 50 states. Although roughly the same size, California’s (slightly more blue) population density is about 40x higher than Montana’s (mostly red). On the extreme end of this scale, the most densely populated state New Jersey has about 1000x as many people living per square mile as the least densely populated state Alaska. Urban areas have a much higher density of voters than rural areas. The different demographics are such that urban areas tend to vote more blue (Democrat), rural areas tend to vote more red (Republican). The size of the colored area in the above chart would only be a good indicator if the population density was uniform. A great way to compensate visually for this difference can be seen in the third chart published by the NYTimes.

2012 Election Delta By County (Source: NYTimes.com)

Now the size of the colored circles is proportional to the number of surplus votes for that color in that county. The few blue circles around most major cities are larger and outweigh the many small red circles in rural areas – both optically intuitive and numerically in total. The original map is interactive, giving tooltips when you hover over the circles. For example, in just Los Angeles county there were about 1 million more blue (Democrat) votes than red (Republican).

2012 Election in Los Angeles County

This optical summation leads to intuitively correct results for the popular votes. The difference in popular vote was about 3.5 million more blue (Democrat) votes or roughly 3%. We see more blue in this delta circle diagram.

Of course, the president is not elected by the popular, but by the electoral votes per state. So no matter how big the Democrat advantage in California may be, there won’t be more than the 55 electoral votes for California. This winner-take-all dynamic of electoral votes by state leads to the outsized influence of swing states which are near the 50%-50% mark on the popular votes. A small lead in the popular vote can lead to a large gain in electoral votes. In extreme cases, a candidate can win the electoral vote and become president despite losing in the popular vote (as happened in 2000 and the very narrow win of Florida by George W. Bush).

Another variation on this theme of visually combining votes and population density information comes from Chris Howard. (This was referenced in an article on theatlanticcities.com by Emily Badger on the spatial divide of urban vs. rural voting preferences which has other election maps as well). The idea is to use shades of blue and red with population density increasing in darker shades of the color, used on a by county map.

2012 Election by county with shading by population density (Source: Chris Howard)

A final visualization comes from Nate Silver’s Blog post on November 8. While the % details of this at the time preliminary result may be slightly off (not all votes had been counted yet), the electoral vote counts remain valid.

2012 Election By State Cumulative (Source: Fivethirtyeight Blog)

It shows which swing state [electoral votes] put the blue ticket over the winning line (Colorado [9]) and which other swing states could have been lost without losing the presidency (Florida [29], Ohio [18], Virginia [13]). It also gives a crude, but somewhat telling indication of where you might want to live if you want to surround yourself by people with blue or red preferences.

 
Leave a comment

Posted by on November 15, 2012 in Socioeconomic

 

Superstorm Sandy – Visualizing Hurricanes

Superstorm Sandy – Visualizing Hurricanes

Time-lapse animation of Sandy Oct-28 from geostationary orbit, 1 frame per minute, 11 hours of daylight. Although “only” a category 1 hurricane, this superstorm has enormous size. Tropical storm force winds extend out over an area 900 miles in diameter.

Living in South Florida makes you alert to tropical storms during hurricane season from May to November. Exactly 7 years ago, at the end of October 2005, the eye of category 3 hurricane Wilma swept over our home in West Palm Beach in South Florida – the most powerful natural weather event I have ever witnessed. After avoiding a direct hit since then, we got a massive rain event from Isaac earlier this year, but again avoided a direct hit. To be sure, often the flooding associated with hurricanes is worse than the wind damage. For example, when hurricane Katrina hit New Orleans in August 2005, most of the devastation came from flooding after the levees were breached. But the first question is always where the storms will make landfall and how strong they are when they hit your area.

Tropical storms are being tracked and forecast in great detail, in particular by the National Hurricane Center of the National Weather Service. There are many great visualizations illustrating the path, windspeed, rainfall, extent of tropical storm force winds, etc. Due to the convenience for browsing, I have almost completely switched to following hurricane or weather updates from the iPad. (In this case I’m using the Hurr Tracker app from EZ Apps.)

Last week a new tropical storm emerged in the Carribean and was named ‘Sandy’. A few days ago with Sandy’s center over the Bahamas, the path looked like this:

Path of hurricane Sandy as of Oct-25 (Hurr Tracker iPad app)

Note the use of color for wind speed and the cone of uncertainty in the lower segment, as well as the rings around the center indicating the size of the area with storm-force winds.

Naturally curious whether South Florida was likely to get hit, another image gave us some relief:

5 Day tracking map for hurricane Sandy

Now a few days later, while we did get some strong northerly winds and pounding surf leading to beach erosion, Sandy was not a particularly disturbing event for South Florida. At the same time, however, Sandy is forecast to make landfall on the Jersey shore within about 24 hours during the night from Monday to Tuesday.

One interesting set of maps with a color code displaying the probability of an area experiencing winds of a certain speed, say at least tropical storm force winds (>= 39 mph). The following map was issued this afternoon and indicates the very large area (mostly offshore) with near 100% probability of exceeding tropical storm force winds in purple.

Tropical storm force wind speed probabilities for hurricane Sandy as of Oct-28

This indicates how large Sandy is – an area the size of Texas with tropical storm force winds! Meteorologists are concerned for the Northeast due to Sandy converging with two other weather events, a storm from the West and cold air coming down from the North. This is expected to intensify the weather system, similar to the Perfect Storm of 1991. Due to the timing around Halloween this is why Sandy was also called a ‘Frankenstorm’.

One of the most chilling pictures is this animated GIF from WeatherBELL. A story in the Atlantic earlier today writes this:

Dr. Ryan Maue, a meteorologist at WeatherBELL, put out this animated GIF of the storm’s approach yesterday. “This is unprecedented –absolutely stunning upper-level configuration pinwheeling #Sandy on-shore like ping-pong ball,” he tweeted. It shows how cold air to the north and west of the storm spin Sandy into the mid-atlantic coastline.

(Click the image if the animation doesn’t play in your browser.)

Animation of hurricane Sandy moving into the NorthEast (Source: WeatherBELL)

Understandably this forecast of superstorm Sandy has the authorities worried. The full moon tomorrow exacerbates the tides and New York City is expecting up to 11 ft storm surge. Cities across the Northeast are taking precautions as of this writing. For example, the New York City subway metro transit system is shutting down tonight and several hundred thousand people in low-lying coastal areas are under mandatory evacuation order. More than 5000 flights to the area on Monday have been cancelled. Take a look at the expected 5 day precipitation forecast in the Northeast. Some areas may get up to 10 inches of rain and/or snow!

5 day precipitation forecast with Sandy’s impact for the Northeast

The first priority is to use such visualizations to communicate the weather impact and allow people to take necessary precautions. One can use similar hurricane charts to visualize other uncertain events, such as the future outcomes of development projects. We will look at this in an upcoming post on this Blog.

 

Addendum 11/4/12: The NYTimes has provided some interactive graphics detailing the location and size of power outages caused by superstorm Sandy in the New York and New Jersey area. The New York City outages have been summarized in this chart, normalized to the percentage of all customers. As can be seen, the efforts to restore power over the first 6 days have been fairly successful, especially in Manhattan and Staten Island, less so in Westchester.

6 day tracking map of power outages caused by Sandy in New York City

 
Leave a comment

Posted by on October 28, 2012 in Recreational, Scientific

 

Tags: , ,

Trends in Health Habits across the United States

Trends in Health Habits across the United States

This week Scientific American published an interesting article about trends in health habits across the United States. The article includes both a large composite chart as well as a page with an interactive chart. Both are well done and a great example of using a visualization to help telling a story. I personally find the most useful part of the graphic to be the comparison column on the right with shades of color indicating degree of improvement (blue) or deterioration (red).

US health habits 1995 vs. 2010 (Source: Scientific American)

From the article:

Americans are imbibing alcohol and overeating more yet are smoking less (black lines in center graphs).

Some of the behaviors have patterns; others do not. Obesity is heaviest in the Southeast (2010 maps). Smoking is concentrated there as well. Excess drinking is high in the Northeast.

Comparing 2010 and 1995 figures provides the greatest insight into trends (maps, far right). Heavy drinking has worsened in 47 states, and obesity has expanded in every state. Tobacco use has declined in all states except Oklahoma and West Virginia. The “good” habit, exercise, is up in many places—even in the Southeast, where it has lagged.

A more detailed visual analysis is possible using the interactive version of these graphs on the related subpage Bad Health Habits are on the rise. Here one can compare up to three arbitrary states against top, median, and bottom performing states by health habit.

The following examples show tobacco use, exercise and obesity by state with line charts for the three arbitrarily selected states of Florida, California and Hawaii.

Tobacco Trend By State

Exercise Trend By State

Obesity Trend By State

Leading the exercise statistics are citizens in states offering attractive outdoor sports opportunities, like Oregon or Hawaii. Such correlation seems intuitive in both causal directions: People interested in exercise tend to move to those states with the most attractive outdoor sports. And people living in those states may end up exercising more due to the opportunity.

When looking at the average trend line, exercise seems to have leveled off after a bump in the early 2000′s, whereas the decline in smoking over the last decade continues unabated.

15 years is half a generation. During that time, Americans have in almost every state smoked less, exercised more in many states, but obesity is sharply on the rise in every state! From a health and policy debate the latter seems to be the most alarming trend. Most people want the next generation to be better off than the previous one. This has to some extent been true with wealth, at least until the great recession of 2008. But these data show that at population levels, more wealth is not necessarily more health.

 
Leave a comment

Posted by on October 19, 2012 in Medical

 

Inequality and the World Economy

Inequality and the World Economy

The last edition of The Economist featured a 25-page special report on “The new politics of capitalism and inequality” headlined “True Progressivism“. It is the most recommended and commented story on The Economist this week.

We have looked at various forms of economic inequality on this Blog before, as well as other manifestations (market share, capitalization, online attention) and various ways to measure and visualize inequality (Gini-index). Hence I was curious about any new trends and perhaps ways to visualize global economic inequality. That said, I don’t intend to enter the socio-political debate about the virtues of inequality and (re-)distribution policies.

In the segment titled “For richer, for poorer” The Economist explains.

The level of inequality differs widely around the world. Emerging economies are more unequal than rich ones. Scandinavian countries have the smallest income disparities, with a Gini coefficient for disposable income of around 0.25. At the other end of the spectrum the world’s most unequal, such as South Africa, register Ginis of around 0.6.

Many studies have found that economic inequality has been rising over the last 30 years in many industrial and developing nations around the world. One interesting phenomenon is that while the Gini index of many countries has increased, the Gini index of world inequality has fallen. This is shown in the following image from The Economist.

Global and national inequality levels (Source: The Economist)

This is somewhat non-intuitive. Of course the countries differ widely in terms of population size and level of economic development. At a minimum it means that a measure like the Gini index is not simply additive when aggregated over a collection of countries.

Another interesting chart displays a world map with color coding the changes in inequality of the respective country.

Changes in economic inequality over the last 30 years (Source: The Economist)

It’s a bit difficult to read this map without proper knowledge of the absolute levels of inequality, such as we displayed in the post on Inequality, Lorenz-Curves and Gini-Index. For example, a look at a country like Namibia in South Africa indicates a trend (light-blue) towards less inequality. However, Namibia used to be for many years the country with the world’s largest Gini (1994: 0.7; 2004: 0.63; 2010: 0.58 according to iNamibia) and hence still has much larger inequality than most developed countries.

World Map of national Gini values (Source: Wikipedia)

So global Gini is declining, while in many large industrial countries Gini is rising. One region where regional Gini is declining as well is Latin-America. Between 1980-2000 Latin America’s Gini has grown, but in the last decade Gini has declined back to 1980 levels (~0.5), despite the strong economic growth throughout the region (Mexico, Brazil).

Gini of Latin America over the last 30 years (Source: The Economist)

Much of the coverage in The Economist tackles the policy debate and the questions of distribution vs. dynamism. On the one hand reducing Gini from very large inequality contributes to social stability and welfare. On the other hand, further reducing already low Gini diminishes incentives and thus potentially slows down economic growth.

In theory, inequality has an ambiguous relationship with prosperity. It can boost growth, because richer folk save and invest more and because people work harder in response to incentives. But big income gaps can also be inefficient, because they can bar talented poor people from access to education or feed resentment that results in growth-destroying populist policies.

In other words: Some inequality is desirable, too much of it is problematic. After growing over the last 30 years, economic inequality in the United States has perhaps reached a worrisome level as the pendulum has swung too far. How to find the optimal amount of inequality and how to get there seem like fascinating policy debates to have. Certainly an example where data visualization can help an otherwise dry subject.

 
1 Comment

Posted by on October 15, 2012 in Socioeconomic

 

Tags: , , ,

Software continues to eat the world

Software continues to eat the world

One year ago Marc Andreessen, co-founder of Netscape and venture capital firm Andreessen-Horowitz, wrote an essay for the Wall Street Journal titled “Why Software Is Eating The World“. It is interesting to reflect back to this piece and some of the predictions made back at a time when Internet company LinkedIn had just gone public and Groupon was just filing for an IPO.

Andreessen’s observation was simply this: Software has become so powerful and computer infrastructure so cheap and ubiquitous that many industries are being disrupted by new business models enabled by that software. Examples listed were books (Amazon disrupting Borders), movie rental (NetFlix disrupting Blockbuster), music industry (Pandora, iTunes), animation movies (Pixar), photo-sharing services (disrupting Kodak), job recruiting (LinkedIn), telecommunication (Skype), video-gaming (Zynga) and others.

On the infrastructure side one can bolster this argument by pointing at the rapid development of new technologies such as cloud computing or big data analytics. Andreessen gave one example of the cost of running an Internet application in the cloud dropping by a factor of 100x in the last decade (from $150,000 / month in 2000 using LoudCloud to about $1500 / month in 2011 using Amazon Web Services). Microsoft now has infrastructure with Windows Azure where procuring an instance of a modern server at one (or even multiple) data center(s) takes only minutes and costs you less than $1 per CPU hour.

Likewise, the number of Internet users has grown from some 50 million around 2000 to more than 2 billion with broadband access in 2011. This is certainly one aspect fueling the enormous growth of social media companies like Facebook and Twitter. To be sure, not every high-flying startup goes on to be as successful after its IPO. Facebook trades at half the value of opening day after three months. Groupon trades at less than 20% of its IPO value some 9 months ago. But LinkedIn has sustained and even modestly grown its market capitalization. And Google and Apple both trade near or at their all-time high, with Apple today at $621b becoming the most valuable company of all time (non inflation-adjusted).

The growing dominance and ubiquitous reach of software shows in other areas as well. Take automobiles. Software is increasingly been used for comfort and safety in modern cars. In fact, self-driving cars – once the realm of science fiction such as flying hover cars – are now technically feasible and knocking on the door of broad industrial adoption. After driving 300.000 miles in test Google is now deploying its fleet of self-driving cars for the benefit of its employees. Engineers even take self-driving cars to the racetracks, such as up on Pikes Peak or the Thunderhill raceway. Performance is now at the level of very good drivers, with the benefit of not having the human flaws (drinking, falling asleep, texting, showing off, etc.) which cause so many accidents. Expert drivers still outperform the computer-driven cars. (That said, even human experts sometimes make mistakes with terrible consequences, such as this crash on Pikes Peak this year.) The situation is similar to how computers got so proficient at chess in the mid-nineties that finally even the world champion was defeated.

In this post I want to look at some other areas specifically impacting my own life, such as digital photography. I am not a professional photographer, but over the years my wife and I have owned dozens of cameras and have followed the evolution of digital photography and its software for many years. Of course, there is an ongoing development towards chips with higher resolution and lenses with better optic and faster controls. But the major innovation comes from better software. Things like High Dynamic Range (HDR) to compensate for stark contrast in lighting such as a portrait photo against a bright background. Or stitching multiple photos together to a panorama, with Microsoft’s PhotoSynth taking this to a new level by building 3D models from multiple shots of a scene.

One recent innovation comes in the form of the new Sony RX100 camera, which science writer David Pogue raved about in the New York Times as “the best pocket camera ever made”. My wife bought one a few weeks ago and we both have been learning all it can do ever since. Despite the many impressive features and specifications about lens, optics, chip, controls, etc. what I find most interesting is the software running on such a small device. The intelligent Automatic setting will decide most settings for your everyday use, while one can always direct priorities (aperture, shutter, program) or manually override most aspects. There are a great many menus and it is not trivial to get to use all capabilities of this camera, as it’s extremely feature-rich. Some examples of the more creative software come in modes such as ‘water color’ or ‘illustration’. The original image is processed right then and there to generate effects as if it was a painting or a drawing. Both original and processed photo are stored on the mini-SD card.

Flower close-up in ‘illustration’ mode

One interesting effect is to filter to just the main colors (Yellow, Red, Green, Blue). Many of these effects are shown on the display, with the aperture ring serving as a flexible multi-functional dial for more convenient handling with two hands. (Actually, the camera body is so small that it is a challenge to use all dials while holding the device; just like the BlackBerry keyboard made us write with two thumbs instead of ten fingers.) The point of such software features is not so much that they are radically new; you could do so with a good photo editing software for many years. The point is that with the ease and integration of having them at your fingertips you are much more likely to use them.

Example of suppressing all colors except yellow

The camera will allow registering of faces and detect those in images. You can set it up such that it will take a picture only when it detects a small/medium/large smile on the subject being photographed. One setting allows you to take self-portrait, with the timer starting to count down as soon as the camera detects one (or two) faces in the picture! It is an eerie experience when the camera starts to “understand” what is happening in the image!

There is an automatic panorama stitching mode where you just hold the button and swipe the camera left-right or up-down while the camera takes multiple shots. It automatically stitches them into one composite, so no more uploading of the individual photos and stitching on the computer required.

Beach panorama stitched on the camera using swipe-&-shoot

I have been experimenting with panorama photos since 2005 (see my collection or my Panoramas from the Panamerican Peaks adventure). It’s always been somewhat tedious and results were often mixed (lens distortions, lighting changes sun vs. cloud or objects moving during the individual frames, not holding the camera level, skipping a part of the horizon, etc.) despite crafty post-processing on the computer with image software. I have read about special 360 degree lenses to take high-end panoramas, but who wants to go to those lengths just for the occasional panorama photo? From my experience, nothing moves the needle as much as the ease and integration of taking panoramas right in the camera as the RX100 does.

Or take the field of healthcare. Big Data, Mobility and Cloud Computing make possible entirely new business models. Let’s just look at mobility. The smartphone is evolving into a universal healthcare device for measuring, tracking and visualizing medical information. Since many people have their smartphone with them at almost all times, one can start tracking and analyzing personal medical data over time. And for almost any medical measurement, “there is an app for that”. One interesting example is this optical heart-rate monitor app Cardiio for the iPhone. (Cardio + IO ?)

Screenshots of Cardiio iPhone app to optically track heart rate

It is amazing that this app can track your heart rate just by analyzing the changes of light reflected from your face with its built-in camera. Not even a plug-in required!

Another system comes from Withings, this one designed to turn the iPhone into a blood pressure monitor. A velcro sleeve with battery mount and cable plugs into the iPhone and an app controls the inflation of the sleeve, the measurement and some simple statistics.

Blood pressure monitor system from Withings for iPhone

Again, it’s fairly simple to just put the sleeve around one upper arm and push the button on the iPhone app. The results are systolic and diastolic blood pressure readings and heart rate.

Sample blood pressure and pulse measurement

Like many other monitoring apps this one also keeps track of the readings and does some simple form of visual plotting and averaging.

Plot of several blood pressure readings

There is also a separate app which will allow you to upload your data and create a more comprehensive record of your own health over time. Withings provides a few other medical devices such as scales to add body weight and body fat readings. The company tagline is “smart and connected things”.

One final example is an award-winning contribution from a student team from Australia called Stethocloud. This system is aimed at diagnosing pneumonia. It is comprised of an app for the iPhone, a simple stethoscope plug-in for the iPhone and on the back-end some server-based software analyzing the measurements in the Windows Azure cloud according to standards defined by the World Health Organization. The winning team (in Microsoft’s 2012 Imagine Cup) built a prototype in only 2 weeks and had only minimal upfront investments.

StethoCloud system for iPhone to diagnose pneumonia

This last example perhaps illustrates best the opportunities of new software technologies to bring unprecedented advances to healthcare – and to many other fields and industries. I think Marc Andreessen was spot on with his observation that software is eating the world. It certainly does in my world.

 
Leave a comment

Posted by on August 20, 2012 in Industrial, Medical, Socioeconomic

 

Tags: , , , , ,

Olympic Medal Charts

Olympic Medal Charts

The 2012 London Olympic Games ended this weekend with a colorful closing ceremony. Media coverage was unprecedented, with other forms of competition around who had the most social media presence or which website had the best online coverage of the games.

In this post I’m looking at the medal counts over the history of the Olympic Games (summer games only, 27 events over the last 116 years, no games in 1916, 1940, and 1944). Nearly 11.000 athletes from 205 countries competed for more than 900 medals in 302 events. The New York Times has an interactive chart of the medal counts on their London 2012 Results page:

Bubble size represents the number of medals won by the country, bubble position is roughly based on a world map and bubble color indicates the continent. Moving the slider to a different year changes the bubbles, which gives a dynamic grow or shrink effect.

Below this chart is a table listing all gold, silver, bronze winners for each sport in that year, grouped by type of sport such as Gymnastics, Rowing or Swimming. Selecting a bubble will filter this to entries where the respective country won a medal. This shows the domination of some sports by certain countries, such as Diving (8 events, China won 6 gold and 10 total medals) or Cycling – Track (10 events, Great Britain won 7 gold and 9 total medals). In two sports, domination by one country was 100%: Badminton (5 events, China won 5 gold and 8 total medals), Table Tennis (4 events, China won 4 gold and 6 total medals).

There is also a summary table ranking the countries by total medals. For 2012, the United States clearly won that competition, winning more gold medals (46) than all but 3 other countries (China, Russia, Britain) won total medals.

Top 10 countries for medal count in 2012

Of course countries vary greatly by population size. It is remarkable that a relatively small nations such as Jamaica (~2.7 million) won 12 medals (4, 4, 4), while India (~1.25 billion) won only 6 medals (0, 2, 4). In that sense, Jamaica is about 1000x more medal-decorated per population size than India! In another New York Times graphic there is an option to compare medal count adjusted for population size, i.e. with the medal count normalized to a standard population size of say 100 million.

Directed graph comparing medal performance adjusted for country size

Selecting any node in this graph will highlight countries with better, worse or comparable relative medal performance. (There are different ways to rank based on how different medals are weighted.)

The Guardian Data Blog has taken this a step further and written a piece called “alternative medals table“. This post not only discusses multiple factors like population, GDP, or number of athletes and how to deal with them statistically; it also provides all the data and many charts in a Google Docs spreadsheet. One article combines GDP adjustment with cartographical mapping across Europe:

Medals GDP Adjusted and mapped for Europe

If you want to do your own analysis, you can get the data in shared spreadsheets. To do a somewhat more historic analysis, I used a different source, namely Wolfram’s curated data source accessible from within Mathematica. Of course, once you have all that data, you can examine it in many different directions. Did you know that 14853 Olympic medals were awarded so far in 27 summer Olympiads? The average was 550 medals, growing about 29 medals per event with nearly 1000 awarded in 2008 and 2012.

A lot of attention was paid to who would win the most medals in London. China seemed in contention for the top spot, but in the end the United States won the most medals, as it did in the last 5 Olympiads. Only 7 countries won the most medals at any Olympiad. Greece (1896), France (1900), the United Kingdom (1908), Sweden (1912), and Germany (1936) did so just once. The Soviet Union (which no longer exists) did it 8 times. And the United States did it 14 times. China, which is only participating since 1984, has yet to win the most medals of any Olympiad.

Aside from the top rank, I was curious about the distribution of medals over all countries. Both nations and events have increased, as is shown in the following paired bar chart:

Number of participating nations and total medals per Summer Games

The number of nations grew steadily with only two exceptions during the thirties and the seventies; presumably due to economic hardship many nations didn’t want to afford participation. 1980 also saw the Boycott of the Moscow Games by the United States and several other delegations over geopolitical disagreements. At just over 200 the number of nations seems to have stabilized.

The number of medals depends primarily on the number of events at each Olympiad. This year there were 302 events in 26 types of Sports. Total medal count isn’t necessarily exactly triple that since in some events there could be more than 1 Bronze (such as in Judo, Taekwondo, and Wrestling). Case in point, in 2012 there were 968 medals awarded, 62 more than 3 * 302 events.

What is the distribution of those medals over the participating nations? One measure would be the percentage of nations winning at least some medals. Another measure showing the degree of inequality in a distribution is the Gini index. Here I plotted the percentage of nations medaling and the Gini index of the medal distribution over all participating nations for every Olympiad:

Percentage and Gini-Index of medal distribution by nations

Up until 1932 3 out of 4 nations won at least some medals. Then the percentage dropped down to levels around 40% and lower since the sixties. That means 6 of 10 nations go home without any medals. During the same time period the inequality grew from Gini of about .65 to near .90 One exception were the Third Games in 1904 in St. Louis. With only 13 nations competing the United States dominated so many sports to yield an extreme Gini of .92 All of the last five Games resulted in a Gini of about .86, so this still very large amount of medal winning inequality seems to have stabilized.

It would be interesting to extend this to the level of participating athletes. Of course we know which athlete ranks at the top as the most decorated Olympic athlete of all time: Michael Phelps with 22 medals.

 
Leave a comment

Posted by on August 15, 2012 in Recreational

 

Tags: , , , , , ,

 
Follow

Get every new post delivered to your Inbox.

Join 102 other followers

%d bloggers like this: