RSS

Category Archives: Scientific

Visualizing Voting Preferences for World Values

The other day I listened to a presentation by Melinda Gates prepared for the United Nations to deliver an update about progress towards the Millennium Development Goals (MDG). The eight goals of the MDG had been embraced by the UN back in 2005 for the time target of 2015. So it is reasonable to see whether the world is on track to reach each of these eight goals. To summarize, from the MDG Wikipedia page:

  1. Eradicating extreme poverty and hunger
  2. Achieving universal primary education
  3. Promoting gender equality and empowering women
  4. Reducing child mortality rates
  5. Improving maternal health
  6. Combating HIV/AIDS, malaria, and other diseases
  7. Ensuring environmental sustainability
  8. Developing a global partnership for development

A good listing of reports, statistics and updates can be found on the UN website here.

Sample Vote for 6 of 16 MDG choices

Sample Vote for 6 of 16 MDG choices

At the end of Melinda’s presentation is a link to a UN global survey on the MDG goals after 2015. I took this survey and found the visualization of voting results quite interesting. First, one is asked to select six out of a list of sixteen (6 of 16) goals which one thinks have the highest impact for a future better world. (The survey methodology is described in more detail here.) Here is a sample vote:

A nice touch is that for each of the sixteen goals there is a different color and when you check that goal, one of the sixteen areas on the stylized globe is filled with that color. Personal data such as name is optional, but some demographic information is required, including age, gender, educational level and country. Next, one can look at a summary of all currently tallied votes and compare them interactively to ones own vote (checkmarks on the right).

WorldVoteOverview

It is perhaps not surprising that I voted very similar to others in similar demographic cohorts.

  • Country: I picked five of the Top five goals like all other voters living in the US. I included ‘Political freedoms’ in my top six, which in the US only ranks 11th.
  • Age: I shared five of the Top six goals with people in my age group (world-wide). The one I did not check was ranked 4th (Better job opportunities). When you mouse over one of the goals, the display changes to highlight this goal in all columns:
Interactive Vote Analysis with highlighted goal

Interactive Vote Analysis with highlighted goal

  • Gender: Here I picked four of the Top five goals (did not include the ‘Better job opportunities’).
  • Education: I voted very similar to people with very high HDI (Human Development Index, a visualization of which we covered in a previous post) with five of the Top six.

From the above, it seems somewhat surprising that voters in the US did not ascribe a higher value to ‘Better job opportunities’, given how much economic values and topics like unemployment seem to dominate the media. That said, these votes should be a reflection about which goals are most valuable for making the world a better place – not just your own home country. Worldwide it seems that other, more fundamental goals are judged by voters in the US to be more important than ‘Better Job opportunities’.

Another chart on the results page is showing a heat map of the world countries based on how many votes have been submitted. I thought it was interesting that Ghana had submitted about twice as many votes as all of the US, and Nigeria about 7x as many. The country with most voters at this time is India, but not far ahead of Nigeria.

CountryTotals

A fairly useless dynamic animation in this map is a map pin drop of four people who voted similarly to me. I found this too anecdotal to be of any real interest and downright annoying that I couldn’t turn it off. and just focus on the vote heat map. For example, the total number of votes should be displayed in the Legend. I vaguely remember that it was several hundred thousand from 194 countries prior to starting the survey, but couldn’t get that data to display again without clicking on the Vote Again:

MyWorldVotes

 
Leave a comment

Posted by on September 21, 2013 in Education, Medical, Scientific

 

Circos Data Visualization How-to Book

Earlier this year we have looked at a powerful data visualization tool called Circos developed by Martin Krzywinski from the British Columbia Genome Science Center. The previous post looked at an example of how this tool can be used to show complex connectivity pathways in the human neocortex, so-called Connectograms.

Circos Book Cover

The Circos tool can be used interactively on the above website. In that mode you upload jobs via tabular data- and configuration-files and have some limited control over the rendering of the resulting charts. For full expressive power and flexibility, Circos can also be downloaded freely and used on your computer for rendering with extensive customization control over the resulting charts.

I have been asked to review a new book titled “Circos Data Visualization How-to“, published by Packt Publishing here. It’s main goal is to guide through the above download + installation process and get you started with Circos charts and their modification. Here is a brief review of this book.

Although originally developed for visualizing genomic data, Circos has been applied to many other complex data visualization projects, incl. social sciences. One such study was done by Tom Schenk, who analyzed the relationships between college majors and the professions those graduates ended up in. It appears as if this work inspired the author to write this book to help others with using Circos.

I downloaded the book in Kindle format and read it on the Mac due to the color graphics and the much larger screen size. It’s well structured and around 70 pages in printed form. The book focuses first on the download and install part, then has a series of examples from first chart to more complex ones using customization such as colors, ribbons, heat maps or dynamic binding.

Flow Chart for creation of Circos charts

Flow Chart for creation of Circos charts

Circos is essentially a set of Perl modules combined with the GD graphics library.

The first part is on Installing Circos, with a chapter each on Windows 7 and on Linux or Mac OS. Working on MAC I went the latter route. I ended up right in the weeds and it took me about 4 hours to get everything installed and working. The description is derived from a Linux install and is generally somewhat terse. It assumes you have all prerequisite tools installed on your Mac or at least that you are savvy enough to figure out what’s missing and where to get it. I had to dust off some of my Unix skills and go hunting for solutions via Google to a list of install problems:

  • directory permissions (I needed to warp the exact instructions with sudo)
  • installing Xcode tools from Apple for my platform (make was not preinstalled)
  • understanding cause of error messages (Google searches, Google group on Circos)
  • locating and installing the GD graphics library (helpful installing-circos-on-os-x tips by Paulo Nuin)
  • version and location issues (many libraries are in ongoing development; some sources have moved)

Others may find this part a lot easier, but I would say there should be an extra chapter for the Mac with tips and explanations to some of these speed bumps. On the plus side, the Google group seems to be very active and I found frequent and recent answers by Circos author Martin Krzywinski.

The next part of the book is easy to understand. One creates a simple hair-to-eye color relationship diagram. Then configuration files are introduced to customize colors and chart appearance. All required data and configuration files are also contained in the companion download from the Packt Publishing book page.

Chart of relationship between hair and eye colors

Chart of relationship between hair and eye colors

The last part of the book goes into more advanced topics such as customizing labels, links and ribbons, formatting links with rules, reducing links through bundling, and adding data tracks as heat maps or histograms. This is the meat for those who intend to use Circos in more advanced ways. I did not spend a lot of time here, but found the examples to be useful.

Contributions by State and Political party during 2012 U.S. Presidential Elections

Contributions by State and Political party during 2012 U.S. Presidential Elections

This section ends abruptly. One gets the feel that there are other subtleties that could be explored and explained. A summary or outlook chapter would have been nice to wrap up the book and give perspective. For example, I would have liked to hear from the author how much time he spent with various features during the college major to professions project.

In summary: This book will get you going with Circos on your own machine. Installing can be a challenge on Mac, depending on how familiar you are with Unix and the open source tool stack. The examples for your first Circos charts are easy to follow and explain data and configuration files. The more advanced features are briefly touched upon, but require more experimentation and time to understand and appreciate.
Circos author Martin Krzywinski writes on his website: “To get your feet wet and hands dirty, download Circos and a read the tutorials, or dive into a full course on Circos.” The How-to book by Tom Schenk helps with this process, but you still need to come prepared. If you are a Unix power user this should feel familiar. If you are a Mac user who rarely ever opens a Terminal then you might be better off just using Circos via the tableviewer web interface.
Lastly, I would recommend buying the electronic version of this book, as you can cut & paste the code, leverage the companion code and documents. A printed version of this book would be of very limited use.

 
1 Comment

Posted by on December 6, 2012 in Education, Scientific

 

Tags: , , ,

Superstorm Sandy – Visualizing Hurricanes

Superstorm Sandy – Visualizing Hurricanes

Time-lapse animation of Sandy Oct-28 from geostationary orbit, 1 frame per minute, 11 hours of daylight. Although “only” a category 1 hurricane, this superstorm has enormous size. Tropical storm force winds extend out over an area 900 miles in diameter.

Living in South Florida makes you alert to tropical storms during hurricane season from May to November. Exactly 7 years ago, at the end of October 2005, the eye of category 3 hurricane Wilma swept over our home in West Palm Beach in South Florida – the most powerful natural weather event I have ever witnessed. After avoiding a direct hit since then, we got a massive rain event from Isaac earlier this year, but again avoided a direct hit. To be sure, often the flooding associated with hurricanes is worse than the wind damage. For example, when hurricane Katrina hit New Orleans in August 2005, most of the devastation came from flooding after the levees were breached. But the first question is always where the storms will make landfall and how strong they are when they hit your area.

Tropical storms are being tracked and forecast in great detail, in particular by the National Hurricane Center of the National Weather Service. There are many great visualizations illustrating the path, windspeed, rainfall, extent of tropical storm force winds, etc. Due to the convenience for browsing, I have almost completely switched to following hurricane or weather updates from the iPad. (In this case I’m using the Hurr Tracker app from EZ Apps.)

Last week a new tropical storm emerged in the Carribean and was named ‘Sandy’. A few days ago with Sandy’s center over the Bahamas, the path looked like this:

Path of hurricane Sandy as of Oct-25 (Hurr Tracker iPad app)

Note the use of color for wind speed and the cone of uncertainty in the lower segment, as well as the rings around the center indicating the size of the area with storm-force winds.

Naturally curious whether South Florida was likely to get hit, another image gave us some relief:

5 Day tracking map for hurricane Sandy

Now a few days later, while we did get some strong northerly winds and pounding surf leading to beach erosion, Sandy was not a particularly disturbing event for South Florida. At the same time, however, Sandy is forecast to make landfall on the Jersey shore within about 24 hours during the night from Monday to Tuesday.

One interesting set of maps with a color code displaying the probability of an area experiencing winds of a certain speed, say at least tropical storm force winds (>= 39 mph). The following map was issued this afternoon and indicates the very large area (mostly offshore) with near 100% probability of exceeding tropical storm force winds in purple.

Tropical storm force wind speed probabilities for hurricane Sandy as of Oct-28

This indicates how large Sandy is – an area the size of Texas with tropical storm force winds! Meteorologists are concerned for the Northeast due to Sandy converging with two other weather events, a storm from the West and cold air coming down from the North. This is expected to intensify the weather system, similar to the Perfect Storm of 1991. Due to the timing around Halloween this is why Sandy was also called a ‘Frankenstorm’.

One of the most chilling pictures is this animated GIF from WeatherBELL. A story in the Atlantic earlier today writes this:

Dr. Ryan Maue, a meteorologist at WeatherBELL, put out this animated GIF of the storm’s approach yesterday. “This is unprecedented –absolutely stunning upper-level configuration pinwheeling #Sandy on-shore like ping-pong ball,” he tweeted. It shows how cold air to the north and west of the storm spin Sandy into the mid-atlantic coastline.

(Click the image if the animation doesn’t play in your browser.)

Animation of hurricane Sandy moving into the NorthEast (Source: WeatherBELL)

Understandably this forecast of superstorm Sandy has the authorities worried. The full moon tomorrow exacerbates the tides and New York City is expecting up to 11 ft storm surge. Cities across the Northeast are taking precautions as of this writing. For example, the New York City subway metro transit system is shutting down tonight and several hundred thousand people in low-lying coastal areas are under mandatory evacuation order. More than 5000 flights to the area on Monday have been cancelled. Take a look at the expected 5 day precipitation forecast in the Northeast. Some areas may get up to 10 inches of rain and/or snow!

5 day precipitation forecast with Sandy’s impact for the Northeast

The first priority is to use such visualizations to communicate the weather impact and allow people to take necessary precautions. One can use similar hurricane charts to visualize other uncertain events, such as the future outcomes of development projects. We will look at this in an upcoming post on this Blog.

 

Addendum 11/4/12: The NYTimes has provided some interactive graphics detailing the location and size of power outages caused by superstorm Sandy in the New York and New Jersey area. The New York City outages have been summarized in this chart, normalized to the percentage of all customers. As can be seen, the efforts to restore power over the first 6 days have been fairly successful, especially in Manhattan and Staten Island, less so in Westchester.

6 day tracking map of power outages caused by Sandy in New York City

 
Leave a comment

Posted by on October 28, 2012 in Recreational, Scientific

 

Tags: , ,

Keystroke Biometrics using Mathematica

Keystroke Biometrics using Mathematica

A few weeks ago Paul-Jean Letourneau posted an article on Wolfram’s Blog about using Mathematica to collect and analyze keystroke metrics as a way to identify individuals. The article analyzes how you type, measuring the time intervals between your typing the individual characters using a little interactive widget, collecting and visualizing the data while you repeatedly type in the word “wolfram”.

Keystroke metrics of 50 trials typing the word “wolfram”

 

It is somewhat interesting at this point to analyze one’s one typing style. For example there appears to be a bi-modal distribution of the time intervals between keystrokes, with the sequence “r-a” taking me almost twice as long (~130ms) as most other sequences (~60-70ms). There is also a ‘learning’ effect visible in my 50 trials, where the speed improves noticeably after about 20 repetitions or so. However, there are occasional relapses into a much slower typing pattern throughout the rest of the trials.

However, what I thought was more interesting is the subsequent analysis the author did across a set of 42 such series he obtained from his colleagues (noting humorously that “it just so happens that Wolfram is a company full of data nerds”). He then proceeds to analyze and visualize that data in various ways.

Distribution Histogram of keystroke intervals

He observes the bimodal nature of the distribution with peaks around 75ms and 150ms for different pairs of characters. In fact, averaging over all those pair typing times, a correlation is found indicating that when people type slower they are more consistent.

(Negative) Correlation of pairwise typing speed and consistency

The analysis continues with the observation that each measurement can be seen as a point in a six-dimensional space (six pair-transitions in a word with seven characters). When a person types this same word 50 times you get a cluster of 50 points in six-dimensional space. Different individuals will produce different clusters. So one can use the (built-in) function FindClusters to determine such clusters. However, since people have a certain amount of inconsistency in their typing, it is possible that sometimes one person’s typing will show up in another person’s cluster and vice versa. To measure the quality of the clusters to distinguish individuals, one can implement various measures. The author implements the Rand-index, a measure of the similarity between two data-clusterings. This gives a numeric accuracy on a scale from 0 to 1 for the ability to distinguish between a pair of two people. When looking across all pairs of 42 people – there are 21*41=861 different pairs, but the author chose to look at all 42*42=1764 pairs, as the FindCluster results depend on the sequence input data, so Rand[i,j] may be different from Rand[j,i] – you get the following histogram of Rand quality scores:

Histogram of Rand quality score for all pairs

This clearly shows that keystroke metrics for one word are not sufficient to reliably distinguish between arbitrary pairs of people. The average quality score is only 0.67. On the other hand, about 400 (~23%) of those quality scores are a perfect 1.0, so for about a quarter of the pairs it alone would suffice to reliably distinguish the two people typing. About half as many scores are 0.0, meaning that the clusters overlap so much that no distinction is possible. The remaining scores are distributed mostly between 0.5 and 1.0, meaning you would just guess right more often than wrong.

The author wraps up the post with this paragraph:

Using this fun little typing interface, I feel like I actually learned something about the way my colleagues and I type. The time to type two letters with the same finger on the same hand takes twice as long as with different fingers. The faster you type, the more your typing speed will fluctuate. The more your typing speed fluctuates, the harder it will be to distinguish you from another person based on your typing style. Of course we’ve really just scratched the surface of what’s possible and what would actually be necessary in order to build a keystroke-based authentication system. But we’ve uncovered some trends in typing behavior that would help in building such a system.

An interactive CDF widget embedded in the article allows you to collect and visualize the timing of your own typing. Source code as well as the test data is also shared if you want to further explore the details of this interesting analysis.

 
1 Comment

Posted by on July 20, 2012 in Linguistic, Scientific

 

Tags: , , , , , ,

Visualign Blog – View Stats for first year and a half

Visualign Blog – View Stats for first year and a half

I started this Data Visualization Blog back at the end of May 2011. WordPress provides decent analytics to measure things like views, referrer, clicks, etc. The built-in stats show bar charts by day/week/month, views by country, top posts and pages, search engine terms, comments, followers, tags and so on. I have accumulated the view data and wanted to share some analysis thereof.

At this point there are 17,000 views and 56 posts (about 1 post per week). The weekly views have grown as follows:

Weekly Views of Visualign Blog

The WordPress dashboard for monthly views looks like this:

Assuming an exponential growth process this amounts to a doubling roughly every 3 months. This may not sound like much, but if it were to continue, it would lead to a 16x increase per year or a 4096x increase in 3 years. Throughout the first year this model has been fairly accurate and allowed to predict when certain milestones would be reached (such as 10k views, reached in Apr-2011 or 100k views, predicted by Jan-2013).

However, the underlying process is not a simple exponential growth process. Instead it is the result of multiple forces, some increasing, some decreasing, such as level of interest of fresh content for target audience, rather short half-life of web content, size of audience, frequency of emails or tweets with links to the content etc. So I expect growth to slow down and consequently the 100k views milestone to be pushed out past Jan-2013.

Views come from some 112 countries, albeit very unevenly distributed.

Views by Country (10244 views since Feb-25, 2012)

The Top 2 countries (United States and United Kingdom) contribute nearly half of the views, the Top 10 (9%) countries nearly 75% of all views. The fairly high Gini index of this distribution (~0.83) indicates strong dependency on just a few countries. The only surprise for me in the Top 10 list was South Korea, ranking fifth and slightly ahead of India. Germany is probably a bit over-represented due to my German business partner (RapidBusinessModeling) and related network.

Views by country with Top 10 list

One interesting analysis comes from looking at the distribution of views over weekdays. Not every weekday is the same. Thursdays are the busiest, Saturday the quietest days. After a little more than one year, averaging over some 56 weeks, the distribution looks like this.

Weekday variation of Blog views averaged over 1st year

Of course, time zone boundaries may cause some distortions here, but it looks like the view activity builds during the week until it hits a peak on Thursday. Then it falls sharply to a low on Saturday, and builds from there again. This fits with intuition: One would expect the weekend days to be low as well as Monday and Friday to be lower than the mid-week days. It’s tempting to correlate that with the amount of work or research getting done by professionals. The underlying assumption is that people discover or revisit my Blog when it fits into their work.

A large fraction (> 65%) of referrals comes from search engines. Within those, it’s mostly Google (>90% summed across many countries) with just a small amount of others like Bing. It’s safe to say that without Google search my Blog would have practically no views. Chances are that your first exposure to this Blog came from a Google search as well. One unexpected insight for me was to see a high ratio of image to text searches, typically 3:1 or 4:1. In some ways it shouldn’t be surprising that a blog on data visualizations gets discovered more often by searching for visual elements than for text. It also jibes with the enormous growth of image related sites such as Instagram or Pinterest. I just would not have expected the ratio to be that high.

The beginning is always slow. But any exponential growth sooner or later leads to rather large numbers. So the real question is how one can keep the exponential growth process going? I’d love to hear your comments. If you want to compare this against your own Blog stats, I have shared the underlying data as a Google doc here. I have no idea how this compares to other blog stats in similar domains. If you know of any other public Blog stats analysis, please comment with a pointer below. Thanks.

Addendum 7/11/2012: Today my Blog reached 20,000 views. I noticed over the last few weeks that the deviation from an exponential growth model was getting quite large. For an exponential trend line R² = 0.9886.

Daily views with 20,000 total view milestone

When instead modeling the weekly views on a linear growth rate, this gives the total views a quadratic growth. Curve fitting the total views with a 2nd order polynomial yields a very good fit (R² = 0.9977).

Total views growth curve with quadratic curve fit

Linear growth of weekly views is compatible with approximately linear increase in content (steady frequency of about 1 post / week) and thus increased chance of Google search indexing new content (with Google search the main source of view traffic). Quadratic growth of total views is also nonlinear, but far slower than exponential growth. For example, the 100,000 view milestone is now projected to be reached in 08/2013 instead of in 01/2013, i.e. in 13 months as compared to 7 months.

Addendum 11/1/2012: The Blog reached 30000 views on Oct-19 and here is a chart of the monthly views through Oct-2012:

Monthly Blog views through Oct-2012

August and September have been slow, presumably seasonal variation. I also didn’t post between late August and mid October. The view data of the last couple of months no longer support the theory of significant growth in view frequency. Instead, multiple dynamic factors come into play. At times views spike due to a mention or a post of temporary interest – such as the recent post on visualizing superstorm Sandy. But such spikes quickly fade away according to the very limited half-life of web information these days. The undulating 4 week trailing average in weekly views below visualizes this clearly. The net effect has been a plateau in view frequency around 3000 per month.

Weekly Views with average Nov 2012

I continue to see most of the referrals coming from Google searches, still with a majority of those being image searches. Engagement growth has been anemic, with relatively few comments, back links or other forms of engagement. It seems to me that growth proceeds in phases, with growth spurts interspersed by plateaus of varying length. One such growth spurt has been reported by Andrei Pandre on his Data Visualization Blog through the use of Google+. Perhaps it’s time to extend this Blog to Google+ as well.

Variation of views by weekday

With regard to variation of views by weekday, the qualitative pattern remains. Tuesday is now emerging as the day with the most views, with Monday, Wednesday, and Thursday slightly behind, but still above average. Friday is slightly below average, Saturday is the lowest day with only half the views and Sunday in between.

I’m not sure whether to conclude from that that important posts should be published on a particular weekday. Again, most views come from Google searches and are accumulated over time, so perhaps only the height of the initial spike will vary somewhat based on the publishing weekday.

 
Leave a comment

Posted by on June 12, 2012 in Scientific

 

Venn Diagrams

Venn Diagrams

The private library Blog had a post with some word play relating to sound, spelling and meaning of words in the English language. From their post on Homographic Homophones:

English is one of the most difficult languages in the world for a non-native speaker to learn.  One of the reasons why this is so is that English has a large number of words that are pronounced the same as other words (i.e., they are homophones) even though they have quite different meanings.  Homophones such as parepair and pear, for example, have the same pronunciation but are spelled differently and have different meanings (heterographic homophones).  Other homophones — tender (locomotive),tender (feeling) and tender (resignation), for instance — are spelled the same and pronounced the same (homographic homophones) but have different meanings (i.e., they are homonyms).

Got all that?  Wikipedia has a nice Venn diagram that may help you sort it out:

Venn Diagram displaying meaning, spelling, and pronunciation of words (Source: Wikipedia)

Of course, you could also list the above combinations in a table. If you’re interested, Carol Moore has done just that on her Buzzy Bee riddle page.

A beautifully symmetric 5 set Venn diagram drawn from ellipses has been proposed by Branko Grünbaum and drawn by Wikipedia contributor Cmglee:

Symmetrical_5-set_Venn_diagram (Source: Wikipedia)

Such set-based diagrams invite a more mathematical notation. Cmglee annotates his image with this snippet:

Labels have been simplified for greater readability; for example, A denotes A ∩ Bc ∩ Cc ∩ Dc ∩ Ec (or A ∩ ~B ∩ ~C ∩ ~D ∩ ~E), while BCE denotes Ac ∩ B ∩ C ∩ Dc ∩ E (or ~A ∩ B ∩ C ∩ ~D ∩ E).

If you search the Wolfram Demonstration Project for ‘Venn Diagram’, you get several interactive diagrams.

Venn Diagram Demonstration Projects (Source: Wolfram Demonstration Project)

These diagrams are interactive. For example, they allow you to click on any subset and then have that set highlighted and the corresponding mathematical set notation displayed accordingly. Interesting and fun to learn.

Speaking of fun: Venn diagrams are also effectively used in many different areas, two of which I’d like to leave you with here:

Data Science Venn Diagram (Source: drewconway.com)

And last but not least, Stephen Wildish’s Pancake Venn Diagram:

 
Leave a comment

Posted by on June 10, 2012 in Linguistic, Scientific

 

Tags: , , ,

Connectograms and Circos Visualization Tool

Connectograms and Circos Visualization Tool

Yesterday (May 16) the Public Library of Science (PLoS) published a fascinating article titled “Mapping Connectivity Damage in the Case of Phineas Gage“. It analyzes the brain damage which the famous trauma victim sustained after an accident drove a steel rod through his skull. Railroad worker Phineas Gage survived the accident and continued to live for another 12 years, albeit with significant behavioral changes and anomalies. Those changes were severe enough for him to have to discontinue his work and also get estranged from his friends who stated he was “no longer Gage”. This has become a much studied case about the impact of brain damage on behavior anomalies. Since the accident happened more than 150 years ago there are no autopsy data or brain scans from Phineas Gage’s brain. So how did the scientists reconstruct the likely damage?

Since a few years there has been interest in the human connectome. Just like the genome is a map of human genes, the connectome is a map of the connectivity in the human brain. The human brain is enormously complex. Most estimates put the number of neurons in the hundreds of billions and the synaptic interconnections in the hundreds of trillions! Using diffusion weighted (DWI) and magnetic resonance imaging (MRI) one can identify detailed neuron connectivity. This is such a challenging endeavor that it drives the development of many new technologies, including the data visualization. The image resolution and post-processing power of modern instruments is now large enough to create detailed connectomes that show major pathways of neuronal fibers within the human brain.

The authors of the Laboratory of Neuro Imaging (LONI) in the Neurology Department at UCLA have studied the connectomes of a population of N=110 healthy young males (similar in age and dexterity to Phineas Gage at the time of his accident). From this they constructed a typical healthy connectome and visualized it as follows:

Circular representation of cortical anatomy of normal males (Source: PLoS ONE)

Details of the graphic are explained in the PLoS article. The outermost ring shows the various brain regions by lobe (fr – frontal, ins – insula etc.). The left (right) half of the connectogram figure represents the left (right) hemisphere of the brain and the brain stem is at the bottom, 6 o’clock position of the graph.

Connectograms are circular representations introduced by LONI researchers in their NeuroImage article “Circular representation of human cortical networks for subject and population-level connectomic visualization“:

This article introduces an innovative framework for the depiction of human connectomics by employing a circular visualization method which is highly suitable to the exploration of central nervous system architecture. This type of representation, which we name a ‘connectogram’, has the capability of classifying neuroconnectivity relationships intuitively and elegantly.

Back to Phineas Gage: His skull has been preserved and is on display at a museum. Through sophisticated spatial and neurobiological reasoning the researchers reconstructed the pathway of the steel rod and thus the damaging effects on white matter structure.

Phineas Gage Skull with reconstructed steel rod pathway and damage (Source: PLoS ONE)

Based upon this geospatial model of the damaged brain overlaid against the typical brain connectogram from the healthy population they created another connectogram indicating the connections between brain regions lost or damaged in the accident.

Mean connectivity affected in Phineas Gage by the accident damage (Source: PLoS ONE)

From the article:

The lines in this connectogram graphic represent the connections between brain regions that were lost or damaged by the passage of the tamping iron. Fiber pathway damage extended beyond the left frontal cortex to regions of the left temporal, partial, and occipital cortices as well as to basal ganglia, brain stem, and cerebellum. Inter-hemispheric connections of the frontal and limbic lobes as well as basal ganglia were also affected. Connections in grayscale indicate those pathways that were completely lost in the presence of the tamping iron, while those in shades of tan indicate those partially severed. Pathway transparency indicates the relative density of the affected pathway. In contrast to the morphometric measurements depicted in Fig. 2, the inner four rings of the connectogram here indicate (from the outside inward) the regional network metrics of betweenness centrality, regional eccentricity, local efficiency, clustering coefficient, and the percent of GM loss, respectively, in the presence of the tamping iron, in each instance averaged over the N = 110 subjects.

The point of the above quote is not to be precise in terms of neuroscience. Experts can interpret these images and advance our understanding of how the brain works – I’m certainly not an expert in this field, not even close. The point is to show how advances in imaging and data visualization technologies enable inter-disciplinary research which just a decade ago would have been impossible to conduct. There is also a somewhat artistic quality to these images, which reinforces the notion of data visualization being both art and science.

The tool used for these visualizations is called Circos. It was originally developed for genome and cancer research by Martin Krzywinski at the Genome Sciences Center in Vancouver, CA. Circos can be used for circular visualizations of any tabular data, and the above connectome visualization is a great application. Martin’s website is very interesting in terms of both visualization tools as well as projects. I have already started using Circos – which is available both for download and in an online tableviewer version – for some visualization experiments which I may blog about in the future.

 
6 Comments

Posted by on May 17, 2012 in Scientific

 

Tags: , , , ,

 
%d bloggers like this: