Welcome, Guest. Please Login
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
Pages: 1
Send Topic Print
Practice Data (Read 2507 times)
Mark Anderson
YaBB Administrator

User - not staff!

Posts: 5689
Southsea, UK
Practice Data
Feb 21st, 2013, 1:15pm
The Guardian's DataBlog has just posted a list of all Brit award winners since 1977. That's likely not of interest as such, but they've kindly made the source data available via Google Docs:
  • Go here. So you've a nice set of real world data (about 100 notes) to play with.
  • Open the above Google doc's 'file' menu.
  • Select "Download As" -> "Plain Text".
  • A .TSV file (tab delimited) is downloaded.
  • Open a new Tinderbox file and drag the TSV file onto it.
  • Explore!  See what you can make of it.
It's nice real data. It is 'dirty' insofar as (probably) multiple compilers have used different formatting for the same date (songs in single vs. double quotes etc.). How would you go about cleaning that?

As column 1 of the source spreadsheet is the year, the default import is lots of notes with a year $Name, many the same year. Can you improve that by altering the input file? Do you need to? How might you move the year data to an attribute and add a more useful note name?

How many different awards are there? Has anyone one the same award twice? In consecutive years or not?

How about a timeline? Different awards on different timeline bands?

All these questions are an excuse to go practice action code on real data. As it's not your data you can concentrate on the process rather than the outcome and as the data is real-world messy you've some realistic minor pitfalls to negotiate.

There are no specific answers here. Just an encouragement to go and hone your Tinderbox code skills on some data you don't have to expend your own time compiling (the most tiresome part of creating a demo).
Back to top
« Last Edit: Feb 21st, 2013, 1:18pm by Mark Anderson »  

Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Pages: 1
Send Topic Print