Knowing Your Data

This week’s first tutorial “Cleaning Data with OpenRefine” has a subheading titled “getting to know your data”. This particular phrase resonated with me, especially thinking about last week’s tutorials. The health records that we organized into tabulated data last week were not metadata like the data that we downloaded online for this week’s tutorial. In the case of the health records, we truly did need to ‘know’ the data because we had to create the structures ourselves, whereas OpenRefine has more utility to allow us to worry less about smaller details (like tabs, paragraph markers etc) that we were concerned about last week.

That being said, the organization and structure of the data that comes into our hands during research projects determines how we can know our data in the first place. Information that we can see has an informal structure with potential to be tabulated and organized, like the health records we were working with, necessitate a completely different strategy than this week’s data.

In this observation, I keep in mind that this week’s data is metadata and therefore a different style of data from our health records. But can metadata not also be communicated in an informal way? Here we have the formal metadata that comes to us from the Powerhouse Museum which is already made sense of for us, although sometimes messy as we saw while using OpenRefine. Take for example, a museum’s early 20th century description of artifacts that are still in a museum’s collection. Just because the structure of this metadata is unconventional, it still fits within the definition of metadata.

So the type of data does not necessarily pigeon-hole it into one tool or another, but rather the structure of the data we have. Metadata does not necessarily mean that it is structured like the Powerhouse Museum has done for us. And in this case, a tool like OpenRefine has little value to us.

Written on February 29, 2016