metadata vs data, an artificial but existential distinction
In two interesting blog posts about how the distinction between data and metadata is artificial, and that it is merely a functional difference, it caught my eye, so I thought I’d join in. We were also talking about this just the other day, in terms of data.gov, the site cataloging government data feeds. We felt that the site might be more accurately termed metadata.gov, since it stores data about data. (I don’t think it necessarily should be, because that would be more confusing to most)
I would argue that the physical world naturally distinguishes data and metadata, and not just in computer software. The human memory seems to do so, recording just the most important snippets of events and occurrences so that we are able to later reconstruct events past, and more importantly, find them in their complete form later. I remember places, but just enough to be able to find them again (and when I’m lucky, enough to be able to name and describe them to other interested people). I remember ideas in articles and books, but just enough to be able to summarize them later, and maybe find the full excerpt when I need it. What my brain remembers is clearly the data, but a distinct subset of the data.
Thought of in that way, metadata is the most pertinent, useful data about the data. That certainly makes it functional data. However, defining what that data is is difficult, because it clearly depends on who you ask. I find some things more pertinent than other individuals, and vice versa. But even in the physical world, I think it is entirely natural to be separating data from the metadata, as that is the only way that the finite, limited capacities of our minds are able to store our existence in a functional way.
So the distinction between metadata and data is a functional difference, but not merely one. Rather, the difference is existential, and without it in the physical world, we wouldn’t be able to function. In the computer world though, we could do without it, and hopefully with projects like FluidDB (I haven’t checked it out yet), as well as new approaches solving the limitations that keep metadata around (B-trees are still much faster than full scans), we’ll have new interesting possibilities in the digital world.
Of course, that doesn’t keep it from being messy. Where the distinction lies between data and metadata is well, in the eye of the beholder.
No comments Digg thisNo comments yet. Be the first.
Leave a reply
