The 5 V’s of Data

You may have heard of the 3 “V”s of Big Data.

What’s Big Data?  It’s the voluminous amount of information that is being mined by corporations by various means (like Internet searches, surveys and preference settings) for a variety of purposes (like targeted marketing, new product development and predictive analytics).

Big Data’s 3 “V”s used to be just one “V” – Volume.  The amount of data that was collected used to be great, but most of it was simply stored, and then when someone or some entity requested information, or decided to research a topic, databases would be accessed so that some type of report could be made regarding the inquiry.  Data is collected, stored, analyzed, and the results are utilized or sold to entities that can benefit from the information.  One simple example of this process is the mailing list.  If you subscribed to a magazine, made a contribution to an organization, or even opened a bank account, your mailing information could be sold to other magazines, organizations or financial institutions for marketing purposes.

The process is much faster now thanks to technology, and 2 more “V”s have been introduced – Velocity and Variety.  Velocity refers to the speed at which data is processed, as well as the manner in which it’s processed.  Data could be “batched” (which refers to collecting a large amount of data before it’s processed – think “buffering” when you’re watching a video) or “streamed” (which processes data as it’s collected).  Variety now simply refers to the different types of information that’s being collected, analyzed and processed into new information.

Some experts say that there is now a 4th “V” – Veracity – which speaks to the uncertainty of the data.  For instance, if you take a survey regarding your experience at local restaurant, are you answering the survey truthfully and objectively, or are all your answers clouded by your experience of the screaming child two tables away from you with the parents that seemed oblivious to the cacophony?

So let’s add a 5th “V” (Okay, 2 Vs) –  Validation/Verification.  Because of the sheer volume of data, the ways in which we get it, as well as the speed at which it’s processed and the potential for uncertainty, data must be validated, and sources verified.  Think of any educational research project, from a classroom assignment to a doctoral dissertation, and questions will be raised regarding the validity of the data and how the results of the research are verified.  Metrics must be validated, and assertions must be verified.

Now that there are 5 elements, an emergent property or principle of the system can be defined.  Yes, that starts with a “V” as well.  Value.  Is the analyzed data relevant to the task at hand, or is it tangential to the central question?  From a business standpoint, value generates revenue.

What does all this have to do with education?  Did you know that you can predict how many students will be in your school next year by analyzing the grade to grade enrollment change over a period of time?  The methodology was researched back in 1981, yet there are schools, especially faith-based ones, that continue to simply “hope and pray” that enrollment declines will “turn the corner.”

So here’s your test question:  What controversial vehicles in the education space today are focused on collecting as much data as possible and analyzing it as quickly as possible to provide a performance assessment relative to learning effectiveness?  Do you have them, and do you use them?  Or, are you just “hoping and praying” too?