Open Data Formats
unteer's picture
Tags

Good day blog readers.  It's Wednesday morning and my class of teachers that was supposed to show up has not (which actually is a suprise, usually they are early!  Wait, I have been informed there are some national exams going on...), so I have decided to introduce the layman to a very specific technical issue that plagues Information Technology in the developing world: that of open data formats and other forms of open standards.  Don't worry, I know it sounds boring, but hopefully you will be interested enough at the end, and maybe you will have even learned something!  I can only hope.

First off, what are open data formats?  What is a data format?  What is data?  Where am I!? Who are you?!  Data in this context is everything created by a computer that is actually important to you, the person using the computer.  I am talking about documents, movies, pictures, emails, all of that jazz.  And while we are at it, we can even throw in other types of data such as websites that you browse on a regular basis.  A data format is the special set of rules that are written along with your data into that Word document file or jpeg photo file from your camera.  They tell the computer, hey, I am a jpeg picture, here's how you read me.  Without data formats being specified, then those 1's and 0's making up that latest Springsteen song on your computer are just that: 1's and 0's.  With the proper format, and a program that knows how to read it, all of a sudden you've got the Boss pumping out of your Bose and all is well.

The problem with data formats is that the people who create the rules don't always share the rulebook with others.  This causes a problem called lock-in.  Perfect example, though slightly generalized: a few court systems around the US recently realized that all of their government documents were created using Microsoft Word and other Microsoft Office products.  Well that's nice, Microsoft Word is a great word processing application, capable of skillfully maniupulating documents from the simplest essay to the largest book.  However, the .doc format is proprietary.  Microsoft Corporation owns the rulebook as to how create and maniupulate documents stored as .doc (which, by default, are all documents created using Word).  Other computer programmers who want to make applications that can utilize .doc documents, must pay royalty fees to Microsoft, and then they are given the rulebook with permission to read it and implement their own solution.  The only other alternative is to reverse engineer the rules by reading them in binary itself, those pesky 1's and 0's that are so unreadable to the average human.  This is what the OpenOffice.org team has done, though the legality of reverse-engineering is questionable and the results are not always perfect.  Should Microsoft go belly-up or start doing interesting things with their licensing fees, what would the the courts do?  They were locked-in to a Microsoft-only solution.

Word documents are not the only prolific proprietary formats out there.  .GIF images are in a questionable state at any given time, with the original license being held by Compuserv (though I would need to do some wikipedia searching to know its current status), Adobe Creative Suite formats are proprietary (though PDFs are fairly open now), and the biggest?  MP3's are a proprietary format, with every program and piece of hardware that legally uses it needing to pay licensing fees.  This also holds true for formats that your DVD movies come in.  Many, many, "everyday," data formats are proprietary.

How does this harm developing world?  Well, the second part of the term "licensing fee," is the little world, "fee."   You could also use "royalty fee," or my more preferred term is, "stupid fee."  This fee trickles upwards into software cost.  As many of us know, there are plenty of free software projects out there that can easily substitute paid-for software in terms of functionality, but being free software projects, they are unable to pay the licensing fees and therefore do not always support proprietary formats.  Let's continue to look at the trickle effect through a case example:

A non-profit in the United States emails a .doc file to an NGO in Kenya. It is their requested application for a grant that would enable to complete a successful AIDS-prevention project.  The organization, in order to open this, needs a copy of Microsoft Word, which only runs on Microsoft Windows (because the organization is unaware of other options, Microsoft being so entrenched in Africa).  They cannot afford Microsoft Windows and therefore get a pirated copy, as well as a pirated copy of Word.  Pirated, as in illegally copied and technically stolen.  They open the document. fill out and return the application and their grant request is fulfilled and they begin collecting all of their data and begin seeing trends that will allow them to seriously assist People Living With AIDS in their area.  A successful NGO!

In possession of pirated copies of software, the NGO is unable to practice proper computer security through updating their software to protect against the latest viruses, and even the simplest such as those transmitted through USB flash drives reap havoc upon systems across Kenya.  Of course, also being low on budget, and barely affording the computer itself, they are unable to purchase hard drives to back up information regularly.  A virus sweeps in and destroys all of their work.  Just to be able to open a .doc emailed from a group in America who themselves did not know better.

There exist alternatives however.  There exist alternatives to every major proprietary format.  Instead of MP3, use FLAC or Ogg.  Instead of GIF, use PNG.  Instead of .doc, use RTF or better yet, the Open Document Format (ODF).  If it's a document that need only be read, use PDF.  This will have a trickle down effect for developing nation.  People like me can come in and start promoting the use of Open Source and Free alternatives to software, including Windows.

An alternative operating system called Linux runs fantastically, especially on older hardware, but one of its drawbacks is the inability of its creators to always bundle applications that can read proprietary formats in order to avoid licensing fees (or law suits, which might ensue should they use less-than-legal reverse-engineered technologies).  It seems like a stretch, but I promise you, the effect would be real.  Ultimately people do not care about how their computer operates, as long as it does.  From an infrastructure support point of view, the only thing preventing people in the developing world from switching is a Microsoft lock-in directly tied to a data format lock-in.  We need to break out, because data lock-in is holding back development.  There, I said it.

Anonymous's picture

Thailand - Cost of Living

I have a query in regards to using cameras whilst travelling. From your experience, what camera(either digital or regular) has been fantastic whilst travelling in terms of practicality, quality, and storage?
http://www.marshall.net

Anonymous's picture

Nail houses

Take a few months out, travel Europe - and have a great time deciding for yourselves. ;)

Anonymous's picture

Comment on Aurelie Guillerey by molasses

My understanding of
http://www.sakstein.com

Anonymous's picture

Heres an interesting article on a

Posted by John Martz on Drawn! The Illustration and Cartooning Blog |
http://www.jennings.net

Anonymous's picture

Tech hell, contd

After a few years off, I've been doing some writing for Linux Magazine (which is on-line only) again recently. First off, my just published feature article is Drizzle: Rethinking the MySQL Database Kernel. As you might have guessed, it looks at Drizzle and some of the reasoning behind forking and re-working MySQL. I'm also writing a weekly column that we've been calling "Bottom of the Stack" (RSS) which started a few weeks ago. Recent articles are: Sphinx: Search Outside the...
http://www.baid.com

Anonymous's picture

Police in the Bahamas are investigating

Just wondering :)
My understanding of
http://www.shaw.net

Anonymous's picture

3) Visited all 4 oceans.

Hannah Sung of Canada Reads, the CBC Book Club, wants to know what your Top10 Graphic Novels are. I name-dropped a few favourites on their blog, and Ive already changed my mind a few times over. Leave a comment at the site and put in a good word for your own top list.

Anonymous's picture

Comment on James Jarvis and Nike: Onwards by oeuf

A golf resort developer duped nearly two dozen NHL players out of $25 million, blowing the money on gratuitous parties that were attended by former Yankees players Roger Clemens and Reggie Jackson, according to a lawsuit filed Thursday.

Anonymous's picture

Ive been a Wall Street Journal

Of the many things I noticed last week at the MySQL Conference, one of the most notable was how many companies have not upgraded from MySQL 5.0 to 5.1 yet. Craigslist is in that camp and it seems that we're joined by the likes of Facebook, Google, Yahoo, and about half a dozen other companies that use MySQL heavily. Come to think of it, SmugMug are the only folks I've talked with who've made the jump (video). So it's not...
http://www.schoenbuchner.info

Anonymous's picture

Since it would appear that my

im an esay going dutchman who likes to have fun see a lot of new things and meet new people. i haven t planned it all yet, nevertheless let me know if you have some plans of going there.
http://www.alekseev.net

Anonymous's picture

Does anyone have any ideas as

yer! gwan the tek!
Watch Movie Online Now!Download Movie

Anonymous's picture

Huh.

Very insightful -- I never thought about emailing a .doc as a security issue, but it makes sense.

In your scheme to fix this Microsoft lock, step one is using open formats. It sounds like step two is education. Coincidence? I think not.

Copyright © 2010 Voices of Africa. All Rights Reserved.