Saturday, September 12, 2015

Storing Approximate or Partial Dates in Database

One of the common issues that I have in programming is how to store questionable dates. There are all different forms of questionable dates, but I have primarily been focused on partial dates where I may or may not have the date. The most common property is the a person's birthday. After a few different ways of storing the date, I have found that having an additional column to store the certainty of the date works best for me thus far.

In MS SQL, I have an additional binary column. I will have two columns, DOB (date) and DOBAppr (binary(1)). DOBAppr contains the certainty of DOB. If DOBAppr is 0 then the date is known exactly. If 1, the day is uncertain. In the application, I will only display Month and Year if DOBAppr is 1. If 2, the month is uncertain. This is uncommon to have only the year and day but the possibility is there. One example is when someone does not explicitly share their DOB but has mentioned that they shared the same day (and you know how old they are). If 3, the day and month is unknown. So basically, I have set it up like a binary flag. 4 is unknown year, 5 is unknown year and date, 6 is unknown year and month, 7 is all unknown.

In the previous solution, I used to broken the date into multiple fields, day (tinyint), month (tinyint), and year (smallint). If the value was unknown, I would leave the field as null or use 0 if null is not allowed. Although slightly easier to build the tables, there was quite a bit of overhead to programming. There are other insignificant inefficiencies. This solution has always just bugged me because it was just not a perfect fit even though it worked sufficiently.

Recently, I have expanded on my application and found other uses to using a flags. I was trying to store data on music genres and one of the field is origins. Often in sources, I only get date ranges or decades. So by simply expanding my flags to include the 4th-bit for decades, 5th-bit for centuries, and 6th-bit for millenniums.

While on this expansion, there is also another flag that I could use in the future like approximation (ie circa, c, ca, circ, or cca). Quite similar to partial dates, but this would also mean that the value provided may be incorrect. For example for circa 2015, this could include 2014 or 2016 (or a wider range depending on the object). For partial dates, 2015 would have high confidence that it is correct (or at least a very small chance for error). Although I have no implemented this yet, the solution still gives this flexibility. I probably would use the last bit.

No comments:

Post a Comment