Data typing is one important aspect of databases. Data typing lets you specify the range of data stored in a database column so you can control the integrity of the data.
SQLite is one database that tries to have no data typing so you can place any data in any column of any database table. Along they way they introduced strong and soft data typing to solve problems. Eventually you realise that data typing is really useful and that some databases give you a free form data type that lets you ignore data types when you do not need them.
MySQL is an example of a database that has strong data typing. You can specify integer, string, or one of several other data types. Within a data type you get more controls. An integer can have a maximum length, can be signed or unsigned, and can allow nulls.
Data typing is a way of describing the data in the database. The database software then implements rules to make sure the data conforms to a type. The database can then use the data type to speed up searches and sorting.
You specify that a column contains a boolean value, perhaps the choice of "no" or "yes". The database can translate "no" to a binary zero and "yes" to a binary one. The one digit binary field takes up just 4 percent of the space required for "yes". If you switch from regular 8 bit characters to 16 bit unicode characters, the string "yes" doubles in size while the binary digit stays the same size.
Databases provide facilities to display the data descriptions so that you can see what is in a database. You can retrieve the data descriptions to help format input fields on Web page forms. The variety of data description information available from a database varies from database to database.
In Australia, postal codes are four digits long at, at some time in the future, will become six digits long. Years from now, when you look in to one of our databases, you will be able to look for fields named postcode then look at their length to see if the post code is the old format or the new format. The data description information will tell you that the four digit format can hold integer values up to 9999. The data description will not tell you that post codes cannot be less than 1000, that the integers 0 through to 999 are not valid.
A boolean column can be stored as a binary string one bit long. Eight boolean fields can be stored as adjoining binary bits in one byte, which saves space.
The alternatives to a binary bit string one bit long are a byte with one bit set or a one byte integer. If you have one boolean field in a table then the database will most likely use a whole byte to store the boolean value. When you have more than one boolean column in a table, some databases will use one byte per boolean column while other databases will use one byte per eight boolean fields.
When boolean binary values are in adjacent columns that are in the required sort order, the database rows can be sorted many times faster. The columns are sorted as one binary string. Some databases move the data around to help sorting while others leave the data in place then sort one field at a time.
Strong data typing helps you understand the data in the database. There are many other advantages of strong data typing. The disadvantages of strong data typing occur when you want a quick database to store unplanned data from a Web page. Suddenly you have unplanned data content.
The Quick Web Site
Imagine setting up a Web site to help photographers store images while travelling. The photographer types in the date, time, aperture setting, lens focal length, and then uploads the image.
You store the focal length as an integer in a field named focal_length. People can type in 90 when they use a 90 mm lens but cannot accidentally type in a text description. Your database stores the integer value. From that day forward, no matter what happens to your programs, people using the database will always know that the field named focal_length contains a number.
You could increase the information about the data in your database by renaming focal_length to focal_length_mm but that is not necessary as everyone, including Americans, use millimetres to signify focal length. Some databases let you store comments at the table or column level so you can describe how you intend to use the data.
When your customers switch from film to digital, they will use shorter lenses. Suddenly they move from 38 mm lenses to 7.8 mm lenses. You now have to change your database. focal_length has to change from integer to a format that allows decimal points.
One of your customers buys a zoom lens then types in the focal length as 80 – 200. You have to change your database to accept strings for the focal_length column or split the focal length column in to two separate columns to hold the minimum and maximum length.
You will have difficulty controlling the content of the field named focal_length. Some cameras have their lenses marked only as "wide", "portrait", or "tele". If you started with SQLite, you would have none of the input problems as all SQLite fields accept free format string information.
Free Form Field Limitations
Free format fields sound great until you try to match data. How do you find all the images made at a specific focal length when the focal_length field might contain "50", "50 mm", "50mm", "f50", or "40 – 200 set at 50"?
If your fields are too free form, people can use the fields to insert code that can break your Web site or corrupt your database. You need to make sure you keep text data as text data and never let it break in to code. That means you have to limit what people include in text fields.
PHP has few limitations on typing as it has automatic conversion from type to type. You would think the PHP approach would fit a database with no type restrictions. In fact you want just the opposite.
PHP's free form fields are great when you first get data from a Web page because the data could contain anything. You then want to narrow down the range of content and accept only what is valid for a field. You use the free form input to let people use a variety of notations for data then you recognise the variations and convert them to a standard form.
Once the data complies with your requirement, you want the data stored in a reliable form that accepts only valid data.
Your raw text input might contain a date that can be in several formats. You might split the input field in to separate fields for day, month, and year. One you have the data validated, you then want to store the date in a known international format so you can always access the date from any software. PHP's free form input is important during the input stage but a database's strong typing is the best protection for long term storage.
If you are paranoid about storing data long term and afraid that a particular database software might disappear leaving you with a database you cannot read, the short term solution is to export the data as SQL statements. Use the SQL export as the backup file. You can then feed the SQL in to other databases.
During the export, use the full SQL statement format so that you can import in to tables that have an altered field structure.
The export SQL statements do not fully describe the exported database so include the table definitions in the same file or a companion file. You will then have a record of all the information used to describe the data in the database.
The longer term solution for data export is to export to XML files. You want an XML file that contains all the information needed to rebuild a database from nothing. Contact me using the form on this page when you want information about replacing SQL with XML.