SQLite

SQLite is free open source database software you can incorporate in to your applications.

For all those American marketing people out there, here is the first sentence translated in to your language. SQLite is cost effective intellectual property that enables you to embed storage management in to your strategic solutions.

If you are an CFO, here is that critical sentence again. SQLite provides an immediate infinite ROI so who cares what it does.

PHP

SQLite is now built in to PHP so you always have a database in PHP no matter which installation options you choose.

Data Typing

Data typing is an important aspect of databases. For a full explanation, read Data Typing. SQLite started out with the target of having absolutely no data typing so you can place any data in any column of any database table. Along they way they introduced strong and soft data typing to solve problems. Eventually people will introduce a data typing layer over SQLite to protect the integrity of the data.

Data typing is a way of describing the data in the database. The database software then implements rules to make sure the data conforms to a type. The database can then use the data type to speed up searches and sorting. You specify that a column contains a boolean value, perhaps the choice of "no" or "yes". The database can translate "no" to a binary zero and "yes" to a binary one. The one digit binary field takes up just 4 percent of the space required for "yes". When binary values are in adjacent columns that are in the required sort order, the database rows can be sorted many times faster. A binary integer comparison is faster than a string comparison, which makes searches faster.

Strong data typing helps you understand the data in the database. There are many other advantages of strong data typing. The disadvantage of strong data typing occurs when you want a quick database to store unplanned data from a Web page.

Imagine setting up a Web site to help photographers store images while travelling. The photographer types in the date, time, aperture setting, lens focal length, and then uploads the image.

You store the focal length as an integer in a field named focal_length. People can type in 90 when they use a 90 mm lens but cannot accidentally type in a text description. Your database stores the integer value. From that day forward, no matter what happens to your programs, people using the database will always know that the field named focal_length contains a number.

You could increase the information in your database by renaming focal_length to focal_length_mm but that is not necessary as everyone, including Americans, use millimetres to signify focal length.

When your customers switch from film to digital, they will use shorter lenses. Suddenly they move from 38 mm lenses to 3.8 mm lenses. You now have to change your database. focal_length has to change from integer to a format that allows decimal points.

One of your customers buys a zoom lens then types in the focal length as 80 – 200. You have to change your database to accept strings or split the focal length field in to two fields.

You can no longer control the content of the field named focal_length. Some cameras have their lenses marked only as "wide", "portrait", or "tele". If you started with SQLite, you would have none of the input problems as all SQLite fields accept free format string information.

Free format fields sound great until you try to match data. How do you find all the images made at a specific focal length when the focal_length field might contain "50", "50 mm", "50mm", "f50", or "40 – 200 set at 50"?

Hard Data Typing

SQLite already uses integer for primary keys. To be specific, they restrict the primary key to 32 bit signed integers. If they use strong typing in one part of the database, there is no reason to stop people from using the same strong typing elsewhere. The also need to let users have unsigned integer because so many systems are built around unsigned 32 bit integers. One real advantage of SQLite is to store temporary data but we cannot do that if the temporary data has an unsinged integer key.

Soft Data Typing

The SQLite developers hate data typing but then introduce data typing for sorts. There are several types of fields that sort in to the wrong order if stored as text fields. Data typing keeps the data in the right order. SQLite tries to compensate by applying an approximate type for sorting. That is a process often called soft typing.

Sorting

One quick look at the SQLite soft typing shows you that you will still have problems sorting data. Often you will have to extract the data from SQLite in to a PHP array and use one of PHP's excellent array sorting functions to get the right order. If you used a better database, you would have the correct sort available without resorting to PHP fuctions.

Searching

Your SQL select statements find data through indexes. The indexes work by storing some of a table's columns in a sorted data structure. That means the SQL select can get direct to specific field values or ranges of values. The whole approach falls apart if the data can not be arranged in a useable sequence. The database then has to use a time consuming full length search. That makes SQLite useless for many large databases.

Size

When you search a database, you can search through an index or by reading every row of a table. An index search is equivalent to reading as few as five rows of a database table. If your database tables all have five or less rows, an index will not save you time. When you search more than five rows, an index saves significant time.

That larger the database, the slower the index becomes. Indexes are faster if closely matched to a narrow data type. SQLite does not have narrow data types which means their indexes will not work at maximum speed for some types of data.

When you have a large database with many fields that are not text fields, use a different database to get maximum speed. The basic MySQL MyISAM tables are the next step up in flexibility and speed.

XML Parameter Files

Many applications store start up parameters in XML files. For some types of data, relational database tables provide a better approach. If you use MySQL to store start up parameters then send the application to another person, the other person has to install MySQL before installing your application.

SQLite can be built in to your application so you can send the application out to anyone. Leaving out the MySQL installation makes life easier for everyone.

My approach is to automatically install MySQL on every machine in preparation for MySQL based applications. That makes my life easier as I often use MySQL based applications during the installation process. If everyone automatically installed MySQL, we would not need SQLite. For the next few years at least, SQLite is a great alternative.

PHP Arrays

I store lots of data in PHP arrays so I have quick access in any sequence. The arrays can be easily passed from Web page to Web page using sessions. If the data is too big for sessions, I use MySQL tables. SQLite provites a great alternative because some ISPs will not install MySQL.

I know the feeling. You are asked to fix your brother's wife's father's Web site so the poor fellow will not go broke. You know the solution is a database and that MySQL will fit. The Web site is hosted at an ISP that uses only Microsoft products. They installed PHP because they would miss out on 60% of new web site business if they did not have PHP. When you ask bout MySQL, they try to sign you up for a $100 per month Microsoft SQL Server contract that says, in the fine print, that you will have to pay many times that amount if you actually store data in the database.

Talking the ISP in to upgrading to the latest PHP is easier than talking them in to installing MySQL. The latest PHP has SQLite installed as standard. Using SQLite in PHP will be easier than using MySQL when MySQL is not yet installed.

Stand Alone Applications

Some applications are too small for a multiple product installation. I used to write small utility programs in Microsoft Word and Excel macro language. If the data contained more than 65,000 rows, I could not store it in an Excel table. That meant using a file or a database.

Microsoft's Access let me store up to 2 gigabytes. SQLite lets me store up to 2 terabytes and SQLite is available on every operating system I use. Which do you think I will choose in the future?

Database tables seem like overkill for small applications until you send the applications out in to the world. I wrote a small utility that looked at files and, amongst other things, looked for duplicates. During the processing, the utility produces a database that contains viewable lists of information about the processing and the results. You can check what happened and verify how it happened.

I ran the utility against a small partition on one of my workstation disks. The utility found less than 100,000 files using a total of only 10 Gigabytes. The utility produced a small database that was too big for Excel and approaching the limits of Access.

Potential users of the utility now have 250 Gigabyte disks approaching full use. When they run the utility, the resultant database will exceed the 2 Gigabyte limit of Access. That makes SQLite's 2 terabytes useful.

Terabytes

SQLite handles 2 Terabytes. How big is a Terabyte? When will a Terabyte be too small?

A Terabyte is just 1,000 Gigabytes. You can buy 250 Gigabyte disks. Plenty of people have filled up a 250 Gigabyte disk. There are now low cost RAID 5 controllers that will handle 8 disk drives. If you place one of those cards in my current medium size home computer, replace the two 0.12 Terabytes disks (120 Gigabyte disks) with five 0.25 Terabytes disks (250 Gigabyte disks), you will produce a one Terabyte disk array.

My current cable Internet access would let me fill that Terabyte with downloads in just three months.

Disks will reach 0.3 terabytes early this year and 0.4 terabytes next year. In 2006 I expect my home computer will have RAID 5 on the motherboard with three SATA cables connected to three 0.5 Terabyte disks. That will give me one Terabyte from only three disks.

Would I ever use a Terabyte? There is at least one database of more than 10 Terabytes that I would happily download if I had the disk space spare.

If you use the Microsoft Word clipart facility, you know that the 640 Megabyte disk expands to over a gigabyte on disk. Video clips mean that a future clipart disk will be a 4.7 Gigabyte DVD that expands to 10 Gigabyte on disk. Video means that Terabyte will rapidly replace gigabyte on the desktop. By 2010, Gigabyte will be relegated to small memory chips in cameras and handheld devices.

Whatever you use now, the best plan is to allow for a 10 fold growth before you have a chance to upgrade your database. If you start with a 100 Megabyte database, plan for a 1,000 Megabyte database. If you have a 100 Gigabyte database now, it will creep up to a Terabyte before you have a chance to change databases. You are then too close to the SQLite limit. If I had an SQLite database anywhere near 100 Gigabytes, I would immediately plan the shift to database software that will handle more than the SQLite limit.

Division and Replication

When your databases grow larger, you find they usually exceed the speed of underlying disks before they exceed the space on the disks. That means you need to spread your database out over several physical files on separate disks. RAID will help buy is only a temporary measure. The next step is to spread a database over several servers.

SQLite stores a whole database in one file which means you are limited to one RAID array on one server. If you replace SQLite with MySQL, each table is in a separate file which means you can spread your tables over several RAID arrays.

MySQL has replication facilities to let you store multiple copies of a database across several servers.

Nulls

SQLite does the right things with nulls if you set the right option. Databases are inconsistent in the way they handle null values. SQLite defaults to copying the inconsistency of PostgreSQL, Oracle, and IBM's DB2. You can also set an option to make SQLite handle nulls consistently. Read the full explanation at http://www.hwaci.com/sw/sqlite/nulls.html.

Conclusion

SQLite is a viable alternative to XML parameter files, PHP arrays, and, for small stand alone applications, an alternative to MySQL. SQLite will handle medium size databases, up to 2 terabytes, but does not have everything you need to manage those databases.