Database

Introduction to relational databases


This is but a very small introduction to databases. Purpose of this page is to explain, in general terms, the most basic terminology of databases, with a strong focus on relational databases. Reading through this page once should get you armed and ready to tackle a more in-depth study of databases, or help you to understand what's happening if you're using a database.

Database

Generally speaking, a database is a system used for storing data. Usually, these data will have some kind of relation to one another An address book will store a person's name, address, telephone number, e-mail, etc, and obviously it should be clear what address belongs to what person. An other database my reflect the staff of a company, and show who works at what department, who is in charge of that department, which division the department belongs and who is the supervising manager.

The purpose of a database is not only to store data. There must be ways to modify data, add new data, delete old data, and there must be mechanisms to use the data so that it becomes useful information : sort the data in one way ore another, only show data that matches certain criteria (How many woman are there in the Sales department ? What's their average age ? ...) etc.

In general, one can distinguish between the place where data are stored (the database : a file, a number of files, or a location on a hard disk), and the software that is used to access the database, look at the data, manipulate the data, etc. We'll get to that later - I just mention it because if, when I say 'database' and you think 'MS-Access', some of this can be confusing, as MS-Access, unlike a typical database system, puts everything (data, user interface (forms), reports, etc.) in 1 single file.

Relational Database

A relational database system is a database system that stores data in records. These records form tables, where each row is a record containing the data, and the field names become the column headers. The data is typically stored in more than one table (to avoid data redundancy - see 'Normal forms' further down), and a relation between the records in separate tables is established by using unique identifiers, so-called "keys". An example will make this clear :

PERSON-table

NameFirst_NameAddressPerson_ID
BosmansJosBovenveld 18 9260 Gentbrugge1
VercammenGreetBietenlaan 36 2569 Hoekinghe2

DEPARTMENT-table

DEPT_IDDepartment
1Human Resources
2Finance
3ICT

PEOPLE_DEPARTMENT_table

Person_IDDEPT_IDDepartment
11Head Of Department
12Head Of Department
23Programmer

this table shows who works where, by matching PERSON_ID to DEPT_ID, eg person "1" works in departments "1" and "2" as "Head Of Department", and "Greet Vercammen" is programmer in the ICT department.

Wikipedia : Relational Database

client-server model and SQL

Relational databases and the applications that use the data in it usually follow the client-server model : the database system runs on a server, and clients connect to the database over a network and send requests to it to store, retrieve, or modify the data. These requests are expressed in SQL, "Structured Query Language", a language designed specifically to work with relational databases and the data they hold.

Normal Form

Relational databases avoid data redundancy. This means every piece of data should only be present in the database once. Consider for instance a database that contains data about employees. You have a table with employees addresses, date of birth, ..., and the department where they work. Lets assume that, for some reason, the head of the Human Resources Dept. and the head of the Finance Dept are the same person. That person would have to be listed twice.

As a result, he'll show up twice on a list of employees. Even worse : any modification (eg a change of address) would have to be entered twice, once for each record. And the problem is : you have no way of knowing whether there is not a 3th record with the same data. You risk updating 1 address and not the other, creating an inconsistency in the database : an address list based on the data in the database will return 2 addresses for that person, and you don't know which one is correct.

Relational Databases solve this problem by distributing data over separate tables, so that each piece of data only exists once. The process of creating separate tables for separate but related data, is called 'normalizing the database'. There's a methodology to create so-called 'normal forms" : Database Normalization.

Queries / Structured Query Language

Queries are (often very small) small pieces of SQL code to retrieve, insert or modify data from a relational database. As the data is spread out over multiple tables, the query needs to indicate from with tables the data needs to be retrieved, and how (by which keys) these tables relate to each other. This is called a 'join" - you join tables based on records' unique keys.
Here are some SQL tutorials :

SQL is a standardized, vendor-independent language, but most rdbms vendors have created vendor-specific extensions and enhancements to it. These are mainly intended to offer additional programming features to your sql statements (eg conditional statements, loops, ...) to avoid having to create complex contraptions in standard SQL to get the same result.

Views

Views can be seen as 'virtual tables'. If you execute a SELECT query, the result is a dataset. If you store that query (in the database), you can re-use it later, so the view gives you immediate access to the dataset defined by that query. It behaves pretty much as a database table, i.e you can use it in other queries.

A different view, paper on the use of views in relational databases. In Dutch.

Stored Procedures

Stored Procedures are procedures that are stored in the database itself. They have a name, so that you can execute a procedure by calling it by name. They are used to manipulate the data automatically, either by a call from an application, or triggered by a given event (see triggers).

Triggers

Triggers are sort of event-handlers, and are usually defined to trigger stored procedures - things like 'as soon as a new record is inserted in this table, execute stored_procedure_1 ; and stored_procedure_1 then might be a procedure to create new records in other tables, with corresponding keys, or a procedure to check for and handle possible errors.

Because normalization can lead to lots of tables and therefore complex relations and elaborate joins in sql queries, causing degraded database performance, some database developers chose to de-normalize their design and use triggers and stored procedures to work around the consequences of not having a fully normalized database, e.g. by copying data, using triggered insert or update statements, etc.

Links : resources and references

Databases and SQL
Hands On
MySQL en MySQL Front-ends
Koen Noens
July 2002 (rewritten November 2007)