Data dictionary HAND CRAFTED
key definitions and concepts
A data dictionary is a design tool that helps database developers to be able to define the exact characteristics of each database field and to describe the data that is allowed to be stored or formatted in this specific field in order for it to be considered reasonable.
Data types represent how data is stored. Data types can include:
- Text (character, string, varchar)
- Number (integer, long integer, floating point, currency)
- Date (long/medium/short, timestamp)
- Boolean (True/False, Yes/No)
Naming conventions exist to make it easier for a reader of a database to understand what they are viewing. There are a few basic naming conventions used when working with Microsoft Access databases (and RDBMS in general).
- Prefixes are used for forms (frm), tables (tbl), queries (qry) and reports (rpt).
- Prefixes are also used for each individual field within a table. The prefixes chosen need to relate to the overall table name. For example the FirstName field within a Customer table would be called cusFirstName
- By using CamelCase formatting, a database object or field can use more than one word to create a meaningful name, but they need to be joined together without a space, but including a capital letter. With the previous example, we used cusFirstName instead of cus First Name.
- CamelCase is used because many programming functions that can be used by an RDBMS (outside of the scope of our learning) need file or field names to be without a space otherwise programs will execute incorrectly.
Databases also use electronic validation to ensure that when data is entered into a field, the data is considered reasonable. It does not need to be correct (only manual validation can ensure this) but it does need to reasonable to allow future processing. For example an age field should only be able to store values between 0 and 125. Any value other than these (for example "old enough" or even "sixty" as it is a text value) is not reasonable and would not be accepted into the database.
An input mask is a method used by database designers to ensure that specific values are entered into a database field. These are commonly helpful for Post Codes (ensuring only numbers are entered, and only 4), mobile phone numbers (ensuring the formatting is consistent e.g. 0400 000 000) and dates (ensuring dates are entered consistently - e.g. dd/mm/yy).
Input mask formats are entered in a specific way. If a number value needs to be entered, but it can be any number, then the number 9 is used. If a specific number needs to be entered, then it is shown. For example if you need to enter in only a Victorian post code, you could have an input mask of "3999". This will allow only the last three digits to be entered and the first will always be a 3. When a mobile phone number is entered, a desired format may be "0999 999 999". The user will not be able to over type the 0, or the spaces and they will only be able to type nine numbers into the database record.
When creating a data dictionary, using a predefined template can make life easier for designers. Outlined below is how to use the template:
- Table name: Name of the table (taking into account table naming conventions)
- Field name: Name of the field (taking into account field naming conventions)
- Data type: The specific data type - how the data is going to be stored.
- Field size: Specifically related to text fields, this allows a maximum field length to be provided. By limiting field size for data entry, you can ensure that a database's file size does not become too large.
- Input mask: List specifically if there are any input masks to guide data input. Remember conventions shown earlier (9 for numbers, 0 for leading zeroes, spaces for mobile phone numbers).
- Caption: When users are entering data, they do not need to see cusCustomerFirstName even though it may be the field name into which the database is storing the value. Find something meaningful such as "First Name:" which will be visible for data entry into tables and forms, and be much more user friendly for the data entry operator.
- Description: A brief (few word) description of what the data in the field represents.
- Validation rule: If there are any validation rules, these should be listed in the data dictionary. These can include: lists (only specific values can be selected from), between or greater than or less than a specific value.
- Validation text: If the validation rule is broken, Microsoft Access (and many other RDBMS) will provide an error message, but it might not be overly meaningful to the data entry operator. Here we can specify clearly the message that we would like the end user to read, and for them to be able to correct their error and resubmit.
key skills to demonstrate knowledge (#7)
Create a Data Dictionary from data examples or solution requirements provided
You will need to be able to create a data dictionary in response to some sort of stimulus material, whether this be through Case Study information (solution requirements) or examples of data stored.
We can start with the database given to us in 3NF (Third Normal Form) that we have created as part of the normalisation process of database design.
We also need to apply a list of business rules that have been defined as part of analysing the data provided in the spreadsheet. They should really be listed as part of the solution requirements, but occasionally business rules are clarified during a design phase of a project. Here are some of the business rules for Pizzas on the Go:
- Order IDs have a five digit number, Customer IDs have a four digit number. Both of these IDs have leading zeroes (e.g. 00012 instead of 12).
- Pizza IDs have a three letter code
- Crusts must be either Deep, Thin or Gluten-Free
- For each pizza order line item between 1 and 10 pizzas can be ordered. (If a customer wants to order 11 Hawaiian pizzas they will need to enter two orders.)
- All pizzas are priced between $4.95 and $10.95
- Only postcodes 3400 and 3401 are acceptable for pizza deliveries (though there may be more than two suburbs that contain these postcodes)
Given these rules, it is now time to create our data dictionary for our five tables: tblOrders, tblOrderDetails, tblPizzas, tblCustomers, and tblPostcodes
Note the use of prefixes for the table (tbl) and fields (ord) as well as the use of different data types for each field. There are two input masks: ordOrderID has an input mask of five underlines which indicates an underline will be placed in the data entry field for the operator to type over. ordOrderDate has an input mask of dd-MMM-yy which means that each date will be stored in the 9-Apr-16 format. Captions are short and meaningful, as are the descriptions. No validation is needed for this table.
The odetOrderID field needs to be stored in an identical way to the ordOrderID field because of the one to many relationship. The odetPizzaID will be stored identically to the PizzaID field to be created in another table, again to allow for a database relationship to be created. Validation rules and validation text details have been created for the Crust type (odetCrustType) and Quantity (odetQty) fields.
Note that the pizPizzaID has the same data dictionary values as the odetPizzaID in the tblOrderDetails table. An input mask of 99.99 has been created for pizPizzaPrice as well as validation rules and text. (Note: The numeric data type for pizPizzaPrice will need to be clarified when we actually create the database to ensure that the decimal places can include cents and are not just rounded to the nearest integer value.)
Shown above is a standard customers table apart from possibly the recording of membership details in a Boolean (yes/no) field. There are two input masks required for the cusCustomerID and cusMobile fields.
There are only two fields in this small table, however both need to be correctly defined. pcSuburb needs to be created identically to cusSuburb to allow a relationship to be generated. pcPostCode needs to have a validation rule and validation text created to ensure that only people from 3400 and 3401 codes can get their tasty pizzas delivered.
Test data
Normally at this stage of a project, test data would be created in order to ensure that all validation fields are working correctly. As part of User Acceptance Testing, the key business users would attempt to insert valid and invalid records into the database and confirm whether the project has been successful.
Sample test data that would be useful to create for this case study:
- A crust type that is: deep, thin, gluten free (each of the three valid responses) and cheesy based (invalid answer)
- A quantity of 0, 1, 10 and greater than 10 on a specific order detail line. The 0 and >10 options should fail validation, but the 1 and 10 should be accepted.
- Price per pizza: $4 (or any value lower than $4.95), $7.50 (or any acceptable value between $4.95 and $10.95) and $12.50 (or any value greater than $10.95) - here you would be testing an incorrect value that is too high, too low and a value that should be correctly entered into the database
- Postcodes for suburbs that are 3400, 3401 and any other value (e.g. 3000) - the 3400/3401 should be entered successfully but the 3000 record should be rejected.
- Testing the relationships (otherwise known as referential integrity) should be completed - where for each Order Detail line there must be an Order and a Pizza existing in the appropriate tables (tblOrders and tblPizzas) and for each Order there needs to be a Customer who has an address in an acceptable Suburb existing in the appropriate tables (tblCustomers and tblPostcodes). This can be tested by trying to insert records into tables where they have no related record.
- Working out in which order to enter values into the database can be tricky. From the relationships in the Pizzas to Go database, the data needs to be entered in a specific order: first tblPostcodes and tblPizza, then tblCustomers, then tblOrders and finally tblOrderDetails.
- What this means is that when entering initial values into the database, it is logical to enter in the postcodes that you will be delivering to as well as the types of pizzas on the menu before taking orders from customers.
Now that the data dictionary has been created by hand, we can build the database in our RDBMS, moving from the Design stage to Development