Separate speaker notes to accompany slide presentation on data validation or data editing;


Slide #1:

This presentation will deal with the concepts of checking input data to make sure it is valid and the types of validation that can be done.

Slide #2:

Editing or validating data means it should be checked as it comes into the system.  In a good system, data is never entered more than once, so if you validate it on entry then you help to assure integrity in the data that will be processed.

A systems flowchart shows input, processing and output.  The diagram on this slide is a systems flowchart.

A transaction is the data being keyed in indicating something happened.  It could be a payment, it could be adding a new employee, it could be adding receipts into inventory.  Each of these is called a transaction.

This is a very basic approach.  The problem with it is that the errors are written to paper and then must be keyed in all over again.  Potentially while some errors are fixed, others may be introduced and the transaction ends up on the error report again.

Slide #3:

The difference here is that the data entry is being done to disk with no editing taking place.  In the previous example, the data entry was being keyed in from a screen and being edited when the user/person keying in indicated it was time to check.

There are two types of data collection.  In one the data entry person is keying in data with an emphasis on speed and accuracy.  They are being given no chance to interact with the data or make corrections.  The data is simply being collected.  This means that the person keying in the data does not have to understand the data, they just need to enter it.  Data entry people doing this job are hired for their speed and accuracy so they do not have to have any knowledge of the data.

Slide #4:

The person doing data entry where the program gives them feedback is usually knowledgeable about the data.  A payroll clerk is inputting payroll data and an accounts receivable clerk is inputting accounts receivable data.  Here the emphasis is on knowing the data (obviously speed and accuracy are still important but knowledge of the data is key).  When the system finds an error, it can give the information to the user.  Only data that is hopelessly in error and cannot be corrected is sent to the error transaction report. 

There is still a problem with this kind of processing.  Errors that make it to the error report still have to be keyed in again.

Slide #5:

This shows invalid transactions going to disk where they can be fixed by showing them on a screen and making the changes.  This avoids the re-entry problem.  Note that I still create a report for a paper trail in many instances.

It should be noted that in many companies there is also a trail of good transactions written to paper.  This is less common in our paperless society.

Slide #6:

The style of report varies tremendously and the information required may be far more than indicated on this slide.

Slide #7:

Starting with this slide, we will cover examples.

In this example, I am checking to see if name is there.  Before entering the validate routine, I set invalid indicator to no.  If I find a problem, I change it to yes.  This means that when I am done validating a record, those with an invalid indicator of yes will not be written to the good transaction file.

Note that no processing needs to be done if the name is there.  Different languages will check in different ways.

Slide #8:

Sometime you want just uppercase  letters in a string field.  You can check to make sure the data is in the range A - Z. What you are looking at is the ASCII or EBCDIC code for the uppercase letters when you make the comparison.

Slide #9:

Sometimes it is easier to test for good data and do no processing when it is true.  This means the error processing is handled in the else.  In this example, this was the easiest approach.  In the example on the previous page where I was testing for the range A - Z, the valid test would be >="A" AND <="Z".  We will examine this a little closer on the next slide.

Slide #10:

Checking to see if a character is an uppercase letter is really checking to see if it is in a specific range.

The use of the AND OR logic should be closely examined.

I am using OR when I want to test to see if it is outside the range on either end, I am using AND when I am testing to make sure it is in the range.  When testing with the AND if either part of the AND is not true, then the code is not valid.  See the flowchart on the next slide.

Slide #11:

This shows the logic of the test discussed on the previous slide. Again the test varies if you are looking for what is outside the range or if you are looking for what is inside the range.

Slide #12:

Again notice that I use the AND to test that it is within the range and the OR to test if it is out of the range on either side.

Slide #13:

This shows the test to make sure that full time employees have a pay in the range of 10 to 25.

Slide #14:

This puts out an error of zipcode 02184 that is not in MA.  The error is on the if state question because that is where the problem is.  I did not specify the processing I wanted to do if the zipcode was not 02184.

Slide #15:

Today's date is usually gotten from testing the date on the computer or the date on the server.

There are date functions in languages that will allow this comparison to be made. The date function does the conversion that allows a meaningful comparison.

Slide #16:

Note in this example, I am not considering overtime hours at all.  I just want to make sure that the regular hours and vacation hours and sick hours add up to 40.

Slide #17:

This shows two errors being produced.  One if there is data in pay per hour and one if there is not data in salary.  Salaried workers should have an entry in salary and no entry in pay per hour.

Slide #18:

This slide discusses batch editing, which is a way to check for transposed digits, or data entered incorrectly when no validation test can be constructed.  It is used with numeric data to make sure that payments, receipts etc are entered correctly.

Slide #19:

A check digit will catch transposed digits within the id. There are many types of formulas that can be used to calculate a check digit.

Slide #20:

This shows data validating/editing within the context of a program.  The portion that deals with validating data is on the next slide.

Slide #21:

Valid data will have something in the name field and the amount field will not be greater than 5000.

Note that I write the errors as I encounter them.  That is because one record could have multiple records and I have decide to report each individually.  Instead, I could have set up something in memory and processed the errors all together.

When I complete the validate routine, I return to process record loop and check to see if the invalid indicator has stayed no.  This tells me that no records were encountered so I write the transaction to the file of good transactions.  If there were errors, no processing is done.  Note again, I handled the errors as I encountered them.

Then I read another record and start the processing again.

A typical program would clearly check for more than two errors, but the methodology is the same.  Each error is checked for individually because I want to find all possible errors.