Week 8
This week we look at Chapter 6, Association Analysis: Advanced Analysis. Two separate videos have been produced for the chapter.
Be sure to read the chapter in your textbook. A separate problem document has been posted to Blackboard.
The document is purposely created in Microsoft Word so you can enter your answers into the document. You will need to use Microsoft Excel for a portion of your answers. Document attached. YOUR ANSWERS MUST APPEAR WITHIN THE PROBLEM DOCUMENT.
10% WILL BE DEDUCTED IF YOU CREATE A NEW OR SEPARATE DOCUMENT.
10% WILL BE DEDUCTED IF YOU CREATE A "TITLE PAGE" TYPE OF DOCUMENT.
Chapter 6 Problems
10% WILL BE DEDUCTED IF YOU CREATE A NEW OR SEPARATE DOCUMENT.
10% WILL BE DEDUCTED IF YOU CREATE A “TITLE PAGE” TYPE OF DOCUMENT.
Use the following table to answer question #1.
1a. Using an Excel spreadsheet, create a binarized version of the data set with the following categories:
Note: the following are also the itemset names in the spreadsheet)
Sky Fair, Sky Stormy, Status Impaired, Status Sober, Violation None, Violation Speeding, Violation Stop,
Violation Signal, Restraint = No, Restraint=Yes, Crash Major, Crash Minor
Paste the Excel spreadsheet into this document here.
1b. What is the maximum width of each transaction in the binarized data?
1c. How did you determine the answer for item 1b?
1d. Assuming that support threshold is 30%, how many candidate and frequent itemsets will be generated?
1e. Again using Excel, create a data set that contains only the following asymmetric binary attributes:
(Weather = Bad, Impaired, Traffic violation = Yes, Restraint = No, Crash Severity = Major).
For Traffic violation, only None has a value of 0. The rest of the attribute values are assigned to 1.
Copy and paste the Excel spreadsheet here:
Assuming that support threshold is 30%, how many candidate and frequent itemsets will be generated?
1f. Compare the number of candidate and frequent itemsets generated in 1(d) and 1(e). What is your
analysis?
2. Find all the frequent subsequences with support >= 50% given the sequence shown below. Assume there are
no timing constraints imposed on the sequence.
3. For each of the sequences w =< e1e2 . . . ei . . . ei+1 . . . elast > given below, determine whether they are
subsequences of the sequence
< {1, 2, 3} {2, 4} {2, 4, 5} {3, 5} {6} >
subjected to the following timing constraints:
mingap = 0 (interval between last event in ei and first event in ei+1 is > 0)
maxgap = 3 (interval between first event in ei and last event in ei+1 is ≤ 3)
maxspan = 5 (interval between first event in e1 and last event in elast is ≤ 5)
ws = 1 (time between first and last events in ei is ≤ 1)
• w =< {1} {2} {3} >
• w =< {1, 2, 3, 4} {5, 6} >
• w =< {1} {2, 4} {6} >
• w =< {1, 2} {3, 4} {5, 6} >

