Case Study: Key Word In Context (KWIC) Product Line

  1. KWIC problem description

  2. How does a KWIC system work?

  3. Variants in KWIC product line

  4. Applying XVCL to handle variants in KWIC product line

  5. Constructing KWIC systems with XVCL processor


1. KWIC problem description

The KWIC [Key Word in Context] index system accepts an ordered set of lines; each line is an ordered set of words, and each word is an ordered set of characters. Any line may be “circularly shifted” by repeatedly removing the first word and appending it at the end of the line. The KWIC index system outputs a list of all circular shifts of all lines in alphabetical order."

On the Criteria for Decomposing Systems into Modules. David Parnas. CACM, 1972

As Parnas points out, the KWIC systems are small systems (except under extreme circumstances such as huge data base). However, we use this well-known example to illustrate principles of our XVCL-based method, treating this problem as if it were a large project.

2. How does a KWIC system work?

The following example demonstrates how a KWIC system works: 

pInputs: Sequence of lines
Pipes and Filters
Architectures for Software Systems
pOutputs: Sequence of lines, circularly shifted and alphabetized
and Filters Pipes
Architectures for Software Systems
Filters Pipes and
for Software Systems Architectures
Pipes and Filters
Software Systems Architectures for
Systems Architectures for Software

3. Variants in KWIC product line

All the KWIC systems are similar in the sense that they satisfy the basic requirements as stated above. However, there are also differences across KWIC systems related to functional requirements, design decisions and implementation details. These differences (variants) yield a KWIC product line. We shall describe some of the variants, giving each variant a unique name for future reference.

Functional variants

·         NOISE_WORDS: In some KWIC systems, circular shifts that start with noise words (such as a, the, and, etc.) should be eliminated. Noise words may differ from one KWIC system to another.

·         INPUT_METHOD: Lines can be read from a file, or from system console. Values FILE and CONSOLE denote these two options.

·         OUTPUT_METHOD: Indexed lines can be output to a file, or to system console. Values FILE and CONSOLE denote these two options.

·         CASE_SENSITIVE: Case sensitivity may or may not be taken into account during sorting lines. Values SENSITIVE and INSENSITIVE denote these two options.

Variant Design Decisions

Some variants in KWIC domain that affect architectural design are:

·         SHIFT_PROCESSING: Line shifting can be performed on each line as it is read from the input device or after all the lines have been read. Values EACHLINE and ALLLINES denote these two options.

·         SHIFT_DATA: Circular shifts can be stored explicitly (as a set of strings) or implicitly (as pairs of index and offset). Values EXPLICIT and IMPLICIT denote these two options.

Implementation-level variants

·         PACKAGE: KWIC components for different systems may belong to different Java packages.

·         NEW_ATTRIBUTES: This variant caters for the situations when new, unexpected changes in requirements require us to add more attributes to a KWIC class.

·         NEW_METHODS: This variant caters for the situations when new, unexpected changes in requirements require us to add more methods to a KWIC class.

·         NEW_IMPORTS: This variant caters for the situations when new, unexpected changes in requirements require us to include more packages to a KWIC system.

Using UML component diagram and extension mechanism, we can model the impact of variants on KWIC architecture as follows (only functional variants and variant design decisions are modeled):

4. Applying XVCL to handle variants in KWIC product line

KWIC x-framework:

KWIC x-frames (the whole package can be downloaded here):

KWIC SPC: An specification x-frame (SPC) for a specific KWIC system

x_MasterControl: An x-frame for constructing the MasterControl component

x_CircularShift: An x-frame for constructing the CircularShiftl component

x_AlphabeticShift: An x-frame for constructing the AlphabeticShift component

x_LineStoraget: An x-frame for constructing the LineStorage component

x_Input: An x-frame for constructing the Input component

x_Output: An x-frame for constructing the Output component

FileIO: An x-frame for a generic file input/output class

BubbleSort: An x-frame for a generic double-bubble sort algorithm

InsertionSort: An x-frame for a generic insertion sort algorithm

Pairs: An x-frame for making pairs of index and offset

ErrorHandling: An x-frame for handling errors in KWIC systems.

MemoryIndicator: An x-frame for checking the "health" status of system memory usage

5. Constructing KWIC systems with XVCL processor

You need an XVCL processor v2.0 (beta 1), JDOM (8.0 or above) and 
a JDK 1.4 (or above) to run the KWIC example. Make sure that all needed packages are in your class path.

The command for XVCL processing is as follows:
      java xvcl.XVCL kwic_spc.s

(A sample batch file xvcl.bat is included in this example.)

Please note that this example sets the XVCL DTD path to file:///f:\XVCL_1.0_beta\dtd\xvcl.dtd. 
You may change it to reflect your XVCL settings.

Concrete KWIC components (Java classes) will be generated after the XVCL processing. 

You may then compile the constructed program and execute the KWIC:
      javac *.java -d classes
      cd classes
      java kwic/kwic [Input_File] [Output_File]

The file Input_File contains input text lines. The file Output_File contains the sorted circularly-shifted lines.

You may try to modify the KWIC_SPC file to generate different versions of the KWIC systems, 
and observe how the XVCL works.