Case Study: eliminating redundant code in the Buffer library

ˇˇ

This case study is described in Jarzabek, S. and Li, S. "Eliminating Redundancies with a ˇ°Composition with Adaptationˇ± Meta-programming Technique," Proc. ESEC-FSE'03, European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, ACM Press, September 2003, Helsinki, pp. 237-246; the paper received ACM SIGSOFT distinguished paper award.

 

  1. Significance of the result of this case study

  2. "Feature combinatorics" problem

  3. Description of the original Buffer library
        3.1 Classes depicted in Figure 1
        3.2 Classes depicted in Figure 2

  4. Groups of similar classes in the Buffer library

  5. Examples of similar fragments

  6. Description of the Buffer x-framework

  7. Step-by-step explanation  of constructing x-frame [T]Buffer

  8. The original Buffer library versus the XVCL solution

  9. References

ˇˇ

1. Significance of the result of this case study

In this experiment, we demonstrate  XVCL's potential to reduce program complexity by eliminating redundant codes. We also show how XVCL can be effectively applied on top of OO techniques, enhancing program maintainability and reusability.

Redundant code obstructs program understanding and contributes to high maintenance costs. While most experts agree on that, opinions on how serious the problem of redundancies really is and how to tackle it - differ. In this case study, we analyzed redundancies in the Java Buffer library, JDK 1.4.1, recently released by Sun. We found that at least 68% of code in the Buffer library is redundant in the sense that it recurs in many classes in the same or slightly modified form. We effectively eliminated that 68% of code at the meta-representation built with XVCL. If we take into account both executable code and comments, then we can eliminate 72% of code. Due to the smaller code base, lack of the redundant code and ability to trace the impact of changes, our XVCL solution scores higher than the Buffer library in terms of ease of maintenance. In this experiment, we designed our meta-representation so that we could produce buffer classes in exactly the same form as they appear in the original Buffer library. While we were tempted to re-design the buffer classes, we chose not to do so, to allow for seamless integration of the XVCL solution into contemporary programming methodologies and systems. This decision did not affect essential results reported in this study.

More details of the case study can be found in the paper Eliminating Redundancies with a ˇ°Composition with Adaptationˇ± Meta-programming Technique  

Defining redundancy: Code fragments that we studied typically contain definitions of classes, class attributes, constructors or methods. Redundancy occurs among similar code fragments. It is difficult to define redundancy in general and descriptive terms. Therefore, we shall accept the following pragmatic definition of redundancy for the purpose of this study: Redundancy occurs in a group of similar code fragments, whenever we can unify all the differences among those fragments at the meta-level, with the XVCL technique. We also require that the result is beneficial for maintenance, that is, such a unified meta-representation should be easier to understand and maintain than the original program with redundant code fragments.    

[back to top]
ˇˇ

2. "Feature combinatorics" problem

This experiment was inspired by Don Batory's analysis of the "feature combinatorics" problem [2], Biggerstaff's work on limits of concrete component reuse [4], and discussions of OO techniques in books by Paul Bassett [1] and Czarnecki/Eisenecker [5]. Batory studied C++ data structure class libraries, where typical features relate to data structure, memory allocation scheme, access mode, concurrency, etc. In general, features may relate to any characteristic of a program such as functionality, design solutions, platform, etc. The problem manifests itself as follows: Features may appear in classes in many different combinations. As we need a unique class for each legal combination of features, we must develop and maintain a large number of similar classes. Batory concludes that "today's method of constructing libraries is inherently unscalable ... Libraries should not enumerate complex components with numerous features ... to be scalable, libraries must offer much more primitive building blocks and be accompanied by generators that compose blocks to yield the data structures needed by application programmers." In the same paper, Batory applied a generation technique of GenVoca [3] to produce classes with required combination of features from a much simpler set of primitive building blocks than the original classes. In 1994, Biggerstaff further analyzed the library scaling problem and limits of concrete component reuse caused by "feature combinatorics" [4]. The "feature combinatorics" problem hampers the scalability of class/component libraries, reuse and programmers' productivity in general. 

Redundant code in classes is one of the symptoms of the "feature combinatorics" problem. As the number of feature combinations increases, classes in a library not only grow in number, but also become polluted with numerous redundant code fragments. Redundancies complicate change and  contribute to high maintenance cost. Redundant code obstructs program understanding during the maintenance in, at least, two ways: (1) a programmer must maintain more code than he/she would have to maintain should the redundancies be removed, and (2) when one logical source of change affects many replicated code fragments scattered throughout a program, to implement a change, a programmer must find and update all the instances of the replicated fragment. The situation complicates further if an affected fragment must be changed in slightly different ways, depending on the context.

 [back to top]

3. Description of the original Buffer library

A buffer contains data in a linear sequence for reading and writing. A Buffer class library in our case study is a part of java.nio.* (new Input/Output) packages JDK 1.4.1. Buffer classes differ in features such as a buffer element type, memory allocation scheme, byte ordering and access mode (Table 1). Each legal combination of features yields a unique buffer class. That is why, even though all the buffer classes play essentially the same role, there are 74 classes in the Java  Buffer library, as shown in Figures 1 and 2. 

Table 1. Features in the Buffer library

Level in class hierarchy Feature dimension Features
level 1 buffer data element type byte, char, int, float double, long, short
level 2 memory allocation scheme direct, nondirect
level 2 byte ordering native, non-native, Big_endian, Little_endian
level 3 access mode

writable, read-only

The Buffer class at the top and the seven classes at level 1 in both figures are the same. Subclasses of each of the seven classes are indicated at level 2 in both figures. For example, class ByteBuffer has two classes, named HeapByteBuffer and DirectByteBuffer at level 2 in Figure 1 (we do not count class MappedByteBuffer which is just an interface implementation class). Class CharBuffer has five subclasses, three of which  (HeapCharBuffer, DirectCharBufferS, DirectCharBufferU) are at level 2 in Figure 1 and two (ByteBufferAsCharBufferB and ByteBufferAsCharBufferL) are at level 2 in Figure 2 (we do not count class StringCharBuffer). Each of the 5 classes IntBuffer, DoubleBuffer, FloatBuffer, LongBuffer, ShortBuffer, has 5 subclasses at level 2 in Figures 1 and 2. Each of  the classes at level 2 in Figures 1 and 2 has a subclass shown at level 3.Class sub-hierarchies for DoubleBuffer, FloatBuffer, LongBuffer, ShorBuffer are analogical to sub-hierarchies for classes CharBuffer and IntBuffer, so for the sake of brevity we do not depict them in Figure 1.

ˇˇ

see Buffer source see MappedByteBuffer source go to the SPC for this level goto the SPC for Heap[T]Buffer goto the SPC for Heap[T]Buffer goto the SPC for Heap[T]Buffer goto the SPC for Direct[T]Buffer[SU] goto the SPC for Direct[T]Buffer[SU] goto the SPC for Direct[T]Buffer[SU] goto the SPC for Direct[T]Buffer[SU] goto the SPC for Direct[T]Buffer[SU] goto the SPC for Heap[T]BufferR goto the SPC for Heap[T]BufferR goto the SPC for Heap[T]BufferR goto the SPC for Direct[T]BufferR[SU] goto the SPC for Direct[T]BufferR[SU] goto the SPC for Direct[T]BufferR[SU] goto the SPC for Direct[T]BufferR[SU] goto the SPC for Direct[T]BufferR[SU]

Figure 1. The core 49 classes in the buffer library

see Buffer source see StringCharBuffer source go to the SPC for this level goto the SPC for ByteBufferAs[T]Buffer[BL] goto the SPC for ByteBufferAs[T]Buffer[BL] goto the SPC for ByteBufferAs[T]Buffer[BL] goto the SPC for ByteBufferAs[T]Buffer[BL] goto the SPC for ByteBufferAs[T]BufferR[BL] goto the SPC for ByteBufferAs[T]BufferR[BL] goto the SPC for ByteBufferAs[T]BufferR[BL] goto the SPC for ByteBufferAs[T]BufferR[BL]

Figure 2. The remaining buffer classes

[back to top]

3.1 Classes depicted in Figure 1

At level 1 in the class hierarchy, we see seven classes that differ in buffer element data types.

Byte ordering matters for buffers whose elements consist of multiple bytes, in case of our library - all the element types but byte. For a variety of historical reasons, different CPU architectures use different native byte ordering conventions. For example, Intel microprocessors put the least significant byte into the lowest memory address (which is called Little_Endian ordering), while Sun UltraSPARC processors put the most significant byte first (which is called Big_Endian ordering).

When using the direct memory access scheme, we must know if buffer elements are stored using native or non-native byte ordering. Twenty new classes at level 2 result from combining memory access and byte ordering features. (We do not count MappedByteBuffer which is just a helping class.) For each buffer class at level 1, we have one Heap* class at level 2 that implements the nondirect memory access scheme for that buffer. Classes with suffixes 'U' and 'S' implement direct memory access scheme with native and non-native byte ordering, respectively. We have only one class DirectByteBuffer, as byte ordering does not matter for byte buffers.

Buffers discussed so far are writable. Twenty classes at level 3 implement read-only variants of buffers.

[back to top]

3.2 Classes depicted in Figure 2

Figure 2 shows 25 classes derived from CharBuffer, IntBuffer, DoubleBuffer, FloatBuffer, LongBuffer and ShortBuffer. The two subclasses, ByteBufferAsCharBufferB and ByteBufferAsCharBufferL at level 2, can reform ByteBuffer to CharBuffer, using Big_Endian ordering for the former while Little_Endian for the latter. Buffers for Int, Double, Float, Long and Short types, also have these similar subclasses as shown at level 2. StringCharBuffer class in shade at level 2 is mainly for make a non-direct, read-only CharBuffer from a CharSequence. This class will not be analyzed for clarity.

At level 3, there are 12 read-only subclasses, of which each is derived from a corresponding class at level 2.

[back to top]

4. Groups of similar classes in the Buffer library

According to the similarities, 71 classes (all classes except Buffer, MappedByteBuffer, StringCharBuffer) are categorized into 7 groups as follows:

  1. [T]Buffer: contains 7 buffer classes of type T ( level 1 in Figures 1 and 2).  T denotes one of the buffer element types, namely, Byte, Char, Int, Double, Float, Long, Short.

  2. Heap[T]Buffer: contains 7 Heap classes of type T (level 2 in Figure 1

  3. Direct[T]Buffer[S|U]: contains 13 Direct classes (level 2 in Figure 1 U denotes native and S - non-native byte ordering.

  4. Heap[T]BufferR: contains 7 Heap read-only classes (level 3 in Figure 1).

  5. Direct[T]BufferR[S|U]: contains 13 Direct read-only classes (level 3 in Figure 1.

  6. ByteBufferAs[T]Buffer[B|L]: contains 12 classes  (level 2 in Figure 2 providing views T of a Byte buffer with different byte orderings (B or L). T here denotes buffer element type except Byte. B denotes Big_endian and L - Little_endian byte ordering.

  7. ByteBufferAs[T]BufferR[B|L]: contains 12 read-only classes  (level 2 in Figure 2 providing views T of a Byte buffer with different byte orderings (B or L). T here denotes buffer element type except Byte. B denotes Big_endian and L - Little_endian byte ordering.

[back to top]

5. Examples of similar fragments

You may view examples of similar code fragments(PDF) in different groups of classes.

[back to top]

6. Description of the Buffer x-framework

We built an x-framework from which we generate  buffer classes. We "normalized" the Buffer x-framework to eliminate the redundant code. 

For each of the seven groups of similar classes, we designed x-frames to generate all the classes in that group. We named a "master x-frame" for each group after the name of the corresponding class group (Figure 3). We created generic, reusable x-frames representing redundant codes. These x-frames, as well as x-frames representing unique code fragments, are included after adaptations into different classes rather than repeated as in the original buffer classes. 

 The SPC BufferAll.s navigates the whole process of generating all the buffer classes.

Click on the x-frame nodes in Figure 3  to view the x-frames for seven  groups of similar classes. 

Figure 3. SPC x-frames in the Buffer x-framework

[back to top]

7. Step-by-step explanation  of constructing x-frame [T]Buffer

You may view a step-by-step explanation of how we constructed x-frames for the [T]Buffer group of classes that includes classes ByteBuffer, CharBuffer, IntBuffer, DoubleBuffer, FloatBuffer, LongBuffer, and ShorBuffer.

[back to top]

8. Comparison of the original Buffer library versus the XVCL solution

Table 2. Original Buffer library vs. XVCL solution

Figure 4. Buffer library: summary chart

Table 2 shows the results of comparing the original Buffer classes with the XVCL solution in terms of LOC, with and without comments. XVCL solution eliminates 68% of the code as compared to the original buffer classes. If we take into account both executable code and comments, then we can eliminate 72% of code. We prefer to include comments in the statistics, as both code and comments are needed to understand a program, and both must be maintained. So the size of the executable code with comments is a better indicator of maintenance effort than code alone. Also, it benefits programmers if they apply XVCL to manage both executable code and comments.

The main return on investment in applying XVCL is due to savings during library maintenance. The principle of XVCL design is clean separation of various sources of change affecting a program. Each source of change (e.g., variation of the features listed in Table 1) can be traced to codes affected by this change. Lack of redundant code reduces the number of points at which affected classes must be modified. Changes done to one meta-fragment consistently propagate to all the contexts in which the fragment is used. If the impact of change is not uniform in all such contexts, the exceptions can be handled at the specific adaptation point, without affecting the code fragment involved. From each meta-class we can trace how different feature combinations affect the code. The meta-component architecture explicitly reflects the impact of change on program elements.

Design in terms of meta-components is different from the inheritance-based program design. It takes some time to adjust to this way of thinking about a program. But as this perspective so well enhances concerns that matter in maintenance, eventually this shift of the viewpoint pays-off.

[back to top]

9. References

[1] Bassett, P. 1997. Framing software reuse - lessons from real world, Yourdon Press, Prentice Hall
[2] Batory, D., Singhai, V., Sirkin, M. and Thomas, J. "Scalable software libraries,", ACM SIGSOFT'93: Symp. on the Foundations of Software Engineering, Los Angeles, California, Dec. 1993, pp. 191-199
[3] Batory, D. and Geraci, B.J. "Validating component compositions and subjectivity in GenVoca generators," Trans. on Software Engineering, 23, 2, 1997, pp. 67-82
[4] Biggerstaff, T. "The library scaling problem and the limits of concrete component reuse, " 3rd Int. Conf. on Software Reuse, ICSR'94, 1994, pp. 102-109
[5] Czarnecki, K. and Eisenecker, U. Generative Programming: Methods, Tools, and Applications," Addison-Wesley, 2000

[6] Jarzabek, S. and Li, S. "Eliminating Redundancies with a ˇ°Composition with Adaptationˇ± Meta-programming Technique," Proc. ESEC-FSE'03, European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, ACM Press, September 2003, Helsinki, pp. 237-246; the paper received ACM SIGSOFT distinguished paper award

 


[back to top]