Exploiting the web to manage software development artifacts:
the Labyrinth Project

Fabiano Cattaneo†, Alfonso Fuggetta‡†, Luigi Lavazza‡†, Giuseppe Valetto†

‡Politecnico di Milano

P.za L. Da Vinci, 32 - 20133 Milano (Italy)
Tel. +39-02-2399.1
E-mail: {fuggetta|lavazza}@elet.polimi.it

† CEFRIEL

Via Fucini, 2 - 20133 Milano (Italy)
Tel. +39-02-23954.1
E-mail: {cattaneo|valetto}@cefriel.it

1. Introduction

Software development activities are characterized by the production and maintenance of a large number of documents. Examples are requirement, design, and test specifications. These documents are interrelated by complex semantic relationships. For instance, a design document "depends" on the requirement specification document describing the user’s expectations about the system to be built. To maintain this complex set of information, software developers often employ specific tools such as configuration management environments (e.g. CCC, PVCS) and data management and tracking systems (e.g. DOORS, RTM). To store and manage document-related information, these tools typically exploit a database system. Certainly, database technology provides a reliable and mature support to data management. However, its centralized architecture and the complexity of the features it sports do not match with the trends and characteristics of the media of the future: Internet and the WWW.

The key characteristics of Internet are openness, scalability, and distribution. The WWW makes it possible to manage and distribute in a very flexible way a huge amount of information to numerous users worldwide. We believe that the management of software development documents (and documents in general) should be reconsidered according to this emerging and challenging vision. For this reason we have initiated the Labyrinth project, which aims at creating an innovative infrastructure to manage complex document collections, and related relationships, over the WWW.

As is, the WWW provides basic mechanisms for distributing information ubiquitously in a simple and inexpensive way. However, on the Web, data and relations among them have no particular semantics: the former are simply text, images, etc., the latter are just navigational links. The problem of building Web sites with complex, well-defined structure and links with specific meaning (as needed in software development processes) is generally tackled employing database systems to manage the published data [3]. The Labyrinth project aims at providing clear structure and additional semantics to sets of Web resources (datawebs in the following) without resorting to DBMS and using purely WWW technologies (HTTP, HTML, Java, etc.).

2. Project Objectives

The Labyrinth approach provides Entity/Relationship structure and semantics to dataweb documents, by supplementing them unobtrusively with additional WWW pages and hyperlinks that indicate the superimposed structure. In contrast with other similar approaches [1] [4], structural meta-information in Labyrinth is completely disjoint from the dataweb contents. This makes the Labyrinth approach relatively seamless, and allows a great flexibility (e.g., the schema is itself stored on the Web, and can thus be easily changed in order to meet applications' needs).

Each Web resource in a dataweb (e.g. a Web page, document, image, etc.) is considered as an instance of an Entity in an E/R schema and is described by an HTML page storing information that qualifies it (e.g., author, product name, and release date). The Entity concept is used to logically cluster Web resources and uniformly model their properties and contents. Relationships are sets of HTML pages: they provide sets of hyperlinks among Entity instances, which supplement the usual navigational hyperlinks found in HTML pages. The purpose of Relationships is to make explicit the meaningful associations between Entity data (e.g. dependency, composition, etc.).

The Labyrinth project has been inspired by two major requirements, and precisely:

Internet-wide distribution: on the WWW, related information is typically widely dispersed over any number of sites; it must be possible to construct a structured dataweb organizing and enabling the retrieval of information that spans all over the Web.
Lightweight approach: our architecture must impose minimal overhead on Web sites participating in a dataweb, as well as on clients accessing the information. It must neither mandate installing, configuring, running and maintaining complex or platform-dependent additional software, nor require any modification to typical Web-based interaction. This in order to achieve the maximum of scalability, generality, and simplicity of use.

In order to achieve these requirements, we have decided to:

Employ standard WWW technologies: we intend to use only technologies and protocols that are native and/or widely disseminated on the Web, avoiding any proprietary solutions. In particular, we mean to avoid using any kind of data repository extraneous to the WWW, such as any DBMS to store and maintain data and their relationships.
Achieve minimal deployment: we are striving towards zero-deployment both on the client and on the server side. For clients, this means using simply Web browsers. For servers, it means being able to dynamically install all needed computational and configuration components of the system as needed.

3. Architecture and implementation

Labyrinth is logically organized as follows:

Information concerning the schema is maintained in the home site of the Labyrinth dataweb (this is the only type of data that is centralized).
For each entity (relationship) defined in the schema, there is an HTML document (called directory). It contains the list of the URLs of the instances of the entity/relationship.
For each item, i.e. instance of an entity defined in the schema, there is one HTML document (called shadow document) that contain the item's meta-information as attribute values and hyperlinks to the actual document. Similarly for instances of relationships.

Figure 1 illustrates a fragment of an example of a Labyrinth dataweb (derived from an example in [5]) aiming at representing product features, and relationships of features (among themselves and with requirements and product descriptions provided by user documents). Notice that this is just an example and not the structure of all the datawebs that are supported by Labyrinth. In fact Labyrinth could be used to represent other artifacts and relationships in the software development domain, as well as documents and relationships of completely different domains.

It is possible to observe that the schema is on the home site, while site 1 hosts the directories of the "Product description" and "User document" entities, and of the "documents" relation. Also on site 1 are instances (i.e., shadow documents) of these items (note that "User document" has two instances). Instances of "User document" refer to actual documents stored on site 1, while the instance of "Product description" refers to a document located on site 2. The shadow document referring to "User document A" will contain values of attributes like the title and author of "User document A", whether it is compliant to any standard, whether it has been formally approved, etc. Instances of relations are represented as HTML pages, containing URLs which simply connect shadow documents.

We have produced an early prototype running on a network of standard WWW servers equipped with a set of Java servlets. Labyrinth clients are networked computers equipped with Java-enabled browsers. The current prototype provides a user-driven interaction mode, in which users access the dataweb and submit request to Labyrinth from their browsers through standard Internet formats and protocols (HTML and HTTP). Besides, it provides also a system-driven interaction mode, in which the Labyrinth servers inform users of updates on the pages they are viewing, thus implementing a push-style behavior. This is achieved by means of the event-based, lightweight middleware Jedi [2], implemented in Java and working according to a publish and subscribe paradigm. The architecture of Labyrinth is schematically reported in Figure 2. We intend to further explore the use of asynchronous communication to implement some server-to-server event exchange protocol, which will allow cooperation and coordination services in Labyrinth just like dataflow support in traditional DBMSs.

Figure 1. Logical structure of Labyrinth.

4. Benefits provided by Labyrinth

The Labyrinth approach superimposes onto a dataweb a separate hypertextual lattice that carries explicit information (like type, cardinality, constraints, associations etc.) concerning the properties and structure of that dataweb. Thanks to the information provided by such infrastructure, several operations are supported:

Structural and semantic browsing are possible, in alternative to the traditional navigational browsing of the Web. This means for instance that starting from a user requirements document page a developer can browse the documents that cooperate to satisfy the requirement.
Labyrinth guides the insertion, deletion and update of documents in a dataweb in conformance to the E/R schema imposed over that dataweb.
It is simpler and more effective to search the dataweb using traditional searching facilities (we use Altavista search engine). In Labyrinth the search is made on the infrastructure which contains schema-compliant information. In this way we can easily find "source code files that have been modified by Mr. Brown on April 3rd 1998".
It is possible to exploit the schema and the infrastructure to augment searches and carry out queries similar to those supported by traditional databases. For instance, we can find "the set of all artifacts affected by a change to a given user requirements document" by following the relationships departing from it.

Figure 2. Overall architecture of Labyrinth.

Finally, a Labyrinth dataweb can incorporate and associate documents everywhere on the Web, not simply map the content of a proprietary Web site or Intranet. A Labyrinth Entity can refer to any WWW resource (i.e., a page not originally published by the dataweb creators). For instance, a page describing a C++ source file could contain a link to a page on object-oriented design patterns, if this is considered useful to understand the code.

5. Conclusions

We have presented the Labyrinth approach to structuring and assigning well-defined semantics to a collection of Web resources. It is based on a few key principles:

Keeping the actual data separate from the meta-information that enables the semantic interpretation and the definition of explicit meaningful relationships among the data;
Managing the information according to its assigned structure and semantics on an Internet-wide scale;
Providing lightweight computational facilities that scale well and minimize deviations from the architectural model and the interaction paradigm of the WWW.

Our current research work is centered on two main topics:

  1. Labyrinth is halfway between common databases and the Web. These two types of environments have very different features as far as consistency and concurrency issues are concerned. As a consequence, it is not immediate to understand what kind of features should Labyrinth offer concerning consistency and concurrency. We are exploring a controlled form of inconsistency. That is, updates to the dataweb may possibly leave it in a state that is not consistent with the schema. However, when accessing a document, the user is informed if it is not consistent with the schema and is possibly guided towards the resolution of the inconsistency. We maintain that this is acceptable since it is coherent with both the kind of behavior currently presented by data that is published over the Internet and the kind of applications that we intend to support.
  2. Part of our future work will be dedicated to consider integration with emerging Web technologies and standards, such as XML, RDF, WebDAV and others. They do not replace Labyrinth, but can fruitfully be used to simplify and improve the implementation of labyrinth components.

References

  1. Anderson, K.M. "Integrating Open Hypermedia Systems with the World Wide Web". In Proc. ACM Hypertext’97, Southampton, UK, April 1997.
  2. Cugola, G., Di Nitto, E. and Fuggetta, A. "Exploiting an event-based infrastructure to develop complex distributed systems", In Proc. 20th International Conference on Software Engineering. Kyoto, Japan, April 1998.
  3. Fraternali, P. and Paolini, P. "A Conceptual Model and a Tool Environment for Developing More Scalable, Dynamic, and Customizable Web Applications", EDBT '98 6th Intl. Conference on Extending Database Technology, Valencia, Spain, March 1998.
  4. Takahashi, K. "Metalevel Links: More Power to your Links". Communications of the ACM. 41(7): 103-105, July 1998.
  5. Turner, C.R., Fuggetta, A., Lavazza, L. and Wolf, A.L. "A Conceptual Basis for Feature Engineering", To appear on Journal of Systems and Software.