Avoid Duplication

To understand an aspect of the domain or solution, it is very often necessary to describe it from different perspectives, which requires creating Multiple views. This usually has a side effect - duplicating information. For example, when the flow of use case steps is controlled by a business rule, the same rule is typically then used somewhere in the system specification. They both tell the same story, yet on different levels of abstraction, so they use the same business rule. Copying the rule to both specifications is a typical representation of the WET antipattern: "Write Everything Twice", "We Enjoy Typing" or "Waste Everyone's Time". It hits the nail on the head because WET is one of the biggest sources of frustration. Instead, the information should be kept in one place only and referenced from wherever it is needed. This is a generally applicable principle which has been known to the software industry for decades and can be found under various names, such as Single source of truth, Don't repeat yourself (DRY) or Once and only once.

Single source of truth (SSOT), is the practice of structuring information models and associated schemata such that every data element is stored exactly once. Any possible linkages to this data element (possibly in other areas of the relational schema or even in distant federated databases) are by reference only. Because all other locations of the data only refer back to the primary "source of truth" location, updates to the data element in the primary location propagate to the entire system without the possibility of a duplicate value somewhere being forgotten.
(source: Wikipedia)

The example above demonstrates duplication within a set of vertical views, where each view describes the same problem, only from different levels of detail. But duplication may also exist among horizontal views, as the information could be at the same level of detail but described using different techniques. In both cases, the best way to prevent duplication is reusing information, capturing it in a single repository, and referencing it from each view. This topic is elaborated in more detail in Part III.

Duplications at Different Levels of Detail

The issue with describing something at various levels of detail is, the same information is stated multiple times, but each time "using different words". It is then very hard to avoid duplicating "the idea". For example, the behavior of a button could be described from the user perspective or from the system perspective:

  • Level 1: "It refreshes the list of orders and displays the first 20 items."
  • Level 2: "It calls API endpoint /api/orders, refreshes the tbl-orders-widget and selects the first page."

It does not duplicate anything literally, yet when the behavior ("refreshing the orders and jumping to the first page") changes, it must be changed in both views.

A similar example is shown in the following picture. Both sequence diagrams say what happens when the product is changed. The sequence diagram on the left is a high-level overview for business people, the right one includes technical details and is suitable for IT people. They both describe the same thing, yet they are not the same.

aa

Duplication at the Same Level of Detail

Unlike the previous case, duplications of information described at the same level of detail are more explicit and could be avoided more easily. It typically occurs when the same thing is described from multiple perspectives (for example, using different diagrams), or when descriptions of different aspects share the same components. In the following picture, the purpose of the model on the left is to outline which systems are related and how, while the model on the right describes a concrete use case.

aa

As you can see, without taking any measures, we would end up having two models that share the same components, which means, if any of these changes, both diagrams need to be updated.

Minimizing Duplications

Duplicating information can never be avoided, we can only do our best to minimize it. The following list includes practices that support analysts in avoiding the most common sources of duplications:

  1. Don't copy information, reference it
  2. All artifacts are captured in the repository just once and have a unique identifier
  3. All artifacts have a single responsibility
  4. No artifact includes specification of some other artifact

Why should the artifacts not be duplicated and only referenced?

Let's take a list of systems in the organization as an example. Large enterprises operate dozens of systems and applications and nobody knows them all. What's more, they frequently change, which often causes people to use wrong names or to struggle to understand the goal of each application. Having all systems stored in one place is beneficial since it is then possible to uniquely reference them, so it is always clear what system is meant and each change is automatically propagated to all occurrences.

Could anything be done not to duplicate terms?

From the Effective Analysis' point of view, the term is just one type of artifact, so it should be atomic and referenceable by an identifier like all the other types. This way, it is then easy to create a "local" project glossary by just referencing the terms from the "global" glossary without describing the terms over and over again. This not only saves time but also helps to avoid duplication. The same applies to business rules.