Data Models

The ordinary day-to-day analysis work is almost always somehow related to information systems, whose primary purpose is to manipulate information - data. It means that a big part of what analysts do is describing and analyzing data, so it is more than important for them to know the various data modeling techniques. No matter whether the central part of the work is business analysis or systems analysis, data is everywhere, yet each time on a different level of abstraction. It could represent domain objects which describe the real-world entities, but it could also be data attributes in the information system. Both are important to be documented, but they require different modeling approaches.

According to the level of abstraction, the data models are divided into 3 categories as defined by ANSI:

aa

  1. Conceptual data model
    • Describes the semantics of a domain. For example, it may be a model of the interest area of an organization or industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationship assertions about associations between pairs of entity classes.
  2. Logical data model
    • Describes the semantics, as represented by a particular data manipulation technology. This consists of descriptions of tables and columns, object-oriented classes, and XML tags, among other things.
  3. Physical data model
    • Describes the physical means by which data are stored. This is concerned with partitions, CPUs, tablespaces, and the like.

Conceptual Data Model

A conceptual model often referred to as a domain model, is usually composed of classes, which represent domain entities, and relationships between them. The domain model is elaborated in more detail in this chapter.

aa

Logical Data Model

Unlike the conceptual data model, which is completely independent of any data manipulation technology, the logical data model reflects whether an object-oriented or relational or any other technology is used. Depending on the selected approach, a class diagram (in case of implementation in the object-oriented technology) or an entity-relationship model (ER diagram) is used to capture also the aspects specific to the underlying technology.

Class Diagram

aa

Entity-Relationship Diagram

aa

Note: ER model is sometimes wrongly called a physical data model. The physical model does not mean tables in a concrete database, it describes the physical storage, real machines and CPUs and their locations.

Good Practices

Data modeling is a broad discipline, and just listing all good practices and antipatterns would be sufficient material for a whole book. For this reason, consider the following list only as a short summary of what we see analyst doing wrong over and over again or not doing it at all:

  1. Each entity must have a self-explanatory name - a noun such as Car, Order, Account, Contact
    • Tables should be named in plural: Cars, Orders
  2. Each entity must have a short unambiguous description explaining its meaning from the business perspective. To make the description clear, it could also include examples of what could be considered an instance of the given entity and what not.
  3. Do not include business rules, state transitions, etc. in entities descriptions. Reference them instead to avoid duplications.
    • For example "When the Order is not paid within 48 hours after it was created, it is canceled."
  4. The same rules apply for entity attributes
    • For example, "Contact:Mobile - Represents a contact's mobile phone number. Only one mobile phone number is supported per Contact so it can be home, work, or any other type."
  5. The same rules apply to relationships between entities
    • For example "Company-->Person [m:n] - Represents people working for a company. An organization can be employing many people (including 0), and a single person can work simultaneously for multiple organizations (or even 0)."
  6. Include examples
    • Entity Car - "VW Passat 2.0 TDI"
    • Relationship Company-->Person - "John Smith working for IBM"
  7. Follow the rule of 7 +/- 2 elements per diagram and split the diagram into several smaller diagrams if the number of entities exceeds nine