Conceptual Modeling Using ER Diagrams
Basic Concepts
ER Modeling is a powerful tool to analyze data need, which is widely used in conceptual design to map real world entities and business rules into high level groupings, in terms of Entity (set) and Relationship (set), for a database design.
Entity (or Entity Set)
An Entity (set) usually represents people in a certain role (such as Student or Customer), a type of concrete objects (such Product), an organization of people or things (such as Department or Category), and more abstract concepts (such as Course or Account). Each instance of the entity set is described with the same set of attributes, whose value is defined on a domain based on its real world meaning.
A rectangle is used to represent an entity, with a number of ovals each of which represents one of its attributes. An attribute that can uniquely identify an instance of the entity is selected as the key attribute, which is labeled by an underline.
There are three roles that an attribute can play:
-
Identifies: primary key
-
There might be multiple candidates that are qualified for the PK role
-
Describes: informative attributes
-
Relates: foreign key(s)
Relationship (or Relationship Set)
A Relationship (set) represents a business rule that defines the association between entity sets (such as a Course may be offered through a number of Course Offerings or Sections, a Section can only be use to offer one Course).
A relationship (set) is represented with a diamond. Most relationships are binary, that means two entity sets are associated with each other through such a relationship. A binary relationship can be represented as follows.
A relationship set may also have descriptive attributes of its own. A grade attribute may be added to the Enrolls-in relationship set to keep track of students’ grades for the section of courses they’ve taken.
To fully define a relationship set, one should specify the following aspects:
-
Cardinality: How many instances from each entity set may be associated through a given relationship? (Such as 1:1, 1:M, or M:M)
-
Optionality or participation: Can an instance in an entity set that is involved in this relationship exist without being associated with any instance in the related entity set? Or is it mandatory for every instance in an entity set to participate in (at least) one instance of a relationship?
-
Consider the relationship between course offering (or Section) and Student:
-
Is it allowed to keep a student’s record if he/she is not enrolled in any section?
-
On the other hand, is it possible to keep a section record that has no students enroll in it?
-
Role: What roles does each entity set play in a relationship between them? It is especially important to specify different roles in a reflexive relationship. (See ER models for Coach and Game below)
Relationship or Entity?
A relationship represents an association between two entities: it may also be considered as an entity since an entity can be abstract. Concepts like enrollment and transaction can either be considered as entities or relationships. It won’t affect the final result when an ER model is reduced into a relational model. In this sense, a relationship set may have its own attributes which are normally descriptive. Such as a Letter Grade may be given to a Student who enrolls in a certain class Section.
Modeling Entity Sets
The ER modeling process usually starts with identifying entity sets. Don’t worry too much about you may miss some in the first pass. The process is typically iterative: identifying entity sets, identifying relationships between them, diagramming and documenting.
Let’s consider the NBA application, unless specified otherwise.
How to Determine What Attributes to Include: Rules of Thumb
The grouping of attributes (essentially data items that need to be persisted) into an entity (set) is based on data needs from the real world application. Let’s first look at the Player records and apply the three rules we introduced earlier.
-
Include all data items that are needed:
-
First of all, data in all columns are needed
-
Personal information such as SSN or contact information may be needed as well
-
How to keep track of who is on the injured list; and, who is the Captain?
-
Are all the column presenting data in the smallest parts?
-
The Name entry is not atomic
-
use two attributes: First Name and Last Name
-
The Position field is not atomic either. But it appears to be a little different: not every player can play more than one position. It is a multi-valued attribute, not a composite attribute.
-
How about the College field?
-
Is there any calculated field (or derived attribute)?
-
Experience: use debut season/date
Domain (or Data Type and Value Range)
Choose an appropriate data type for each field.
(To be expanded)
Select a Primary Key for Each Entity
Consider the Player entity set. First look at the candidates:
-
Names could be unique for all NBA players we know, but it is quite possible that there will be name conflictions in the future.
-
Number is unique within a team. It’s possible to use both number and the identifier of the team to identify a player, but what if a player got transferred?
-
SSN may be selected if it is included, but it’s also possible that may be changed
It’s a common practice to add an ID attribute, when reducing the ER model into its relational counterpart, into each table (or relation) to work as the primary key. To comply with this convention, you may add an ID attribute to each entity set you identified, although it is not there in the business domain.
One-to-Many Relationships
It’s rather straightforward to model the relationships between Team and Player entities, and between Team and Coach, based on the knowledge we have about how a NBA team is formed. There is a one-to-many relationship between Player (or Coach) and Team: one player (or coach) may belong to only one team, but a team may have multiple players (or coaches).
But the following discussion about a membership relationship in general is quite interesting:
-
Let’s start with the one-to-many relationship we’ve already discussed: ended up with two relations (or tables)
-
What if a player may join more than one team at a time? End up with three relations
-
What if a team can contain only one player and a player can be on one team at any given time?
-
A team may exist without a player: two relations
-
A team may not exist without a player: one relation
A relationship represents an association between two entities: it may also be considered as an entity since an entity can be abstract. You may treat Enrollment as an entity in its own light or consider it as a relationship between a Student entity and a Course Section entity. Either way, there will be three relations in you DB design. Similarly, you may treat a Game either as a relationship between two teams, or an entity by itself. More important things to consider include:
-
How many Team entities do we need: one for Host teams and one for Guest teams, or just one?
-
How can we handle
-
Two relationships between two entity sets (Game and Team)
-
A relationship (Game) between instances from the same entity set (Team)
Let’s first try to treat Game as an entity.
You may start with identifying Game as an entity set, not a relationship set. It’s perfectly legal since an entity set may represent something abstract, such as an event or transaction. The ER model you come up with may look like this:
How many Team entity sets do we need? Do we need two: one for host and another for guest teams? Then it seems like there are exactly the same contents in both of them: a team (such as Atlanta Hawks) may involve in one game as a host and as a guest in some others.
If we need just one Team entity, the rectified model would be:
So we will have two relationships between the two entity sets, Game and Team.
Many-to-Many Relationship
Now let’s treat Game as a relationship between two instances of the Team entity set. It’s easy to see it turns out to be a many-to-many relationship, and is reflexive. It’s interesting to compare the resultant relational models through two different approaches. Will we arrive at the same design?
A first attempt may lead to a naïve ER model to the left. It’s easy to see that we will not need two team entity sets. The correct ER model will look as shown below.
It turns out to be a reflexive relationship with a many-to-many mapping. Each of the two teams that associate with one another through an instance of this relationship set (i.e. a game) plays a different role—host or guest—as specified with the role names at the two ends.
Don’t worry about which one of the two different ER models we should use. Both will reduce to the same relational model as we will see shortly.
Reflexive Relationship
Can we conclude from the Game relationship that a separate relation is needed to keep track of a reflexive relationship? Look at the Coach example.
Coach
Let’s assume that an assistant coach or trainer needs to report to either an(other) assistant coach or the head coach. We can identify a reflexive relationship ReportTo that works on the Coach entity.
As represented in the ER model on the left, an instance of the Coach entity set in the role of assistant is associated with another instance in the supervisor through the ReportsTo relationship. This relationship is many-to-one (from assistant to supervisor). There is no need to use a relation to keep track of this relationship. Adding a supervisor attribute to the Coach relation would work: a record for an assistant will have the value of his supervisor’s CoachId in that field, wheraeas the head coach will simply have a null value.
Although Position appears in the Player roster report as a single field, it needs to be modeled as a multi-valued attribute. It takes the following entity sets to model:
-
A PositionType entity that defines the (official) position types used in the NBA games. This entity may include position names and the corresponding descriptions.
-
A PlayerPosition entity that specifies positions that a player can play. This entity may be referred to as weak since none of its instances may exist without a corresponding instance in the Player entity. Each PlayerPosition instance is identified by the associated Player’s ID and a sequence number such as 1 and 2.
Weak Entity
A weak entity is an entity that cannot exist without a hosting entity, such as a Course Section (depending on a Course) and Order Item (depending on an Order). It is similar to a multi-valued attribute, such as Player Position. A weak entity set doesn’t have enough attributes to form a primary key, nor does it need to. What it has is a partial key, such as section number, order item sequence number. It can borrow the primary key from the hosting (strong) entity. This PK, plus the partial key that exists locally in the weak entity can form an identifier for its instances.
An example of a weak entity set appears in modeling the Transaction needs. From the attached sample report for transactions for the week April 8 to 14, 2004 we can see that a transaction may have a number of items in it. For instance, the second transaction in the report listed (1) activated a player, and (2) put another player on the injured list. The number of items that are included in one transaction varies from one case to another.
Dallas activates Tony Delk and places Josh Howard (sore right hamstring) on the injured list.
To normalize the DB design, i.e. making sure every record in a table has exactly the same number of fields, we need to use a TransactionItem entity to give the detailed information. This entity set is related to the Transaction as demonstrated below:
The box for the TransactionItem entity is double-lined to denote that it is a weak entity, i.e. an entity that does not stand alone. It depends on another entity set, referred to as an identifying entity, to provide key attributes. The relationship, also double-lined is referred to as the identifying relationship.
The TransactionItem entity, as a weak entity, needs to borrow the PK from the identifying entity, Transaction, to form a key: TranxId plus ItemNumber. It needs three descriptive attributes to keep track of the type of transaction (using TranxTypeId), the Player involved (using PlayrtId), and a comment (needed in the case of place on the injured list to explain the reason).
The using of TranxTypeId instead of the type name itself implies that an auxiliary entity, TransactionType is needed.
Auxiliary Entities: Use a table or just an attribute?
An auxiliary entity may be used to formalize the values that are accepted to certain field, such as C, F, and G for PositionType, or activate, sign, place on injured list, etc for TransactionType. Others that may be used in the NBA sample include:
-
Title (for coaches)
-
Conference/division names
-
(College)
Share with your friends: |