The perform_join function uses an inner join unless the user specifies the join type.
Arguments
- erd_object
An object of class "ERD", which encapsulates the data frames and the relationships between them. This object is generated using the
create_erd
function.- tables_to_join
A character vector listing the names of tables to join. The first table in this list serves as the main table to which subsequent tables are joined. The tables are joined in the order specified and utilize the relationships defined with the first table.
- specified_joins
An optional named list where each element's name corresponds to a table in
tables_to_join
and the value specifies the type of join to perform with that table. The default value isNULL
, which activates automated mode (which uses inner joins).
Value
A data frame resulting from the join operations conducted between the specified tables, consistent with the relationships indicated in the ERD object. Additionally, the types of joins used are printed to the console.
Details
This orchestrates the joining of multiple tables based on a specified
Entity-Relationship Diagram (ERD) object. This function extracts the
relationships and join criteria defined within the ERD object and executes
the appropriate join operations using R's dplyr
package.
The function can operate in two modes: automated and user-specified joins. In automated mode, join types are determined by the relationship symbols in the ERD object. In user-specified mode, the types of joins are explicitly provided by the user.
Implementation Details:
- Join Variables: For each pair of tables to be joined, the function extracts the relevant join variables from the ERD object.
- Join Type: Depending on the relationship symbol associated with each pair
of tables, the function decides whether to perform an inner join or a left
join. This decision is implemented by dynamically invoking the corresponding
dplyr function (inner_join
or
left_join
).
- Aggregation: The function uses do.call
to dynamically execute
the appropriate join operation, accumulating the result in the
main_table
variable, which is ultimately returned.
Notes:
- The function iteratively applies the join operations, using the first
table in tables_to_join
as the main table.
- The join operations are performed in the order specified in
tables_to_join
.
- When specified_joins
is NULL
, the function operates in
automated mode, determining the type of join based on relationship symbols.
- The names in specified_joins
should match the table names in
tables_to_join
for user-specified mode to function correctly.
Examples
# Load Packages -----------------------------------------------------------
library(ERDbuilder)
library(dplyr)
# Define entities ---------------------------------------------------------
students_tbl <- data.frame(
st_id = c("hu1", "de2", "lo3"),
dep_id = c("water", "evil", "values"),
student = c("Huey", "Dewey", "Louie"),
email = c("hubert.duck", "dewfort.duck", "llewellyn.duck"),
dob = c("04-15", "04-15", "04-15")
)
courses_tbl <- data.frame(
crs_id = c("water101", "evil205", "water202"),
fac_id = c("02do", "03pe", "04mi"),
dep_id = c("water", "evil", "water"),
course = c("Swimming", "Human-chasing", "Dives")
)
enrollment_tbl <- data.frame(
crs_id = c("water101", "evil205", "evil205", "water202"),
st_id = c("hu1", "hu1", "de2", "de2"),
final_grade = c("B", "A", "A", "F")
)
department_tbl <- data.frame(
dep_id = c("water", "evil", "values"),
department = c("Water activities", "Evil procurement", "Good values")
)
faculty_tbl <- data.frame(
faculty_name = c("Scrooge McDuck", "Donald", "Pete", "Mickey"),
title = c("Emeritus", "Full", "Assistant", "Full"),
fac_id = c("01sc", "02do", "03pe", "04mi"),
dep_id = c("water", "water", "evil", "values")
)
head(students_tbl)
#> st_id dep_id student email dob
#> 1 hu1 water Huey hubert.duck 04-15
#> 2 de2 evil Dewey dewfort.duck 04-15
#> 3 lo3 values Louie llewellyn.duck 04-15
head(courses_tbl)
#> crs_id fac_id dep_id course
#> 1 water101 02do water Swimming
#> 2 evil205 03pe evil Human-chasing
#> 3 water202 04mi water Dives
head(enrollment_tbl)
#> crs_id st_id final_grade
#> 1 water101 hu1 B
#> 2 evil205 hu1 A
#> 3 evil205 de2 A
#> 4 water202 de2 F
head(department_tbl)
#> dep_id department
#> 1 water Water activities
#> 2 evil Evil procurement
#> 3 values Good values
head(faculty_tbl)
#> faculty_name title fac_id dep_id
#> 1 Scrooge McDuck Emeritus 01sc water
#> 2 Donald Full 02do water
#> 3 Pete Assistant 03pe evil
#> 4 Mickey Full 04mi values
## Define relationships----------------------------------------
relationships <- list(
courses = list(
enrollment = list(crs_id = "crs_id", relationship = c("||", "|<")),
department = list(dep_id = "dep_id", relationship = c(">|", "||")),
faculty = list(fac_id = "fac_id", relationship = c(">0", "||"))
),
enrollment = list(
students = list(st_id = "st_id", relationship = c(">0", "||")
)
),
students = list(
department = list(dep_id = "dep_id", relationship = c(">|", "||"))
),
faculty = list(
department = list(dep_id = "dep_id", relationship = c(">|", "||"))
)
)
## Create ERD object
erd_object <-
create_erd(
list(
students = students_tbl,
courses = courses_tbl,
enrollment = enrollment_tbl,
department = department_tbl,
faculty = faculty_tbl
),
relationships)
## Render ERD -----------------------------------------------------------
render_erd(erd_object, label_distance = 0, label_angle = 15, n = 20)
# Perform joins -----------------------------------------------------------
perform_join(erd_object, c("courses", "enrollment", "department"))
#> Performing join: Using inner_join for table enrollment
#> Performing join: Using inner_join for table department
#> crs_id fac_id dep_id course st_id final_grade department
#> 1 water101 02do water Swimming hu1 B Water activities
#> 2 evil205 03pe evil Human-chasing hu1 A Evil procurement
#> 3 evil205 03pe evil Human-chasing de2 A Evil procurement
#> 4 water202 04mi water Dives de2 F Water activities