Show the code
pacman::p_load(visNetwork, lubridate, ggpraph, knitr, kableExtra,
tidyverse, tidygraph,dplyr, jsonlite, ggplot2)This exercise will attempt to Question 1 of Mini Challenge 2 from Vast Challenge 2023. This exercise focuses on utilizing visual analytics to identify temporal patterns within the FishEye knowledge graph.
Necessary R packages are installed for analysis and visualization.
Json file ‘mc2_challenge_graph.json” is imported and named as ’main’.
The code chunk is used to extract nodes data table from main list object and save the output in a tibble data frame object called main_nodes.
The code chunk is used to extract edges data table from main list object and save the output in a tibble data frame object called main_edges.
From the main_edges table, rows with hscode starting with 301-309(hscode only related to fish related products ) and Year from “2032” to “2034” (Top 3 years with highest weights) is selected.
New nodes data is prepared using the source and target field from main_edges_aggregated.
An interactive graph is built for the fishing links in 2032. We can see that out of all the entities, few entities are the centers of all the relationships across different entities.
After analyzing the above network, five entities which seem most busy are selected and stored in new data frame.
New network graph is built to highlight the relationships between selected entities. All these 5 selected entities seem to have interacted with one another at least once.
In the below table, it can be seen that these five entities alone accounted for around 35% of all the events happened in 2032. Distinct_count shows the unique entities from which they have received any fishery related goods. These 5 entities which can be assumed as central entities have received goods from around 190 different entities.
top_details_2032 <- as.data.frame(top_ids_2032 %>%
group_by(to) %>%
summarise(distinct_count = n_distinct(from), total_weight=sum(weights)) %>%
left_join(nodes_df, by = c("to" = "id")) %>%
mutate(percentage_of_total = round(total_weight / sum(edges_df$weights[edges_df$Year == 2032]) * 100, 2)) %>% select(label,distinct_count,total_weight,percentage_of_total)) %>% arrange(desc(total_weight))
#| fig-height: 4
top_details_2032 %>%
kbl() %>%
kable_paper("hover", full_width = F)%>%
column_spec(4, bold = T) %>%
row_spec(0, bold = T, color = "white", background = "#D7261E")| label | distinct_count | total_weight | percentage_of_total |
|---|---|---|---|
| Pao gan SE Seal | 29 | 4544 | 10.32 |
| Caracola del Sol Services | 47 | 4521 | 10.27 |
| Mar del Este CJSC | 40 | 3796 | 8.63 |
| hǎi dǎn Corporation Wharf | 39 | 1874 | 4.26 |
| Costa de la Felicidad Shipping | 30 | 1709 | 3.88 |
Similarly as 2032, in 2033 too, if we zoom in into the graph, we can see few entities with highly significant traffic which can be assumed as central entities.
After analyzing the above network, additional to 5 entities selected in 2023, 2 new entities, total 7 entities which seem most busy are selected and stored in new data frame.
New network graph is built to highlight the relationships between selected entities. All these seven entities have not interacted directly, but they have interacted through another intermediaries.
These 7 entities accounted for around 40% of all the events happened in 2033 and they have receive fishery related goods from 242 different entities (33% of total entities).
top_details_2033 <- top_ids_2033 %>%
group_by(to) %>%
summarise(distinct_count = n_distinct(from), total_weight=sum(weights)) %>%
left_join(nodes_df, by = c("to" = "id")) %>%
mutate(percentage_of_total = round(total_weight / sum(edges_df$weights[edges_df$Year == 2033]) * 100, 2)) %>% select(label,distinct_count,total_weight,percentage_of_total) %>% arrange(desc(total_weight))
#| fig-height: 4
top_details_2033 %>%
kbl() %>%
kable_paper("hover", full_width = F)%>%
column_spec(4, bold = T) %>%
row_spec(0, bold = T, color = "white", background = "#D7261E")| label | distinct_count | total_weight | percentage_of_total |
|---|---|---|---|
| Caracola del Sol Services | 45 | 5023 | 10.44 |
| Pao gan SE Seal | 34 | 4885 | 10.15 |
| Mar del Este CJSC | 49 | 4466 | 9.28 |
| hǎi dǎn Corporation Wharf | 46 | 2046 | 4.25 |
| Madagascar Coast AG Freight | 22 | 1482 | 3.08 |
| AquaDelight N.V. Coral Reef | 20 | 1424 | 2.96 |
| Costa de la Felicidad Shipping | 26 | 1277 | 2.65 |
Similarly as previous years, in 2033 too,we can see few entities with significantly busy traffic which can be assumed as central entities.
This time, total 8 entities with one new additional entity to previous selection which can be assumed as central entities are selected and stored in new data frame.
New network graph is built to highlight the relationships between selected entities. Similar pattern in which these entities interacted through another different enitites can be also observed here.
In 2024, 8 entities which can be assumed as central entities accounted for around 40% of all the events happened in 2034 and they have receive fishery related goods from 238 different entities (32% of total entities).
top_details_2034 <- top_ids_2034 %>%
group_by(to) %>%
summarise(distinct_count = n_distinct(from), total_weight=sum(weights)) %>%
left_join(nodes_df, by = c("to" = "id")) %>%
mutate(percentage_of_total = round(total_weight / sum(edges_df$weights[edges_df$Year == 2034]) * 100, 2)) %>% select(label,distinct_count,total_weight,percentage_of_total) %>% arrange(desc(total_weight))
#| fig-height: 4
top_details_2034 %>%
kbl() %>%
kable_paper("hover", full_width = F)%>%
column_spec(4, bold = T) %>%
row_spec(0, bold = T, color = "white", background = "#D7261E")| label | distinct_count | total_weight | percentage_of_total |
|---|---|---|---|
| Caracola del Sol Services | 41 | 3845 | 8.74 |
| Pao gan SE Seal | 29 | 3244 | 7.37 |
| Mar del Este CJSC | 51 | 3232 | 7.35 |
| hǎi dǎn Corporation Wharf | 53 | 2330 | 5.30 |
| Madagascar Coast AG Freight | 22 | 1546 | 3.51 |
| AquaDelight N.V. Coral Reef | 20 | 1438 | 3.27 |
| Costa de la Felicidad Shipping | 22 | 1245 | 2.83 |
Throughout this exercise,it cn be observed that certain entities can be considered as central entities within the network. These central entities consistently interact with a multitude of other entities over the years, indicating a strong and continuous presence in the whole network throughout the years.