Forecasting Salmon Runs: A Time Series Analysis of Willamette Falls Fish Ladder Data

R
Modeling
Quarto
Author

Zoe Zhou

Published

March 18, 2025

Willamette River Fish Bypass, Image by dannymoore from Pixabay

Overview

The goal of this analysis is to visualize time series data of fish passage at the Willamette Falls fish ladder to identify trends, seasonal patterns, and annual variations in fish passage. Additionally, the analysis includes forecasting using the Holt-Winters method to predict future trends in salmon runs, providing critical information for watershed management and fish recovery efforts.

Background

Fish passage at Willamette Falls is critical for the survival of Oregon’s wild salmon and steelhead, which are on the brink of extinction due to dams blocking access to spawning habitats and degrading river ecosystems. With populations at just 1–2% of their historic levels, this analysis helps uncover migration trends and forecast future passage, providing valuable insights to guide recovery efforts, improve watershed management, and ensure these iconic species are protected for future generations.

Data Summary

This project explores adult fish passage data recorded at the Willamette Falls fish ladder on the Willamett River, Oregon from 2001 to 2010. The dataset provides valuable insights into the migration patterns of five salmon species: Chinook, Jack Chinook, Steelhead, Coho, and Jack Coho.

Data Citation: U.S. Army Corps of Engineers, NWD, et al. DART Adult Passage Counts Daily for All Species. Data accessed via Columbia River DART (Data Access in Real Time) on January 25, 2023.

Analysis Outline

  1. Exploratory Data Analysis
  2. Data Preprocessing
  3. Visualization of Time Series
  4. Visualization of Seasonplots for each species
  5. Plot Annual Totals for Fish Passage
  6. Apply Holt-Winters Forecasting
  7. Summarize Trends and Patterns

Set Up

The following libraries will be used for data manipulation, visualization, and building regression models.

Code
library(tidyverse)
library(here)
library(lubridate)
library(tsibble)
library(feasts)
library(fable)
library(kableExtra)
library(patchwork)
library(skimr)

Import data, replace NA values with zero, convert Date column from character to data and convert df to tsibble using the as_tsibble() function.

Code
# Import data
fish_df <- read_csv(here("posts","2025-04-12-fishladder",'data', 'willamette_fish_passage.csv')) %>% 
  #replace(is.na(.),0) %>% 
  janitor::clean_names()

# Convert to ts
fish_ts <- fish_df %>% 
  mutate(date = mdy(date)) %>% 
  as_tsibble(key = NULL,
             index = date) %>% 
  replace(is.na(.), 0)

Preliminary Data Exploration

The original dataset contains multiple empty columns with over 3,000 missing values. Therefore, this study focuses on three salmon species—Coho, Jack Coho, and Steelhead salmon—for further analysis.

Click here to expand summary table
Code
skim(fish_df)
Data summary
Name fish_df
Number of rows 3652
Number of columns 16
_______________________
Column type frequency:
character 2
logical 8
numeric 6
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
project 0 1 16 16 0 1 0
date 0 1 6 8 0 3652 0

Variable type: logical

skim_variable n_missing complete_rate mean count
chinook_run 3652 0 NaN :
wild_steelhead 3652 0 NaN :
sockeye 3652 0 NaN :
shad 3652 0 NaN :
lamprey 3652 0 NaN :
bull_trout 3652 0 NaN :
chum 3652 0 NaN :
pink 3652 0 NaN :

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
chinook 1543 0.58 250.48 476.52 1.0 7.00 35.0 236.00 3883.0 ▇▁▁▁▁
jack_chinook 2433 0.33 11.22 16.37 1.0 2.00 6.0 14.00 170.0 ▇▁▁▁▁
steelhead 203 0.94 85.97 115.68 -2.0 8.00 28.0 127.00 709.0 ▇▂▁▁▁
coho 2694 0.26 76.03 160.84 -2.0 3.25 15.0 57.75 1290.0 ▇▁▁▁▁
jack_coho 2971 0.19 21.03 37.02 1.0 2.00 7.0 22.00 380.0 ▇▁▁▁▁
temp_c 1382 0.62 12.94 6.04 0.6 7.80 11.7 17.80 26.7 ▂▇▆▃▃

Decomposition Summary

To better understand the structure of time series data, it’s useful to decompose to separately explore contributions of the different components (trend, seasonality, etc). This study employs Seasonal and Trend Decomposition using LOESS (Locally Estimated Scatterplot Smoothing), a method for estimating nonlinear relationships. It applies a weighted moving average across all points in the dataset, with weights determined by their distance from the point being averaged. STL() function from the feasts is used to obtain decomposition.

Click here to expand decomposition plots
Code
# Prepare data for plotting
fish_long <- fish_ts %>% 
  select('date', 'steelhead','coho', 'jack_coho') %>% 
  pivot_longer(
    cols = -date,
    names_to = "species",
    values_to = "counts"
  )

# Obtain decomposition
jack_dcmp <- fish_long %>% 
  filter(species =='jack_coho') %>% 
  model(STL(counts~season(period = "1 year") + trend(window = 25)))

# Visualize components
components(jack_dcmp) %>% 
  autoplot()+
  theme_classic()

Code
# Obtain decomposition
coho_dcmp <- fish_long %>% 
  filter(species =='coho') %>% 
  model(STL(counts~season(period = "1 year") + trend(window = 25)))

# Visualize components
components(coho_dcmp) %>% 
  autoplot()+
  theme_classic()

Code
# Obtain decomposition
jack_dcmp <- fish_long %>% 
  filter(species =='steelhead') %>% 
  model(STL(counts~season(period = "1 year") + trend(window = 25)))

# Visualize components
components(jack_dcmp) %>% 
  autoplot()+
  theme_classic()

Data Visualization

Code
# Create plot
ggplot(fish_long, aes(x=date, y=counts, color=species)) +
  geom_line() +
  labs(title = "Time Series Plots of Salmon Adult Passage",
       x = "Time",
       y = "Counts") +
  scale_color_brewer(palette = "Set2")+
  theme_minimal()+
  facet_wrap(~species,ncol=1)+
  theme(legend.position='none')

Figure 1: Time Series of Adult Passage for Coho, Jack Coho, and Steelhead salmon.

From the time series plots in Figure 1, we can observe the following observations:

  • Steelhead consistently has higher counts compare to Coho and Jack Coho salmon. There is relatively low abundance of Jack Coho salmon, with smaller and less frequent peaks compared to the other species.
  • All three species exhibit seasonal peaks in their counts. Steelhead salmon displays broader and more frequent peaks over time, suggesting a longer migration period.
  • There is increasing trend in the counts of Coho salmon after 2008, with higher peaks observed in subsequent years.

A seasonplot can help point out seasonal patterns, and help to glean insights over the years. We’ll use feasts::gg_season() to create an exploratory seasonplot, which has month on the x-axis, salmon counts on the y-axis, and each year is its own series (mapped by line color).

Code
fish_long %>% 
  gg_season(y=counts, pal=hcl.colors(n=10)) + 
  theme_light()+
  labs(title='Seasonplots of Daily Counts for Salmon Passage by Species',
    x="month",
       y="Species Counts")

Figure 2: Seasonplots of Daily Counts for Salmon Passage by Species

Figure 2 displays the seasonal variation in the counts of salmon species across all years, with each line representing a specific year and highlighting recurring monthly patterns. Since the daily counts were too stochastic and busy, especially for steelhead salmon, I opted to use monthly means for clearer interpretation.

Seasonplots with monthly total:

Code
fish_month <- fish_long %>% 
  index_by(yr_mo = ~yearmonth(.)) %>% 
  group_by(year = year(yr_mo), month = month(yr_mo, label=TRUE), species) %>% 
  summarize(month_total = sum(counts, na.rm=TRUE), month_mean = mean(counts, na.rm=TRUE), .groups = "drop") %>% 
  ungroup()

fish_month %>%
  mutate(year = factor(year)) %>%
  ggplot(aes(x = month, y = month_total, colour = year, group = year)) +
  geom_line() + # Draw lines connecting points for each year
  # ggplot will automatically use a discrete scale because 'year' is a factor
  theme_light() +
  labs(
    title = 'Monthly Total Counts for Salmon Passage by Species', # Adjusted title
    x = "Month",
    y = "Species Counts",
    colour = "Year" # Legend title
  ) +
  facet_wrap(~species, ncol = 1, scales = "free_y") # Facet by species

Figure 3: Seasonplots of Monthly Total Counts for Salmon Passage by Species

Figure 2 and 3 shows the monthly variation in counts for three salmon species across multiple years:

  • Both Coho and Jack Coho salmon show a sharp and well-defined seasonal peak in September, with Coho consistently having significantly higher peak counts compared to Jack Coho, as also seen in the original time series.
  • Steelhead salmon exhibit a broader seasonal pattern, with counts steadily increasing from January to a peak in May or June, followed by a rapid decline, suggesting a longer migration period.
  • Interannual variability is noticeable, particularly in Coho, where a sharper increase in counts is observed in later years.
Code
# Prep data for plotting
fish_yr <- fish_long %>% 
  index_by(year = ~year(.)) %>%
  group_by(species) %>% 
  summarize(annual_total = sum(counts)) %>% 
  ungroup()
# Convert year column to factor
fish_yr$year = as.factor(fish_yr$year)

fish_yr %>%
  ggplot(aes(x = year, y = annual_total, color = species, group = species)) +  # Explicitly group by species
  geom_line() +
  theme_minimal() +
  labs(
    title = "Annual Total Counts of Fish Passage by Species",
    x = "Year",
    y = "Counts"
  ) +
  scale_color_brewer(palette = 'Set2')

Figure 4: Plot of Annual Counts of Fish Passage by Species
  • Annual Steelhead counts show significant interannual variability, with drop of 25000 counts between 2002 and 2003.
  • Jack Coho salmon consistently has the lowest counts among the three species, with minimal variation over the years.
  • Coho counts remain relatively low until 2008, after which there’s a sharp increas, reaching a significant peak with annual counts over 25,000 in 2010.

We use the ETS() function from the fable package to generate predictions. Exponential smoothing calculates weighted averages of past observations, with the weights decreasing exponentially as the observations become older. Given the change in variance over time, we specify multiplicative seasonality, and set restrict as FALSE to allow function to find best model.

The results show negative counts for Steelhead and Jack Coho salmon, which is unrealistic and indicates that this forecasting method is not suitable for salmon data.

Code
salmon_ts <- fish_ts %>% select(date, coho, jack_coho, steelhead)

# Reshape the data into a long format
salmon_long <- salmon_ts %>%
  pivot_longer(cols = c(coho, jack_coho, steelhead), 
               names_to = "species", 
               values_to = "counts")

# Fit the ETS model for each species

ets_models <- salmon_long %>%
  model(ets = ETS(counts~season("M"), restrict=FALSE))

# Generate forecasts for each species 
forecasts <- ets_models %>%
  forecast(h = "5 years")

# Plot the forecasts along with historical data
forecasts %>%
  autoplot(salmon_long, level = NULL) +
  labs(
    title = "Historical and Forecasted Salmon Passage by Species",
    y = "Counts",
    x = "Date"
  ) +
  theme_minimal()

Code
kable(ets_models, col.names=c("Species","ETS Model(Error, Trend, Seasonality)"), caption = "ETS Models by Species")
ETS Models by Species
Species ETS Model(Error, Trend, Seasonality)
coho
jack_coho <ETS(A,N,M)>
steelhead <ETS(A,N,M)>

The differences in the ETS models arise because the ETS() function automatically selects the best model based on the data for each species. For Jack Coho and Steelhead, the data likely exhibited clear seasonal patterns with varying magnitudes, leading to the selection of a model with multiplicative seasonality. In contrast, Coho data may not show significant seasonal or trend components, so no model were selected.

Summary

Forecast results suggest that forcing a seasonal model may not be appropriate. Further exploration is needed to find the correct model for forecasting. Here we use a autocorrelation function to confirm our interpretation.

Autocorrelation function (ACF) shows how correlated observations within a series are with pervious observations on the same variable. In the context of salmon passage, the ACF helps identify patterns in the monthly average counts of steelhead salmon passing through the ladder over time. This analysis is useful for guiding selection of models to forecast future values in a time series.

Click here to expand ACF plots

For Steelhead salmon, I do not expect exponential smoothing (ETS) to be the most effective forecasting method due to its broader and more variable seasonal pattern and significant interannual variability.

Code
# autocorrelation for steelhead
fish_bymonth <- fish_long %>% 
  index_by(yr_mo = ~yearmonth(.)) %>% 
  group_by(species) %>% 
  summarize(month_total = sum(counts, na.rm=TRUE)) %>% 
  ungroup()

fish_bymonth %>% 
  filter(species == 'steelhead') %>% 
  ACF(month_total) %>% 
  autoplot() +
  theme_minimal()+
  labs(
    title="ACF plot for Steelhead Salmon Passage",
    x="lag (in month)",
    y="autocorrelation coefficient"
  )

For Coho salmon, I do not expect exponential smoothing (ETS) to be the most effective forecasting method due to its broader and more variable seasonal pattern and significant interannual variability.

Code
# autocorrelation for coho
fish_bymonth %>% 
  filter(species == 'coho') %>% 
  ACF(month_total) %>% 
  autoplot() +
  theme_minimal()+
  labs(
    title="ACF plot for Coho Salmon Passage",
    x="lag (in month)",
    y="autocorrelation coefficient"
  )

Code
# autocorrelation for jack coho
fish_bymonth %>% 
  filter(species == 'jack_coho') %>% 
  ACF(month_total) %>% 
  autoplot() +
  theme_minimal()+
  labs(
    title="ACF plot for Jack Coho Salmon Passage",
    x="lag (in month)",
    y="autocorrelation coefficient"
  )

For Jack Coho salmon, the highest correlation coefficient occurs at lags of 12 months, indicating seasonality. This suggests that exponential smoothing could be a suitable forecasting method.