This program is designed to analyze water quality over time in rivers and estuaries, focusing on the concentration of chlorophyll-a (CHLa), a key indicator of aquatic health. It uses data from long-term water monitoring projects and applies several mathematical models, both linear and non-linear, to predict CHLa levels based on factors like water temperature and flow rates.
Logarithmic Models: These models use natural logarithms to examine how changes in water temperature combined with changes in water flow affect CHLa concentration. Here, we try three models using different predictor variables derived as water temperature-besad ratios:
- ln(CHLa) ~ ln(temperature:discharge)
- ln(CHLa) ~ ln(temperature:discharge 4-day rolling average)
- ln(CHLa) ~ ln(temperature:inverse specific conductivity)
Nonlinear, Tangential Model: This more complex model uses a mathematical function called "tanh" (hyperbolic tangent) to predict CHLa levels. It considers how temperature and discharge influence CHLa up to a maximum saturation point, beyond which changes have a progressively smaller effect.
- CHLa ~ CHLamax * tanh((alpha * T:Q)/CHLamax)
The program outputs its findings in an easy-to-read HTML format, which includes summary statistic tables for the all years in the data set; for each model, regression statistics for all years combined, as well as individual statistics by year; model regression plots with best fit line; model-specific time series plots; and time series plots for CHLa and dissolved organic nitrogen.
The script is designed to dynamically adapt to changes in the dataset, accommodating new data from subsequent years without requiring manual updates to its structure or content.
-
All the contents of the received zipped folder should be extracted to a single folder.
-
Open the R Project File: - Locate the file named
algal-blooms.Rproj
within the extracted folder and open it. -
Check the Current Working Directory: - In the RStudio console, you can check the current working directory using the command
getwd()
. - Ensure that the current working directory is set to the location where you extracted the project files. If it's not, proceed to the next step. -
Set the Working Directory: - To set the working directory to the location of the project files, you can use the following command in the RStudio console:
`setwd("path/to/your/extracted/folder")`
-
Verify the Working Directory: - Run
getwd()
again to confirm that the working directory has been set correctly.
The following files are included in the original zipped project folder and should all be kept together in the project working directory.
- 'algal-blooms.Rproj' (R project file)
- 'CHLA_script.rmd' (R Markdown file containing analysis script)
- 'CHLa_data.xlsx' (source data Excel workbook file)
- 'functions.R' (R script containing custom functions, including documentation on use)
- 'README.md'
- 'Img' (folder containing image files)
- 'output' (folder that contains script output, e.g., csv, jpeg files)
- `CHLa_script.html' (most recent knitted HTML output)
This program requires the following packages. An installation procedure has been written into the script.
- `tidyverse`
- `ggplot2`
- `kableExtra`
- `openxlsx`
- `zoo`
The script expects to read in an Excel workbook file with 3 tabs:
- tab 1: ReadMe
- tab 2: table with rows corresponding to weekly observations and the following columns:
colnames(df.james)
[1] "date" "site" "SurfaceTemp"
[4] "CHLa" "DIN" "SpCond"
[7] "InvSpCond" "TtoCond"
- tab 3: table with rows corresponding to days of year (1-365) and the following columns:
> colnames(df.discharge)
[1] "DOY" "2010" "2011" "2012" "2013" "2014" "2015"
[8] "2016" "2017" "2018" "2019" "2020" "2021" "2022"
[15] "2023"
The script assumes that columns 3-5 in tab 2 of the Excel file correspond to surface temperature, CHLa, and DIN, in that order. Column indexing and specific column names can be modified at line 57 of the CHLA_script file:
`colnames(df.james)[3:5] <- c("SurfaceTemp", "CHLa", "DIN")`
Custom functions are used throughout to simplify, streamline and reduce the amount of code in the main script. The use of custom functions is annotated in the script. The underlying code and documentation on their usage are contained in the functions.R
file.
The script can be most easily navigated using either (1) the OUTLINE functionality, for moving between designated sections and subsections, or (2) the "chunk table of contents." The latter allows for more precise navigation within the document by choosing a specific code chunk. (See red outlines below)
The code is highly modular. After reading in data and generating CHLa and DIN time series plots, each of the following four sections (one per model) follows nearly the same structure. This is less the case with the final (nonlinear) model due to its more complex model formula and the need for additional arguments. As a result, fewer custom functions are used with model 4.
Once the script has been successfully executed, there are a few ways that plots can be exported and saved.
Locate the plot in the RStudio Plots pane, using the left and right arrows to move between plots. Use the 'Export' button above the plot.
You can also export programmatically by writing in the following function immediately following the code that produced the desired plot.
ggsave(filename = "output/name_of_plot.jpg", device = "jpeg", dpi = 300)
dpi
sets the resolution (dots per square inch). device
(other output types include "pdf", "png")