

Parallelism, Ethics, and AI
Grayson White
Math 241
Week 10 | Spring 2026
Monday Lecture:
for() and while() loopsWednesday Lecture:
Last time, we learned how to iterate our code through for() loops and functional programming with map_XXX().
But sometimes, we’d like to iterate tasks that take a long time.
(Lots of iterations) + (Code that takes a long time to execute) = 🕐, 🕑, 🕒, 🕓, 🕔…







We get more butter faster!
Not a free lunch: have to pay the workers in RAM.

for() loops with foreach and doParallelThe foreach library provides functionality similar to for() loops, but with slightly different syntax
.combine tells foreach() how to combine the elements into vec.
list() back..export to load packages necessary for computationfor() loops with foreach and doParallelCombining functionality of foreach with doParallel let’s us write loops in parallel!
Sequential:
In parallel:
[1] 12
user system elapsed
0.006 0.029 3.025
[1] 10 20 30 40 50 60 70 80 90 100 110 120
system.time() calls are just for illustrative purposes, and are uncessary for calling these loops.for() loops with foreach and doParallelLast time, we created a bootstrap distribution for the mean DBH for our sample of Portland trees:
Now, we can do this in parallel:
map_XXX()ing with furrr
map_XXX()ing with furrrfurrr packages enables functional programming in parallel, with syntax and functionality mimics purrr.Last time:
Now, in parallel:
user system elapsed
0.499 0.087 7.733
seed option allows us to do thread-safe random number generation.Startup cost: it is (time) costly to start up many R processes to do something that is a relatively fast computation.
Big data: Try to limit your parallel computing to only include the necessary objects. Moving large objects across R sessions is (time) costly.
You must consider thread-safe random number generation.
Parallel computing is a balance of RAM, workers/cores, and costly operations like moving big data around and worker startup cost.
An art and a science! Practice tends to make you better at balancing these things.
The Americian Statistical Association’s “Ethical Guidelines for Statistical Practice”
“The ethical statistical practitioner seeks to understand and mitigate known or suspected limitations, defects, or biases in the data or methods and communicates potential impacts on the interpretation, conclusions, recommendations, decisions, or other results of statistical practices.”
“For models and algorithms designed to inform or implement decisions repeatedly, develops and/or implements plans to validate assumptions and assess performance over time, as needed. Considers criteria and mitigation plans for model or algorithm failure and retirement.”
Algorithmic bias: when the model systematically creates unfair outcomes, such as privileging one group over another.
Example: The Coded Gaze

Facial recognition software struggles to see faces of color.
Algorithms built on a non-diverse, biased dataset.
Algorithmic bias: when the model systematically creates unfair outcomes, such as privileging one group over another.
Example: COMPAS model used throughout the country to predict recidivism

Ethical considerations:
Algorithmic bias
Fair use / copyright considerations
Helpful context: our readings for today
Data Science is a uniquely humanistic field situated behind a laptop. We have responsibility to understand and mitigate algorithmic bias by asking questions such as:
Who are represented in the sample?
Who are not represented in the sample?
What biases might my methods (including generative AI!) introduce?
How can I safeguard against those biases?
Generative AI tools are being used in horrific and unethical ways as we speak (e.g., Flock cameras). Not to mentioned they are trained on people’s intellectual content without their consent.
My goals:
A large language model (LLM) is a predictive model designed to predict the next token of text, conditional on the given context.
Just like any statistical model, one must train the model and you can use the model to make new predictions
Companies like OpenAI, Anthropic, and Google host trained models on their servers, and charge users to use them.
Why can’t you run these locally?
Local LLMs:
Cloud-based LLMs:
For our purposes, we will use a cloud-based LLM.
I encourage you to explore what can be done with local LLM’s if it interests you.

ellmerAn R package for interacting with LLMs in R

Today, we’ll use ellmer to do three different types of tasks one may want to with an LLM:
Structured data
Tool calling
Coding
ellmer: SetupIn order to interact with LLMs hosted on a server, we need to first set up an API key.
Steps:
R environment:Then add:
OPENAI_API_KEY=INSERT_YOUR_KEY_HERE
to your .Renviron file. Save it.
ellmer: basicslibrary(ellmer)
chat <- chat_openai_compatible(
base_url = "https://api.cerebras.ai/v1",
model = "qwen-3-235b-a22b-instruct-2507"
)
chat$chat("Who are you?")
#> Hello! I'm Qwen, a large-scale language model independently developed by the
#> Tongyi Lab under Alibaba Group. I can assist you with answering questions,
#> writing, logical reasoning, programming, and more. I aim to provide helpful and
#> accurate information. Feel free to let me know if you have any questions or need
#> assistance!
ellmer: Structured dataHow would you extract name and age from this data?
prompts <- list(
"I go by Alex. 42 years on this planet and counting.",
"Pleased to meet you! I'm Jamal, age 27.",
"They call me Li Wei. Nineteen years young.",
"Fatima here. Just celebrated my 35th birthday last week.",
"The name's Robert - 51 years old and proud of it.",
"Kwame here - just hit the big 5-0 this year."
)ellmer: Structured dataLLMs are generally good at this sort of task
chat <- chat_openai_compatible(
base_url = "https://api.cerebras.ai/v1",
model = "qwen-3-235b-a22b-instruct-2507",
)
chat$chat("Extract the name and age from each sentence I give you")
chat$chat(prompts[[1]])
#> **Name: Alex, Age: 42**
chat$chat(prompts[[2]])
#> **Name: Jamal, Age: 27**
chat$chat(prompts[[3]])
#> **Name: Li Wei, Age: 19**ellmer: Structured dataBut wouldn’t it be nice to get an R data structure?
ellmer: Structured dataparallel_chat_structured(): many prompts at once
chat <- chat_openai_compatible(
base_url = "https://api.cerebras.ai/v1",
model = "qwen-3-235b-a22b-instruct-2507",
params = params(max_tokens = 500)
)
parallel_chat_structured(chat, prompts, type = type_person, rpm = 30)
#> name age
#> 1 Alex 42
#> 2 Jamal 27
#> 3 Li Wei 19
#> 4 Fatima 35
#> 5 Robert 51
#> 6 Kwame 50Not parallel in the sense of parallelized R code: just sending many API calls at once.
Have to set params in order to avoid getting rate-limited automatically.
ellmer: Structured dataCan also send attach images or other files by pointing to their directory on your computer.
ellmer: Tool callingchat <- chat_openai_compatible(
base_url = "https://api.cerebras.ai/v1",
model = "qwen-3-235b-a22b-instruct-2507"
)
chat$chat("What day is it?")
#> I don't have access to real-time information, so I can't tell you today's
#> date. You can check the current date on your device's calendar or by
#> asking a voice assistant like Siri, Google Assistant, or Alexa.ellmer: Tool callingellmer: Tool calling…usually giving models the ability to read and write state
chat <- chat_openai_compatible(
base_url = "https://api.cerebras.ai/v1",
model = "qwen-3-235b-a22b-instruct-2507"
)
chat$chat("Delete the csv files in my working directory")
#> I **cannot** delete files from your computer or working directory. I don't have
#> access to your file system for security reasons.
#>
#> However, here are safe ways you can delete CSV files in your working directory
#> depending on your environment:
#>
#> ### Option 1: Using Python
#> If you're using Python and want to delete all `.csv` files in the current
#> directory:
#> ...chat <- chat_openai_compatible(
base_url = "https://api.cerebras.ai/v1",
model = "qwen-3-235b-a22b-instruct-2507"
)
chat$chat("Delete the csv files in my working directory")
#> To delete all CSV files in your current working directory, you can use one of the
#> following methods depending on your operating system:
#>
#> ### **Method 1: Using Command Line / Terminal**
#>
#> #### **On Windows (Command Prompt):**
#> ```cmd
#> del *.csv
#> ```
#>
#> #### **On macOS or Linux (Terminal):**
#> ```bash
#> rm *.csv
#> ```
#>
#> > ⚠️ **Warning**: This permanently deletes all `.csv` files in the current
#> directory. Make sure you don’t need any of them!Needs to be able to:
The LLM needs to be able to read our working directory:
Create some dummy .csv files
[1] TRUE TRUE TRUE
[1] "a_please_dont_delete_me.csv" "a.csv"
[3] "b.csv" "custom.scss"
[5] "data" "img"
[7] "math241_wk01mon_files" "math241_wk01mon.html"
[9] "math241_wk01mon.qmd" "math241_wk01wed_files"
[11] "math241_wk01wed.html" "math241_wk01wed.qmd"
[13] "math241_wk02mon_files" "math241_wk02mon.html"
[15] "math241_wk02mon.qmd" "math241_wk02wed_files"
[17] "math241_wk02wed.html" "math241_wk02wed.qmd"
[19] "math241_wk03mon_files" "math241_wk03mon.html"
[21] "math241_wk03mon.qmd" "math241_wk03wed_cache"
[23] "math241_wk03wed_files" "math241_wk03wed.html"
[25] "math241_wk03wed.qmd" "math241_wk04mon.html"
[27] "math241_wk04mon.qmd" "math241_wk04wed_activity.html"
[29] "math241_wk04wed_activity.qmd" "math241_wk04wed.html"
[31] "math241_wk04wed.qmd" "math241_wk05mon_files"
[33] "math241_wk05mon.html" "math241_wk05mon.qmd"
[35] "math241_wk05wed_files" "math241_wk05wed.html"
[37] "math241_wk05wed.qmd" "math241_wk06mon_files"
[39] "math241_wk06mon.html" "math241_wk06mon.qmd"
[41] "math241_wk06wed.html" "math241_wk06wed.qmd"
[43] "math241_wk07mon_files" "math241_wk07mon.html"
[45] "math241_wk07mon.qmd" "math241_wk07wed_files"
[47] "math241_wk07wed.html" "math241_wk07wed.qmd"
[49] "math241_wk08mon_files" "math241_wk08mon.html"
[51] "math241_wk08mon.qmd" "math241_wk09mon.html"
[53] "math241_wk09mon.qmd" "math241_wk09wed_files"
[55] "math241_wk09wed.html" "math241_wk09wed.qmd"
[57] "math241_wk10mon_files" "math241_wk10mon.html"
[59] "math241_wk10mon.qmd" "math241_wk10wed_cache"
[61] "math241_wk10wed_files" "math241_wk10wed.qmd"
[63] "math241_wk10wed.rmarkdown" "my_plots"
[65] "rosm.cache"
Now ask it to delete csv’s that
chat$chat("Delete all the csv files in the current directory, unless they ask to not be deleted.")
#> ◯ [tool call] ls()
#> ● #> a_please_dont_delete_me.csv
#> #> a.csv
#> #> b.csv
#> #> custom.scss
#> #> data
#> #> …
#> ◯ [tool call] rm(path = c("a.csv", "b.csv"))
#> ● #> 0
#> The CSV files `a.csv` and `b.csv` have been deleted.
#>
#> The file `a_please_dont_delete_me.csv` was not deleted, as it appears to be a
#> file you wish to keep based on its name.
#>
#> Let me know if you'd like to keep or remove any other files! [1] "a_please_dont_delete_me.csv" "custom.scss"
[3] "data" "img"
[5] "math241_wk01mon_files" "math241_wk01mon.html"
[7] "math241_wk01mon.qmd" "math241_wk01wed_files"
[9] "math241_wk01wed.html" "math241_wk01wed.qmd"
[11] "math241_wk02mon_files" "math241_wk02mon.html"
[13] "math241_wk02mon.qmd" "math241_wk02wed_files"
[15] "math241_wk02wed.html" "math241_wk02wed.qmd"
[17] "math241_wk03mon_files" "math241_wk03mon.html"
[19] "math241_wk03mon.qmd" "math241_wk03wed_cache"
[21] "math241_wk03wed_files" "math241_wk03wed.html"
[23] "math241_wk03wed.qmd" "math241_wk04mon.html"
[25] "math241_wk04mon.qmd" "math241_wk04wed_activity.html"
[27] "math241_wk04wed_activity.qmd" "math241_wk04wed.html"
[29] "math241_wk04wed.qmd" "math241_wk05mon_files"
[31] "math241_wk05mon.html" "math241_wk05mon.qmd"
[33] "math241_wk05wed_files" "math241_wk05wed.html"
[35] "math241_wk05wed.qmd" "math241_wk06mon_files"
[37] "math241_wk06mon.html" "math241_wk06mon.qmd"
[39] "math241_wk06wed.html" "math241_wk06wed.qmd"
[41] "math241_wk07mon_files" "math241_wk07mon.html"
[43] "math241_wk07mon.qmd" "math241_wk07wed_files"
[45] "math241_wk07wed.html" "math241_wk07wed.qmd"
[47] "math241_wk08mon_files" "math241_wk08mon.html"
[49] "math241_wk08mon.qmd" "math241_wk09mon.html"
[51] "math241_wk09mon.qmd" "math241_wk09wed_files"
[53] "math241_wk09wed.html" "math241_wk09wed.qmd"
[55] "math241_wk10mon_files" "math241_wk10mon.html"
[57] "math241_wk10mon.qmd" "math241_wk10wed_cache"
[59] "math241_wk10wed_files" "math241_wk10wed.qmd"
[61] "math241_wk10wed.rmarkdown" "my_plots"
[63] "rosm.cache"
We’ll explore how folks have expanded ellmer into some even more helpful tools for data science
We’ll start to learn how to write our own R packages!