Setting Up an AWS Lab for Graphics Programming
21 Jun 2024This post will builds upon Setting Up an AWS Lab in so much so that we assume you have a working IAM role that you can use to execute AWS API calls. We will use this foundation to outline a Terraform workspace to spin up GPU and non-GPU instances for graphics programming.
Our main motivation will be to build everything we need to work on Introduction to 3D Game Programming with Direct3D 12.0 and on https://learnopengl.com/.
You can go ahead and continue reading, just be mindful that whenever we write things such as
./run-cmd-in-shell.sh aws sts get-caller-identity
The ./run-cmd-in-shell.sh
is a wrapper script we described in that previous post that allows us to run commands with a valid set of temporary IAM IAM role credentials.
Also keep in mind that our configurations are very 1Password centric since no one should be routinely having any sort of IAM user credentials in plain text or in a file on disk - but we also don’t yet need to configure SSO access or anything of the sort.
Table of Contents
- Table of Contents
- Requirements
- Our Starting Point
- Our Terraform code
- Knowing the limits of
ssh -X
- Spining Up a GPU
- Configuring our EC2 to Learn OpenGL
- Day to Day Tips
- Conclusion
Requirements
We recommend you use github.com/tfutils/tfenv to manage the Terraform versions you will need.
For this example we did
tfenv init
tfenv install 1.8.4
tfenv use 1.8.4
You’ll also need
brew install jq
Our Starting Point
We will begin our travels with the following chatGPT generated program
/*
g++ -c opengl_test.cpp -I/usr/include/GL
g++ -o opengl_test opengl_test.o -lGL -lGLU -lglut
*/
#include <GL/glut.h>
void display() {
glClear(GL_COLOR_BUFFER_BIT);
glBegin(GL_TRIANGLES);
glColor3f(1.0f, 0.0f, 0.0f); // Red
glVertex2f(-0.5f, -0.5f);
glColor3f(0.0f, 1.0f, 0.0f); // Green
glVertex2f(0.5f, -0.5f);
glColor3f(0.0f, 0.0f, 1.0f); // Blue
glVertex2f(0.0f, 0.5f);
glEnd();
glFlush();
}
int main(int argc, char** argv) {
glutInit(&argc, argv);
glutCreateWindow("OpenGL Test");
glutDisplayFunc(display);
glutMainLoop();
return 0;
}
This scriot came about because we were working through an example in CUDA by Example: An Introduction to General-Purpose GPU Programming, the one about the Julia sets, and we wanted to learn more about graphics and annimations.
So our first test was to try to get that script to run and to see whatever it was that it was producing.
Our Terraform code
We will go right ahead and just show all of the Terraform we have been using. We will go over it as we go on, but we are opting to give all the TF from the beginning as otherwise it may get confusing what we need to add, change, remove, as we discover things.
A copy of the code we use can be found in github.com/seafoodfry/seafoodfry-code/gpu-sandbox.
Knowing the limits of ssh -X
In the next section we will run a non-GPU linux EC2 instance. The AMI we ended up using was based on trying out the EC2 launch wizard through the console and seeing what AMIs it offered.
We saw the AMI ami-0ca2e925753ca2fb4
as one of the recommended
AMIs when trying to launch an EC2 from the console.
Running the following command,
./run-cmd-in-shell.sh aws ec2 describe-images --image-ids ami-0ca2e925753ca2fb4
Gave us this output
{
"Images": [
{
"Architecture": "x86_64",
"CreationDate": "2024-05-24T03:27:51.000Z",
"ImageId": "ami-0ca2e925753ca2fb4",
"ImageLocation": "amazon/al2023-ami-2023.4.20240528.0-kernel-6.1-x86_64",
"OwnerId": "137112412989",
"PlatformDetails": "Linux/UNIX",
"Description": "Amazon Linux 2023 AMI 2023.4.20240528.0 x86_64 HVM kernel-6.1",
"ImageOwnerAlias": "amazon",
"Name": "al2023-ami-2023.4.20240528.0-kernel-6.1-x86_64",
"DeprecationTime": "2024-08-22T03:28:00.000Z"
...
}
]
}
So we looked for the newest version with
./run-cmd-in-shell.sh aws ec2 describe-images --owner amazon --filters "Name=platform-details,Values=Linux/UNIX" "Name=architecture,Values=x86_64" "Name=creation-date,Values=2024-05*" "Name=description,Values=*Amazon Linux*" --query 'Images[?!contains(Description, `ECS`) && !contains(Description, `EKS`) && !contains(Description, `gp2`)]' > out.json
We ended up going with
{
"Architecture": "x86_64",
"CreationDate": "2024-05-30T00:51:59.000Z",
"ImageId": "ami-04064f2a9939d4f29",
"ImageLocation": "amazon/amzn2-ami-kernel-5.10-hvm-2.0.20240529.0-x86_64-ebs",
"ImageType": "machine",
"Public": true,
"OwnerId": "137112412989",
"PlatformDetails": "Linux/UNIX",
"UsageOperation": "RunInstances",
"State": "available",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"DeleteOnTermination": true,
"SnapshotId": "snap-07f3b72092a551eb6",
"VolumeSize": 8,
"VolumeType": "standard",
"Encrypted": false
}
}
],
"Description": "Amazon Linux 2 Kernel 5.10 AMI 2.0.20240529.0 x86_64 HVM ebs",
"EnaSupport": true,
"Hypervisor": "xen",
"ImageOwnerAlias": "amazon",
"Name": "amzn2-ami-kernel-5.10-hvm-2.0.20240529.0-x86_64-ebs",
"RootDeviceName": "/dev/xvda",
"RootDeviceType": "ebs",
"SriovNetSupport": "simple",
"VirtualizationType": "hvm",
"DeprecationTime": "2025-07-01T00:00:00.000Z"
},
Note that we chose an Amazon Linux 2 AMI for compatibility with our GPU instances.
We are using the G4dn series because they are the cheapest GPUs one can use. Specially with the “spot instance discount”.
In the next section we will also talk about why you should use Amazon Linux 2 and not Amazon Linux 2023.
Testing the limits of ssh -X
Our first attempt at getting into graphics was to simply run ssh and enable X111 forwarding via a command such as
ssh -X ec2-user@${EC2}
First set var.dev_machines
to 1
and var.gpus
and var.windows_gpu_machines
to 0
`, and then run
./run-cmd-in-shell.sh terraform init
Now set the my_ip
variable as follows
export TF_VAR_my_ip=$(curl https://cloudflare.com/cdn-cgi/trace | grep ip | awk -F= '{print $2}')
(We could have also set it using -var my_ip="x.x.x.x"
.)
Then,
./run-cmd-in-shell.sh terraform plan -out a.plan
And apply the plan
./run-cmd-in-shell.sh terraform apply a.plan
You will get a plain old t3 EC2 running Amazon Linux 2.
Note: we initially tried doing the X111 forwarding on Amazon Linux 2023, AL2023,
but we discovered that we couldn’t even install apps such as xeyes
to test that our
ssh forwarding was correct.
After a lot of googling, chatGPTing, and Claude-ing we sort of gave up on this approach and found this AWS docs page
Prerequisites for Linux NICE DCV servers.
Which in the “Install a desktop environment and desktop manager”, in the “Amazon Linux 2” tab, it mentions the following:
Currently, NICE DCV is not compatible with Amazon Linux 2023. AL2023 does not include a graphical desktop environment which is required for NICE DCV to run.
This gave us the hint that AL2 was the OS to use - and that we should try something like NICE DCV but more on that later.
Anyways, that’s why we ended up using the AMI ami-04064f2a9939d4f29
.
You can get more info on it by running
./run-cmd-in-shell.sh aws ec2 describe-images --image-ids ami-04064f2a9939d4f29
Anyway, back to our adventure.
If you look at the user data for the non-GPU linux EC2 we just created, it install
freeglut
, which is the library used by the “CUDA by example” book, and the library used by our opengl_test.cpp
file we included above.
Once the EC2 is up, go on and SSH into it
export EC2="<public DNS name>"
ssh -X ec2-user@${EC2}
At this point, if you type
xeyes
Then you should see a cute lil pop up. Otherwise you need to go figure out how to get the X111 forwarding through SSH configured.
But assuming, you saw the eyes, let’s go and copy our test program into the EC2 and let’s try compiling it
scp ./opengl_test.cpp ec2-user@${EC2}:/home/ec2-user
The compilation instructions are in the header of the file but for completeness You compile with
g++ -c opengl_test.cpp -I/usr/include/GL
and link with
g++ -o opengl_test opengl_test.o -lGL -lGLU -lglut
Both of these should work, otherwise freeglut was not installed correctly.
Running the progra should then give you this error
$ ./opengl_test
freeglut (./opengl_test): ERROR: Internal error <FBConfig with necessary capabilities not found> in function fgOpenWindow
You can google and AI all you want but this error essentially means that we need a “remote desktop”. Spoiler, using NICE DCV will provide us with the perfect environment to actually do graphics programming in Linux instances. (We will talk about Windows in a later post when we talk about DirectX.)
Also, we will proceed with running NICE DCV on a GPU instance because after figuring out that NICE DCV might be our solution, we read Prerequisites for Linux NICE DCV servers. and saw that not only is AL2023 not suitable for graphics programming, but when we attempt to do graphics programming without a GPU we also need to install things such as the XDummy driver, which allows the X server to run with a virtual framebuffer when no real GPU is present.
Spining Up a GPU
At this point we will go full into running NICE DCV on a GPU machine. The following section will outline how we looked for AMIs compatible with GPUs - because you always want an AMI with NVIDIA drivers instead of installing them yourself!
Finding a GPU AMI
While checking out what AMIs were recommended through the launch wizard, we came across the
AMI ID ami-0296a329aeec73707
published by amazon with the title
“Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.2.0 (Amazon Linux 2) 20240521”.
We can query info about it as follows:
./run-cmd-in-shell.sh aws ec2 describe-images --owners amazon --image-ids ami-0296a329aeec73707
We kept searching for AMIs with the following query
./run-cmd-in-shell.sh aws ec2 describe-images --owner 898082745236 --filters "Name=platform-details,Values=Linux/UNIX" "Name=architecture,Values=x86_64" "Name=name,Values=*Amazon Linux 2*" "Name=creation-date,Values=2024-05*" "Name=description,Values=*G4dn*" > out.json
and found this candidate
{
"Architecture": "x86_64",
"CreationDate": "2024-05-22T09:42:47.000Z",
"ImageId": "ami-0c4b8684fc96c1de0",
"ImageLocation": "amazon/Deep Learning OSS Nvidia Driver AMI (Amazon Linux 2) Version 78.2",
"ImageType": "machine",
"Public": true,
"OwnerId": "898082745236",
"PlatformDetails": "Linux/UNIX",
"UsageOperation": "RunInstances",
"State": "available",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"DeleteOnTermination": true,
"Iops": 3000,
"SnapshotId": "snap-0af15a9e4c4b2e59c",
"VolumeSize": 105,
"VolumeType": "gp3",
"Throughput": 125,
"Encrypted": false
}
}
],
"Description": "Supported EC2 instances: G4dn, G5, G6, Gr6, P4d, P4de, P5. PyTorch-2.1, TensorFlow-2.16. Release notes: https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html",
"EnaSupport": true,
"Hypervisor": "xen",
"ImageOwnerAlias": "amazon",
"Name": "Deep Learning OSS Nvidia Driver AMI (Amazon Linux 2) Version 78.2",
"RootDeviceName": "/dev/xvda",
"RootDeviceType": "ebs",
"SriovNetSupport": "simple",
"VirtualizationType": "hvm",
"DeprecationTime": "2026-05-22T09:42:47.000Z"
},
Running the GPU
Once we had our AMI for G4dn GPUs we went and created one.
Again, the commands are:
./run-cmd-in-shell.sh terraform init
Then you get your current IP[v4],
export TF_VAR_my_ip=$(curl https://cloudflare.com/cdn-cgi/trace | grep ip | awk -F= '{print $2}')
(We could have also set it using -var my_ip="x.x.x.x"
.)
Then,
./run-cmd-in-shell.sh terraform plan -out a.plan
Then apply the plan
./run-cmd-in-shell.sh terraform apply a.plan
And to clean up
./run-cmd-in-shell.sh terraform destroy
You can find logs in
cat /var/log/cloud-init-output.log
Setting up NICE DCV
There are two good sources of docs here
- What is DCV. You’ll really need to read the docs though!
- Deploy an EC2 instance with NICE DCV
Read them both but follow the instructions in the first link! We did that and thats how we came up with the user data that was passed to the GPU EC2.
Once the EC2 is ready we will perform the following checks. First check, taken from Prerequisites for Linux NICE DCV servers
sudo DISPLAY=:0 XAUTHORITY=$(ps aux | grep "X.*\-auth" | grep -v grep | sed -n 's/.*-auth \([^ ]\+\).*/\1/p') glxinfo | grep -i "opengl.*version"
Then we will perform a couple more commands from Post-Installation checks
sudo DISPLAY=:0 XAUTHORITY=$(ps aux | grep "X.*\-auth" | grep -v grep | sed -n 's/.*-auth \([^ ]\+\).*/\1/p') xhost | grep "SI:localuser:dcv$"
This one is ok if it doesn’t return anything.
sudo DISPLAY=:0 XAUTHORITY=$(ps aux | grep "X.*\-auth" | grep -v grep | sed -n 's/.*-auth \([^ ]\+\).*/\1/p') xhost | grep "LOCAL:$"
This one should return something.
This one should return no errors, maybe just an info item.
sudo dcvgldiag
To check that the DCV server is running do
sudo systemctl status dcvserver
And to get the fingerprint of its self-signed certificate (we’ll needed when we actually sign in)
dcv list-endpoints -j
Now, we need to give ec2-user
an actual password
sudo passwd ec2-user
And create a session.
dcv create-session dcvdemo
At this point we are ready to use NICE DCV.
We ended up storing the “health check” commands in a shell script called /home/ec2-user/dcv-diagnostics.sh
and the commands to get the DCV’s server fingerprint, create a session, and configure the password for the ec2-user in /home/ec2-user/dcv-diagnostics.sh
.
That way we could spin up the EC2, wait some 3 minutes for everything in the user data to execute and for the machine to reboot, then run those two scripts and be up and running.
If you were to try running the openGL test script (it relies on freeGlut),
scp ./opengl_test.cpp ec2-user@${EC2}:/home/ec2-user
The compilation instructions are in the header of the file but for completeness You compile with
g++ -c opengl_test.cpp -I/usr/include/GL
and link with
g++ -o opengl_test opengl_test.o -lGL -lGLU -lglut
You should now be able to see something.
Configuring our EC2 to Learn OpenGL
Following the instructions in learnopengl.com/, we began by going over to github.com/Dav1dde/glad to obtain the bindings we need for GLFW.
We went to the author’s website and chose the following options:
- Language: C/C++
- API: gl Version 4.6
- Profile: Core
- Options: check the box for “Generate a loader”
- Click Generate
We stored the generated files under a local directory called learning-opengl
and pushed them up to our EC2 as follows.
scp -r learning-opengl/ ec2-user@${EC2}:/home/ec2-user/src
The next step was to download, compile, and install github.com/glfw/glfw/releases. We found, through trial, the process to go as follows:
mkdir glfw
cd glfw/
wget -O glfw.zip https://github.com/glfw/glfw/releases/download/3.4/glfw-3.4.zip
unzip glfw.zip
cd glfw-3.4/
To see the avialbale generators, the possible arguments for cmake . -B build -G <generator>
you can run
cmake --help
.
In our case "Unix Makefiles"
was the default generator so we proceeded with
cmake -S . -B build
After some failures to compile, we eventually ended up installing these additional packages,
sudo yum install -y libX11-devel libXrandr-devel libXinerama-devel libXcursor-devel libXi-devel
sudo yum install -y wayland-devel wayland-protocols-devel libxkbcommon-devel
And finally, we could compile it,
cd build/
make
sudo make install
Compile the test program,
g++ -std=c++11 -I./glad/include -c test_glad.c
g++ -std=c++11 -I./glad/include -c glad.c
And link it,
g++ -o test_glad test_glad.o glad.o -lGL -lglfw3 -lX11 -lpthread -lXrandr -lXi -ldl
lGL
`: Links against the OpenGL library.lglfw3
`: Links against the GLFW library.lX11
`: Links against the X11 library.lpthread
`: Links against the POSIX threads library.lXrandr
`: Links against the X11 RandR extension library.lXi
`: Links against the X11 Xinput extension library.ldl
`: Links against the dynamic linking library.
Our glad code ended up as follows:
ls -R glad
include/ src/
glad/include:
KHR glad
glad/include/KHR:
khrplatform.h
glad/include/glad:
glad.h
glad/src:
glad.c
And we have ben using a Makefile like this to compile our code for the book
CC := g++
# Compiler flags:
# -g adds debugging information to the executable file
# -Wall turns on most compiler warnings
# -Wextra https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wextra
# -Werror Make all warnings into errors.
GLAD_LIB := ./../glad
CFLAGS := -g -O2 -Wall -Wextra -Wshadow -Werror -std=c++17 -I$(GLAD_LIB)/include -I./../include/nothings-stb #-std=gnu++latest # std=c++20 -std=c++17 -std=c++14 -std=c++11
LFLAGS := -lGL -lglfw3 -lX11 -lpthread -lXrandr -lXi -ldl
TARGET = main.out
SOURCES = $(wildcard *.c)
GLAD_SOURCE = $(GLAD_LIB)/src/glad.c
GLAD_OBJ := $(notdir $(GLAD_SOURCE:.c=.o))
OBJECTS = $(SOURCES:.c=.o) $(GLAD_OBJ)
# Rule to link the program.
# $@ is the name of the target.
# $^ stands for all the items in the dependencies list.
$(TARGET): $(OBJECTS)
$(CC) -o $@ $^ $(LFLAGS)
# Rule to compile every .cpp to an .o
# The -c flag says to generate the object file,
# the -o $@ says to put the output of the compilation in the file named on the left side of the :,
# the $< is the first item in the dependencies list, and CXXFLAGS are the flags passed to the compiler.
%.o: %.cpp
$(CC) $(CFLAGS) -c $< -o $@
$(GLAD_OBJ): $(GLAD_SOURCE)
$(CC) $(CFLAGS) -c $< -o $(notdir $@)
clean:
rm -f *.o $(TARGET)
Day to Day Tips
scp -r learning-opengl/ ec2-user@${EC2}:/home/ec2-user/src
or
scp -r ec2-user@${EC2}:/home/ec2-user/src/learning-opengl/chapter-01-triangle/shaders_class learning-opengl/chapter-01-triangle/
But even easier is to do
rsync -rvzP learning-opengl ec2-user@${EC2}:/home/ec2-user/src
and
rsync -rvzP ec2-user@${EC2}:/home/ec2-user/src/learning-opengl/ learning-opengl
Jupyter Tips
This tip is extra but we wanted to have it handy. So if for some reason you end up needing some Jupyter in your life, there are ready to use images in quay.io/organization/jupyter. That repository is documented in jupyter-docker-stacks.
docker run -p 8888:8888 quay.io/jupyter/scipy-notebook:python-3.11
You can then view it locally by running
ssh -L 8888:127.0.0.1:8888 ubuntu@${EC2}
Downloading and Using Source Code in MacOS
There is this wonderful feature in macs that makes it so that any files downloaded from the internet get a “quarantined” attribute to prevent them from being executed. For example, we saw this when downloading the source code for developer.nvidia.com/cuda-example.
After unziping it, we checked if the qurantine attribute was associated with the files using the following command
ls -l@
And we removed it the following command
xattr -r -d com.apple.quarantine cuda_by_example
Conclusion
The GPU lab we have created should allow you to work through the Learn OpenGL tutorials. And you now have freeGlut, openGL, and GLFW all installed on an EC2 via user data, along with NICE DCV. Plus, a Makefile to build any projects you build while reading through the tutorials.
In the next post we will talk about the windows GPU instance you might have noticed that’s part of the gpu sandbox/lab Terraform workspace. That machine is there to us learn DirectX/Direct3D following the book “Introduction to 3D Game Programming with Direct3D 12.0”. This book was our other entrypoint into the GPU Gems books.