Setting Up an AWS Lab for Graphics Programming

21 Jun 2024

This post will builds upon Setting Up an AWS Lab in so much so that we assume you have a working IAM role that you can use to execute AWS API calls. We will use this foundation to outline a Terraform workspace to spin up GPU and non-GPU instances for graphics programming.

Our main motivation will be to build everything we need to work on Introduction to 3D Game Programming with Direct3D 12.0 and on https://learnopengl.com/.

You can go ahead and continue reading, just be mindful that whenever we write things such as

./run-cmd-in-shell.sh aws sts get-caller-identity

The ./run-cmd-in-shell.sh is a wrapper script we described in that previous post that allows us to run commands with a valid set of temporary IAM IAM role credentials.

Also keep in mind that our configurations are very 1Password centric since no one should be routinely having any sort of IAM user credentials in plain text or in a file on disk - but we also don’t yet need to configure SSO access or anything of the sort.

Table of Contents
Requirements
Our Starting Point
Our Terraform code
Knowing the limits of ssh -X
- Testing the limits of ssh -X
Spining Up a GPU
Configuring our EC2 to Learn OpenGL
Day to Day Tips
- Jupyter Tips
- Downloading and Using Source Code in MacOS
Conclusion

Requirements

We recommend you use github.com/tfutils/tfenv to manage the Terraform versions you will need.

For this example we did

tfenv init
tfenv install 1.8.4
tfenv use 1.8.4

You’ll also need

brew install jq

Our Starting Point

We will begin our travels with the following chatGPT generated program

/*
    g++ -c opengl_test.cpp -I/usr/include/GL
    g++ -o opengl_test opengl_test.o -lGL -lGLU -lglut
*/
#include <GL/glut.h>

void display() {
    glClear(GL_COLOR_BUFFER_BIT);
    
    glBegin(GL_TRIANGLES);
    glColor3f(1.0f, 0.0f, 0.0f);  // Red
    glVertex2f(-0.5f, -0.5f);
    glColor3f(0.0f, 1.0f, 0.0f);  // Green
    glVertex2f(0.5f, -0.5f);
    glColor3f(0.0f, 0.0f, 1.0f);  // Blue
    glVertex2f(0.0f, 0.5f);
    glEnd();
    
    glFlush();
}

int main(int argc, char** argv) {
    glutInit(&argc, argv);
    glutCreateWindow("OpenGL Test");
    glutDisplayFunc(display);
    
    glutMainLoop();
    
    return 0;
}

This scriot came about because we were working through an example in CUDA by Example: An Introduction to General-Purpose GPU Programming, the one about the Julia sets, and we wanted to learn more about graphics and annimations.

So our first test was to try to get that script to run and to see whatever it was that it was producing.

Our Terraform code

We will go right ahead and just show all of the Terraform we have been using. We will go over it as we go on, but we are opting to give all the TF from the beginning as otherwise it may get confusing what we need to add, change, remove, as we discover things.

A copy of the code we use can be found in github.com/seafoodfry/seafoodfry-code/gpu-sandbox.

Knowing the limits of `ssh -X`

In the next section we will run a non-GPU linux EC2 instance. The AMI we ended up using was based on trying out the EC2 launch wizard through the console and seeing what AMIs it offered.

We saw the AMI ami-0ca2e925753ca2fb4 as one of the recommended AMIs when trying to launch an EC2 from the console.

Running the following command,

./run-cmd-in-shell.sh aws ec2 describe-images --image-ids ami-0ca2e925753ca2fb4

Gave us this output

{
    "Images": [
        {
            "Architecture": "x86_64",
            "CreationDate": "2024-05-24T03:27:51.000Z",
            "ImageId": "ami-0ca2e925753ca2fb4",
            "ImageLocation": "amazon/al2023-ami-2023.4.20240528.0-kernel-6.1-x86_64",
            "OwnerId": "137112412989",
            "PlatformDetails": "Linux/UNIX",
            "Description": "Amazon Linux 2023 AMI 2023.4.20240528.0 x86_64 HVM kernel-6.1",
            "ImageOwnerAlias": "amazon",
            "Name": "al2023-ami-2023.4.20240528.0-kernel-6.1-x86_64",
            "DeprecationTime": "2024-08-22T03:28:00.000Z"
            ...
        }
    ]
}

So we looked for the newest version with

./run-cmd-in-shell.sh aws ec2 describe-images --owner amazon --filters "Name=platform-details,Values=Linux/UNIX" "Name=architecture,Values=x86_64" "Name=creation-date,Values=2024-05*" "Name=description,Values=*Amazon Linux*" --query 'Images[?!contains(Description, `ECS`) && !contains(Description, `EKS`) && !contains(Description, `gp2`)]' > out.json

We ended up going with

{
        "Architecture": "x86_64",
        "CreationDate": "2024-05-30T00:51:59.000Z",
        "ImageId": "ami-04064f2a9939d4f29",
        "ImageLocation": "amazon/amzn2-ami-kernel-5.10-hvm-2.0.20240529.0-x86_64-ebs",
        "ImageType": "machine",
        "Public": true,
        "OwnerId": "137112412989",
        "PlatformDetails": "Linux/UNIX",
        "UsageOperation": "RunInstances",
        "State": "available",
        "BlockDeviceMappings": [
            {
                "DeviceName": "/dev/xvda",
                "Ebs": {
                    "DeleteOnTermination": true,
                    "SnapshotId": "snap-07f3b72092a551eb6",
                    "VolumeSize": 8,
                    "VolumeType": "standard",
                    "Encrypted": false
                }
            }
        ],
        "Description": "Amazon Linux 2 Kernel 5.10 AMI 2.0.20240529.0 x86_64 HVM ebs",
        "EnaSupport": true,
        "Hypervisor": "xen",
        "ImageOwnerAlias": "amazon",
        "Name": "amzn2-ami-kernel-5.10-hvm-2.0.20240529.0-x86_64-ebs",
        "RootDeviceName": "/dev/xvda",
        "RootDeviceType": "ebs",
        "SriovNetSupport": "simple",
        "VirtualizationType": "hvm",
        "DeprecationTime": "2025-07-01T00:00:00.000Z"
    },

Note that we chose an Amazon Linux 2 AMI for compatibility with our GPU instances.

We are using the G4dn series because they are the cheapest GPUs one can use. Specially with the “spot instance discount”.

In the next section we will also talk about why you should use Amazon Linux 2 and not Amazon Linux 2023.

Testing the limits of `ssh -X`

Our first attempt at getting into graphics was to simply run ssh and enable X111 forwarding via a command such as

ssh -X ec2-user@${EC2}

First set var.dev_machines to 1 and var.gpus and var.windows_gpu_machines to 0`, and then run

./run-cmd-in-shell.sh terraform init

Now set the my_ip variable as follows

export TF_VAR_my_ip=$(curl https://cloudflare.com/cdn-cgi/trace | grep ip | awk -F= '{print $2}')

(We could have also set it using -var my_ip="x.x.x.x".)

Then,

./run-cmd-in-shell.sh terraform plan -out a.plan

And apply the plan

./run-cmd-in-shell.sh terraform apply a.plan

You will get a plain old t3 EC2 running Amazon Linux 2.

Note: we initially tried doing the X111 forwarding on Amazon Linux 2023, AL2023, but we discovered that we couldn’t even install apps such as xeyes to test that our ssh forwarding was correct. After a lot of googling, chatGPTing, and Claude-ing we sort of gave up on this approach and found this AWS docs page Prerequisites for Linux NICE DCV servers. Which in the “Install a desktop environment and desktop manager”, in the “Amazon Linux 2” tab, it mentions the following:

Currently, NICE DCV is not compatible with Amazon Linux 2023. AL2023 does not include a graphical desktop environment which is required for NICE DCV to run.

This gave us the hint that AL2 was the OS to use - and that we should try something like NICE DCV but more on that later.

Anyways, that’s why we ended up using the AMI ami-04064f2a9939d4f29. You can get more info on it by running

./run-cmd-in-shell.sh aws ec2 describe-images --image-ids ami-04064f2a9939d4f29

Anyway, back to our adventure. If you look at the user data for the non-GPU linux EC2 we just created, it install freeglut, which is the library used by the “CUDA by example” book, and the library used by our opengl_test.cpp file we included above.

Once the EC2 is up, go on and SSH into it

export EC2="<public DNS name>"

ssh -X ec2-user@${EC2}

At this point, if you type

xeyes

Then you should see a cute lil pop up. Otherwise you need to go figure out how to get the X111 forwarding through SSH configured.

But assuming, you saw the eyes, let’s go and copy our test program into the EC2 and let’s try compiling it

scp ./opengl_test.cpp ec2-user@${EC2}:/home/ec2-user

The compilation instructions are in the header of the file but for completeness You compile with

g++ -c opengl_test.cpp -I/usr/include/GL

and link with

g++ -o opengl_test opengl_test.o -lGL -lGLU -lglut

Both of these should work, otherwise freeglut was not installed correctly.

Running the progra should then give you this error

$ ./opengl_test
freeglut (./opengl_test):  ERROR:  Internal error <FBConfig with necessary capabilities not found> in function fgOpenWindow

You can google and AI all you want but this error essentially means that we need a “remote desktop”. Spoiler, using NICE DCV will provide us with the perfect environment to actually do graphics programming in Linux instances. (We will talk about Windows in a later post when we talk about DirectX.)

Also, we will proceed with running NICE DCV on a GPU instance because after figuring out that NICE DCV might be our solution, we read Prerequisites for Linux NICE DCV servers. and saw that not only is AL2023 not suitable for graphics programming, but when we attempt to do graphics programming without a GPU we also need to install things such as the XDummy driver, which allows the X server to run with a virtual framebuffer when no real GPU is present.

Spining Up a GPU

At this point we will go full into running NICE DCV on a GPU machine. The following section will outline how we looked for AMIs compatible with GPUs - because you always want an AMI with NVIDIA drivers instead of installing them yourself!

Finding a GPU AMI

While checking out what AMIs were recommended through the launch wizard, we came across the AMI ID ami-0296a329aeec73707 published by amazon with the title “Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.2.0 (Amazon Linux 2) 20240521”. We can query info about it as follows:

./run-cmd-in-shell.sh aws ec2 describe-images --owners amazon --image-ids ami-0296a329aeec73707

We kept searching for AMIs with the following query

./run-cmd-in-shell.sh aws ec2 describe-images --owner 898082745236 --filters "Name=platform-details,Values=Linux/UNIX" "Name=architecture,Values=x86_64"  "Name=name,Values=*Amazon Linux 2*" "Name=creation-date,Values=2024-05*" "Name=description,Values=*G4dn*" > out.json

and found this candidate

{
    "Architecture": "x86_64",
    "CreationDate": "2024-05-22T09:42:47.000Z",
    "ImageId": "ami-0c4b8684fc96c1de0",
    "ImageLocation": "amazon/Deep Learning OSS Nvidia Driver AMI (Amazon Linux 2) Version 78.2",
    "ImageType": "machine",
    "Public": true,
    "OwnerId": "898082745236",
    "PlatformDetails": "Linux/UNIX",
    "UsageOperation": "RunInstances",
    "State": "available",
    "BlockDeviceMappings": [
        {
            "DeviceName": "/dev/xvda",
            "Ebs": {
                "DeleteOnTermination": true,
                "Iops": 3000,
                "SnapshotId": "snap-0af15a9e4c4b2e59c",
                "VolumeSize": 105,
                "VolumeType": "gp3",
                "Throughput": 125,
                "Encrypted": false
            }
        }
    ],
    "Description": "Supported EC2 instances: G4dn, G5, G6, Gr6, P4d, P4de, P5. PyTorch-2.1, TensorFlow-2.16. Release notes: https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html",
    "EnaSupport": true,
    "Hypervisor": "xen",
    "ImageOwnerAlias": "amazon",
    "Name": "Deep Learning OSS Nvidia Driver AMI (Amazon Linux 2) Version 78.2",
    "RootDeviceName": "/dev/xvda",
    "RootDeviceType": "ebs",
    "SriovNetSupport": "simple",
    "VirtualizationType": "hvm",
    "DeprecationTime": "2026-05-22T09:42:47.000Z"
},

Running the GPU

Once we had our AMI for G4dn GPUs we went and created one.

Again, the commands are:

./run-cmd-in-shell.sh terraform init

Then you get your current IP[v4],

export TF_VAR_my_ip=$(curl https://cloudflare.com/cdn-cgi/trace | grep ip | awk -F= '{print $2}')

(We could have also set it using -var my_ip="x.x.x.x".)

Then,

./run-cmd-in-shell.sh terraform plan -out a.plan

Then apply the plan

./run-cmd-in-shell.sh terraform apply a.plan

And to clean up

./run-cmd-in-shell.sh terraform destroy

You can find logs in

cat /var/log/cloud-init-output.log

Setting up NICE DCV

There are two good sources of docs here

What is DCV. You’ll really need to read the docs though!
Deploy an EC2 instance with NICE DCV

Read them both but follow the instructions in the first link! We did that and thats how we came up with the user data that was passed to the GPU EC2.

Once the EC2 is ready we will perform the following checks. First check, taken from Prerequisites for Linux NICE DCV servers

sudo DISPLAY=:0 XAUTHORITY=$(ps aux | grep "X.*\-auth" | grep -v grep | sed -n 's/.*-auth \([^ ]\+\).*/\1/p') glxinfo | grep -i "opengl.*version"

Then we will perform a couple more commands from Post-Installation checks

sudo DISPLAY=:0 XAUTHORITY=$(ps aux | grep "X.*\-auth" | grep -v grep | sed -n 's/.*-auth \([^ ]\+\).*/\1/p') xhost | grep "SI:localuser:dcv$"

This one is ok if it doesn’t return anything.

sudo DISPLAY=:0 XAUTHORITY=$(ps aux | grep "X.*\-auth" | grep -v grep | sed -n 's/.*-auth \([^ ]\+\).*/\1/p') xhost | grep "LOCAL:$"

This one should return something.

This one should return no errors, maybe just an info item.

sudo dcvgldiag

To check that the DCV server is running do

sudo systemctl status dcvserver

And to get the fingerprint of its self-signed certificate (we’ll needed when we actually sign in)

dcv list-endpoints -j

Now, we need to give ec2-user an actual password

sudo passwd ec2-user

And create a session.

dcv create-session dcvdemo

At this point we are ready to use NICE DCV.

We ended up storing the “health check” commands in a shell script called /home/ec2-user/dcv-diagnostics.sh and the commands to get the DCV’s server fingerprint, create a session, and configure the password for the ec2-user in /home/ec2-user/dcv-diagnostics.sh.

That way we could spin up the EC2, wait some 3 minutes for everything in the user data to execute and for the machine to reboot, then run those two scripts and be up and running.

If you were to try running the openGL test script (it relies on freeGlut),

scp ./opengl_test.cpp ec2-user@${EC2}:/home/ec2-user

The compilation instructions are in the header of the file but for completeness You compile with

g++ -c opengl_test.cpp -I/usr/include/GL

and link with

g++ -o opengl_test opengl_test.o -lGL -lGLU -lglut

You should now be able to see something.

Configuring our EC2 to Learn OpenGL

Following the instructions in learnopengl.com/, we began by going over to github.com/Dav1dde/glad to obtain the bindings we need for GLFW.

We went to the author’s website and chose the following options:

Language: C/C++
API: gl Version 4.6
Profile: Core
Options: check the box for “Generate a loader”
Click Generate

We stored the generated files under a local directory called learning-opengl and pushed them up to our EC2 as follows.

scp -r learning-opengl/ ec2-user@${EC2}:/home/ec2-user/src

The next step was to download, compile, and install github.com/glfw/glfw/releases. We found, through trial, the process to go as follows:

mkdir glfw
cd glfw/
wget -O glfw.zip https://github.com/glfw/glfw/releases/download/3.4/glfw-3.4.zip
unzip glfw.zip
cd glfw-3.4/

To see the avialbale generators, the possible arguments for cmake . -B build -G <generator> you can run cmake --help. In our case "Unix Makefiles" was the default generator so we proceeded with

cmake -S . -B build

After some failures to compile, we eventually ended up installing these additional packages,

sudo yum install -y libX11-devel libXrandr-devel libXinerama-devel libXcursor-devel libXi-devel
sudo yum install -y wayland-devel wayland-protocols-devel libxkbcommon-devel

And finally, we could compile it,

cd build/
make
sudo make install

Compile the test program,

g++ -std=c++11 -I./glad/include -c test_glad.c
g++ -std=c++11 -I./glad/include -c glad.c

And link it,

g++ -o test_glad test_glad.o glad.o -lGL -lglfw3 -lX11 -lpthread -lXrandr -lXi -ldl

lGL`: Links against the OpenGL library.
lglfw3`: Links against the GLFW library.
lX11`: Links against the X11 library.
lpthread`: Links against the POSIX threads library.
lXrandr`: Links against the X11 RandR extension library.
lXi`: Links against the X11 Xinput extension library.
ldl`: Links against the dynamic linking library.

Our glad code ended up as follows:

ls -R glad
include/ src/

glad/include:
KHR  glad

glad/include/KHR:
khrplatform.h

glad/include/glad:
glad.h

glad/src:
glad.c

And we have ben using a Makefile like this to compile our code for the book

CC := g++
# Compiler flags:
# -g    adds debugging information to the executable file
# -Wall turns on most compiler warnings
# -Wextra https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wextra
# -Werror Make all warnings into errors.
GLAD_LIB := ./../glad
CFLAGS := -g -O2 -Wall -Wextra -Wshadow -Werror -std=c++17 -I$(GLAD_LIB)/include -I./../include/nothings-stb #-std=gnu++latest # std=c++20 -std=c++17 -std=c++14 -std=c++11
LFLAGS := -lGL -lglfw3 -lX11 -lpthread -lXrandr -lXi -ldl

TARGET = main.out
SOURCES = $(wildcard *.c)
GLAD_SOURCE = $(GLAD_LIB)/src/glad.c
GLAD_OBJ := $(notdir $(GLAD_SOURCE:.c=.o))
OBJECTS = $(SOURCES:.c=.o) $(GLAD_OBJ)


# Rule to link the program.
# $@ is the name of the target.
# $^ stands for all the items in the dependencies list.
$(TARGET): $(OBJECTS)
	$(CC) -o $@ $^ $(LFLAGS)

# Rule to compile every .cpp to an .o
# The -c flag says to generate the object file,
# the -o $@ says to put the output of the compilation in the file named on the left side of the :,
# the $< is the first item in the dependencies list, and CXXFLAGS are the flags passed to the compiler.
%.o: %.cpp
	$(CC) $(CFLAGS) -c $< -o $@

$(GLAD_OBJ): $(GLAD_SOURCE)
	$(CC) $(CFLAGS) -c $< -o $(notdir $@)

clean:
	rm -f *.o $(TARGET)

Day to Day Tips

scp -r learning-opengl/ ec2-user@${EC2}:/home/ec2-user/src

scp -r ec2-user@${EC2}:/home/ec2-user/src/learning-opengl/chapter-01-triangle/shaders_class learning-opengl/chapter-01-triangle/

But even easier is to do

rsync -rvzP learning-opengl ec2-user@${EC2}:/home/ec2-user/src

and

rsync -rvzP ec2-user@${EC2}:/home/ec2-user/src/learning-opengl/ learning-opengl

Jupyter Tips

This tip is extra but we wanted to have it handy. So if for some reason you end up needing some Jupyter in your life, there are ready to use images in quay.io/organization/jupyter. That repository is documented in jupyter-docker-stacks.

docker run -p 8888:8888 quay.io/jupyter/scipy-notebook:python-3.11

You can then view it locally by running

ssh -L 8888:127.0.0.1:8888 ubuntu@${EC2}

Downloading and Using Source Code in MacOS

There is this wonderful feature in macs that makes it so that any files downloaded from the internet get a “quarantined” attribute to prevent them from being executed. For example, we saw this when downloading the source code for developer.nvidia.com/cuda-example.

After unziping it, we checked if the qurantine attribute was associated with the files using the following command

ls -l@

And we removed it the following command

xattr -r -d com.apple.quarantine cuda_by_example

Conclusion

The GPU lab we have created should allow you to work through the Learn OpenGL tutorials. And you now have freeGlut, openGL, and GLFW all installed on an EC2 via user data, along with NICE DCV. Plus, a Makefile to build any projects you build while reading through the tutorials.

In the next post we will talk about the windows GPU instance you might have noticed that’s part of the gpu sandbox/lab Terraform workspace. That machine is there to us learn DirectX/Direct3D following the book “Introduction to 3D Game Programming with Direct3D 12.0”. This book was our other entrypoint into the GPU Gems books.

Seafoodfry Cloud computing, Graphics, engineering, and some fun AI stuff