I’ve been using PyTorch with a GPU for my side projects for four years. However, I often hesitated to upgrade my graphics card drivers, as doing so might inadvertently break the development environment for my PyTorch projects. To address this, I started using Dev Containers with Docker, which supports GPU integration.
In the beginning, I created separate but similar Dockerfiles for each project. However, I quickly realized that my disk space was being exhausted due to the container images. These container images shared the same Python packages, such as NumPy, Pandas, and PyTorch, but each had its own copy in different container images, which consumed a lot of storage resources.
Goal
To avoid saving multiple copies of the same packages, I decided to create a Dev Container template for VS Code. This template should:
- Share common Python packages across multiple Dev Container projects without duplicating them in each project.
- Enable each Dev Container project to customize its own dependencies.
I am going to document my thoughts on building the template in the following sections. Feel free to visit my GitHub repository for this template if you would like to use it directly.
pytorch-dev-container
twbrandon7 • Updated Jan 28, 2025
Basic Ideas
Use Poetry to Manage Dependencies
Poetry is a tool that simplifies project dependency management and packaging. It provides a
poetry.lock file, which records the exact versions of all dependencies, including sub-dependencies. Similar to using npm install or yarn add to add dependencies to a Node.js project, we can simply use the poetry add command to install our desired Python packages, and Poetry will handle the tracking of package versions.I use Poetry in this template to manage both shared and project-specific dependencies.
The Key to Save Disk Space
Container images are actually a set of stacked layers. Each layer is immutable and records the changes to the filesystem. For example, a
FROM debian instruction in a Dockerfile will form a layer, and each RUN instruction will also form a layer. If multiple images share the same layers, there will only be a single copy of each layer on the disk. This is key to avoiding multiple copies of the same dependencies and saving disk space.graph TD; B[PyTorch and CUDNN] --> C[Python and pip] --> D[Debian base];
Use docker-compose to Dynamically Create Named Volumes for Dev Containers
Adding Persistent Storage
- I use Docker Compose to define the Dev Container instead of directly specifying the image in
devcontainer.json. If we define a named volume indocker-compose.yml, Docker will create independent named volumes for each new project we start with the Dev Container, even if the volume names are the same.
- Explain how to configure volumes in the
docker-compose.ymlfile.
- Showcase examples of mapping host directories or using named volumes.
Advantages of Using This Template
- Consistency Across Projects: Ensures all team members work in identical environments.
- Scalability: Easily adapts to new projects with minor customizations.
- Resource Efficiency: Minimizes duplicate installations and disk space usage.
Conclusion
- Reinforce the benefits of the template for PyTorch developers.
- Invite readers to try the project and share feedback.
- Provide links to the GitHub repository and any relevant documentation.