Uploads development documentation
GitLab Workhorse has special rules for handling uploads. To prevent occupying a Ruby process on I/O operations, we process the upload in workhorse, where is cheaper. This process can also directly upload to object storage.
The problem description
The following graph explains machine boundaries in a scalable GitLab installation. Without any workhorse optimization in place, we can expect incoming requests to follow the numbers on the arrows.
graph TB
subgraph "load balancers"
LB(Proxy)
end
subgraph "Shared storage"
nfs(NFS)
end
subgraph "redis cluster"
r(persisted redis)
end
LB-- 1 -->workhorse
subgraph "web or API fleet"
workhorse-- 2 -->rails
end
rails-- "3 (write files)" -->nfs
rails-- "4 (schedule a job)" -->r
subgraph sidekiq
s(sidekiq)
end
s-- "5 (fetch a job)" -->r
s-- "6 (read files)" -->nfs
We have three challenges here: performance, availability, and scalability.
Performance
Rails process are expensive in terms of both CPU and memory. Ruby global interpreter lock adds to cost too because the Ruby process will spend time on I/O operations on step 3 causing incoming requests to pile up.
In order to improve this, disk buffered upload was implemented. With this, Rails no longer deals with writing uploaded files to disk.
graph TB
subgraph "load balancers"
LB(HA Proxy)
end
subgraph "Shared storage"
nfs(NFS)
end
subgraph "redis cluster"
r(persisted redis)
end
LB-- 1 -->workhorse
subgraph "web or API fleet"
workhorse-- "3 (without files)" -->rails
end
workhorse -- "2 (write files)" -->nfs
rails-- "4 (schedule a job)" -->r
subgraph sidekiq
s(sidekiq)
end
s-- "5 (fetch a job)" -->r
s-- "6 (read files)" -->nfs
Availability
There's also an availability problem in this setup, NFS is a single point of failure.
To address this problem an HA object storage can be used and it's supported by direct upload
Scalability
Scaling NFS is outside of our support scope, and NFS is not a part of cloud native installations.
All features that require Sidekiq and do not use direct upload won't work without NFS. In Kubernetes, machine boundaries translate to PODs, and in this case the uploaded file will be written into the POD private disk. Since Sidekiq POD cannot reach into other pods, the operation will fail to read it.
How to select the proper level of acceleration?
Selecting the proper acceleration is a tradeoff between speed of development and operational costs.
We can identify three major use-cases for an upload:
- storage: if we are uploading for storing a file (i.e. artifacts, packages, discussion attachments). In this case direct upload is the proper level as it's the less resource-intensive operation. Additional information can be found on File Storage in GitLab.
- in-controller/synchronous processing: if we allow processing small files synchronously, using disk buffered upload may speed up development.
- Sidekiq/asynchronous processing: Asynchronous processing must implement direct upload, the reason being that it's the only way to support Cloud Native deployments without a shared NFS.
For more details about currently broken feature see epic &1802.
Handling repository uploads
Some features involves Git repository uploads without using a regular Git client. Some examples are uploading a repository file from the web interface and design management.
Those uploads requires the rails controller to act as a Git client in lieu of the user. Those operation falls into in-controller/synchronous processing category, but we have no warranties on the file size.
In case of a LFS upload, the file pointer is committed synchronously, but file upload to object storage is performed asynchronously with Sidekiq.
Upload encodings
By upload encoding we mean how the file is included within the incoming request.
We have three kinds of file encoding in our uploads:
-
multipart:
multipart/form-data
is the most common, a file is encoded as a part of a multipart encoded request. - body: some APIs uploads files as the whole request body.
- JSON: some JSON API uploads files as base64 encoded strings. This will require a change to GitLab Workhorse, which is planned.
Uploading technologies
By uploading technologies we mean how all the involved services interact with each other.
GitLab supports 3 kinds of uploading technologies, here follows a brief description with a sequence diagram for each one. Diagrams are not meant to be exhaustive.
Rack Multipart upload
This is the default kind of upload, and it's most expensive in terms of resources.
In this case, workhorse is unaware of files being uploaded and acts as a regular proxy.
When a multipart request reaches the rails application, Rack::Multipart
leaves behind temporary files in /tmp
and uses valuable Ruby process time to copy files around.
sequenceDiagram
participant c as Client
participant w as Workhorse
participant r as Rails
activate c
c ->>+w: POST /some/url/upload
w->>+r: POST /some/url/upload
r->>r: save the incoming file on /tmp
r->>r: read the file for processing
r-->>-c: request result
deactivate c
deactivate w
Disk buffered upload
This kind of upload avoids wasting resources caused by handling upload writes to /tmp
in rails.
This optimization is not active by default on REST API requests.
When enabled, Workhorse looks for files in multipart MIME requests, uploading any it finds to a temporary file on shared storage. The MIME data in the request is replaced with the path to the corresponding file before it is forwarded to Rails.
To prevent abuse of this feature, Workhorse signs the modified request with a special header, stating which entries it modified. Rails will ignore any unsigned path entries.
sequenceDiagram
participant c as Client
participant w as Workhorse
participant r as Rails
participant s as NFS
activate c
c ->>+w: POST /some/url/upload
w->>+s: save the incoming file on a temporary location
s-->>-w: request result
w->>+r: POST /some/url/upload
Note over w,r: file was replaced with its location<br>and other metadata
opt requires async processing
r->>+redis: schedule a job
redis-->>-r: job is scheduled
end
r-->>-c: request result
deactivate c
w->>-w: cleanup
opt requires async processing
activate sidekiq
sidekiq->>+redis: fetch a job
redis-->>-sidekiq: job
sidekiq->>+s: read file
s-->>-sidekiq: file
sidekiq->>sidekiq: process file
deactivate sidekiq
end
Direct upload
This is the more advanced acceleration technique we have in place.
Workhorse asks rails for temporary pre-signed object storage URLs and directly uploads to object storage.
In this setup, an extra Rails route must be implemented in order to handle authorization. Examples of this can be found in:
This will fallback to disk buffered upload when direct_upload
is disabled inside the object storage setting.
The answer to the /authorize
call will only contain a file system path.
sequenceDiagram
participant c as Client
participant w as Workhorse
participant r as Rails
participant os as Object Storage
activate c
c ->>+w: POST /some/url/upload
w ->>+r: POST /some/url/upload/authorize
Note over w,r: this request has an empty body
r-->>-w: presigned OS URL
w->>+os: PUT file
Note over w,os: file is stored on a temporary location. Rails select the destination
os-->>-w: request result
w->>+r: POST /some/url/upload
Note over w,r: file was replaced with its location<br>and other metadata
r->>+os: move object to final destination
os-->>-r: request result
opt requires async processing
r->>+redis: schedule a job
redis-->>-r: job is scheduled
end
r-->>-c: request result
deactivate c
w->>-w: cleanup
opt requires async processing
activate sidekiq
sidekiq->>+redis: fetch a job
redis-->>-sidekiq: job
sidekiq->>+os: get object
os-->>-sidekiq: file
sidekiq->>sidekiq: process file
deactivate sidekiq
end
How to add a new upload route
In this section, we'll describe how to add a new upload route accelerated by Workhorse for body and multipart encoded uploads.
Uploads routes belong to one of these categories:
- Rails controllers: uploads handled by Rails controllers.
- Grape API: uploads handled by a Grape API endpoint.
- GraphQL API: uploads handled by a GraphQL resolve function.
CAUTION: Warning: GraphQL uploads do not support direct upload yet. Depending on the use case, the feature may not work on installations without NFS (like GitLab.com or Kubernetes installations). Uploading to object storage inside the GraphQL resolve function may result in timeout errors. For more details please follow issue #280819.
Update Workhorse for the new route
For both the Rails controller and Grape API uploads, Workhorse has to be updated in order to get the support for the new upload route.
- Open an new issue in the Workhorse tracker describing precisely the new upload route:
- The route's URL.
- The upload encoding.
- If possible, provide a dump of the upload request.
- Implement and get the MR merged for this issue above.
- Ask the Maintainers of Workhorse to create a new release. You can do that in the MR
directly during the maintainer review or ask for it in the
#workhorse
Slack channel. - Bump the Workhorse version file to the version you have from the previous points, or bump it in the same merge request that contains the Rails changes (see Implementing the new route with a Rails controller or Implementing the new route with a Grape API endpoint below).
Implementing the new route with a Rails controller
For a Rails controller upload, we usually have a multipart upload and there are a few things to do:
- The upload is available under the parameter name you're using. For example, it could be an
artifact
or a nested parameter such asuser[avatar]
. Let's say that we have the upload under thefile
parameter, readingparams[:file]
should get you anUploadedFile
instance. - Generally speaking, it's a good idea to check if the instance is from the
UploadedFile
class. For example, see how we checked that the parameter is indeed anUploadedFile
.
CAUTION: Caution:
Do not call UploadedFile#from_params
directly! Do not build an UploadedFile
instance using UploadedFile#from_params
! This method can be unsafe to use depending on the params
passed. Instead, use the UploadedFile
instance that multipart.rb
builds automatically for you.
Implementing the new route with a Grape API endpoint
For a Grape API upload, we can have body or a multipart upload. Things are slightly more complicated: two endpoints are needed. One for the Workhorse pre-upload authorization and one for accepting the upload metadata from Workhorse:
- Implement an endpoint with the URL +
/authorize
suffix that will:- Check that the request is coming from Workhorse with the
require_gitlab_workhorse!
from the API helpers. - Check user permissions.
- Set the status to
200
withstatus 200
. - Set the content type with
content_type Gitlab::Workhorse::INTERNAL_API_CONTENT_TYPE
. - Use your dedicated
Uploader
class (let's say that it'sFileUploader
) to build the response withFileUploader.workhorse_authorize(params)
.
- Check that the request is coming from Workhorse with the
- Implement the endpoint for the upload request that will:
- Require all the
UploadedFile
objects as parameters.- For example, if we expect a single parameter
file
to be anUploadedFile
instance, userequires :file, type: ::API::Validations::Types::WorkhorseFile
. - Body upload requests have their upload available under the parameter
file
.
- For example, if we expect a single parameter
- Check that the request is coming from Workhorse with the
require_gitlab_workhorse!
from the API helpers. - Check the user permissions.
- The remaining code of the processing. This is where the code must be reading the parameter (for
our example, it would be
params[:file]
).
- Require all the
CAUTION: Caution:
Do not call UploadedFile#from_params
directly! Do not build an UploadedFile
object using UploadedFile#from_params
! This method can be unsafe to use depending on the params
passed. Instead, use the UploadedFile
object that multipart.rb
builds automatically for you.