2d sprites, part 2 : VBOs
Last time we built a simple particle system using immediate mode, now we will replace the immediate mode calls with streaming Vertex Buffer Objects (VBOs) to avoid the overhead of calling gl:tex-coord and gl:vertex 4 times each for each particle.
VBOs let you specify a set of parameters for a group of vertices in advance, store it on the graphics card, then draw a bunch of geometry at once with a single function call. This is particularly important for 3d engines, where you could be saving thousands of calls per object. For our current purposes, we don't get quite as much benefit, since we need to update the positions of every particle every frame. We avoid the extra function calls though, so it should still be an improvement.
(We will be using cffi and some lower level parts of cl-opengl for this, since the current vertex array/vbo support in cl-opengl isn't a very good fit for what we want here.)
We'll start from the code from part 1, and make a variant of click-particles that uses VBOs instead of immediate mode for drawing.
(defclass vbo-click-particles (click-particles) ((vbo :initform (car (gl:gen-buffers 1)) :accessor vbo)))
gl:gen-buffers works like gl:gen-textures, and creates a name for a new VBO (or one of a few other types of buffer objects we won't worry about for now). We store a VBO for each instance, which might not be the best strategy, but is good enough for this case where we generally have a reasonable number of sprites per system. We wouldn't want to use just 1 for the entire app, since then the GPU wouldn't be able to start drawing until we had filled the entire thing, and if they were too small, managing them would start to get inefficient.
(defun vbo-rectangle (pointer offset x y width height &optional (u1 0) (v1 0) (u2 1) (v2 1)) (let* ((w/2 (/ width 2.0)) (h/2 (/ height 2.0)) (x1 (- x w/2)) (x2 (+ x w/2)) (y1 (- y h/2)) (y2 (+ y h/2))) (macrolet ((store-values (&rest v) `(progn ,@(loop for i in v collect `(setf (cffi:mem-aref pointer :float (1- (incf offset))) (float ,i 0.0)))))) (store-values u1 v2) (store-values x1 y1) (store-values u2 v2) (store-values x2 y1) (store-values u2 v1) (store-values x2 y2) (store-values u1 v1) (store-values x1 y2))))
This is pretty much the same as what we had before, except it stores the values in increasing offsets in a block of foreign memory instead of calling directly to GL. The layout is determined by how we set things up for the draw call, in this case we will set it up to match the order of the calls from the old version.
We could (and probably will in a later part) make this faster, though. Since the values for the texture coordinates never change, we could put them in a separate VBO and initialize that once, leaving us half as much data to send to the GPU every frame.
(defmethod draw-object ((o vbo-click-particles)) (let ((components-per-vertex 4) (size-of-float 4) (vertices-per-sprite 4)) ;; activate the VBO (gl:bind-buffer :array-buffer (vbo o)) ;; allocate space for the data (%gl:buffer-data :array-buffer (* (/ (length (positions o)) 2) vertices-per-sprite components-per-vertex size-of-float) (cffi:null-pointer) ;; just allocating space :stream-draw) ;; get a pointer to the VBO memory (gl:with-mapped-buffer (p :array-buffer :write-only) ;; copy the vertex data to the buffer (loop with positions = (positions o) for i below (length positions) by 2 for offset from 0 by (* components-per-vertex vertices-per-sprite) for x = (aref positions i) for y = (aref positions (1+ i)) do (vbo-rectangle p offset x y 32 32))) ;; draw the object (%gl:tex-coord-pointer 2 :float (* components-per-vertex size-of-float) (cffi:null-pointer)) (%gl:vertex-pointer 2 :float (* components-per-vertex size-of-float) (cffi:make-pointer (* 2 size-of-float))) (gl:enable-client-state :texture-coord-array) (gl:enable-client-state :vertex-array) (%gl:draw-arrays :quads 0 (* vertices-per-sprite (/ (length (positions o)) 2))) (gl:disable-client-state :vertex-array) (gl:disable-client-state :texture-coord-array)) ;; finally disable the buffer (gl:bind-buffer :array-buffer 0))
Lots of new things here.
The %gl: package is the part of cl-opengl with the low level CFFI bindings to the OpenGL C library. It requires managing pointers and types by hand more than the functions exposed in the gl: package, but is useful for situations where you need more control than the functions in gl: provide, or for when there just isn't a nice version implemented in gl: yet.
We start by calling gl:bind-buffer, which sets the current object to be modified by later calls until we deactivate it by binding 0. :array-buffer specifies that we are binding a VBO to be used for storing vertex data (as opposed to indices or one of various non-VBO buffers).
Next we call %gl:buffer-data, to tell GL to allocate some space for our vertices. If we already had the vertex data in the correct format, we could have passed it directly to %gl:buffer-data instead of passing (cffi:null-pointer). We allocate a new buffer every time by passing (cffi:null-pointer) to let GL keep the previous contents if it hasn't finished drawing them yet, while letting us start filling the new buffer. The :stream-draw parameter tells GL that we intend to fill up the buffer, draw from it once then throw it away, so it can decide where to allocate the buffer to handle that usage pattern efficiently. Another common option is :static-draw, for data that will be specified once and used for drawing repeatedly.
Once we have the buffer ready, we use gl:with-mapped-buffer to get a cffi:foreign-pointer to some memory into which we write the vertex data. We specify :write-only, so we should not try to read from that pointer, and GL doesn't need to waste time initializing it (or copying the old contents of the buffer into it if we hadn't reallocated it already).
After the body of gl:with-mapped-buffer finishes, cl-opengl takes care of unmapping the buffer, at which point the VBO is ready to use. We set up two pointers into the data with %gl:tex-coord-pointer and %gl:vertex-pointer. In both cases, we are passing 2 floats per vertex, with (* components-per-vertex size-of-float) bytes between each vertex. The texture coordinates start at offset 0 (represented by (cffi:null-pointer)) and the vertex position start 2 floats into the buffer (represented by (cffi:make-pointer (* 2 size-of-float))). We need to use pointers for the offsets for historical reasons — originally the API was intended for use with arrays managed by the user, in which case we would be specifying the actual pointer to the buffer there instead of an offset (that still works, if no VBO is bound to :array-buffer, but is deprecated/removed in GL 3/3.1).
Once everything is set up, we call gl:enable-client-state to tell GL we are using the data specified by those two pointers, and then use %gl:draw-arrays to actually draw the contents of the array, starting from vertex 0 and continuing for (* vertices-per-sprite (/ (length (positions o)) 2)) vertices. Finally, we disable the arrays, and bind buffer 0 to disable the VBO.
Since we store resources allocated from GL in the vbo-click-particles instances, we need to be a bit more careful about making sure we don't leave leftover particle systems running when the main loop exits, so lets clean that up as well.
(defgeneric delete-particle-system (object)) (defmethod delete-particle-system ((object vbo-click-particles)) (gl:delete-buffers (list (vbo object)))) (defgeneric delete-particle-systems (manager)) (defmethod delete-particle-systems ((manager particle-manager)) (mapc 'delete-particle-system (systems manager))) (defmacro with-particle-manager (&body body) (let ((manager (gensym "PARTICLE-MANAGER-"))) `(let* ((,manager (make-instance 'particle-manager)) (*particle-manager* ,manager)) (unwind-protect (progn ,@body) (delete-particle-systems ,manager))))) (defun main-loop () (sdl:with-init () (sdl:window *nominal-screen-width* *nominal-screen-height* :flags (logior sdl:sdl-opengl sdl:sdl-resizable)) (setf cl-opengl-bindings:*gl-get-proc-address* #'sdl-cffi::sdl-gl-get-proc-address) (let ((previous-tick (sdl:sdl-get-ticks))) (flet ((mx (x) ;; adjust mouse coordinates from screen to world (* x (/ (float *screen-width* 1.0) *actual-screen-width*))) (my (y) ;; adjust mouse coordinates from screen to world (* y (/ (float *screen-height* 1.0) *actual-screen-height*)))) (with-texture-manager (with-particle-manager (init) (setup-ortho-projection *nominal-screen-width* *nominal-screen-height*) (sdl:with-events () (:quit-event () t) (:video-resize-event (:w w :h h) (sdl:resize-window w h) (reload-textures *texture-manager*) (restartable (setup-ortho-projection w h))) (:key-down-event (:state state :scancode scancode :key key :mod-key mod-key :unicode unicode) (restartable (key-down key state mod-key scancode unicode))) (:key-up-event (:state state :scancode scancode :key key :mod-key mod-key :unicode unicode) (restartable (key-up key state mod-key scancode unicode))) (:mouse-button-up-event (:button button :state state :x x :y y) (restartable (mouse-up button state (mx x) (my y)))) (:mouse-button-down-event (:button button :state state :x x :y y) (restartable (mouse-down button state (mx x) (my y)))) (:mouse-motion-event (:x x :y y :x-rel delta-x :y-rel delta-y) (setf *mouse-x* (mx x) *mouse-y* (my y)) (restartable (mouse-move (mx x) (my y) (mx delta-x) (my delta-y)))) (:idle () #+(and sbcl (not sb-thread))(restartable (sb-sys:serve-all-events 0)) (let ((delta-t (- (sdl:sdl-get-ticks) previous-tick))) (setf previous-tick (sdl:sdl-get-ticks)) ;; we check for negative delta-t in case sdl's ;; timer wraps after some amount of time, 0 for ;; a frame is better a large negative number (restartable (update (if (minusp delta-t) 0 delta-t)))) (restartable (draw)) (restartable (bt:with-lock-held (*next-frame-hook-mutex*) (loop for i in *next-frame-hook* do (funcall i)) (setf *next-frame-hook* nil)))))))))))
Then we can modify the mouse-down handler to make the new type of particle system instead of the old one.
(defun mouse-down (button state x y) (declare (ignore button state x y)) (push (make-instance 'vbo-click-particles) (systems *particle-manager*)))
Now we can click some more, and get the exact same thing as last time, but with possibly different performance characteristics.
Next time, benchmarking and optimizations.