So I’ve got this web-based image gallery. The images in it are organized with simple tags (there’s a many-to-many relationship between images and tags). Every image has at least one tag, and almost all images have several tags. When a user wants to look up an image in the gallery, they select one or more tags and are shown thumbnails for all images which have at least those tags. At first the image thumbnails were simply sorted alphabetically by file name, and this worked fine when the size of the collection was modest, but that didn’t last. Soon the collection grew to the point where most queries would give forty or more thumbnails, and it became pretty challenging to find your desired image in the pile. The search was even trickier if you had only a general idea of what you were looking for and couldn’t select lots of tags to narrow things down.
To improve the usability of this thing I had to come up with a way to arrange the image thumbnails in a particular order such that images with similar characteristics were placed near each other. This would make it much, much easier to visually scan the mass of thumbnails. In this post I’ll elaborate on the technical details of the problem and how I developed the algorithm I’m currently using.
So let’s say you’ve got some particles in 2-dimensional space.
There they are. Now let’s say you want any two particles which are very close to each other to interact. You need to know which pairs are close enough. You could take all possible pairs of particles, calculate their distances, and compare those to the minimum distance
d. That’s fine if you have 10 particles, but if you have 5,000 that’s going to add up to over 12,000,000 pairs. That’s too many pairs.
That complexity can be alleviated if for each particle
P we can efficiently find a subset of all particles which includes all those within
P and which includes as few as possible which aren’t within
d. Then we can calculate the distances from
P to the few particles in this subset and determine which ones are actually within distance
Enter: spatial hashing. I’ll show how
boost::unordered_multimap can be used to do this.
compcache. Use it.
The general idea is to set apart some physical memory as swap space, with compression on the pages going in. Much of what’s in RAM is very compressible, so a fast compression algorithm can do a lot. I ran a large portion of a swap file through LZO and got a ratio above 3:1. And it was fast. I’ve never seen compression that fast. The bottom line is: compressing back to RAM is orders of magnitude faster than copying to disk, in more ways than one.
I enabled this feature on an old Pentium III laptop and it works wonderfully. I found it was already present in CrunchBang 9.04, an extremely light Ubuntu variant; I just had to set
COMPCACHE_SIZEin an init script. That machine has 256M memory and a
So I tried it on my workstation with its 2 gig memory. With
COMPCACHE_SIZE="30 %", if the compression ratio holds out, I’ve got more like 3.2 gig, and I’m pushing the limits of my primitive ape address space. A 64 bit machine could have, say, 8 gig RAM. A 50% compcache would be stellar: 4 gig of real RAM, and quick access to another 8 gig or more. There is definitely no need for the disk, and swapping is definitely a waste of SSD life.
To set it up in Ubuntu:
(specific instructions are there in the file)
- disable swap files in
(optional probably. I think compcache gets priority.)
Switching over to OpenGL to do the scaling and presenting was almost criminally easy: rather than inherit QWidget the pixelated view object inherits QGLWidget. That’s it. The CPU load is alleviated without any other changes.
With some simple benchmark animation going on (not accelerated with OpenGL) the total CPU load is fairly reasonable. I believe this configuration (software drawing with hardware pixelation-scaling) is efficient enough for Fishwars, but I intend to look into using OpenGL more. If it’s convenient enough I’ll use hardware acceleration in drawing the fishtank scene itself.
I implemented the context menu minimap in its most superficial form, but I’m having doubts about the utility of such a thing. One major issue is precision of panning control; in my current prototype each screen pixel in the minimap corresponds to 36 screen pixels in the main view. Also, the context menu must be called up to even access this minimap, making panning the scene a two or three step process at least. I think I’ll remove the map from the context menu and make click-drag the primary panning control. The minimap could automatically appear in an appropriate window corner during panning to provide necessary context for the user e.g. what portion of the total scene is visible, whether any interesting action is out of view, etc.
Source for the current prototype: qtinkygl20100715.py
I found that using the scaled() method of the drawing surface could be replaced by a scaling factor set in QPainter to improve performance slightly. The former method creates a scaled up copy of the entire source which is then copied directly to the window, and the latter method simply scales the source as it is copied to the window. I got a little better performance again by using a QImage source rather than QPixmap, but I was still unsatisfied.
While looking for solutions I found this thread, wherein it’s suggested that X11 may be the main source of pain. I tested my program on Windows XP, and it was far faster, but the scaling was still too expensive. I’ve only scratched the surface of Qt’s API, but I like what I’ve seen so far. I don’t want to drop Qt. So even though it seems like overkill I’m going to try to use OpenGL to scale my drawing surface and deliver it to the screen.
It would be nice if I could use OpenGL to do the drawing, too, but I have a few concerns. I need pixel-perfect primitives: individual dots, lines, circles, etc. that I’m not sure OpenGL can easily deliver.