Oh, that's why you should use Varnish and not Squid.

The really short answer is that computers do not have two kinds of storage any more.

It used to be that you had the primary store, and it was anything from acoustic delaylines filled with mercury via small magnetic dougnuts via transistor flip-flops to dynamic RAM.

And then there were the secondary store, paper tape, magnetic tape, disk drives the size of houses, then the size of washing machines and these days so small that girls get disappointed if think they got hold of something else than the MP3 player you had in your pocket.

And people program this way.

They have variables in "memory" and move data to and from "disk".

Take Squid for instance, a 1975 program if I ever saw one: You tell it how much RAM it can use and how much disk it can use. It will then spend inordinate amounts of time keeping track of what HTTP objects are in RAM and which are on disk and it will move them forth and back depending on traffic patterns.

Well, today computers really only have one kind of storage, and it is usually some sort of disk, the operating system and the virtual memory management hardware has converted the RAM to a cache for the disk storage.

So what happens with squids elaborate memory management is that it gets into fights with the kernels elaborate memory management, and like any civil war, that never gets anything done.

What happens is this: Squid creates a HTTP object in "RAM" and it gets used some times rapidly after creation. Then after some time it get no more hits and the kernel notices this. Then somebody tries to get memory from the kernel for something and the kernel decides to push those unused pages of memory out to swap space and use the (cache-RAM) more sensibly for some data which is actually used by a program. This however, is done without squid knowing about it. Squid still thinks that these http objects are in RAM, and they will be, the very second it tries to access them, but until then, the RAM is used for something productive.

This is what Virtual Memory is all about.

If squid did nothing else, things would be fine, but this is where the 1975 programming kicks in.

After some time, squid will also notice that these objects are unused, and it decides to move them to disk so the RAM can be used for more busy data. So squid goes out, creates a file and then it writes the http objects to the file.

Here we switch to the high-speed camera: Squid calls write(2), the address i gives is a "virtual address" and the kernel has it marked as "not at home".

So the CPU hardwares paging unit will raise a trap, a sort of interrupt to the operating system telling it "fix the memory please".

The kernel tries to find a free page, if there are none, it will take a little used page from somewhere, likely another little used squid object, write it to the paging poll space on the disk (the "swap area") when that write completes, it will read from another place in the paging pool the data it "paged out" into the now unused RAM page, fix up the paging tables, and retry the instruction which failed.

Squid knows nothing about this, for squid it was just a single normal memory acces.

So now squid has the object in a page in RAM and written to the disk two places: one copy in the operating systems paging space and one copy in the filesystem.

Squid now uses this RAM for something else but after some time, the HTTP object gets a hit, so squid needs it back.

First squid needs some RAM, so it may decide to push another HTTP object out to disk (repeat above), then it reads the filesystem file back into RAM, and then it sends the data on the network connections socket.

Did any of that sound like wasted work to you ?

Hm, instructive.

7 responses
The Varnish Architect's Notes are to be studied and taken to heart.
My one concern with this approach is that it doesn't fully take into account some aspects of the way memory is organized, and the possible sizes of objects located in that memory. Consider a program with millions of ten-byte objects running on a computer with only ten user memory pages available. If the access pattern for these ten-byte objects is random, you are going to be doing a ton of thrashing, because every time you fault in a page for an object you need, you're also faulting in around four hundred objects you don't. I don't have the time to work out the exact amount of seeks it's going to take to access all of this data given a random read pattern, but it's a lot. On the order of the number of objects in the set.

So it's not sufficient to just create objects if you want to take advantage of the swap file, at least not in a system where there are many small objects. You have to do it in some kind of locality aware fashion such that objects you access frequently are located nearby one another. It's not as simple as "leave them in RAM and let the hard drive take care of it for you". Of course if this guy is as good as he sounds, he's going to explain that in the remainder of the article, which I'm about to read. ;)

I'd also like to see a "least loaded of N" policy in their load-balancing system. http://varnish-cache.org/wiki/LoadBalancing
The source code is written like a kernel too. It's a problem when half of your implementation modules have three-letter abbreviations for file names.
(Hi James. Long time, no see.)

You make a theoretically valid point regarding memory thrashing under random workloads. I've never seen a random workload on the Internet.

To your point about the naming conventions used in the source code: what would you use instead of C? I don't ask to be combative, I just think it's too bad there isn't much better. C++ brings namespaces but also brings mangled symbols, terrible compiler errors, wonky new syntax and at least one extra way to do everything. Java and friends have the garbage collector. Go seems pretty sweet but I haven't given it enough of a test drive to say more.

I'm fine with C. :) I'd just like to see names more inviting to a newbie than vss.c. As best I can tell it's "varnish something socket," but that kind of brevity is false economy. There are bunches of other files in there with similarly unenlightening names. The code within is sometimes even worse -- whole headers filled with tens of variables and methods all named with four or fewer characters.

vmb.h (http://varnish-cache.org/browser/trunk/varnish-cache/include/vmb.h) is one of my . . . erm . . . "favorites." v_mem_barrier.h would be better. But all of them are similarly obtuse.

Sure, the workload is unlikely to be true-random. I'm only saying care needs to be taken to organize your allocations such that, if you have multiple objects on a page, they are likely to be used together. I'm certain varnish does that, or else mostly caches objects that are larger than a page, or else doesn't have to fault in more than about fifty pages/disk/sec, or is expected to be backed by page files on non-magnetic media.

Probably never read InnoDB filenames, then. You won't like what you see!