The site currently runs on sitecore 6.2, although we have plans to move to 6.4. In production we currently have 2 cms servers ( one is a redundant backup ) and 4 front end servers sitting behind a load balancer just serving content.
To manage the state across the servers we use memcached to store session data. We have two memcached servers to provide redundancy.
Cache clearing across the front end servers is done using the staging module ( it does have its quirks admittedly).
The database server is a huge clustered beast to ensure we have some redundancy. Each web server has it's own web database, so that we can safely update one server at a time.
We also employ a CDN provider called limelight to handle the serving of media such as images, video, CSS & JS. This takes a huge strain off of the servers as sitecore usually stores the images in the database.
To help reduce the load on the database server, we store all of the news, videos ect in a lucene index. This saves us having to do lots of really slow GetItems() queries.
We have also set up some maintenance tasks to ensure the database indexes do not get fragmented. At one point, our indexes were 98% fragmented and our database server was taking a pounding. A quick rebuild on the main tables and it made a huge difference.
Below is a query to help you find out how fragmented a table is:
DECLARE @db_id SMALLINT;
DECLARE @object_id INT;
SET @db_id = DB_ID(N'ManCitySitecore_Web_Phase3');
SET @object_id = OBJECT_ID(N'Items');
IF @db_id IS NULL
BEGIN;
PRINT N'Invalid database';
END;
ELSE IF @object_id IS NULL
BEGIN;
PRINT N'Invalid object';
END;
ELSE
BEGIN;
SELECT * FROM sys.dm_db_index_physical_stats(@db_id, @object_id, NULL, NULL , 'LIMITED');
END;
GO