At Adify, my responsibilities covered two of the core systems.
As ads are served we received a feed of data regarding what was served, where and to whom. My primary responsibility was the system that processed these incoming logs. During my tenure the system migrated from a stovepipe application just barely able to keep pace with the rate of data into a system which could be extended simply, could be distributed to several machines cleanly, and would have been able to keep up with 5 times the traffic using cheaper hardware. By converting the project into a chain of responsibility, new tasks could be added simply and quickly, while allowing the entire process to be controlled via configuration.
Once log processing was sorted out I piloted the company's MapReduce project. I was responsible for the analysis of which framework to use, heavily involved in the initial estimations of size, load, and capabilities, wrote a number of the MapReduce jobs (and queries), and built the framework that controlled access to the cluster from our applications. Ultimately applications could submit jobs using simple LiNQ queries, backed by the cluster's massive datastore. I led development of all other aspects of the cluster. The production cluster was ultimately responsible for holding nearly a trillion unique events. The ongoing plan is for the cluster to become the source for aggregated data, rather than aggregating data in our traditional databases.