My Biased Coin: November 2009

Tuesday, November 24, 2009

4-year Masters

Harvard, like many other places, has an option by which students (with "Advanced Standing" from AP classes) can obtain a Master's (in some programs) as well as their undergraduate degree in 4 years. The School of Engineering and Applied Sciences, and CS in particular, offers this option.

Every year some students I advise are interested in and want advice about the program. My first question back is always why do they want to do it -- what do they think they'll get out of it? Often, they don't have a good reason; it seems they just see it as an opportunity to get an additional degree, which I don't think is a particularly good reason in and of itself. (Harvard students are, after all, high-achievers and trained to respond to such incentives.) Some possible reasons that come up include:

1) They're going into industry, and they believe that the Master's will give them a higher starting salary (which may, over a lifetime, translate into a substantial benefit).
2) They're going into graduate school, and believe the degree will allow them to move faster through their graduate program, or at least allow them to take fewer classes.
3) They're coming from another area, like economics or chemistry, but have found late in the game that they like computer science, and would like to have a credential in that area in case that is where they move in terms of their career.
4) Their parents have figured out that they can technically graduate in three years and are unwilling to pay for four, but can be convinced to pay for it if there is a Masters degree involved.

Are there other reasons? And how good are these reasons? Is there still a "Master's premium" in starting salaries, even for a "4th year" class-based Master's program? Does starting with a Master's of this kind really get you through graduate school faster anywhere? (Personally, I already think graduate students don't take enough classes, so I'm biased against reason 2.) The third reason -- a student coming in from another field -- is reasonable, but Harvard does now have minors (called "secondary concentrations" here) instead. Pressuring parents is not a reason I can throw support behind, but I can certainly understand it. Maybe it's the best reason on my current list.

Of course, it's important to consider what, if any, are the downsides. Here's my starting list:
1) A loss of flexibility in choosing classes. Doing the two degrees in four years saddles a student with so many requirements they lose the chance to take that art or history class, learn a language, or do some other exploratory things in college. Isn't that part of what college is supposed to be about?
2) It often ends up taking the place of other college experiences, like doing a senior thesis or other research. For students who want to go to graduate school, I'd personally recommend doing a senior thesis over a 4th year master's degree, so that they can begin to get some insight into how research works -- and if they'll really like it.
3) It can be hard. Most graduate courses have large-scale final projects; trying to 6-8 such courses in two or three semesters can be a real challenge, and is certainly a time-commitment.

As always, my biggest goal in advising students on such matters is to make sure they're well-informed and have thought through the various implications of their choices. I'd appreciate any thoughts anyone has on the matter, either from their own personal experience or their experience with students who have done such programs.

Friday, November 20, 2009

In Reverse

The Crimson is reporting that the Harvard Faculty of Arts and Sciences will plan to decrease the size of the faculty* in response to budget woes. The key point seems to be that "There are now 720 associate professors and professors in FAS, an increase of 20 percent since 2000." The reductions will occur in the standard way -- not filling some positions after retirement, and offering some sort of early retirement package for faculty. (Harvard has, comparatively, a much older faculty than most institutions. Here's a 2004 Crimson article on the subject that pointed out that 7% of the tenured faculty was over age 70.)

I admit, I'd like to see some concrete statements that there's to be a similar if not more extensive effort to decrease the size of administration, although to be fair some of that has also been occurring.

* The first comment, by one menckenlite, to the article seems so funny I have to quote it here:
"Disappointed to learn that Harvard is just reducing the number of faculty members. I thought they were going to get smaller professors so that they did not overload the sidewalks and streets with their over sized egos and girths."

Wednesday, November 18, 2009

Job Advice?

Not for me, thank goodness.

A very talented graduating senior (who may or may not be Harvard...) has obtained a number of job offers, and asked me if I had any advice on what job would make the best place to start a career -- or look best on a resume. (Similar questions come up most every year.) I explained that besides having some potential conflicts of interest, I was removed enough from the job circuit these days to not have any useful advice. But I could ask others....

Let's consider a number of possible jobs a talented student might easily obtain:

Software Developer at Google
Program Manager at Microsoft

Developer at Facebook
Entry-level position at a tech-oriented "boutique" consulting firm
Something else you'd like to suggest

What advice would you give them on what to choose? Or how to choose, which is probably more useful?

A warning to students: free advice is often worth what you pay for it....

Monday, November 16, 2009

Conference Reviewing Update

A few weeks ago, I talked about the reviewing process for NSDI and LATIN. I suppose now is a reasonable time for an update.

LATIN is nearing the end of the reviewing process. I think it went well -- my top ranked papers seem to be being accepted, my low ranked papers are not. There's been some electronic discussion of papers where there was wide disagreement, but we're not having an on-site PC meeting, and overall there's been surprisingly little discussion on my papers. Because LATIN is a "2nd tier" conference, I had previously suggested that I expected there would be some wide deviations among review scores, "corresponding to different opinions about how interesting something is". There were in fact some wide scoring discrepancies, though this may not have been the primary reason. I was a reviewer on multiple papers where one reviewer really didn't seem to "get" the paper -- in most cases, ranking it high when I thought the ranking should be lower. (I imagine the scores will change before the reviews come back to the authors.) I've seen similar problems even in other, stronger theory conferences -- selecting 3 reviewers who are expert on the subject of paper in a broad theory conference is very difficult to consistently get right, especially when subreviewers come into play -- though I think it was more problematic here, where the papers are weaker on average in any case. Finally, I still don't like the easychair interface that much.

The NSDI reviews have been, for me, substantially more interesting, no doubt in part because the papers are more interesting (to me). The "first round" is nearing the end, and at least on my papers, the review scores are remarkably consistent. In cases where they aren't consistent, there's usually a clear reason for the discrepancy that comes out in the reviews, which tend to be longer and more detailed for systems conferences. While that's all very satisfying, at this point I'm hoping to be offered some dramatically more controversial papers to look at for Round 2, or I'll be finding the PC meeting pretty boring. (I should note I have a paper submitted to NSDI, so I reserve the right to either completely trash the reviewing system, or sing its praises ever-higher, depending on the eventual outcome.) Finally, I still like the hotcrp interface a lot.

I get asked to serve on a number of PCs, and usually, I make efforts to serve, because I believe such service is important to the community. But I must say, doing these two at roughly the same time has led me to think I'll be more circumspect in the future. The older I get, the more precious time seems to become, and perhaps I've just reached a point where I think I've done enough PC service that I can be choosier about what I agree to, aiming for things that are more enjoyable. At the same time, I wouldn't want everyone to start acting that way, since then I imagine it would be tough to get good PCs together for the many conferences we have.

Thursday, November 12, 2009

Surveys

This post is about surveys. It's motivated by one of my tasks last night, as I has to spend some time going over the final proofs for the survey Hash-Based Techniques for High-Speed Packet Processing, written with Adam Kirsch and George Varghese, which happily will finally be "officially" published (as part of a DIMACS book). The link is to a submitted version; I'll try to find a "final" version to be put up when possible, but the delta is small. I'll take the liberty of complimenting my co-authors on the writing; if you want a quick guide on the connection between hashing and routers, it should be a good starting point.

It occurs to me that I've written a number of surveys -- I mean, a lot*. Indeed, I'm sure in some circles I'm known mostly (and, perhaps, possibly only) for some survey I've written. That is not meant as self-promotion; indeed, I'm well aware that some people would view this as quite a negative. After all, as comments in a recent post felt it important to point out (and argue over), the mathematician Hardy wrote: Exposition, criticism, appreciation, is work for second-rate minds. I would disagree, and I would hope to encourage others to write surveys as well.

I've found writing surveys a useful tool in both doing and promoting my research. I've done surveys for multiple reasons. It's a good way to learn a new topic. It's a good way to bring some closure to a long line of work for oneself. It's a good way to frame and popularize a research direction or a set of open problems. And finally, I've found it's a good way to provide a bridge between the theoretical and practical communities.

Earlier in my career, there didn't seem to be much of a home for publishing surveys. I was fortunate that the journal Internet Mathematics started when it did, and was willing to take surveys. Otherwise, I'm not sure where my surveys on Bloom filter and power laws -- my two most cited (and I would guess read) would have ended up. These days, surveys seem to have become more acceptable, thankfully. The Foundations and Trends series, in particular, have provided a natural outlet that has spurred a number of impressive and useful surveys. I admit these booklets tend to be a bit longer than what I have usually aimed for, but I was usually hoping just to find some journal (or conference) that would take a survey, so length was actually a negative. I imagine someday I'll get up the energy to write something for this series.

But perhaps there are now other mechanisms for producing and publishing surveys. I view Dick Lipton's blog as providing one or more well-written mini-surveys every week (a truly amazing feat). Wikipedia provides a means for what seem to be essentially collaborative mini-surveys to also be written on technical topics; perhaps some Wiki-based tool or archive could be developed that would allow for richer, growing and changing surveys with multiple contributing authors.

In any case, when somebody suggests to you that exposition is for a second-rate mind, keep in mind that not everybody agrees. Writing a survey has become downright respectable. If you feel like disagreeing strongly, in the most vocal way possible, then please, go ahead and write a survey as well.

* Here's a possibly complete list. Co-author information and other related information can be found on my List-of-Papers page. Current links are provided here for convenience.

Some Open Questions Related to Cuckoo Hashing
Hash-Based Techniques for High-Speed Packet Processing.
A Survey of Results for Deletion Channels and Related Synchronization Channels
Human-Guided Search
Toward a Theory of Networked Computation
Network Applications of Bloom Filters: A Survey
Digital Fountains: A Survey and Look Forward
A Brief History of Generative Models for Power Law and Lognormal Distributions
The Power of Two Random Choices: A Survey of Techniques and Results

Tuesday, November 10, 2009

Graduate School? How to Decide...

What do people think about students going to work for a year or two and then applying to graduate school? Or applying but then deferring to work for a year or two?

It's that time of year when seniors are thinking about graduate school. (I have multiple requests for NSF letters pending...) So, naturally, the other day I talked with a student who, essentially, had the question, "Should I go to graduate school?"

In this case, the question wasn't one of talent; the student would, I'm sure, do very well in graduate school. But he also has a job offer from a top company in computing where he could do interesting work and, I'm sure, also do very well.

In these tough situations, I try my best not to give direct advice, but instead try to get a student to talk about their own concerns and issues to help them realize which way they really want to go. While I feel positive about the outcomes from my having gone to graduate school, I'm a very biased sample, and I know lots of others -- very bright, talented, capable people -- who found it wasn't worth it for them. I don't think I would attempt to give advice even if I thought I could perfectly distinguish those who would find great personal success from graduate school from those who won't, and it's perfectly clear to me that I'm far from a perfect distinguisher.

Where possible, I try to give facts. Inevitably, people who find both work and graduate school compelling options want to know how difficult it would be to switch from working back to school. My take was that at the application level, a year or two working generally, at worst, does minimal harm to an application. Your professors still remember who you are well enough to write useful and informative letters, and your academic skills are assumed to have not gotten rusty. Coming back after an extended period, however, might make the application harder to judge.

The greater difficulty in switching is that the longer you work, the harder it can become. You get used to a real paycheck instead of a subsistence wage. Who wants to move again, uprooting their life (friends, relationships, etc.)? And you probably start to become attached to your job and your co-workers in various ways. [Interestingly, the same sorts of issues can arise for people who are thinking about academic jobs vs. research labs/other jobs after their PhD.]

Happily, the student seemed to not need my most important advice -- that both possibilities offered him great opportunities for success and happiness, so he should not stress about making a choice that was "wrong".

Does anyone have further, general advice for those facing this decision?

Friday, November 06, 2009

Ranking

An anonymous commenter asked an insightful question, worthy of a real answer: "Hi Prof, Why are you so obsessed with ranking things?"*

Honestly, I don't think I am. I have 3 children, and I have thus far avoided assigning them a preference ordering.** If you asked me for my favorite TV shows (or movies, or songs, etc.) I could think of some off the top of my head, but I haven't ever thought hard about coming up with a list of favorites.*** Same with restaurants, food, vacation destinations, whatever. I don't spend my time giving rankings for Netflix or things like that. I could probably come up with rankings with some thought, but it's not like I go around ranking things constantly.

That is, I don't do that in my personal life. In my professional life, come to think about it, I spend an awful lot of my time ranking things. I serve on multiple program committees each year where I'm asked to rank papers. (And I send my papers to conferences, where they are in turn ranked, and my submission is, implicitly a ranking of sorts on the conference.) I serve on NSF panels to rank grants. I write letters of recommendation which, implicitly or explicitly, provide a ranking of students (and, occasionally, faculty). I interview and evaluate faculty candidates. I grade and assign grades in my classes, and similarly grade senior theses. I serve on a Harvard committee that decides undergraduate thesis prizes. And I'm sure if I thought it about some more, I could come up with even more examples.

My blog is meant to be a professional blog, about my professional life. If it seems that I'm obsessed about ranking, that is a reflection of my professional life. I am asked to rank a lot as part of my job.

So I think I can turn the question back -- why are all of you so obsessed with ranking, that I end up having to spend so much time doing it?

* This comment came up in my last post about the possibility of FOCS/STOC asymmetry, where ranking was at most a tangential concern. But my previous post was on ranking networking conferences, so I can understand where the comment comes from.
** That's meant to be humorous.
*** Well, that's perhaps not quite true. Any undergraduate who has taken my algorithms class can correctly tell you that my favorite TV show of all time is Buffy the Vampire Slayer, so I'd best admit to it before it comes up in the comments.

FOCS/STOC and Asymmetry

I had a funny conversation with Madhu Sudan yesterday, with him relaying an idea he said he heard from Umesh Vazirani (and perhaps the trail goes on further from there) -- roughly that FOCS should double in size and STOC should halve in size. Or, I guess vice versa -- the point is that right now the two are pretty symmetric, and it's not clear that's the best setup.

The idea (or my interpretation of it) is that in theory we could use a more selective "top" conference -- one that people felt they should really try to go to, even if they didn't have a paper in it, because it would have the major results from the year. Hence we halve one of the conferences and make it more selective (and, naturally, make it single-session, maybe have some special tutorials or other activities). At the same time, we don't want to lessen the number of papers that currently are taken in FOCS/STOC -- indeed, since the community (or at least the number of papers being written) has expanded, we should probably accept more. (So maybe people wouldn't feel the need to start yet more conferences, like ICS.) So we double the other. Again, this would be a conference that, ideally, more people would attend, because more people would have papers in it. Indeed, this could help get papers out of the system faster (instead of papers being resubmitted quite so frequently). By introducing asymmetry, perhaps we could make both conferences more appealing and better attended.

I pointed out that one community I know of already does this -- this is very similar to SIGCOMM and INFOCOM in networking. I think that model works, though there are certainly tensions and problems with it -- as you can see in the comments on my recent post on Ranking Networking Conferences. (Bigger conferences are more variable in quality, primarily; also, they require large-scale parallel sessions.) Again, we'd have asymmetry -- the larger conference might become perceived as "weaker", but it would play the important role of bringing the community together and being an outlet for more papers.

Interesting as though the idea is, I have trouble imagining the theory community moving in that direction. Big changes are always hard to get moving, and it's not clear how many people really think the current system is broken -- though the ICS movement clearly seemed to think something was wrong. I'd be willing to try it, myself, but of course I also like the "two-tiered" (or maybe 1.5-tiered) SIGCOMM/INFOCOM system.

Thursday, November 05, 2009

Harvard Financial Aid

This post will talk about Harvard's financial aid program, and why it's a perfectly good thing to give money to Harvard, despite what you might read in the New York Times.

I am motivated to write about this also because some weeks ago, I got into a blog-argument with some Chronicle of Higher Education writer who gave an incoherent argument that Harvard should have been spending its endowment increasing its undergraduate class size. (See the bottom of this post for the starting point if you want.) One point I argued was that Harvard had in fact been spending its endowment to make college more affordable through its financial aid program, and that that was probably doing more to open Harvard up to a wider talent pool than simply admitting more students would do.

Certainly one can argue whether teaching more students or making Harvard financially available to more students is a more important goal. But one thing that became clear is that that author, the author of the New York Times opinion piece, and I presume many other people, just don't understand the financial aid picture at Harvard. So I'll say something about it, that's actually based on facts and numbers.

Let me start with a back of the envelope calculation. (I recently got access to some official numbers, but they may be confidential, and the back of the envelope calculation is easy and accurate enough.) About 2/3 of Harvard undergraduates get financial aid from Harvard, and on average it covers about 2/3 of their tuition. That's approximately 4000 students, getting an average of about $35,000 per year in aid from Harvard, for about $140 million per year. Let's call it $125 million in case my numbers are off and to make the math easier.

Long-term endowment spending rates are about 5%. (This seems to be a standard rule across most major universities, but I haven't seen an economic analysis to explain this number. Please give pointers in the comments.) So Harvard's undergrad financial aid corresponds to roughly $2.5 billion of endowment money.

This is a much bigger proportion of the endowment than people realize. Usually people bandy about a figure of $27 billion or so post-crash for Harvard's endowment, but the endowment for the Faculty of Arts and Sciences -- that is, for the undergrads, as opposed to the law/business/medical/graduate/etc. schools -- is only about $11 billion. So Harvard is now using, by my estimates, well over 20% of its annual endowment spending (for FAS) for financial aid. I've argued in the past that Harvard should make itself tuition-free for undergraduates -- but even I'm impressed by and happy with these numbers.

Think of it this way: the projected deficit for FAS over the next few years, roughly speaking, could disappear entirely without any budget cuts if we just turned off financial aid. Of course that's a terrible idea, and financial aid is one area where Harvard, so far, is making sure not to cut. But that gives an idea of the scope.

So when I hear people say that Harvard isn't doing enough to open its educational doors, or suggesting that giving to Harvard is not morally sound, I admit I feel obliged to politely correct them. (Or, sometimes, less politely correct them.) If you believe that affordable education is important, there are of course many institutions deserving of support. Harvard remains on that list.

Tuesday, November 03, 2009

Conference Reviews

I promised at some point to get back to discussing the reviewing process for two conferences I am currently on the PC for, NSDI and LATIN. Since I happily just finished my "first drafts" of the reviews for both conferences, now seems like a good time. As usual, I've finished a bit early and most reviews are not yet in, so I'm writing this without benefit of seeing most of the other reviews yet.

I should point out that comparing NSDI and LATIN is definitely an apples and oranges comparison, and not just because one is systems and one is theory. LATIN is a "2nd tier" conference (and one would probably argue that was being polite), held every other year, with no specific theme other than theory; the acceptance rate is probably in the 25-35% range. That is not to say the papers are bad, but generally the papers generally utilize known techniques, and the question is whether the underlying question seems interesting, the paper was written well, etc. I'm not looking for papers that everyone would want to read; I'm looking for papers that I think somebody wants to read. Since interests vary greatly, I suspect there may be some substantial score deviations among reviewers, corresponding to different opinions about how interesting something is. I don't mean to sound negative about the conference; some very nice papers have appeared in LATIN, with my favorites including The LCA Problem Revisited, and On Clusters in Markov chains. But I don't think it's a first choice destination for many papers -- unless, of course, an author lives in Latin America or wants to go to Latin America.

NSDI is arguably a "1st tier" systems conference for networks/distributed systems. While it doesn't have the prestige of a SIGCOMM, it's certainly aiming at that level -- although I think perhaps even more than SIGCOMM there's a bit of bias at NSDI for concrete implementations demonstrating actual improvements. In the last two years the acceptance rate has dropped below 20% and I expect it to be there again. Generally I'm looking for a solid, well-explained idea or system design, with some experimental evidence to back up that the idea really could be useful. I admit I would prefer to have some definitions, equations, theorems, or at least well-structured arguments in these submissions -- this is something I push on regularly -- as for me these are highlights of having a well-explained idea, but a paper can still possibly be good without them (and sometimes a paper that is too theoretically oriented wanders too far off from reality, even for an open-minded idealist such as myself).

Now for concrete differences. For LATIN I only have 10 or so papers to review; there's a big PC and the meeting will all be electronic. I imagine I might get asked to read one or two more papers where the reviews don't agree but that's probably it. Most papers will probably have 3 reviews. There's a 0-5 point scale, from strong reject to strong accept, but no "percentages" assigned to the ratings. There's also a whole lot of other scores (originality, innovation, correctness, presentation, difficulty) I have to give that I think are overkill. Even though the number of papers is small, it seems a number of people are using outside reviewers. (I generally don't, unless I feel I'm so far from the area of the paper I need someone else to read it.) We're using Easychair, which these days seems passable, but is far from my favorite.

For NSDI, we have a first round of 20 or so papers. Each paper is getting 3 reviews in the first round, and then we'll probably cut the bottom X% (about 40-50%?). Everyone reviews their own papers. In the second round papers will probably get 1-2 more reviews (or more), and outside reviewers will be used if it's thought their expertise could help. (Usually the chairs, I believe, assign outside reviewers, often based on comments or suggestions by the first-round reviewers.) After the second round of reviews are in we have a face-to-face PC meeting. We're using the standard 1-5 networking scale with 1 being "bottom 50%", and 5 being "top 5%". I've actually found that helpful; I was going over my scores, realized I had bit less than 50% with scores of 1, and went back and decided that there were papers I was being a bit too generous to. (Giving scores of 1 is hard, but if everyone tries to follow the guidelines -- unless they really believe they had a well-above-average set of papers -- I find it makes things run much more smoothly.) We're using hotcrp, which I like much better than Easychair -- I can easily see from the first screen the other scores for each paper, the average over all reviews, how many other reviews have been completed, etc.

Once all the reviews are in, we'll see how things work beyond the mechanics.

Monday, November 02, 2009

ICS Papers Announced

As pointed out many places, the paper for the (strangely named) new theory conference Innovations in Computer Science are out, with the list here and list with abstracts here.

I suppose the future will tell how "innovative" these papers are compared to, say, the normal collection at FOCS/STOC/SODA. I'm not surprised to see the trendy areas of game theory and quantum fairly heavily represented. I was a bit shocked, however, to see a number of papers on what I would consider "mainstream" coding/information theory, in that I wouldn't be at all shocked to see papers with similar abstracts (but different authors) at say an International Symposium on Information Theory. The example nearest and dearest to me would have to be

Global Alignment of Molecular Sequences via Ancestral State Reconstruction
Authors: Alexandr Andoni, Constantinos Daskalakis, Avinatan Hassidim, Sebastien Roch

which, while sounding all biological, is really just studying trace reconstruction problems on a tree. I'm a fan of the under-studied trace reconstruction problem, as it's tied closely to insertion and deletion channels; I was a co-author on a paper on a different variant of the problem back in SODA 2008. (I also cover the problem in my survey on insertion/deletion channels.) I guess I'm glad to see that work on this very challenging problem is considered "innovative".

My Biased Coin