My Biased Coin: March 2009

Monday, March 30, 2009

PODS paper

I'm a co-author of a paper that will be appearing in PODS: An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets, by Kirsch, Mitzenmacher, Pietracaprina, Pucci, Upfal, and Vandin. This version isn't quite the final official version (which is due shortly), but should be much the same. I think this is my first official "database paper", which is very exciting. (I'm grateful to Eli Upfal for inviting me and Adam to work on this project.)

The motivation, roughly, is the following. The setting is a standard data mining problem: we have transactions, where each transaction is a set of items, and we are looking for interesting correlations. Suppose we have 1,000,000 transactions, on 1,000 items, where each item appears in 1/1,000 of the transactions. We observe that the pair of items (i,j) appears in 7 transactions. Is this a significant observation? If we compare to a "random model" where each item appears independently in each transaction with probability 1/1,000, then it might appear so; on average, the pair should appear in only 1 transaction. But the problem is there are almost 500,000 pairs of items. It's not surprising that some pair should appear 7 times; in fact, we'd expect about 50 such pairs. So if we were mining for common pairs, the count alone should not necessarily flag this pair as unusual; it would also matter how many such pairs with this count that we've seen.

The underlying issue is that we're doing multi-hypothesis testing. We're asking "do any pairs appear a statistically significant number of times", not "does this specific pair appear a statistically significant number of times." The first question is naturally harder to answer, since it also should involve the global behavior of all pairs of items, not just the counts from each pair.

Our theoretical starting point is therefore to look at the distributions we obtain on frequencies of subsets of items under our random model. Let Q_{k,s} be the number of itemsets of size k that appear together in at least s transactions in the random model. We can show that, for sufficiently large s (and under certain other conditions), Q_{k,s} is well-approximated by a Poisson random variable, using the Chen-Stein method. (As with all things probabilistic, that sounds very intuitive if we could just assume independence everywhere, but there are dependence issues to take care of -- hence the Chen-Stein method.) We can then leverage this to design statistical tests for finding interesting itemsets, based in part on how far the corresponding values in our observed data set are from the Poisson distribution.

The idea is to expand on the approach of "return all itemsets of size k that appear at least this threshold many times", both so that we have a more rigorous notion of what that threshold should be, and so that we can tell if there are enough such itemsets that what we're seeing is likely to be important. This appears to effectively cut down on the number of spurious itemsets we return, or what in the literature is called the False Discovery Rate. Hopefully, this approach will prove interesting enough for further exploration from data-miners.

Friday, March 27, 2009

FOCS Brief Descriptions

I too got Muthu's request to point to Umesh's writeup to motivate the brief descriptions for FOCS this year. But I was busy with family duties this morning, and didn't get a chance to say anything before all the other blogs did. And now it's all been said. But I'll link anyway, in case somebody's missed it.

Of all the posts, I'd say JeffE's should be required reading. Of course, I consider anything with the slightest hint of snarkiness from Jeff required reading, both for amusement value and to see what points he makes, and this post exceeds the minimum recommended daily allowance of snark.

Thursday, March 26, 2009

Parallel Session Madness

One of the last tasks I have as STOC chair is trying to organize the schedule, which involves the unpleasant subtasks of trying to group the talks into coherent sessions, and then figuring out what sessions go when, including what sessions will "compete" since we'll have two parallel sessions running. These tasks are frustrating since in some cases it's hard to find coherent subtasks, there are annoying constraints that arise (4-paper sessions should only run against other 4-paper sessions so there aren't holes; in general, an "algorithmic session" should run opposite a "complexity" session?) and decisions that people surely won't be happy with (nobody likes being opposite a "trendy" session or a "name" author, and nobody likes being placed in the last session of the day). And, of course, there aren't any tools for this (that I know of).

Last night, after already putting hours into this, it hit me -- slots should just be assigned randomly. Obviously this idea partly stems from my own frustration -- why am I spending time on a problem with no well-defined solution (and that I'll only hear complaints about afterward)? It has the advantage of making any "decisions" that people might find disagreeable an act of chance, instead of an act by me.

But I also thought of a good reason to do it this way. Categorizing and grouping talks in this way exacerbates the subdividing and splintering of theory into subgroups that know little to nothing about what everyone else is doing. Not working in property testing? Great, there's a whole session you can skip. Not so interested in computational geometry? Now you can sleep in for the morning session.

If FOCS/STOC are supposed to be flagship theory conferences each year, and they're the conferences where all these subgroups come together, maybe this session grouping is actually detrimental. Random mixing of papers could encourage people to stay in the room and see something new. As long as you're down there for the talk you wanted to see, you might stick around and see that next talk that would have been in a session you never would have gone to.

Yes, I know the price might be a few more times where two papers with a potential shared audience run at the same time. (Maybe we can still use that session information; find a random schedule that doesn't put two papers in the same nominal session at the same time.) But it seems like the payoff could outweigh this consideration.

Not that I'd really do this. I don't know of any conference that has. Although there is still time before I have to turn in the schedule. What would all of you think?

(PS -- yes, I know about the approach of having each talk given twice randomly, to help avoid conflicts -- nice in theory, not do-able in practice. :) )

Monday, March 23, 2009

SIGACT elections

SIGACT elections are coming up. Online voting starts April 1. It would seem very important to remind you that election day is coming up, except that apparently you'll have a couple of months to vote. (I admit I'm a bit confused about why the election site would "go live" three separate times, but that probably won't effect anything.)

I was surprised to see that we're actually having an election this year, in that there's more than 1 person running for chair, and many people running for Member at Large. (See all the various SIG slates here.) It's good to see that people are interested in serving, and I think it reflects a healthy community.

It would be fun to say I'm surprised to see that I'm running (for Member at Large), but it comes as no surprise. I did indeed volunteer myself. I'm not planning on campaigning other than mentioning it here. (It's not really the sort of job one campaigns for, is it?) There are seven people running, and it's an excellent list -- the others running are Cynthia Dwork, Anna Lysyanskaya, Toniann Pitassi, Michael Saks, Baruch Schieber, and Alan Selman -- I think SIGACT will be well served by any subset. If you think I'd be a good Member at Large, vote for me, and if not, vote for the people you think will do better.

As has been mentioned elsewhere, Madhu Sudan and Lance Fortnow are running for chair. Again, that's an excellent choice to have. Apparently, platforms will go up when the election site goes live, and I'll be curious to see what both of them have to say about possible plans for SIGACT in the future. I'd be happy to host a debate here, on the blog, though again, this isn't really a position where there's a big debate necessary. But that's a good issue for comments : what would you like to ask them? (I can always send them a list of questions/point them to the blog post, if only so they can see what is on people's minds.)

Friday, March 20, 2009

Midterm Blues

Yesterday, just before Harvard's spring break, I gave the midterm for my undergraduate class, and then we spent the afternoon grading it. Hopefully, students don't actually read my blog, so they won't be anxious and disappointed over break. Because on average, they didn't do well.

My midterms are commonly both difficult and long; most students don't finish, and historically the average has been around 65-70%. This year, the average was about 50%.

I'm looking for possible explanations. Here's what I've come up with so far.

1) The test was harder this year. There may be something to that -- some of the new questions I've never used before people did quite poorly on. On the other hand, I also think the class did statistically significantly worse on some of my old stand-by questions than in previous years. This may account for some of the discrepancy -- possibly the top students, who usually can do almost everything on the exam, might have obtained lower scores than usual because of this -- but I don't think it can explain this big a discrepancy.

2) I'm teaching worse this year. There may be something to that -- I am teaching two classes this semester and I do have a 9-month old to cope with -- but I doubt this is an explanation. (Obviously, I'm a bit biased here.) My class doesn't vary significantly from year to year -- same topics, same lecture notes this year as usual. Again, even if I think this had an effect, I don't think it can explain this big a discrepancy. (Similarly, I don't think the issue is the TAs I have this year -- they seem like a really great bunch.)

3) The students are doing less well this year. Unfortunately, here there's some evidence -- the average scores, on the whole, are noticeably lower this year than last year across all assignments so far. That could be variance in TA grading style, and the discrepancy on assignments is nowhere as large as it was with the midterm. But my thought is that perhaps this is an unfortunate flip side to class sizes increasing. The class enrollment has dropped from the mid-80s down to 80 -- but this is still over twice as many people as were taking it last year, and still my second largest class ever. Does larger class size (less self-selection) translate into lower averages?

4) Random chance. Seems unlikely.

5) External considerations outside my scope. Maybe more students are having to deal with other midterms the same day or week this year than usual. Small changes in the class schedule can have big effects this way, since many people have multiple classes in common.

Whatever the reason, it's a situation I'll have to pay some attention to. I may have to adjust the pace and style of the lectures, to better maximize what students learn. But for now, I've got the midterm blues. Yes, professors get them too.

Thursday, March 19, 2009

NSF Recovery Act

It's a good time to have proposals pending at the NSF. On the other hand, I suppose it will just be even more depressing for those who don't get funded this round.

There's also a chance your previously rejected proposals will get funded!

NSF also will consider proposals declined on or after October 1, 2008. The reversal of the decision to decline must be based on both the high quality of the reviews received on the initial submission and the lack of available funding at the time the original decision was made. The cognizant program officer will contact the institution when a reversal is being considered by NSF. Specific procedural information regarding this new process is available on the NSF Recovery website.

Anyone have any additional information to share on how (besides waiting) someone can get a rejected proposal reversed?

Tuesday, March 17, 2009

Would You Leave a Tenured Job?

As reported by Lance, Madhu Sudan is "leaving" MIT for Microsoft Research New England.

Of course, he's not really leaving his tenured position, but just going on leave. But it raises the question -- what would it take for someone (you?) to leave a tenured position? It does happen -- Micah Adler recently left U Mass to enjoy life as a serial entrepreneur -- but tenured positions are quite comfortable, and because of the apparent security tenure offers, it can be hard to leave a tenured position for anything except another one. Tenure is, in that sense, a double-edged sword; you have a position for life, but only if you stay where you are.

Mind you, there are other appealing aspects of academic life. And of course some less appealing ones. Perhaps part of the reason so few people ever give up tenure is that professor positions tend to attract people who enjoy being professors, who find the positives outweigh the negatives, and so they're not attracted to other possible positions or pursuits.

Sunday, March 15, 2009

Classes

Richard Lipton ends a recent blog post with the question:

My question today is how can we better educate students to solve problems, to apply knowledge across boundaries, and in general be better problem solvers? What do you think?

When talking with a visiting prospective graduate student, we discuss the fact that our requirements include taking 10 classes (recently reduced from 12). The student seemed to think that's a very large number of classes to have to take.

I'm a big believer in graduate students taking classes. I'm obviously biased having had to suffer through many requirements as a graduate student at Berkeley. And I freely admit, some of the classes felt like suffering.

But the classes provided me with a strong basis to build on. I found classes good for introducing me to the basics of an area, so that I could understand the problems and communicate with people in that area. Classes also introduced me a much larger group of people, working in a larger variety of areas, than I would have experienced on my own.

And while here I'm certainly talking here about theory people getting to know about things outside of theory (as well as systems people getting to know some theory and AI and other things as well), I think the argument holds as well solely for the increasingly balkanized theory community just knowing about what each other is doing. These days, there seem to be boundaries between different areas of theory that are as high as boundaries between theory and systems. (See, for example, this recent post and comments on "European theory".)

I've heard the arguments against classes. The biggest is that they take time away from research. There are plenty of counter-arguments. (Most students don't really spend 40+ hours a week on research. It's important to train for the long-term future career and not just focus on research. There are important other benefits of classes -- meeting people and being able, if needed, to teach in multiple areas.) Other complaints focus on specific implementations. (Why should I have to take a graduate OS class if I took one as an undergraduate? Fine, let's find something else worthwhile for you to take.)

I think one reason for the trend in reducing or even eliminating class requirements is students generally don't like taking classes, especially in areas not directly related to their research. Why should they? It's work they have to do right now, and the payoff is generally long-term. As educators, we should find ways to make this tradeoff better when possible -- but more importantly, as educators, we have to explain clearly to the students that this type of education really is in their long-term interest.

To my mind, classes (and class requirements) are still a powerful tool for educating students on how to solve problems, including across boundaries. The trend away from classes, which should include both "core knowledge" for computer science as well as opportunities (or even requirements) to explore connections between CS and other fields, does not benefit us.

Friday, March 13, 2009

Thinking Big (Memory)

I gave one of my standard programming assignments this year -- finding minimum spanning trees on big random graphs. It seems for many students this is the first time they've experienced the possibility of not having enough memory using a straightforward implementation; they actually have to think about memory issues.

Year by year I've slowly increased the size of the graphs I require from students. This year, I think I left it too small; some students got away without having to think too hard about the memory issues. I'll have to up it for next year. Part of the problem is that I leave implementation details -- like what language they program in -- up to them. My framework is that I care about them being able to get the results -- I don't care about the details like which language they use. On the other hand, this looseness leads to large variations in how much students have to worry about various things. Java can handle memory issues for you (at the expense of speed); your laptop might exhibit different performance (including whether or not it runs out of memory) than a campus account (which is where the code should eventually be able to run).

One issue that became big this year was students wanting to use Python. I tried to tell them it was a bad idea, but many students like the high-level language and the sleekness of the resulting code. Unfortunately, standard constructs in Python are memory hogs, causing crashes or impossible slowness. Many of the students who started in Python had to eventually rewrite their code in another language -- or failed to deal with graph sizes that the assignment asked for.

I didn't design the programming assignments in my algorithms class to teach students about programming. But I find these exercises really stretch some of their programming skills, making them think about questions (what's the best language to use? what's the bottleneck here -- time or memory? how can I debug/test my code effectively?) that they need to learn to think about sometime, and perhaps aren't dealt with sufficiently in introductory classes any more.

Wednesday, March 11, 2009

New Cuckoo Hashing Paper

Moni Naor nicely pointed me to a new preprint he has up on de-amortizing cuckoo hashing. (The paper is entitled De-amortized Cuckoo Hashing: Provable Worst-Case Performance and Experimental Results, by Arbitman, Naor, and Segev.)

The "starting point" of the paper was the technique for de-amortizing cuckoo hashing Adam Kirsch and I suggested in our 2007 Allerton paper, which involved using a content addressable memory as a queue for move operations when doing cuckoo hashing. (See the blog post I wrote way back when here.) Our work was experimental, as we didn't analyze the queueing process, which is based on the number of moves needed when doing an insertion.

Their new paper obtains provable results, showing that there a cuckoo hashing scheme that has constant worst-case insertion time (with high probability over any polynomial number of steps), by utilizing a logarithmic-sized spare space (to queue pending moves, and stash items that can't be placed in the table, a la the Kirsch/Mitzenmacher/Wieder stash technique described here). They even show that only poly-logarithmic independent hash functions suffice, using a new result from Braverman. The only big caveat is that they analyze only the case of 2-choice cuckoo hashing, where the structure of the underlying random graph related to the cuckoo hashing process is better understood. Because of this, the result only allows loads of up to 50%.

It's a very, very nice result with a lot of theoretical pieces to put together. The use of the Braverman result appears to be a general technique for lessening the randomness requirements of these sorts of problems that is interesting in its own right, never mind the useful advance in the theory of cuckoo hashing. I was also excited to see that they used simulations to enhance their theoretical results (for example, providing a conjecture on how small a constant number of moves would be required at each step -- apparently 3 suffices, a quite reasonable number!) and show off how practical the scheme is. Imagine, a theoretical paper with simulations! (Hooray!)

Overall I think the paper is a strong step forward in demonstrating the practical applicability of cuckoo hashing, while at the same time providing strong formal theoretical guarantees. I still believe cuckoo hashing (along with multiple-choice hashing) will become a (the?) standard approach for hashing in the years to come, so of course I'm excited to see results like this. There remain plenty of further open questions -- more than 2 choices? tight bounds on queue sizes? -- so don't worry, there are still interesting things left to do.

Tuesday, March 10, 2009

Liskov wins Turing Award

Here's the news from MIT.

Friday, March 06, 2009

STOC Registration server up

I'm told you can now register for STOC 2009. Please sign up! (Early date is April 28.)

Other STOC work behind the scenes at this point involves getting the ancillary stuff ready for the publisher (not a particularly pleasant part of the job) -- including the paper schedule. I hope to have a preliminary schedule up next week (or the week after) to help people plan their trips.

Thursday, March 05, 2009

A Note to Networking/Systems People

I am sure it sometimes seems on this blog that I'm "critical" of the theory community. I am! Because constructive criticism is, I think, a useful thing.

Today, I unleash some criticism of the networking community. This complaint is based on my own experience with my papers, as well as experience I have on the other side (as a PC member), on how for example data structure papers are thought of when submitted to these communities. Mind you, this criticism isn't universal -- several people seem willing to look at and accept more theoretical papers in these communities. But not everyone...

(Still, keep in mind, I'm taking an extreme point of view here, to make a point, which I understand may inspire some interesting comments.)

When a paper comes in to a systems/networks where the main contribution is a general data structure (in my papers, this has often been a hash table construction) that naturally could apply to various networking problems (e.g., hash tables are commonly used in MANY router-level applications), and the main point of the paper is how to better make these data structures work for these applications (i.e., we consider reducing accesses to slow memory, trying to get it down from say 2 to 1, or consider pin count/hardware parallelization tradeoffs), THIS IS A NETWORKING (OR SYSTEMS) PAPER. I often hear the argument "This is a general data structure, it belongs in a theory conference," and this is just wrong, for multiple reasons.

1) Please don't try to make generality a negative. I know in systems papers you are accustomed to seeing detailed results on a single application (and when a data structure paper also includes that, it can also be a very good thing). But generality itself is a good thing, not a bad thing. (For SIGCOMM this year, one of the review fields is "longevity" -- sometimes these papers aim to have an expected long lifetime, rather than focusing on the short-term impact of a particular application).
2) In the same vein, don't dismiss a paper just because in it's evaluation, it judges the data structure on raw performance (time per insertion/deletion/lookup in a hash table) and/or on simulated data. Yes, I know, by measuring performance at the data structure level instead of the application level we haven't shown this is better on "any particular" application -- but we have tried instead to show it's better for a wide range of present and future applications! (If we haven't demonstrated this effectively, then it makes more sense to reject.) Similarly, in many settings (e.g., hashing), all data is artificial. You may have to pick a good enough hash function for your data to look random (see e.g. Mitzenmacher/Vadhan), but then it doesn't matter what your initial data was.

The first two points can also be summarized the following way: your conference has page limits. We've decided to focus on using the pages available to show off the general, widely encompassing benefits of our approach; don't dismiss us just because we haven't dug into specific implementation details for a particular application (which can take pages to do, and months to program/gather data on, and really doesn't add nearly as much to the paper as what we've put in).

3) This type of work doesn't go to a theory conference: often, in such papers, there is little of "theoretical" interest -- these are engineering papers, not papers with new theoretical techniques or technically challenging theoretical proofs (when we have proofs in this type of paper). Theory people will wonder why we're caring about these details of the hardware/software/data model -- it's only interesting in the context of your applications.
4) Indeed, the point of writing such a paper is that we are looking (or trying to look) precisely at the application and systems issues. This paper is meant for you! In fact, sometimes on the theory side it seems the only way we can get you to notice a new and powerful technique is to write papers for your community, so please try to let us in -- it will speed up adoption of these new, powerful techniques, helping your community overall. (My examples are Bloom filters, d-left hashing, etc.)

One worry seems to be that you'll accept a paper that turns out not to have the payout you'd have expected. That will happen! (By the way, it's also happening with all the other less theoretical papers you're accepting now...) But you need to have a pipeline of these ideas coming in -- who knows where the next basic data structure idea that will have a lot of applications (e.g. a Bloom filter) will come from! I worry you're cutting off your pipeline.

I understand that these papers will face the same high bar that everyone else does to get into say SIGCOMM; that's not my complaint. But in multiple settings I've seen such papers essentially dismissed out of hand, which seems unfortunate. I often hear (and even promote!) the idea that theory people could be working on more practical things. When we do try to meet you halfway, leave the door open a little for us...

Wednesday, March 04, 2009

Journal Version : Two Choices + Bloom Filters

A paper that's been sitting around in the pipeline far too long is finally out in its journal version.

Using the Power of Two Choices to Improve Bloom Filters, by Lumetta/Mitzenmacher.

This paper pretty much came about from a dare. I was visiting my friend Steve Lumetta while at Allerton, and as he was ribbing me about all my papers being on either Two Choices or Bloom filters, he suggested I should be able to combine the two in one paper. It took a few minutes for me to work out that he wasn't entirely teasing me, he actually had an idea.

The basic idea is the following: instead of having one group of k hash functions we use to insert an item into a Bloom filter, we have two groups of hash functions. We insert items sequentially into the Bloom filter, but we use our power of two choices as follows: when we insert an item, we use the group of k hash functions that sets the smaller number of bits to 1 (since fewer 1s means fewer false positives). When checking if an item in the Bloom filter, we have to see if all corresponding bits are set to 1 for either group of k hash functions, since we don't know which group we used initially for each item.

It turns out you can analyze this scheme, and somewhat surprisingly to me, there are settings where it leads to improved false positive rates. That is, the gain you get from putting in fewer ones actually more than makes up for the fact that now you have to check two groups of k hash locations. We also explore even better schemes (based on better choosing which group to use for each item) that we can't analyze.

I admit I took this as a funny mathematical curiosity more than a practical scheme. And I couldn't resist the title. (If you know my work, it really does sound like a joke.) But someone saw the paper and ran with the idea: the follow-on SIGMETRICS paper Building high accuracy bloom filters using partitioned hashing gives a potentially practical variation. (It's not clear to me when you'd use this variation over a perfect-hash-function-based Bloom filter; I suppose their variation is more amenable to adding a small number of additional items on the fly.)

My hope is perhaps it will inspire other non-obvious valuable uses of the power of two choices.

Tuesday, March 03, 2009

Committee Games, Playing by the Rules, and Double-Blind

There's a phenomenon I've noticed in committee meetings. To avoid any issues related to discussing PCs, I'll discuss it in relation to a thesis prize committee I've served on multiple times at Harvard. It's very much like a PC -- usually about a 1/3 of the theses nominated receive the prize.

We're given a bunch of senior theses to read, and asked the score them on a 1-5 scale, with 3 being the nominal desired average. Obviously, it's very hard to give any of these theses a 1 or even a 2 -- faculty are generally good about being selective about nominating a thesis for this prize -- but if you want to give a 3 average, you have to make tough decisions.

At the meeting, your average on your theses is compared to the average of the other reviewers on your theses, and your scores are re-scaled accordingly. So if you worked to maintain a 3 average, but the other reviewers on average gave your theses a 3.5, those theses you read would get a higher "scaled score". Similarly, if you gave your theses a 4 average, they would get a lower scaled score.

Now, even with the scaled scores, I find in these meetings that when someone has given a much higher average than their peers, their scores are discounted even further. Perhaps they couldn't manage to be tough and give a low score to some theses, and they thought they were being nice by giving high scores (which the students never see) to everyone, but in the end it often makes things worse for the theses they've read. There seems to be a higher-level phenomenon going on here, where committee members who don't play by the rules have their theses suffer some extra punishment. Unconsciously or consciously, the mindset seems to be that if you couldn't be bothered to make the tough decisions before the meeting, why should your judgment be trusted as much as others who did?

There can also be a similar effect, where those who give scores close to a 3 average but find their theses overall have a much higher average get appear to have their theses get a slight bump even beyond the scaled score. The theses they read are sometimes rewarded for their playing by the rules.

[I should point out that I have no documented evidence of this effect. I could be imagining it all. I'm just relaying my perceived experience.]

I was originally going to bring this up as a point related to the 1-5 scale I used for STOC. The scale, I think, is naturally self-reinforcing; people who clearly go outside the rules (for whatever reason) by giving higher scores are easily seen, and unless evidence from other reviewers supports that they had exceptionally good papers, their papers will have no benefit -- in fact, they may even suffer a little.

But this also seems relevant to the discussion of what would happen if double-blind reviewing was introduced. People keep pointing out that, assuming papers were posted somewhere, double-blind reviewing wouldn't help anything. But I think it still could. PC members who tried to discuss papers making use of the authors' names would not be playing by the rules; my guess is there would be sufficient reaction from other PC members to squelch such behavior. Indeed, it may even be enough to discourage at least some people from going to look at who wrote the paper. While I'd understand if a PC member happened to know who wrote some of the papers based on their own personal knowledge, I admit I'd be troubled by a PC member who, knowing the rules were for a double-blind PC, felt the need to go outside the rules and find the authors for all of their papers. Would that cause me (and others) to discount their scores, to make up for the implicit bias they were bringing into the system? It would be interesting to find out.

[Side note: I'm greatly enjoying the debates on double-blind reviewing. And honestly, I can't say I think I know what the best answers to the underlying questions are. But I think it's great that the community is having an open, challenging discussion about conference reviewing processes; that's one of the things these blogs are for.]

Sunday, March 01, 2009

Double-blind reviewing

The issue of double-blind reviewing came up a number of times when I was blogging about the STOC PC. I had wanted to wait and write a well-thought out post on the subject, and happily, was beaten to it by Sorelle, who was written an excellent post outlining the various arguments and counter-arguments (with more in the comments). Sorelle has asked this conversation to continue, so I'm continuing it here.

My primary concern against double-blind reviewing is what restrictions is puts on author dissemination. I have limited experience with double-blind reviewing as an author, and my impression was that authors were not to disseminate their work in any public way when submitting to a conference with double-blind reviewing. Perhaps times have changed, or my memory/understanding was faulty, but such strict dissemination restrictions do not seem to be the default.

For SIGCOMM (which is double-blind) the important statements in the call seem to be:

"As an author, you are required to make a good faith effort to preserve the anonymity of your submission..."

but at the same time,

"The requirement for anonymity is not meant to extend beyond the submission process to the detriment of your research. In particular, you may circulate your submission among colleagues or discuss it on a mailing list if you see fit. "

I wish this second statement was clearer, but my understanding (and apparently the understanding of at least some others) is this wouldn't prevent an author from putting the paper on their web page or on arxiv or giving a talk somewhere about it.

If we have this understanding, we should move to double-blind reviewing. To try to limit myself to something new in the argument -- again, go read Sorelle's post and comments! as a baseline -- let me say that after serving as a PC chair, it's absolutely clear to me that biases based on the authors' names do regularly enter into the picture. Let me take a hypothetical situation. Three initial reviewers give bad reviews to a paper. Another PC member takes a look -- because they know the author -- and argues that they've heard a bit about the work, that the reviewers might not realize how this is tying together two areas (quantum and e-commerce auctions! -- again, that's hypothetical) in a novel way, and think an outside reviewer is in order.

At this point, bias has almost surely entered the picture -- unless you think with high probability the other PC member would have noticed this paper without the name on it. It's a limited and seemingly harmless entry of bias -- all that's being asked is someone else look at the paper! -- but what about all the other papers that aren't going to get a second (well, fourth) look because the "right person's" name isn't on the paper? In my mind, it's bias pure and simple. At this point, it seems to me, if you want to argue against double-blind reviewing you have to argue that the cost of this solution is outweighed by the actual benefit. (Hence my concern about the "costs" of limiting authors' ability to disseminate -- which for now I'm understanding is a less severe issue than I might have originally thought.)

And by the way, I think there are much more severe examples of bias that regularly come to play in PC meetings. Much of this readily admitted, by people who give the argument that we in fact should give more trust to "name" authors that unknown authors, since they have a potentially bigger loss of reputation to protect. (See comments at Sorelle's page for those that give this argument.) People presenting this argument are at least being open that in their mind they are considreing a cost-benefit analysis on the bias they admit is there. I don't think this argument holds up -- though we can leave that discussion for the comments if needed.

To wrap up, I should say that I personally don't recall a setting where I thought there was specific bias against female authors in a theory PC, which was an issue Sorelle brought up. I'm not saying it doesn't exist, but that unlike many other clear issues of bias that I have seen, I can't recall a specific one where I felt sexism was playing a role. Of course that doesn't change my opinion that double-blind reviewing, implemented suitably, would be of positive benefit for our major conferences. And if there is bias against female authors, an additional benefit is it would likely mitigate that as well.

My Biased Coin