My Biased Coin: December 2009

Wednesday, December 30, 2009

New Year's Again

It's time for the annual New Year posting. Last year, I talked about the potential power of affirmations -- in the sense of thinking about concrete goals you hope to accomplish for the year. This year, I'll work in the same theme, but I'll be a bit more specific, and specifically for graduate students.

Now is a good time for graduate students to ask themselves some important questions:

1) If you're planning on graduating this coming semester (or the end of summer), what do you need to do to get there? Do you have a timeline -- with a few extra weeks of padding for unforeseen delays or writer's block? Are you on top of your job search? What can you do to make the thesis and job search go more smoothly? (Hint -- how can you make things easier on your advisor, faculty readers, letter-writers, etc.?)

It's a challenging and scary time -- but you can and will do it, and graduating will feel good. Very, very good.

2) If you're not graduating this year, are you ready to commit to graduating next year? Now is a great time to talk to your advisor and plan what needs to be done to get you out the door within 18 months. (Almost surely your advisor will be thrilled to see such initiative -- even if they don't think you're ready...)

There's an old saying that there's two types of students -- those who are 6 months from graduating and those who are 2 years+ from graduating. The point is that it's mindset -- when you start thinking you're ready to graduate, you'll start moving toward graduating. Are you ready to start thinking that way?

3) If you're early on in the process, what can you do to make this year a good one? Perhaps you can start a collaboration with someone somewhat outside your area -- generally a good experience -- through a class project this semester or by just talking to people about what they're up to. Figure out ways you can talk to people beside your advisor, like going to seminars or serving as a "grad student host" for visitors. Also, now is the time to be applying for summer internships.

4) Finally, if you've found yourself struggling with graduate school, and wondering if you've made the right choice, now is the time for some careful thinking. Think about what you want, possibly talk about it with friends and colleagues -- and then talk to your advisor. Maybe you and your advisor can come up with a plan to make things better. Or maybe you're better off leaving (with a useful Master's Degree), and now is the right time to look for jobs and finish off remaining projects before the end of the academic year. It can be much better to ponder this all now rather than wait until summer nears and realize your options are limited.

Whatever you're up to, I wish you all a happy, healthy, successful 2010.

Wednesday, December 23, 2009

New Result : Tight Asymptotic Bounds for the Deletion Channel with Small Deletion Probabilities

Posting will be light over winter break, in part because so little is going on, but more because I'm busy working on papers for upcoming deadlines. I'll describe a fun little nugget, which is being submitted to ISIT, joint work with Adam Kalai and Madhu Sudan. (The starting point for the result was my giving my survey talk on deletion channels at MSRNE... always nice when something like that works out!) The goal was to get better bounds on the capacity of the deletion channel. (If you haven't been reading the blog long enough: In a binary deletion channel, n bits are sent, and the channel deletes each bit independently with probability p. So, for example, the message sent might be 00110011 and the received message could be 010011 if the 2nd and 4th bits were deleted.) It was known that for deletion probability p the capacity was at least 1-H(p). We show an essentially tight upper bound of 1-(1-o(1))H(p), where the o(1) term goes to 0 as p goes to 0. Here's the draft (a full two weeks before the submission deadline!).

In English, the binary deletion channel looks very much like a standard binary symmetric error channel when p is small. This is not the case when p is larger. (Here's a link to my survey on deletion channels and related channels for more info.)

Here's an attempt at describing the intuition. Let's first look back at the error channel. Suppose we had a code of N codewords each with n bits and a perfect decoder for at most pn errors. Then here's a funny way I could store data -- instead of storing n bits directly, I could store a codeword with pn errors that I introduce into it. To get back my data, I decode. Notice that when I decode, I automatically also determine the locations where the errors were introduced. This gives me N*{n choose pn} \approx N2^{nH(p)} possibilities, each of which I can use to represent a different data sequence. Since I'm only storing n bits, I better have N2^{nH(p)} <= 2^n, or else I've found a way to store more than n bits of data into n bits. So (log N)/n, or the rate, satisfies (log N)/n <= (1-H(p)). This is a different way of thinking about the Shannon upper bound on capacity. Of course, it's sweeping away details -- like what if you don't have a perfect decoder -- but it gives the right sort of insight into the bound. Now consider the deletion channel, and apply the same sort of reasoning. Suppose that we had a decoder for the deletion channel, and further, we had a method of determining which bits were deleted given the received string. Then we could use it to store data in the same way as above and obtain a similar (1-H(p)) upper bound on the rate. Now we have to worry about the details -- like what to do when you don't have a perfect decoder. But more importantly, we have to show that, most of the time with non-trivial probability, you can use the decoder to guess which bits were deleted. (This is where we use the fact that p is going to 0.) The details work out. Surprisingly, this doesn't seem to have been known before. The best (only) upper bound I know for this case previously was the work by Fertonani and Duman, mentioned in this blog post. Their upper bound as p goes to 0 was of the form 1 - cp for some constant c, so it was different in kind.

Slowly but surely, the mysteries of the deletion channel become, well, less mysterious.

Friday, December 18, 2009

Text-book Algorithms at SODA (Guest Post, Mikkel Thorup)

Mikkel Thorup sent in the following guest post:

Text-book algorithms at SODA

This is a pitch for promoting good text-book algorithms at SODA. Erdos promoted book proofs, but book algorithms are in some sense far more important in that they could end up being understood and used by every programmer with a degree in CS. This can yield a huge external impact, and I do think we do ourselves and the world a big favor taking this work seriously. Instead of taking papers on this theme (which would, incidentally, be a great idea), perhaps the area could serve as the basis for a lighter afternoon entertainment session, providing cool stuff that one could take home and show students.

To me the greatest text-book algorithms work well in both theory and practice. They have a cool non-obvious idea that will impress the students, yet, after first you get the idea, they are simple to understand and implement. Unfortunately, to get into a top conference, it is best if you also have 5-10 pages worth of complications. Typically the complications are not themselves that interesting, but they are helpful in making the paper look hard enough; otherwise some referees are bound to complain about lack of meat (fat?).

Note that by insisting on complications, we narrow the contributers to the small pond of theoreticians. Simple efficient algorithms are sought by every smart practitioner, and it no coincidence that many of the elegant algorithms theorists analyze are discovered outside theory. On the other hand, I do think theorists are the ones in the best position to develop great simple algorithms thanks to our fundamental understanding, and I think we should celebrate it when it
happens.

To be more clear, let me present a somewhat controversial example of what I consider a great text-book contribution which is buried in a paper [DHKP97] about very different issues. The example is for universal hashing (low collision probability) where the new scheme is simpler and much faster than the classic method.

Suppose we want universal hashing of w-bit keys to u-bit indices. The classic solution is to pick a prime p>2^w, and a random a in [p], and then use the hash function

h_a(x) = ((ax) mod p) mod 2^u --- math terminology.

The alternative from [DHKP97] is to let b be a random odd w-bit number and use

h_b(x) = ((bx) mod 2^w) div 2^(w-u) --- math terminology.

To prove that this is universal is a nice text book exercise using that odd numbers are relative prime to powers of two.

There may be no obvious advantage of one scheme over the other on an established theory model like the unit-cost RAM, but the difference is major in reality. Implementing the scheme from [DHKP97] is extremely simple with standard portable C-code. We exploit that C-multiplication (*) of unsigned w-bit numbers is done mod 2^w, and get

h_b(x) = (b*x) >> (w-u) --- C code.

By comparison, the implementation of the classic scheme is problematic. One issue is that the mod operation is very slow, and has been so for more than 30 years. Already when Carter and Wegman introduced universal hashing at STOC'77, they were aware of the issue.
They suggested using Mersenne primes (p=2^i-1) allowing us to bypass mod p with some faster bit-trick operations. Even using that, we still have the issue that the classic scheme requires us to compute ax exactly, and ax has more than 2w bits. Since w-bit multiplication is mod 2^w, we need 6 w-bit multiplications to compute ax in its full length, and that is even ignoring the issue of mod p. If 2w-bit multiplication is available, it suffices with two multiplications, but these are often more expensive than w-bit multiplication.

The impact of the scheme from [DHKP97] is big in that it unites theory and practice in what is probably the world's most common non-trivial inner loop. The classic prime based scheme is so slow that practitioners have come up with all kinds of alternatives that are not even remotely universal, e.g., some combination of shifts and xor, hence no entropy. The new scheme is faster than all these hacks, so now we can convince practitioners to use real universal hashing, often leading them to better more robust results.

Is the above theory? It is certainly involves a mathematical observation about the use of relative primality, and I like to think of algorithms as math with mathematically well-defined impact on computing. To get a mathematically well-defined measure, we can, for example, look at how many operations are needed in C, which has been used for efficient portable code since 1972. A theoretical irritant is that we have a whole array of measures, e.g., depending on how we count 2w-bit multiplication and mod-operations. However, the new scheme is clearly better: the variations only affect exactly how much it is better---some factor between 3 and 15.

It is a bit interesting to contrast the above situation with, say, the more robust world of polytime approximation with, say, a very well-defined difference between a worst-case factor 2 and 3. Translating to reality, if the polytime algorithm is superquadratic, it is typically too slow to finish on large scale problems. Moreover, one often gets much better results using simple heuristics with bad worst-case behavior.

For the hashing we are not just talking worst-case but all-cases (same instructions performed on all keys), and I have never tried a real computer on which the new scheme didn't gain at least a factor 4 in speed compared with the classic scheme tuned with Mersenne primes. On top of that, the new scheme is much simpler to implement. While this difference is very convincing for anyone experienced with efficient programming, it may be a bit hard to appreciate for "THEORY of Computing" conferences like STOC/FOCS. However, I see algorithms and SODA more as a "Theory of COMPUTING" with a scope closer to the reality of computing, hence with a bigger interest in text-book algorithms that unite theory and practice. Highlighting such simple, but incredibly useful, practical computing algorithms would both increase the impact of SODA (and arguably theory more generally) and provide a useful distinguishing characteristic for the conference.

[DHKP97] M. Dietzfelbinger, T. Hagerup, J. Katajainen, and M. Penttonen.
A reliable randomized algorithm for the closest-pair problem.
J. Algorithms, 25:19-51, 1997.

Wednesday, December 16, 2009

Avoiding Travel

I'm spending the morning as part of the committee for a thesis defense over in Europe. I'm watching the talk over Skype; we're using a conference call for sound (and as a backup); I have copies of the slides on my laptop.

Is it the same as being there? No. But it's probably 90+% as good in terms of the experience of listening to the defense. It saves me a full day of plane travel (never mind other time overhead), it saves the institution multiple thousands of dollars in air fare and other travel expenses, and if you happen to feel that flying has negative externalities due to greenhouse gas emissions, then it's good for other reasons as well.

If the timing had worked out better, I might have made the trip, and arranged to give some talks to amortize the "travel cost" over more than the defense. But I'm glad to avoid the travel -- thank goodness for modern technology. And as much as I enjoyed the NSDI PC meeting Monday, if it hadn't been down the block at MIT, I would have enjoyed it much less. (Indeed, the location of the meeting was one of the incentives to accept the PC invitation.) I'm still waiting for the tech to get good enough so we can have an online PC meeting (w/video, sound, etc. to mimic the face-to-face discussions of a face-to-face meeting) that we don't have to travel to.

Monday, December 14, 2009

LiveBlogging NSDI PC Meeting

My post today is live-blogging the NSDI PC Meeting -- with a delay for security purposes, of course.

My take on the reviews (and from past experience) is that the NSDI PC is a very, very tough committee. People are looking for exciting and novel ideas, with clear and detailed experiments demonstrating real-world benefits (which usually means comparing against a real implementation from previous work). It's hard to get all that into one paper -- and to get everything so that the reviewers are all happy. And once in a while you can run into a reviewer like me, who expects your "good idea" to also have a suitable mathematical formulation when that makes sense. (If you're claiming to optimize something, I -- and others -- want a clear notion of what you're trying to optimize, and why your idea should help optimize it.)

So it's not surprising that, 4th paper in from the top, we've already hit our first paper where we're deferring our decision instead of accepting, and we're already getting some detailed discussions on whether a paper is good enough to accept. We'll have to speed up to make dinner....

As if to underscore that I did not have a great set of papers, I have just a few that I reviewed in our "DiscussFirst" pile, which takes us through lunch. Good thing I can keep busy with blog entries. And I have a review to write for another conference...

My submission appears (tentatively) accepted. Hooray! For this PC, we're not kicking people out of the room for conflicts -- people are just supposed to be keeping their mouths shut on papers where they have a conflict. For PC members with papers, however, you get kicked out of the room. So I've just spent a tense 15 minutes or so outside, but I'm happy to see the news once I'm back in. (More on this paper in another post....) Overall, I'd say (as expected) PC papers had no special treatment -- they were as harshly judged as all the other papers.

We're now having an interesting discussion about "experience" papers -- what do you learn after building/running a system after several years? A lot of people really think that having experience papers is a good idea, but there's some discussion of the bar -- what makes such papers interesting, and how interesting should they be? (Good anecdotes, but with quantifiable data to support the lessons.)

We're now about in the middle of the papers we're meant to discuss. Things here could go either way. Lots of technical discussion. As an aside, I can give my point of view on what are "hot topics". Data centers seems to be a big topic. There seemed to be a number of papers about scheduling/optimizing/choosing the right configuration in cloud computing systems -- how that could be done without making the programmer explicitly figuring out what configuration to use (but just give hints, or have the tools figure it out automatically). There's a significant amount of EE-focused papers -- essentially, trying to gains with some detailed, explicit look at the wireless signal, for example.

Headed to the end, more or less on schedule. Now we're assigning shepherds to all of the papers.

Sorry to say, I won't be revealing any information on specific papers -- everyone will find out when the "official" messages go out, or from their own connections on the PC...

I think the PC chairs have done a good job pushing us to keep on schedule; I think the discussions have been detailed and interesting. I think the committee is perhaps overly harsh (a target of 30 papers for about 175 submissions, or 17-18% acceptance; we ended up with 29). But I think we did a good job overall, and have earned our post-meeting dinner out.

Friday, December 11, 2009

Harvard Governance

Harry Lewis (and Fred Abernathy) take a stand against the Harvard Corporation. Required reading for anyone associated with Harvard, or interested in its current predicaments.

Thursday, December 10, 2009

News Items

The people at Microsoft Research New England are excited to announce that Boaz Barak will be joining them. I imagine the situation is similar to their hiring of Madhu Sudan. They continue to build up a remarkable collection of researchers.

Harvard's Allston complex is officially suspended.

JeffE doesn't post so much these days, so you might have missed this great post with his favorite useless theorem. (I hope this starts a series of "Favorite Useless Theorems" -- please send him, or me, if you have examples of your own.)

Nela Rybowicz, staff senior editor for IEEE (and who I've dealt with for pretty much all of my Transactions on Information Theory papers), informed me that she'll soon be retiring. She'll be missed.

Wednesday, December 09, 2009

TSA Oops

You may have heard on the news that the TSA (temporarily) put up their Screening Management Standard Operating Procedure on the web. As pointed out on this blog (and elsewhere), they made a small error:

"So the decision to publish it on the Internet is probably a questionable one. On top of that, however, is where the real idiocy shines. They chose to publish a redacted version of the document, hiding all the super-important stuff from the public. But they apparently don’t understand how redaction works in the electronic document world. See, rather than actually removing the offending text from the document they just drew a black box on top of it. Turns out that PDF documents don’t really care about the black box like that and the actual content of the document is still in the file."

Oops. Apparently, somebody hasn't read Harry Lewis's book Blown to Bits; the start of Chapter 3 discusses several similar cases where somebody thought they had redacted something from a PDF file... but didn't.

Sunday, December 06, 2009

Faculty Ratios

Perhaps the most basic breakdown one can make of a college faculty is into the categories Natural Sciences / Social Sciences / Humanities. (Yes, one could naturally think of "engineering" as a fourth category separate from "natural sciences"; feel free to do so.) So here are some basic questions:

1. What is the faculty breakdown in your institution into these three categories?
2. Do you think that breakdown is appropriate?
3. What do you think the breakdown will be (or should be) ten years from now? Twenty years from now?
4. Do you think your administration is thinking in these terms? Do you think they have a plan to get there? (And if so, are they open about it -- is the faculty involved in these decisions?)

[These questions were inspired by a comment of Harry Lewis over at Shots in the Dark.]

Friday, December 04, 2009

Retirement

No, not for me. But Harvard has announced its plans to encourage faculty to retire. I won't call it an "early retirement" package, since it is for people over 65. Though I suppose that is early for academia.

Note that (according to the article) Harvard's Faculty of Arts and Sciences will have 127 offers to its 720 (junior and senior) faculty. So I conclude (as of next year) 1/6 of Harvard's faculty would be over 65. And the article states that the average age of tenured Harvard professors is 56. Can people from elsewhere tell me if this is unusual? Harvard has a reputation for having an older faculty; this seems to confirm it. Does his suggest a lack of planning somewhere along the line? Or is this a good thing?

I don't expect drastic changes arising from this plan; it will be interesting to see how many faculty take the offer. In general, however, is it a good idea for a university to "encourage" retirement for older faculty? And if so, what means should they use to do it?

Viewed as an optimization problem, one can ask what is the "best" distribution of faculty ages at a university, and what mechanisms could (and should) be used to maintain that distribution?

Thursday, December 03, 2009

Tight Thresholds for Cuckoo Hashing via XORSAT

As I promised sometime back, we now have a paper (up on the arxiv) giving the thresholds for cuckoo hashing, a problem that had been open and was part of my survey (pdf) on open problems in cuckoo hashing.

One interesting thing about our paper is that our (or at least "my") take is that, actually, the thresholds were right there, but people hadn't put all the pieces together. The argument is pretty easy to sketch without any actual equations.

In cuckoo hashing, we have n keys, m > n buckets, and each key is hashed to k possible buckets. The goal is to put each key into one of its k buckets, with each bucket holding at most one key. We can represent this as a random hypergraph, with each key being a hyperedge consisting of k buckets, represented by vertices; our goal is to "orient" each edge to one of its vertices. A natural first step is to repeatedly "peel off" any vertex that doesn't already have an edge oriented to it and has exactly one adjacent unoriented edge, and orient that edge toward it. At the end of this process, we're left with what is called the 2-core of the random hypergraph; every vertex has 2 adjacent edges. Can we orient the remaining edges of the 2-core? In particular, we're interested in the threshold behavior; as m,n grow large, what initial ratio m/n is required to have a suitable mapping of keys to buckets with high probability. (This corresponds to the memory overhead of cuckoo hashing.)

Now consider the random k-XORSAT problem, where we have m variables and n clauses, where each clause is randomly taken from the possible clauses
x_{a_1} + x_{a_2} + ... + x_{a_k} = b_a.
Here the x_{a_i} are distinct variables from the m variables, b_a is a random bit, and addition is modulo 2. The question is whether a random k-XORSAT problem has a solution. Again, we can put this problem in the language of random hypergraphs, with each clause being a hyperedge of k variables, represented by vertices. Again, we can start by oriented edges to vertices as we did with the cuckoo hashing representation; here, a clause has an associated orientation to a variable if that variable is "free" to take on the value to make sure that clause is satisfied. Is the remaining formula represented by the 2-core satisfiable?

The correspondence between the two problems is almost complete. To finish it off, we notice (well, Martin Dietzfelbinger and Rasmus Pagh noticed, in their ICALP paper last year) that if we have a solution to the k-XORSAT problem, there must be a permutation mapping keys to buckets in the corresponding cuckoo hashing problem. This is because if the k-XORSAT problem has a solution, the corresponding matrix for the graph has full rank, which means there must be a submatrix with non-zero determinant, and hence somewhere in the expansion of the determinant there's a non-zero term, which corresponds to an appropriate mapping.

And the threshold behavior of random k-XORSAT problems is known (using the second moment method -- see, for example, this paper by Dubois and Mandler.)

While in some sense it was disappointing that the result was actually right there all along, I was happy to nail down the threshold behavior. In order to add something more to the discussion, our paper does a few additional things.

First, we consider irregular cuckoo hashing (in the spirit of irregular low-density parity-check codes). Suppose that you allow that, on average, your keys have 3.5 buckets. How should you distribute buckets to keys, and what's the threshold behavior? We answer these questions.

Second, what if you have buckets that can hold more than one key? We have a conjecture about the appropriate threshold, and provide evidence for it, using a new fast algorithm for assigning keys to buckets (faster than a matching algorithm).

I should point out that since I originally "announced" we had this result two other papers have appeared that also have nailed down the cuckoo hashing threshold (Frieze/Melsted ; and Fountoulakis/Panagiotou). Each seems to have different takes on the problem.

My Biased Coin