October 6, 2005

Google: Organizing the World’s Information… and Loving It!

October 6, 2005 - Categories: -Uncategorized

The Google recruitment talk was given by John Abd-El-Malek (jam@google.com, abdelMAHLik). Other engineers were also around for the question and answer: Amit Agarwal, Tim James, Jon McAlister, Peter Szulczewski, Joel Zacaharias. There were two women from HR whose names I didn't catch.

Google's mission statement is to organize the world's information and make it universally accessible and useful. The presentation covered the following points:

- build systems for scalability - harness the power of data - innovating new applications - managing fast-paced growth - looking ahead to the future

Google works on a massive scale

Google faces the following challenges:

- hardware and networking: build a basic computing platform with low cost - big distributed systems: create reliable systems from many individual machines - algorithms, data structures: process data efficiently and flexibly - machine learning, information retrieval: improve quality of search by analyzing lots of data - user interfaces: design effective ui for search and other products

Large data set, simple structure. Key insight: Google works with large data sets with simple structure. For example, web page repositories, query logs, status records from thousands of machines, source code control and software build records, etc. These aren't stored in SQL databases because they're too large for DBMSes (terabytes of data!) and they don't need the full complexity of a DBMS. Simple statistical analysis. Often, analyses of data tend to be simple. General statistical analysis often only requires computing small number of statistics, then performing more complex operations using only these statistics. For example, if we're trying to find the most popular query, we don't need to check all the queries. Data as a sequence of records. For commutative operations, record order is irrelevant (example: addition). For associative operations, aggregation order is irrelevant (example: finding the maximum). This allows you to write parallel programs to take advantage of Google's distributed computing power. For example, consider a week of code submissions. This short program calculates the minute for one entry and emits an instruction to add one to the record for that minute. The emit statements are delivered to an aggregator, which then combines the results into a graph. (As you can see, we do have weekends.)

(Demo followed by a totally awesome video of query traffic represented as points of light on a map of the Earth.)

Harnessing the power of data

The conventional wisdom is that given an order of magnitude increase in computational power, you can solve previously impractical problems.

Google's insight: Given an order of magnitude increase in data, you can solve previously unsolvable problems!

It's not just about getting a more robust solution. Some methods that appear to fail with limited data works with much larger data sets.

Consider spelling correction. The old way was to use a lexicon/dictionary - 100k words. This allows you to suggest correction words that have a short edit distance from unrecognized words. What's the challenge? Proper names, which are rarely in lexicons. Example: Kofi Annan.

The set of terms on web is much larger than standard lexicons and changes regularly. People misspell queries, even popular ones such as "britney spears". Dictionary-based spelling correction has problems with context.

(Points out funny contrast between britney spears and briney spears (asparagus).)

Solution? Use the web as a contextual lexicon. Find misspellings based on contextual usage on web. Build a probabilistic model of term spellings. Context is key.

You can also find interesting patterns in data. For example, here are the most popular queries from the past few Januarys. (Points out Superbowl, points out one year when Janet Jackson and "superbowl halftime" topped the Google queries.)

Google Maps

Example: Google Maps. Revolutionary update because it's dynamic, clean rendering. Open API for developers.

Making it all work

- Plenty of crazy hacks to make it work across browsers - Mozilla/Safari/Opera don't support vector markup. Draw driving directions on server in a PNG image and overlay it - IE does not support alpha transparency in PNGs. Use a little known ActiveX control that's enabled by default - Safari and Opera don't suppot parsing XML strings, so we wrote an XML parser in JavaScript (no joke)

The benefit of DHTML: Simple API

- Putting map on page requires only two lines of JavaScript: - Initially designed to integrate - Developers figured this out before we published API

http://www.scipionus.com/katrina.html . Wow. Leaving messages overlaid on a map. Good idea.

Automatic machine translation

Goal: Provide automatic high-uality translations of text between different languages Enables all text data on the web to be accessible in any language no matter what the language of original text Approach: statistical machine translation. Build a statistical model of translation. Use decision theory to make optimal decisions. Sentence-by-sentence level.

Pre-translated pairs of text to learn parameters of log-linear model.

Throw statistics at the problem! BLEU% score: how closely machine translation similar to human translation Outperformed Chinese-English translation and Arabic-English translation. Why Chinese and Arabic? They're very different from English. If we can do these languages, then it would be easier to do Spanish and French.

BLEU% for Arabic-English translation as amount of data for language model is increased. weblm: learning model trained on 219B words of web data! Seems roughly logarithmic.

Google Desktop

Google Desktop APIs: Indexer, Query, Sidebar, Event API. More info at http://desktop.google.com/developerguide.html . (Oooh. Event API. What is the user doing? They've done the grunt-work of hooking into the different applications! Sweeeeeeet! Awesome! Awesome! They have an event stream already going!)


Show useful information, not distracting, make efficient use of space. Write a cool panel, and depending on the number of users: limited edition Google Desktop t-shirt, adwords, iPod nano, internship!

Some ideas: local traffic, calendar, eBay, iTunes, sports scores, quicklaunch, TV guide, random Google Video, webcam, SMS...

Google work environment

Small teams of 3-5 people, problems that matter, with freedom to explore their ideas. Access to enormous computational resources. 20% time to explore your own ideas.

Froogle, orkut, news, desktop: all 20% products.

Not just about search

- hardware, mechanical engineering - networking, distributed sys, fault tolerange - compilers, programming language - data structures, algos - machine learning, statistics, IR , AI - user interfaces - product design

Not just about engineering

- product management - product marketing - finance - technical sales - tech program management - staffing - online sales and operations

Hiring all over the world Great benefits

- flexible work environment - fun atmosphere - free gourmet meals - on-site massage, doctor, concierge, and dry-cleaning - and all those "standard" things

(still need an apartment to sleep in, though, though.)

Looking at the future

Sampling of Google's product suite: Google Web Search, Adsense for Search, Google News, Blogger, Froogle, Gmail, Google Earth, Google Search Appliance, Google Toolbar...

help users organize information

Google Labs: personalized search, video, suggest, sets

looking ahead: always room for improvement

- better systems: improving scalability and performance, providing new infrastructure to build services on - better relevance: improving which pages are presented to the user, giving user access to more/new information - better products/services: new product directions to pursue

Come join the fun! http://google.com/jobs , collegejobs@google.com

Questions and answers

- How does Google make money off Orkut? We never worry about profit

for product. We make it first, and then we see if we can make money off it. - Is there an reality in a Google online office? Can't comment on any rumors. - How many people are you looking to hire? No specific number in mind. As many great, talented people are out there. - Server count? Can't answer that. - Majority of Google revenue come from licensing technologies? Revenue statements are largely open now that Google is a public company. Most of it comes from Adsense. Some revenue from Google Earth and Google search appliance. - Only some publications from Google Labs. Is that something encouraged within Google, or is it just happens? Very fine line betwe... we want competitive advantage also. We have opened up software. Historically we haven't been a huge research company. - Where do you stand on privacy? "Don't be evil." You need to get special permission to go through query logs, for example. - What about Linux and Mac versions of things like Google Desktop? We want to focus on what will give us the most impact. Cross-platform thing is 20%-time stuff. Most Googlers use Linux, so it's frustrating having to borrow someone else's computer to try things out. - What about linkspam? 50-100 people working on linkspam. Matt Cutts is one of the Googlers working on this. - What about corporate structure? I've heard Google's supposed to be very democratic. — Teams themselves figure out what features should be added. We just meet and figure out what to do. Engineers have a lot of power. More motivation to work on things. - How many engineers do you have? 3000+ engineers. - Why do you help out Firefox? What do you have planned? Sometimes Google just does things to help make the Web a better place. Part of philosophy of not being evil. - What about UI design? UI designers really help us a lot. For example, sidebar. UI designers helped us do that. - Software engineering? We have design documents and we review them. Testing. 20% projects are an exception; rules are looser. For most projects, there are design documents, all the code is reviewed before it's submitted, unit tests are encouraged... - What are you looking for? Well-rounded bright individuals. We want to be able to learn something from you. We want to make sure you're a solid recruit for Google. We want to make sure we keep learning something. Something that wows us. "Wow, this guy is sharp."

Update: Also blogged by Alvin Chin. Also: http://www.the-gadgetman.com/files/Google%20tech%20talk.mp3

Google recruitment talk: Impressions

October 6, 2005 - Categories: -Uncategorized

Google is, of course, t3h k3wl. In fact, working at Google is probably cooler than studying at MIT, in terms of geek status. ;) This recruitment talk wasn't about convincing U of T students how cool Google is. That would've been preaching to the choir. Rather, the talk was about some of the interesting challenges people might get to work on at Google. This should help students think about their projects and their resumes...

I was a bit disappointed that there weren't any female engineers. The two women there were both from HR. They wore Google shirts with the second "o" replaced by the sign for woman, and that's something I want to think about further. I talked to one of the women after the presentation. She said that there was supposed to be one, but she got pulled into a project at the last minute. They do try to pay attention to these things, though, and occasionally have all-female events.

I confess. I loiter near the front during post-talk mingling not because I have burning questions to ask, but because I like eavesdropping on other people's questions. I learn a lot from other people's concerns. For example, like students around the world, U of T students are worried about their GPA and whether their grades will affect their admissions. They want to know what companies are looking for. They want to know about where the company's going. The usual HR stuff. I like watching out for the unusual questions, like the way someone asked "So, important question: vi or emacs?" (Wish I knew who asked that one!) And the person who asked about Python. Interesting.

Anyway, getting back to Google. Google's interesting. Here'd be my strategies for getting in:

- Resume, traditional job application? Right now? No way. I

won't stand out in the crowd.

- Internship? International student; fat chance.

- Extracurricular projects? Promising. If I want to get into

this stuff, it's a good time to learn AJAX and figure out how to use the Google APIs. Google Desktop looks _really_ interesting and it's right up my personal info/knowledge management alley, but it's Microsoft Windows-based. (That's another option, though; get something running on Linux...)

So if I want to boost my chances for next year's job application cycle, I should work on a project. Come to think of it, anyone can do that from anywhere in the world—so don't lose hope, people back home! =)

Next question. Do I want to work at Google?

I didn't need to see this presentation to know that Google is totally cool. It's every geek's dream company. Imagine hanging out with incredibly brilliant geeks, working on great projects, eating nice (and free!) food, and enjoying all the computing power you can throw at a problem.

Does it fit what I want to do?

Well, if I get in, it will certainly push me in terms of technical skills. I'll learn a _lot._ But I don't just want to work on my technical skills... I don't think I know enough about Google yet to like them immensely.

It's nice that Google matches employee donations, and it's great that they've got a motto of "Don't be evil." I need to learn more about them and how they might fit into my personal mission statement, though... I think I need a lot more user contact, a lot more involvement in people's lives.

And hah! yes, ego comes into it too. I want people to know me. Not just the systems I build, but to know _me_, and I want to know them not just as statistics but as people too. As much as I'm glad that those Googlers can keep Google running and can develop all sorts of cool new systems, they're still anonymous to me and to the millions of people who use Google without thinking.

There you go. I've confessed it. I'm egotistic. I want people to know me and I want to know them. I want to be within talking distance of users.

Is that something Google can let me do? I don't know. We'll see.

Ack! I can't believe I feel uncertainty about _the_ geek company of our time!

Does this mean I'm getting less geeky?

Oh no!

Geek girl T-shirts

October 6, 2005 - Categories: women

The two women from HR wore Google Women's Tees. From the website: "We originally designed this shirt for our efforts in recruiting women engineers." Seeing the shirt on them made me think about my geekwear, and why I found the Google Women's Tee a bit strange.

I like wearing tech shirts. They're a great way to identify myself to other people. They makes it easier for geeks to talk to me. They provide instant conversation starters for people in the know.

I'm still not used to the Venus symbol, though, and that's probably because I think of the symbol in different contexts. It feels too serious for me. I guess I'm also more used to the "girl" aspect of my identity than I am to the "woman" aspect. That's why I self-identify as "geek girl".

Maybe it's a socialization thing. I'm more used to subtle gender signs, like the "geekette" in my signature. I like wearing baby tees with the same logos as the regular shirts. The logo connects me to other geeks, but the slightly more flattering cut makes a small difference.

Ah. That's probably it. I want my geekwear to connect me with other geeks, which is why I'd go for something generic like "emacs" over something like "geek. girl. goddess." I'd wear "emacs girl" if I want to point out that yes, I can _too_ be a girl _and_ be into Emacs, but I prefer focusing on what I have in common with other geeks.

This doesn't mean the T-shirts are bad, though. It just means I'd be more comfortable in a plain black Google women's T-shirt than in a Google Women's Tee.

It's pretty much a moot point, anyway, as they only had white long-sleeved men's style shirts earlier, and they ran out before I could get one. The swag would've been nice, but it wasn't essential. I learned enough from the conversations and the talk itself to make the time worthwhile. <laugh> I can understand why they probably wouldn't bring women's tees to a mixed talk. Still, I'm endlessly appreciative of conferences and tech sessions that actually have baby tees, like the totally cool open source conference I spoke at in Cebu and the blogging summit I attended in Manila right before I left. I left the blogging shirt at home, but I love my open source baby tee to pieces.

Ah, the trouble with being a geek girl in a guy's world... Swag rarely fits.

More about looking for geek role models

October 6, 2005 - Categories: geek

Michael Olson wrote:

I hope Google does a better job with your tech talk than they did at Purdue. A few things rankled me. Like bringing in an equal number of men and women to the talk, but no computer scientists among the women, just recruiters. All of the men, by contrast, were computer scientists.

You know, he has an interesting point there.

Research blog

October 6, 2005 - Categories: -Uncategorized

Yes, yes, yet another blog. http://blogs.imedia.mie.utoronto.ca/sacha/research/ will store my research notes. Really. Promise. Well, at least my research notes will stay there for maybe a week...

This replaces the boring one at http://blogs.mie.utoronto.ca . Wordpress is so much cooler than Roller.

I need a little bit more organization than WordPress can give me, so I'll also be organizing http://blogs.imedia.mie.utoronto.ca/sacha/wiki/ sometime. If I can figure out how to properly blog on pmwiki, then I'll switch to that instead.

Tips for talking to other people

October 6, 2005 - Categories: -Uncategorized

I met Jessie at the Graduate Students Initiative lunch yesterday. She's a first-year grad student taking up a master's degree in chemical engineering while her husband takes an MBA. (Wow, that's tough!) She moved here around two months ago too, and is getting used to learning in English. We talked about how difficult it was to start conversations. I e-mailed her these tips afterwards. =)

- Take advantage of common ground. At a graduate student lunch, you know that everyone's a graduate student, so you can ask people the usual questions: What program are you taking? When did you start? Why University of Toronto? Do you have any tips for other grad students? If you're at the International Student Centre, ask about where people are from, when they moved here, what they learned while moving... In a club? Ask about how people got interested in the club and how the activities have been so far. =)

- Take advantage of the fact that you're new to Toronto. Ask about winter. Ask about places to shop or eat cheaply. Ask about things you're curious about. Most people love helping other people figure things out. It's a great way to get people in a conversation

- Read the newspaper. If you don't have time, just read the headlines and the editorials. This'll give you plenty of stuff to talk about.

- Don't worry if people don't seem friendly. Maybe they're just having a bad day. When talking to someone, you can figure out if they're interested in talking to you or if they just want to be by themself. If they smile, explain, and ask you questions, then even if you don't start off with any common interests, you're bound to find something interesting. On the other hand, if they sound distracted or they answer with very short sentences ("No. Yes. Fine."), maybe it's just not a good time to talk to them. Smile, thank them for their time, and move on.


Tips for time management

October 6, 2005 - Categories: emacs

Jessie and I also talked about the challenges of balancing the demands of research, studies, teaching, and life. She wanted to do an hour of exercise a week, but simply couldn't find the time for it. She felt overwhelmed with the things she needed to do.

I want to help her figure out how to gain control of her time. =) I sent her these tips to help her get started.

- Keep track of your time. For one week, write down everything that you do and how long it takes you to do it. You'll get an idea of where you're spending too much time and what you're not spending enough time on.

- Think about your priorities. What do you want to do with your life? Start from that and plan what you want and need to do this week. Schedule time in to work on things that are important to you. Then you can go through each day knowing that you've not only worked on the things that other people need you to do, but also the things that you want to do.

- Make the most of your time. Is whatever you're doing something you really need to do? Can you invest a little time in the beginning to save more time later on?

I have a spare academic planner that I'm no longer using because I have my own system for keeping track of my time. I'm thinking of giving it to her because I'm not using it anyway. =)

I'm also thinking of doing D*I*Y planner templates to help people do that kind of time analysis...