SpoonDrift: 2006-12

2006-12-31

WHY kill Doggy ?

为什么要杀死狗狗？
人类最忠实的伙伴，做出过那么多的杰出贡献......
Why kill dogs?
Mankind's most faithful partner, has made so many outstanding contributions ......

2006-12-29

Thunderbird, Foxmail, 附件

我的Thunderbird给Foxmail发送的附件全是乱码。

mail.strictly_mime.parm_folding 由"2"，改为"0"即可

2006-12-20

The Secret Source of Google's Power

Much is being written about Gmail, Google's free webmail system. There's something deeper to learn about Google from this product than the initial reaction to the product features, however. Ignore for a moment the observations about Google leapfrogging their competitors with more user value and a new feature or two. Or Google diversifying away from search into other applications; they've been doing that for a while. Or the privacy red herring.

No, the story is about seemingly incremental features that are actually massively expensive for others to match, and the platform that Google is building which makes it cheaper and easier for them to develop and run web-scale applications than anyone else.

I've written before about Google's snippet service, which required that they store the entire web in RAM. All so they could generate a slightly better page excerpt than other search engines.

Google has taken the last 10 years of systems software research out of university labs, and built their own proprietary, production quality system. What is this platform that Google is building? It's a distributed computing platform that can manage web-scale datasets on 100,000 node server clusters. It includes a petabyte, distributed, fault tolerant filesystem, distributed RPC code, probably network shared memory and process migration. And a datacenter management system which lets a handful of ops engineers effectively run 100,000 servers. Any of these projects could be the sole focus of a startup.

Speculation: Gmail's Architecture and Economics

Let's make some guesses about how one might build a Gmail.

Hotmail has 60 million users. Gmail's design should be comparable, and should scale to 100 million users. It will only have to support a couple of million in the first year though.

The most obvious challenge is the storage. You can't lose people's email, and you don't want to ever be down, so data has to be replicated. RAID is no good; when a disk fails, a human needs to replace the bad disk, or there is risk of data loss if more disks fail. One imagines the old ENIAC technician running up and down the isles of Google's data center with a shopping cart full of spare disk drives instead of vacuum tubes. RAID also requires more expensive hardware -- at least the hot swap drive trays. And RAID doesn't handle high availability at the server level anyway.

No. Google has 100,000 servers. If a server/disk dies, they leave it dead in the rack, to be reclaimed/replaced later. Hardware failures need to be instantly routed around by software.

Google has built their own distributed, fault-tolerant, petabyte filesystem, the Google Filesystem. This is ideal for the job. Say GFS replicates user email in three places; if a disk or a server dies, GFS can automatically make a new copy from one of the remaining two. Compress the email for a 3:1 storage win, then store user's email in three locations, and their raw storage need is approximately equivalent to the user's mail size.

The Gmail servers wouldn't be top-heavy with lots of disk. They need the CPU for indexing and page view serving anyway. No fancy RAID card or hot-swap trays, just 1-2 disks per 1U server.

It's straightforward to spreadsheet out the economics of the service, taking into account average storage per user, cost of the servers, and monetization per user per year. Google apparently puts the operational cost of storage at $2 per gigabyte. My napkin math comes up with numbers in the same ballpark. I would assume the yearly monetized value of a webmail user to be in the $1-10 range.

Cheap Hardware

Here's an anecdote to illustrate how far Google's cultural approach to hardware cost is different from the norm, and what it means as a component of their competitive advantage.

In a previous job I specified 40 moderately-priced servers to run a new internet search site we were developing. The ops team overrode me; they wanted 6 more expensive servers, since they said it would be easier to manage 6 machines than 40.

What this does is raise the cost of a CPU second. We had engineers that could imagine algorithms that would give marginally better search results, but if the algorithm was 10 times slower than the current code, ops would have to add 10X the number of machines to the datacenter. If you've already got $20 million invested in a modest collection of Suns, going 10X to run some fancier code is not an option.

Google has 100,000 servers.

Any sane ops person would rather go with a fancy $5000 server than a bare $500 motherboard plus disks sitting exposed on a tray. But that's a 10X difference to the cost of a CPU cycle. And this frees up the algorithm designers to invent better stuff.

Without cheap CPU cycles, the coders won't even consider algorithms that the Google guys are deploying. They're just too expensive to run.

Google doesn't deploy bare motherboards on exposed trays anymore; they're on at least the fourth iteration of their cheap hardware platform. Google now has an institutional competence building and maintaining servers that cost a lot less than the servers everyone else is using. And they do it with fewer people.

Think of the little internal factory they must have to deploy servers, and the level of automation needed to run that many boxes. Either network boot or a production line to pre-install disk images. Servers that self-configure on boot to determine their network config and load the latest rev of the software they'll be running. Normal datacenter ops practices don't scale to what Google has.

What are all those OS Researchers doing at Google?

Rob Pike

has gone to Google. Yes, that Rob Pike -- the OS researcher, the member of the original Unix team from Bell Labs. This guy isn't just some labs hood ornament; he writes code, lots of it. Big chunks of whole new operating systems like Plan 9.

Look at the depth of the research background of the Google employees in OS, networking, and distributed systems. Compiler Optimization. Thread migration. Distributed shared memory.

I'm a sucker for cool OS research. Browsing papers from Google employees about distributed systems, thread migration, network shared memory, GFS, makes me feel like a kid in Tomorrowland wondering when we're going to Mars. Wouldn't it be great, as an engineer, to have production versions of all this great research.

Google engineers do!

Competitive Advantage

Google is a company that has built a single very large, custom computer. It's running their own cluster operating system. They make their big computer even bigger and faster each month, while lowering the cost of CPU cycles. It's looking more like a general purpose platform than a cluster optimized for a single application.

While competitors are targeting the individual applications Google has deployed, Google is building a massive, general purpose computing platform for web-scale programming.

This computer is running the world's top search engine, a social networking service, a shopping price comparison engine, a new email service, and a local search/yellow pages engine. What will they do next with the world's biggest computer and most advanced operating system?

2006-12-19

Download Video From Youtube

Youtube有很多有意思的视频，可供娱乐或学习。如果我是英语老师，我觉得有些视频就可以用做听力课或者口语课的材料。但youtube本身不提供视频下载服务，它所使用的是FLV格式，要用FLVplayer播放（据说最新版的暴风影音也可以）。下面我们就研究研究Youtube视频的下载、本地播放及格式转换。

下载youtube视频：

方法1：在线服务
Keepvid.com这个网站非常爽，只要把youtube相应视频的url填入，它就会自动提取出flv文件供下载了，而且速度还很快。keepvid支持大部分流行视频服务的下载，如google video,国内的tudou等。

注意事项：
1、有时候下载的文件名是get_video，没有扩展名.flv，需要手工加上扩展名（先确认"查看--文件夹选项--隐藏已知文件的扩展名，这个选项已经去掉"）
2、偶发现有时候用keepid下载的视频尺寸有点问题。不知道是不是rpwt

方法2：YouTubio软件
YouTubio是个绿色软件，界面简单（也比较简陋），操作方法跟keepvid差不多，输入url后F5就开工下载了，但速度居然比keepvid慢。

播放.FLV
下载一个FLV Player或者 Riva FLV Player。

转换.FLV
用Riva FLV encoder可把FLV文件转换为avi、mpeg、wmv等更为常见的格式（ more）。RivaFLVencoder同时也包括一个FLV播放器Riva FLV Player，但从功能、界面、资源消耗等方面考虑，还是FLV player强一点。

RivaFLVencoder的简单使用：
1、在input--input video里面打开要转换的FLV文件，或直接把文件拖拽到这里
2、在output--output directory里面选择输出文件的保存目录
3、在output--output video里面修输出文件的文件名和扩展名（如avi、wmv）
4、[可选]在右边栏可设置输出文件的各种设置
5、点击"encode"，速度挺慢，需要有点耐心

参考资料：
1、Riva FLV Encoder Help
2、 Flash 视频(FLV)编码,转换,录制,播放方案一网打尽
3、How to convert .flv (flash video) to .avi or .mpg
4、 Keepvid-Youtube等视频的下载服务

2006-12-18

link suggestion - blog banned by Google

Hi. My name is Eugene Gershin. Perhaps we have met online, but more probably you don't know me from Adam. I monitor blogs for SamsonBlinded, and came across your post.

I'd like to welcome you to look at Obadiah Shoher's blog. Obadiah - an anonymous Israeli politician - writes extremely controversial articles about Israel, the Middle East politics, and terrorism.
Shoher is equally critical of Jewish and Muslim myths, and advocates political rationalism instead of moralizing.
Google banned our site from the AdWords, Yahoo blocked most pages, and Amazon deleted all reviews of Obadiah's book, Samson Blinded: A Machiavellian Perspective on the Middle East Conflict.
Nevertheless, 170,000 people from 78 countries read the book.

Various Internet providers ban us periodically, but you can look up the site on search engines. The mirror www.terrorismisrael.net/blog currently works.

Please help us spread Obadiah's message, and mention the blog in one of your posts, or link to us from spoondrift.blogspot.com. I would greatly appreciate your comments.

Best wishes,
Eugene Gershin

2006-12-14

来自自由世界的消息

这些个月来中宣部正在旗帜鲜明地反对恶搞，可是顶风作案的却大有人在，从广电总局、信产部到中国电信，这次轮到国务院新闻办公室了。

自从信誓旦旦地宣称我们不搞网络审查之后，我们的官员再度宣称，中国的网民实际上是世界上最自由的。有文为证：

记者昨天从刚刚结束的第五届亚太地区媒体与科技和社会发展研讨会上获悉，目前，我国也正在研究如何对互联网进行高度统一的管理模式。国务院新闻办公室主任蔡武在闭幕式上做报告指出，中国的网民实际上是世界上最自由的。根据国务院新闻办对全球20多个国家的调查，所有国家对互联网都是有管理的，都要求在本国宪法和法律范围内。"由于隐私权的问题，英国至今没有开放博客；韩国则必须使用网络实名制，这个我国才刚提出"，蔡武表示，大多数国家网上不许跟帖，但我国这种现象却普遍存在。蔡表示，网络这样有高度融合性的技术需要高度统一的管理，对此国家也正在研究。

对于中央的声明，我当然是一如既往地坚决拥护，我对我们身处的自由世界深信不疑，正如我们应该坚信世界上还有三分之二的苦难人民等待我们去解救一样。我们必须相信，所谓撞墙只是我们没睡醒时候的幻觉，那些突然人间蒸发的论坛其实根本从来都没有存在过。而在这里比在任何国度都成长得更为茁壮的流氓、SP和推送广告，则是自由世界的鲜活见证。

什么叫和谐？这就叫和谐。和谐的最终奥义，不在于真的有多和谐，而在于每个人有多相信这个世界的和谐。在和谐的世界里面，我们都是沐浴着自由阳光的花朵，哇哈哈，哇哈哈，每个人的脸上都笑开颜。

Updated: 短暂的复活以后Wikipedia再次被封，这次连改host文件都不好使了。身为和谐国度的一个光荣国民还真他m的自由。

2006-12-05

About SYN Flood

SYN Flood是当前最流行的DoS（拒绝服务攻击）与DdoS（分布式拒绝服务攻击）的方式之一，这是一种利用TCP协议缺陷，发送大量伪造的TCP连接请求，从而使得被攻击方资源耗尽（CPU满负荷或内存不足）的攻击方式。

要明白这种攻击的基本原理，还是要从TCP连接建立的过程开始说起：

大家都知道，TCP与UDP不同，它是基于连接的，也就是说：为了在服务端和客户端之间传送TCP数据，必须先建立一个虚拟电路，也就是TCP连接，建立TCP连接的标准过程是这样的：

首先，请求端（客户端）发送一个包含SYN标志的TCP报文，SYN即同步（Synchronize），同步报文会指明客户端使用的端口以及TCP连接的初始序号；

第二步，服务器在收到客户端的SYN报文后，将返回一个SYN+ACK的报文，表示客户端的请求被接受，同时TCP序号被加一，ACK即确认（Acknowledgement）；

第三步，客户端也返回一个确认报文ACK给服务器端，同样TCP序列号被加一，到此一个TCP连接完成。
以上的连接过程在TCP协议中被称为三次握手（Three-way Handshake）。

问题就出在TCP连接的三次握手中，假设一个用户向服务器发送了SYN报文后突然死机或掉线，那么服务器在发出SYN+ACK应答报文后是无法收到客户端的ACK报文的（第三次握手无法完成），这种情况下服务器端一般会重试（再次发送SYN+ACK给客户端）并等待一段时间后丢弃这个未完成的连接，这段时间的长度我们称为SYN Timeout，一般来说这个时间是分钟的数量级（大约为30秒-2分钟）；一个用户出现异常导致服务器的一个线程等待1分钟并不是什么很大的问题，但如果有一个恶意的攻击者大量模拟这种情况，服务器端将为了维护一个非常大的半连接列表而消耗非常多的资源----数以万计的半连接，即使是简单的保存并遍历也会消耗非常多的CPU时间和内存，何况还要不断对这个列表中的IP进行SYN+ACK的重试。实际上如果服务器的TCP/IP栈不够强大，最后的结果往往是堆栈溢出崩溃——即使服务器端的系统足够强大，服务器端也将忙于处理攻击者伪造的TCP连接请求而无暇理睬客户的正常请求（毕竟客户端的正常请求比率非常之小），此时从正常客户的角度看来，服务器失去响应，这种情况我们称作：服务器端受到了SYN Flood攻击（SYN洪水攻击）。

目前流行的网络攻击手段还有：

Syn Attach（同步攻击）、ICMP flood（ICMP 泛滥）、UDP flood（UDP泛滥）、Ping of death （死亡ping）、IP spoofing（IP欺骗）、 Port Scan（端口扫描）、Land attack（陆地攻击）、Tear drop attack （撕毁攻击）、Filter IP source route option（过滤IP源路由选项）、IP address sweep option（IP地址扫描攻击）、WinNuke attack（WinNuke攻击）、Java/ActiveX/ZIP/EXE（）、 Default packet deny（预设封包拒绝）、User-defined Malicious URL（用户可预设的恶意URL ）、Per-source session limiting（每源口的会话限制）、Syn fragments（同步碎片）、Syn and Fin bit set （Syn和Fin位设置（bit set ））、No flags in TCP（TCP无标记）、FIN with no ACK （无确认FIN）、ICMP fragments（ICMP碎片）、Large ICMP（大型 ICMP）、IP source route（IP始发路由）、 IP record route（IP记录路由）、IP security options（IP安全选项）、IP timestamp（ IP时间戳）、IP stream（IP流）、IP bad options（IP损害选项）、Unknown protocols（不明协议）等。

2006-12-04

Windows Live Messenger 8.1

虽然Microsoft拒绝了我的Live Messenger 8.1试用请求，但是我还是从Ideas.live.com那里下载到了，事先我没有看网上对于8.1版本的评价，用了几个月，除了联系人卡片，没有发现其他的变化......

在2006年6月20日正式推出。Windows Live Messenger包括MSN Messenger的全部功能并再加上新的连接和共享文件方法，例如支持了MSN Messenger所不支持的离线消息以及离线共享文件。同时,微软也从Windows Live OneCare中提取出一个杀毒组件，专供MSN用于扫描接收到的文件。

2006年7月13日起，Windows Live Messenger的用户可以传送讯息给Yahoo! Messenger的用户。同时，也可互留Offline Messages（离线留言），互传振动提示（WLM 称为Nudge，而Yahoo! messenger 则称为Buzz），加入MSN用户及看见双方的上线（Online），繁忙（Busy），离线（Offline）等状态。

我并不是WLM的忠实用户，相比之下，我更喜欢Gaim。

SpoonDrift

2006-12-31

WHY kill Doggy ?

2006-12-29

Thunderbird, Foxmail, 附件

2006-12-20

The Secret Source of Google's Power

Speculation: Gmail's Architecture and Economics

Cheap Hardware

What are all those OS Researchers doing at Google?

Competitive Advantage

2006-12-19

Download Video From Youtube

2006-12-18

link suggestion - blog banned by Google

2006-12-14

来自自由世界的消息

2006-12-05

About SYN Flood

2006-12-04

Windows Live Messenger 8.1

About Me

Blog Archive

Friend Blog

2006-12-31

2006-12-29

2006-12-20

Speculation: Gmail's Architecture and Economics

Cheap Hardware

What are all those OS Researchers doing at Google?

Competitive Advantage

2006-12-19

2006-12-18

2006-12-14

2006-12-05

2006-12-04

About Me

Blog Archive

Friend Blog

Subscribe