July 31, 2008

觉得有点意思的:原文链接 http://zhiqiang.org/blog/posts/rotate-coin-games.html


Alice和Bob两人玩一种硬币游戏。游戏在一个2\times2的棋盘上进行,棋盘上每个格子上都有一枚硬币。在每一回合,Alice可以决定选择翻转某两枚或者一枚硬币,接着Bob可以选择将棋盘旋转90,180或者270度,也可以什么都不做。

游戏轮流进行直到棋盘上所有硬币都正面朝上或者反面朝上,Alice获得胜利。

如果Alice在游戏过程中无法看到棋盘上的银币,也不知道游戏刚开始的状态,甚至不知道Bob每回合是否旋转了棋盘,那么Alice有策略能够获得胜利么?他的最优策略是什么?

接下来我们推广这个游戏。共有n枚硬币,分别放在一个正n边形棋盘的顶点上。每回合Alice可以翻转任何一些银币,Bob则可任意以n种不同的方式(旋转360/n的倍数角度)之一旋转棋盘。游戏一直到所有硬币正面朝上或者反面朝上,Alice获得胜利。

这时候Alice还能取胜吗?

解答在此,但强烈推荐独立思考此题,特别是n=4的情况。

Tags: ,.

Spam Detection in Social Bookmarking Systems

With the growing popularity of social bookmarking systems, spammers discovered this kind of service as a playground for their activities. Usually they pursue two goals: On the one hand, they place links in the system to attract people to advertising sites. On the other hand, they increase the PageRank of their sites by placing links in as many popular web 2.0 sites as possible, in order to increase their visibility in Google and other search engines. Usual counter-measures like captchas are not efficient enough to effectively prevent the misuse of the system. In the last year, we were able to collect data of more than 2,000 active users and more than 25,000 spammers by manually labeling spammers and non-spammers. The provided dataset consists of these users and of all their posts. This includes all public information such as the url, the description and all tags of the post. The goal of this challenge is to learn a model which predicts whether a user is a spammer or not. In order to detect spammers as early as possible, the model should make good predictions for a user when he submits his first post.

Dataset description

A general description of the dataset can be found here. For the spam detection task all provided files are relevant.

Evaluation

All participants can use the training dataset to fit the model. The training dataset contains flags that identify users as spammers or non-spammers. The test dataset will have the same format as the training dataset and can be downloaded two days before the end of the competition. It will contain users of a future period. All participants must send a sorted file containing one line, for each user, composed by the user number and a confidence value separated by a tab. The higher the confidence value, the higher the probability that the user is a spammer. The highest confidence should come first.

                     user spam
                        1234  1
                        1235  0.987
                        1236  0.765
                        1239  0

If no prediction is provided we assume the user is not a spammer. The evaluation criterion is the AUC (the Area under the ROC Curve) value. We compare the submitted spammer predictions of the participants with the manually assigned labels on a user basis.
———————————————–

1. Give a evaluation model for the Users according the time line behavior, and by this model we can calculate the probability of a spammer.
2. Some sites are spam website, and the spam users are doing something with these websites.So the spam users recommend spam websites.
3. The spammers are similar, there working time, their frequence, the website they recommended, etc.
4. Artificial Neural Networks is not that good for this question, I think the regression model maybe work. To summarize some rule to identify the potential spammer.

The datasets are a little bit complex which is consist of 4 tables.

The dataset consists of seven files:

These are tab-separated files which have the following columns:

Files tas and tas_spam

Tag ASsignments: Fact table; who attached which tag to which resource/content

  1. user (number; user names are anonymized)
  2. tag
  3. content_id (matches bookmark.content_id or bibtex.content_id)
  4. content_type (1 = bookmark, 2 = bibtex)
  5. date

Files bookmark and bookmark_spam

Dimension table for bookmark data

  1. content_id (matches tas.content_id)
  2. url_hash (the URL as md5 hash)
  3. url
  4. description
  5. extended description
  6. date

Files bibtex and bibtex_spam

Dimension table for BibTeX data

  1. content_id (matches tas.content_id)
  2. journal volume
  3. chapter
  4. edition
  5. month
  6. day
  7. booktitle
  8. howPublished
  9. institution
  10. organization
  11. publisher
  12. address
  13. school
  14. series
  15. bibtexKey (the bibtex key (in the @… line))
  16. url
  17. type
  18. description
  19. annote
  20. note
  21. pages
  22. bKey (the “key” field)
  23. number
  24. crossref
  25. misc
  26. bibtexAbstract
  27. simhash0 (hash for duplicate detection within a user — strict — (obsolete))
  28. simhash1 (hash for duplicate detection among users — sloppy –)
  29. simhash2 (hash for duplicate detection within a user — strict –)
  30. entrytype
  31. title
  32. author
  33. editor
  34. year

File user

Mapping of non-spammer / spammer for each user. This file can be used for spam classification.

  1. user (matches tas.user)
  2. spam flag (0 = non-spammer, 1 = spammer)

Size of Files

Number of lines in files:

  1. tas 816,197 / tas_spam 13,258,759
  2. bookmark 181,833 / bookmark_spam 2,059,991
  3. bibtex 219,417 / bibtex_spam 716
  4. user_spam 31,715

Deadline:

May 5, 2008 Tasks and datasets available online.
July 30th, 2008 Test dataset will be released (by midnight CEST).
August 1st, 2008 Result submission deadline (by midnight CEST).
August 4th, 2008 Workshop paper submission deadline.

Submit URL: http://www.kde.cs.uni-kassel.de/ws/rsdc08/upload/


Technorati : , , ,
Del.icio.us : , , ,
Ice Rocket : , , ,
Flickr : , , ,
Zooomr : , , ,
Buzznet : , , ,

July 26, 2008

江南闷骚诗人 SinRain
写道:

http://pic.yupoo.com/sinrain/088575ee3735/qzpzwyxz.jpg

哦 浦口你走吧
你的落叶 随着这被记忆吹起的夏天
一起破碎而腐烂
这里定会再有陌生的笑脸
这里定不会再有满怀故事的少年

哦 浦口你走吧
我不会去怀恋半点
你落寞的夜里
再不会有人惊吓到夜游的精灵
再不会有人在思源上仰望星空 以及吃烤串儿
大雾么 早就散啦
孩子们回家吧

—————————

闷骚完毕: 看书去了。。

July 25, 2008

Convex Optimization – Boyd and Vandenberghe

Convex Optimization book cover  Convex Optimization
Stephen Boyd and Lieven Vandenberghe

Cambridge University Press

More material can be found at the web sites for EE364A (Stanford) or EE236B (UCLA), and our own web pages. Source code for almost all examples and figures in part 2 of the book is available in CVX (in the examples directory), in CVXOPT (in the book examples directory). Source code for examples in Chapters 9, 10, and 11 can be found here. Instructors can obtain complete solutions to exercises by request to solutions@cambridge.org.

If you find an error not listed in our errata list, please do let us know about it.

Stephen Boyd & Lieven Vandenberghe

Download

Copyright in this book is held by Cambridge University Press, who have kindly agreed to allow us to keep the book available on the web.

Prof. HE recommended this book to me, so I got something to do besides reading  papers…

Tags: ,.

 史上最早的公平招聘 

╔═╤═╤═╤═╤═╤═╤═╤═╤═╤═╤═╤═╤═╤═╤═╤═╤═╤═╗
║符│﹂│其│;│,│!│能│所│骊│也│:│之│菜│,│此│相│求│秦║
║︾│ │所│见│天│是│知│使│。│?│﹁│。│者│不│者│也│马│穆║
║ │马│不│其│机│乃│也│求│穆│﹂│已│﹂│,│可│绝│。│者│公║
║ │至│视│所│也│其│?│马│公│对│得│ │有│告│尘│天│乎│谓║
║ │,│。│见│。│所│﹂│者│不│曰│之│穆│九│以│弭│下│?│伯║
║ │果│若│,│得│以│ │!│说│:│矣│公│方│天│辙│之│﹂│乐║
║ │天│皋│不│其│千│伯│色│,│﹁│,│见│皋│下│。│马│ │曰║
║ │下│之│见│精│万│乐│物│召│牝│在│之│,│之│臣│者│伯│:║
║ │之│相│其│而│臣│喟│、│伯│而│沙│,│此│马│之│,│乐│﹁║
║ │马│马│所│忘│而│然│牝│乐│黄│丘│使│其│也│子│若│对│子║
║ │也│者│不│其│无│太│牡│而│。│。│行│于│。│皆│灭│曰│之║
║ │。│,│见│粗│数│息│尚│谓│﹂│﹂│求│马│臣│下│若│:│年║
║ │ │乃│,│,│者│曰│弗│之│使│穆│马│非│有│才│没│﹁│长║
║ │││有│视│在│也│:│能│曰│人│公│。│臣│所│也│,│良│矣║
║ │││贵│其│其│。│﹁│知│:│往│曰│三│之│与│,│若│马│,║
║ │︽│乎│所│内│若│一│,│﹁│取│:│月│下│共│可│亡│可│子║
║ │列│马│视│而│皋│至│又│败│之│ │而│也│担│告│若│形│姓║
║ │子│者│,│忘│之│于│何│矣│,│﹁│反│。│幺│以│失│容│有║
║ │·│也│而│其│所│此│马│,│牡│何│报│请│墨│良│。│筋│可║
║ │说│。│遗│外│观│乎│之│子│而│马│曰│见│薪│马│若│骨│使║
╚═╧═╧═╧═╧═╧═╧═╧═╧═╧═╧═╧═╧═╧═╧═╧═╧═╧═╝

可以说是最早的无视肤色和地域公平选拔人才的HR顾问了,也就只有这般才能选到天下之马也!

做事情专注,就是根本不会留意到和结果无关的变量,伯乐也会因此去佩服九方皋的呀!

Tags: ,.
July 13, 2008

Eason又发新专辑了–《不想放手》,听到其中一首然后怎样,实在太陈奕迅了。

http://music.sina.com.cn/yueku/m/974752.html

完成了所谓的理想
放纵了情绪的泛滥
汗都流干 天都微亮 然后怎样
拥有了旅行的空档
却遗失流浪的背囊
沿着轨道 一直流浪 然后怎样
假期过完 有什么打算
走过一个天堂 少一个方向
谁在摧我成长 让我失去迷途的胆量
我怕谁失望 我为谁而忙
我最初只贪玩 为何变负担
为何我的问题 总得等待别人的答案
我的快乐时代唱烂
才领悟代价多高昂
不能满足 不敢停站 然后怎样

Tags: ,.
July 7, 2008

从此后,人海更加茫茫…没有人知道我们曾擦肩…

July 6, 2008

最近这么一个视频在国内的很多视频论坛上很热。大意是一个农民就这么造了一台直升飞机上天了,而且造价只有2万人民币。

http://www.youtube.com/watch?v=O9SGUHU1s9E

这是个神奇的事情,而神奇的事情一般来说,总是由于信息不够多造成的假象。

我们来看看这个所谓的牛人自述吧:

“他说,做直升机远远比造超轻型飞机难,所以国内从来没有人能够把直升机飞起来的,“国内能飞的,至少一百多万元,都是买回来的。”

陈昭荣初中毕业就没有再上过学。年轻时,他就喜欢摆弄电子器件。一开始时,他不得不对照国外造出来的直升机的样子来做,为此,他这个“初中毕业生”不得不对着数十个全英文的外国网站来搜寻资料。当然,他承认,除了图片外从来看不懂写什么。

两年前,40岁的陈昭荣正式开始在村里造直升机。从组装开始,他就不停在门前的空地上摆弄,拉去别人家里请人焊接等等。6个月前,他组装好直升机,“样子跟任何的一架直升机都不一样,是我自己做的,发动机是买来的一个二手发动机。”造飞机的全部花费,陈昭荣估计是六七万元。

——————————-

这里,我们发现太多漏洞啦,

  • 一个航空Motor,不是2w块能买到的(一台像样的模型直升机的发动机需要1万块左右,入门的也得上千)
  • 要应付的是复杂的流体力学,空气动力学,焊接的过程需要非常好的数控机床才能达到飞行器的精确度,这些都是已经超出了一个不懂英文的大伯的理解范围了。
  • 飞机的焊装怎么都不像是山寨机,涂装很专业。实际上他自己也说了,只有一台 并且跌落毁掉了。但是网路上就越传越夸张的成了2w块生产一台,感情搞个生产线,大家就别去买QQ了,买直升机多方便。

下面看看这台飞机:

下面在看看这个:

http://www.helis.com/timeline/cicare.php

你会发现和CH6惊人的相似,因为他们就完全是同一台机器。

 所以“广东牛人”牛在,作为一个农民怎么搞过来这么一台几乎是完整的Ch6 helicopter。很有耐心和勇气的开着这个应该是小修过的Ch6上天。

That’s All.

哪怕做过飞行器模型的同志们都应该能知道 直升飞机是非常难制作的,需要把参数都调对 才能不在空中自己打转。 牛人自制直升机?Only a joke~!

July 2, 2008

今天过得非常的非常的满足。

事情是这样的,我在今天早晨7点左右玩了一局Dota,看了计算机学报上可能是翻译的文章之后。躺在非常凉爽和干净的床上看Java的一个小书,然后就飘然入睡了。

中午被Vivian的短讯吵醒,才想起来今天是7月1号了,该告别浦口了。于是起来洗澡开着Coldplay的音乐,虽然我也不知道在唱什么。宿舍里混乱的书和衣服、袜子、破光碟、新东方的教材,或者是数学、彩色的铅笔、谁写的信封,还是杂乱的USB线口,3台空转的电风扇,LV配色的编制袋里装着没洗或没洗干净的衣服,很多专业课的教材。

出宿舍门,Parl在拍照片,回头一个猥琐的微笑。楼道里黯淡的光线,远处窗口亮着光。我一边走一边想,终于还是要走了。门口收废纸的叔叔还在等着丢书的学生们,空气闷热的可以。经过明湖的时候又有人摆出各种俗不可耐的姿势拍照,我喜欢这些俗不可耐的姿势,这些青葱无比的姿势。

杨小沫又打电话了,说是来PK了,我正好盛情的邀请她加入米饭团。于是10分钟后,满脸桃花的沫来了。还看见陈小橙一次,with一个莫名girl,沫好似很生气,后果不严重就是。

在丑水沟边遇到蚊子和小花夫妇,所以饭团变成了5人。去了n多年没去的川菜馆,无数遍的食物点上,发现现在菜的质量极其底下,拔丝香蕉实在只是汤汁香蕉糊!(此叹号表示我的愤怒)

吃完后沫和橙居然出来后就牵手了,听我说,这只是第一步。

我去打球了,洗澡了,整理宿舍整理出来160个一毛的硬币,拿到教超阿姨很开心的数了36个硬币买了一个巧克力冰激凌,又开心的数了29个硬币买了一袋山楂片。总之,除了在我后面排队的MM不爽之外,大家都觉得很开心。

又是明湖,又是拿着酒瓶坐在草地上,星星快出来的时间聊天的即将滚蛋浦口的男人们。Vivian在门口等着,橙迟到了20min,所以我们觉得可以有Free的鸡排吃了。

晚上是约定在蚊子家吃沫的烧肉的,沫和小花主厨。我们在里面聊天儿,吃坚果。吃肉对于一个即将离开浦口的人来说,意义太大了。那种咀嚼的充实感,蛋白质被吸收的快感。背景音乐是西班牙or葡萄牙的波萨诺瓦曲风的鸟语歌曲,我不得不再次赞下蚊子的小窝,简直太Paradise了。。

沫和橙然后就公开了下身份,感觉是报匪一样的邪恶。于是大家就继续八卦啊,八卦啊。我存着沫的短讯,就她那点歪歪肠子,如果试图掩盖就是实在巨大失误了。然后是小花自曝,可惜情节过于简短,还没说就结束了。然后开始又说到鱼和沫在火车上神奇遇到的事情,总之,充满着奇幻色彩。

然后收到小5短信,说她fall in love,恭喜完了之后,本来还想八卦下谁的,结果她支支吾吾的不说,好似我也不怎么感兴趣就继续在回忆这若干年的浦口生涯了。

我满足有这样一群能想说什么就说什么的朋友。我满足有这样一段喝酒吃肉,自习大平台的生活、我非常非常的满足我这么一段真爱真恨刻苦过拼搏过堕落过迷失过的时光。

Tags: ,,.