Meta Ideas
Thursday, November 11, 2004
 
A solid balloon
Weird idea. I just had to post it somewhere. Think about it: carbon nanospheres are strong, and light. Can they be vacuum-filled? (I'm wondering if I even can write it like this, but it seems that "vacuum-emptied" reads worse). A sealed vacuum sphere is lighter than a helium filled sphere of the same size. The problem is, of course, the internal vacuum pressure. I don't ahy any idea on what does it take to build a sealed carbon nanosphere. But (and there is always a "but"), if built, such device would allow one to build a solid balloon.

What use is a solid baloon? Well, I guess that these beasts, besides being geeky, could be useful. Nanospheres could be used to fill arbitrary shape (for example, a flying saucer, or flying pan...) I guess that the volume to weight ratio is better than the one from a similar normal-sized structure, that needs stiff (and heavy) walls.

This is probably one of the weirdest ideas I ever had. But it's neat, and it found it's way into this blog...  
Saturday, October 23, 2004
  Academic Research and CS Innovation by Phil Windley:
We've created an entire ecosystem based not on what is useful and good, but on whether or not we can convince a handful of other people that what we've written is sufficiently sophisticated to publish in their journals.

It's a sound point, and it's something about which I've pondering for a long while. In my student times I liked to read CS papers, and I learned a lot of stuff by studying them. Since leaving school I've lost touch with academia, and became involved with open-source projects. Now when I read academic papers, I find them too much mathemathics, and little practical; open source projects are the opposite, working code but little or no theorethical foundations. It seems that something is missing in both areas. Now, the term 'third way' was somewhat abused, but it seems to be wwhat is missing here.

As for solutions... Google is now very well positioned to do it. They have the brand name recognition, and a strong presence in the academic field (several top universities are internally indexed by Google). Google could deploy a online scientific library of its own, and use it's own page-ranking mechanisms to decide what should be 'published' on its main page. I don't know if they want to step into the editorial world themselves -- the recent launch of Google Books tends to point to another direction -- but it's still an interesting idea.  
Monday, July 26, 2004
 
Richness of Social Interaction in the Internet Age
There is a whole bunch of things that can be discussed on the nature of social interaction on the Net. Social dynamics are very interesting. Most popular sites experience an early period of intense activity. As the community grows, first there is a growth on social involvement of its users, that can be measured by the average number of comments and messages left per user; but later, the sheer growth of the site causes a dillution of its base, in such a way that the social interaction is gone. Why does it happen?

I have a few ideas of my own regarding this problem. A very simple one is the "novelty effect" thesis: people go to new places, and forget it quickly, going to some other place. Although probably right to some extent, it's also fairly simple minded, and probably can't explain all the behavior involved. Another alternative is the "optimum size" thesis; it says that any community has an optimum size that maximizes social interaction. Examples are cities - small cities have a too small population, metropolis have it too big, and the optimum lies somewhere inbetween.

Another alternative is the "hierarchy of attractors". In this model, individuals are divided in two classes (using Internet-related analogies here): 'content generators' and 'readers'. In other words, some people like to write content, other ones like to read and comment sometimes. Great content generators become 'attractors', and remove some of the traffic from lesser writers. But this also reduces the richness of the media, and reduces the level of social interaction. In this model, the more people you have, the biggest the difference between the top-level attractors and the low level ones will be.

How can a big community handle these problems? By breaking it up - pretty much like the biggest and most succesfull cities in the world do. New York is the prime example of this - a big city with lots of small (and lively) communities. The Internet has to do the same. However, this does not mean that it will become fragmented or balkanized. But this is a big enought topic for another story.  
Friday, July 23, 2004
 
Blogging, Wikis and Snippets - A new approach to PKM
We all know what blogs and Wikis are, but what the heck is PKM? The
acronym means "Personal Knowledge Management", and it describes a new
category of software, bigger and more dynamic and interesting than old
fashioned PIMs. I have my own PKM dream, and that's why snippets are
important.

For those rare souls who don't know what I'm talking about, some
definitions. Despite being relatively new in the web scene, blogs are
very popular now, and Wikis, albeit a little less known, are also hot
with the hardcore Net community. Both are exceptionally easy to use
tools to manage information. Blogs allows for non-structured, personal
information to be published chronologically, which is the logical
approach for something done for we humans to use; that's the "my
diary" of the Net age. Wikis are perfect to structure arbitrary
networks of information in a fairly straighforward way; they allow any
user to write about concepts, and the relate them between each other,
doing things that no physical card filers could ever do. You can jump
from info to info, define new terms, and relate everything with a
minimum of syntax required.

Having said that, what is a snippet? Well, a snippet is just a small
piece of information. A blog entry can be a snippet. A Wiki entry can
also be a snippet. Anything is a snippet. The snippet is the smallets
block of information in my dream's PKM.

What I have in mind is a fairly flexible framework, where snippets of
information can be stored and structured in different ways, depending
on the way you like it to be. Basic to the system is the snippet
store, where all snippets are kept. To keep things easy, the design of
the system must no be particular about the nature of the store. I'm
thinking for now about using the filesystem, pretty much like simple
Wikis do. One file per snippet, that's all.

The snippet server just takes a snippet, formats it accordingly and
presents it. It's a small HTTP server, one that knows at first very
little about the snippets. And that's where things start to get
interesting.

Every snippet has some little metainformation about it. Some of it can
be inferred indirectly by looking at the snippet itself. For example,
file extension or contents can store information about the type of the
snippet. For example, text files can be identified by the extension
(.TXT, .HTML, and so on); the first line of the text file can also be
used to infer some of the information; graphical images can be
detected by the file extension, so.GIF or .JPG files are easily
recognizable. When the snippet server is about to serve a snippet, it
looks at the snippet type, and than act accordingly.

Up to this point, nothing is different. But snippets can be related to
each other using, and classified, forming different things depending
on the way you look at it. Snippets can be used to keep a blog, using
chronological metainformation to order them. Snippets can also be
structured as a Wiki; in this case, the snippets themselves can
cointain the internal links, or crossreferences, needed to keep the
Wiki working.

Depending how do you refer information in the Snippet server, you can
have either a Blog, or a Wiki - or both! That's the idea, but I'm
still working out the details... But I hope to be able to have
something working fast!
 
 
Posting by email is fun!
Well, nothing like a newbie with lots of enthusiasm! So that's my
first post by email. That's something that will make keeping my blog
up-to-date easier...
 
 
Back home
Well, I'm back to the House of Unfinished Projects. It's been a long time since my last post - almost one year! But since then I've been out of the Net, working at some entirely unrelated business; I had no Internet connectivity at my office, and I had to resort to LAN houses for my web surfing. I had no time for blogging at this time.

Now, I'm back online. I'm still looking for the perfect blogging solution - something like a 'personal knowledge management' framework, where I can post comments, articles, short notes, rants, and etc. I like Kuroshin's community-oriented style, but it lacks a lot of the tools that I want. The same goes for Blogger. On the other hand, I implemented my own personal Wiki on my own PC; it's not available for the public, but it's helping me put my own notes in perspective.

Well, that's enough for now. Now I need to go back to my real projects...
 
Tuesday, October 07, 2003
 
Comments on Spam and Natural Language Processing

I think that there is a highly interesting link between the research on natural language processing and the war on spam. I'm not sure if both sides are aware of the extent of this relationship; this is both good and bad. Bad, because the two fields use different methodologies, and with a small number of exceptions, don't talk too much and don't share information as they could. But it is also good, in the sense that different approaches are being sought, out of the academia, by people that have a real problem in their hands and are very practical about it.

Natural language processing

Natural language processing is one of the still unfulfilled promises of the information age. During the 50's and 60's, it was considered safe to predict that we would have systems capable of holding a conversation as soon as before the turn of the cebtury. Well, we're in 2003 now, and "2001"'s HAL still seems like a distant dream. I'm not talking about the most advanced stuff, such as speech recognition or artificial reasoning; the problem that we can't even have a system that understands the meaning of written language.

Early natural language theorists focused their work on the structural aspects of language. It was assumed that it was the promising approach, as it would lead to deterministic algorithms to analyze and extract the meaning out of any text. However, the importance of the structure was initially greatly exaggerated. Over time, the importance of context was recognized, and it became clear that the understanding of the language, even in its written form, was a much more difficult problem than originally thought.

One of the effects of this realization was the continuous reduction of the goals. At first, it seemed to be possible to have computer systems to do simple office work like reading and writing business letters. When it became clear that it was a really hard task for a computer - even with all the improvements on processing power - the goals were reduced; and we're still limited today to spell checking and fairly simple translation applications. This example helps to illuminate one of the main problems with the development of natural language processing systems: there is no usable intermediate step. Today's systems are really so simple, by human standards, as not to qualify nowhere near to the 'intelligent' level. And we seem to be really very far from any practical application for business purposes.

The lack of business applications makes much more difficult to attract investment to the development of such systems; there is much ground to cover, and limited chances to make money in the short term. Current research is now mostly limited to non profit institutes and universities - where it is treated as pure science - or to big companies like IBM and Microsoft, that can invest with long term profit in mind.

(Of course, Google now qualifies as one of the big companies, and they are going fast in this direction. Starting as a search company, they're now looking very far beyond, and NLP applications are on sight - specially now that they are investing on things such as blogging and webmail applications)

The anti-spam war

Right now, there is an arms race involving spammers and their opponents, the anti-spam sofware developers. Eric Allman - the author of the widely used sendmail software - was quoted recently as saying that this is a war where everyone loses, except perhaps the arms dealers. While Mr. Allman prediction may seem overly catastrophic, the situation is pretty much as stated - but we may be learning something very important in the process.

The war on spam is a show of Darwinian evolution at work. Every new generation of spam is matched by a new generation of anti-spam software. For every new technique of the anti-spam arsenal, spammers are quick to react with a new and improved attack. The main difference between this war and the one being fought by virus and anti virus writers is that the later focuses on the evolution of code, while the prior focuses on the evolution of text recognition. The goal of spammers is to craft messages that can be understood by human readers, but not recognized by the automatic anti spam filters. Both sides also follow strict economic guidelines - doing only as much as needed, at the least cost possible. This approach is perfect for the evolutionary game, and will lead to a slow but constant flow of improvements for both sides.

Many of the techniques recently developed to filter spam messages are beginning to show the improvements of our understanding of natural language processing. First generation filters used simple word filtering techniques, but this type of filter was prone to false positives (blocking legitimate email as if it was spam). Bayesian filters are a improvement, because they do not rely on static lists of words; instead, they can learn from both legitimate and spam messages. But now, spammers are improving their game once more. Some of the new trends: the usage of random or incorrectly spelt words; and better written messages that can pass as legitimate messages. The first technique was devised to exploit weaknesses of the word-based filters, either static or Bayesian ones, but are still readable. That we can read words with punctuation characters in the middle is not totally unexpected. But it has been shown recently that we can still read words... even when some of the letters are shuffled. Recent spam messages are fully using this knowledge to avoid being filtered.

But it is the second trend that is more interesting (if disturbing). Spammers are getting better at disguising their messages with text that reads like legitimate email. In some cases, the messages are subtle. We (the human readers) know that they are spam, but it is difficult to point out any particular feature of the message that tells that it is a spam. Some people have pointed out that, instead of working on filters that recognize spam, we should be doing the opposite: writing filters that recognize legitimate email, and discard everything else as spam. Either way, it is clear that the future anti-spam filters will have to be much better (and clever) than the current generation. Natural language processing is now the way to go.

What surprises does the future reserve for us?

The war on spam presents a unique opportunity to watch the invisible hand of evolution as it guides things. It may also help to put some light on our understanding about the evolution of human language. How did we evolve such a complex and well structure system? Nature does not work by designing a plan and following it; it works by trying random stuff zillions of times, until it finds something that works, that is cheaper and more effective than the alternatives. If there is some structure in the end result, it is perhaps an accident of the way the solution evolved. Had we started from a different point, the final solution could have looked completely different.

Evolution is all about taking one step at a time. Today, the anti-spam software development presents a unique intermediate step for the research on natural language processing. Once the war is started, it is a matter of time until better solutions get developed. I'm sure that the next few years will watch as new techniques are developed by both sides. By definition, spammers will be always slightly ahead, because they are the ones who have economic incentive to do it. But anti-spam vendors will catch quickly, and our knowledge will evolve at the same pace.

But there may be more problems in the future. As processing power and bandwidth grow cheaper, it may become feasible to send individually customized messages for each and every person. In the long term, we may end up in a situation where we may have spam that looks like legitimate messages. For example, it is possible that spammers will plant AI bots at services such as Friendster with the sole purpose to start chatting with other people, and then try to sell something to them in a subtle way. How can we tell if we are meeting the date of our dreams, or if we're being cheated by a bot? That will be probably the ultimate Turing's Test - one that I'm not willing to participate myself.

 
Monday, September 08, 2003
 
Teaching children on how to deal with natural death
I've read at Slashdot that two teenagers have commited murder after playing Grand Theft Auto. This obviously led to the discussion, "aren't our children being exposed to too much violence in videogames - movies - etc."?

My opinion in this matter is that yes - in a way. The real problem is that our children (and I can say 'our', as I'm a parent of a 2 years old boy) are being raised too far from reality. In our own legitimate concern about their safety, we seek to protect children from what we call 'the dangers of the world'. But as a side effect, I think that we end up hiding too much. I'll explain it a little bit better.

I think that kids today have little experience with natural death. When I listen stories told by my parents (and more effectively, my grandparents), they all have had some experiences of natural death of close relatives and friends while they were still young. It was something that happened, and people had to learn how to deal with it. Death was then regarded as natural (if sorrow) part of life. People would sometimes revolt at God's will (as they still do today), but it the end, most people would learn how to mourn the dead; how to respect them, and how to respect the family; and above all, how to keep living with a sense of the fragility of life.

The advances in health and sanitary conditions has changed this situation a lot. In developed countries, natural death is not nearly as common as it was a few decades ago. On the other hand, today's kids do have some experience - from news, or first hand - with violent death. Dealing with violent death is much more difficult than dealing with natural death. Some of the reactions include the desire to revenge, or the revolt against 'the system'; both can be manifested in several ways, most of them disturbing.

Anyway, the fact is that kids now have little or no experience with natural death. On the other hand, they have some experience - direct or indirect - with violence. In this sense, violent games, movies and news shows are only part of the problem. Kids don't have a correct perspective of life, and can't understand exactly what does death means. They have lost something fundamental: the sense of the importance of life. If life has no meaning, what does death means? That's help to explain, at least in part, this casual approach to life - and death.

P.S. I've reread, and revised, this post on July, 2004; it is now something much more important to me, as I have lost my father three months ago, after a long battle against cancer. My son is now 3, and he's struggling to deal with the fact that grandpa has died. I think that, while hard for us all, he'll learn something valuable for his life, despite his age. I hope he'll be able to understand what life and death mean in a healthy way, without any trauma.
 
  Well. That's not my first blog attempt; in fact, I have started another blog, where I've posted only a couple of entries. So that's just another try - I hope it will be a better one (or at least a little bit more succesful).  
Ever had some strange idea that you think was neat... but never had the time to think it all properly? And what when you want to resume working on the subject... but can't remember exactly where did you stop, or even what was that great insight that you was so enthusiast about just a few days ago? Well, then you know what this blog is about - good ideas that need to be written somewhere; projects that deserve to be shared, if only to be saved from forgetfulness.

ARCHIVES
09/01/2003 - 10/01/2003 / 10/01/2003 - 11/01/2003 / 07/01/2004 - 08/01/2004 / 10/01/2004 - 11/01/2004 / 11/01/2004 - 12/01/2004 /


Powered by Blogger