Tag Archives: Google

Another Fantastic Waste of Time

I can’t remember off the top of my head what it was, but sometime in 2009 or 2010, Google put up an interactive doodle and someone calculated how many hours were wasted at work playing with it.

I just looked it up and it was the Pac Man game commemorating the 30th anniversary of its release.

Anyway, they have another interactive doodle today (June 9, 2011) in honor of Les Paul.  I wonder how many hours will be collectively spent today!  I figured out how to play the first seven notes of Glorious is Thy Name, and then I had to get back to work.

UPDATE: The doodle is no longer on the front page, but it has a permanent home here.

Random Thinking

I’ve just come to the conclusion, at 12:48 this morning, that I don’t write here much anymore because I really don’t care enough to do so.  I am not that passionate about anything anymore to the point that I feel I should write about it.

Apparently I’m passionate enough about not being passionate, though.  I’ve just told you in a couple of sentences that I don’t care about anything enough to write a public discussion.  And look – you’re already growing bored of this post.  The fact is that I have had this blog here at benrehberg.com for over six years now and I have only posted 520 times.  I’ve nearly tweeted that much in 18 months.  And speaking of Twitter, I think I’m getting off of that train.  Facebook too.  Down with friends who only know me again through a social experiment and marketing shithole.  And fuck Mark Zuckerberg.

And lately, fuck Google too, and their sleazy one-night-stand Verizon.  I’m beginning to dislike those companies simply because they profit too much on the personal interactions of individuals.  It’s a sickness that wears one out from the outside in.  First it was search results which were innocent enough. It has come all the way to “push” advertising, where Google will know that since I like pizza and I am near a pizza restaurant, my phone will buzz to tell me the specials there (near future).

No thanks.  I’m quitting Facebook, and I am seriously considering not continuing with Google and Android.  I do not live where that plethora of information is usable, and I am becoming increasingly afraid that we will become too dependent on this availability of data and personalization.  Like GPS has done for travelers – we no longer have maps or ask for directions.

I realize that I am rambling.  It’s late and I have been drinking to counteract the early-afternoon coffee that punishes me when I close my eyes tonight.

"Too Many Stakeholders are Being Left Out of Discussions Over the Future of the Internet"

I think it’s time we started creating our own networks with IPv6 and sticking it to the corporations.  An ad-hoc network would work if everyone knew what was going on.  One tie-in to a public backbone and I can light up a community without touching the mainstream corporations.

                

Executive Decision

After toying with C# today, I’ve decided that it is way to process-intensive to write the application on a runtime environment like .NET or Java. What I need is a simple language that can download a page, rip through text like a bandit, write the necessary fields to the database, and move on. I can organize the data when the search engine extracts that data.

I can’t commit to anything yet, but my spidey-sense is telling me that the crawler will be written in Perl with LWP. I suppose I could look at Ruby, too, but I already have my Camel book and have worked with LWP before. I haven’t tied Perl to a RDBMS, but I have done it with PHP and it must be similar. Perl can also do some limited recursion from what I understand, and if it can’t I may can use a database back-end to save the stacks of URLs.

I was ready to buy books at O’Reilly today (I chickened out of spending the money) and found a book on writing spiders. From the preview I surmised my crawler/spider must be registered. That means I have to go mainstream, doesn’t it?

And now after some more reading, I have discovered that this crawler can be used to build an index for special purposes. I can build my own search engine for this site, for example, and get much better results than I can searching the Google index for benrehberg.com. I have searched for things I know I wrote about, but never found them with Google. Building my own search engine and maintaining my own index of the site can prove useful if I keep writing about programming.

Update: I have created a new label “Web Crawler” for all posts related to this project.

How to Write a Search Engine

It seems a bit strange using the world’s best search engine to find out how to build your own. Google is my first resource in this project, though Google itself provides nothing but the idea. There is a paper at Stanford by Larry and Sergey, and that basically is the starting point. That is Google’s only contribution so far aside from the many searches I will perform.

There are three main parts to the search engine: the crawler, which tirelessly captures data from the web, the database to hold everything, and the actual search engine – the queries that put the data together in a meaningful format for you.

I could write a search engine that actually crawls the web looking for my search criteria, but that is very VERY inefficient. Google (and many others) have solved this inefficiency by effectively downloading the Web (that’s right – as much of it as they can) to their computers so it can search it much faster and have it available in one place. They’ve done a whole lot more to increase efficiency and effectiveness of searches, but downloading the web was the first thing they did. It turns out they needed a lot of computers.

I’m going to start with two. I have three desktops that no one wants to buy, and I am really tired of looking at them. I will probably need more if I get this index working soon, but there will be software considerations to make too. You can’t fit the web on one computer, no matter how big. I will learn a lot.

I have always had an interest in distributed systems and cluster computing, so this will be fun. I have a lot to learn about distributed databases and algorithm analysis. But all that is later – I haven’t even really finished thinking out the preliminaries yet. So one development/crawling machine, and one database machine. After I figure out how to crawl the web, I will begin work on performing searches. If this project holds my interest long enough, I might publish statistics at 49times.com, so keep looking. I will be posting here if I come up with anything worth publishing. I’m going to try to journal my progress and decisions without publishing code, but I realize that I very well could lose interest in this. If I get started, I will likely enjoy it and keep going, but no one can say. If you have some confidence that I will continue, you can subscribe to this blog and get the updates. Beware, though, that you’ll get everything else I write too.

Never be Late Again

With Gmail’s Custom Time, just make up an event in the past and say it happened. It’s that easy!

You may even figure out a way to win last week’s lottery using the Custom Time API! I’m going to create an app for Android so you can even keep a little slice of your own time in your pocket (coming the second half of 2008). But when that happens, I’ll have had it since 2005.

You guys are way behind!

Friends with Vista

After nearly a year, I finally decided to figure out what I could do to make my Vista laptop a bit faster. The memory is maxed out at 2 Gigabytes and it has a dual-core AMD CPU. It had always been very very slow in completing trivial tasks, like opening a browser or the control panel. Copying and moving files took way too long, and I just never approached my problem with logic.

A few weeks ago I was talking with a friend about my experience with Vista so far, and mentioned to him that I didn’t think it was a problem with Vista, but a hardware issue with my Gateway laptop. “It runs very hot,” I told him. “The hard drive activity never stops. I just don’t think the machine was designed well enough to support such a heavy OS.” I’d never seen Vista so slow on any other computer, so why the hell is it pokey on mine? And what in the dickens is going on with my hard drive?

Then it hit me. Constant hard drive activity is an indicator of (1) a virus or crapware, or (2) an indexing service. Google Desktop search was deployed with the computer when I bought it; part of Gateway’s image, along with all the other garbage like BigFix, AOL , and the Office 2007 90-day trial.

Having been a student of Vista before and during its release, I remembered something about Google and Microsoft having fits about desktop search. It seems that Vista includes its own indexing service to speed up searching, and Google was having a hissy over users not being able to choose a desktop search engine. The Windows Indexing Service is on by default, and I don’t think any manufacturers have changed that in their production images. And it just so happens that Gateway included Google Desktop in every computer they released with Vista, and therein lies my problem: two indexing services, constantly running on my poor little 5400 RPM notebook hard drive.

After some thought, I decided I’m a fairly organized fellow and don’t have the need very often to search for a document. Most of what I access anyway is on the network, and those locations aren’t indexed by default anyway. So away went Google Desktop. Though I love Google, I have no need for that program on my mobile station.

And for that matter, I canceled the Windows Indexing service. No need to pick sides, you know?

Then for a final pick-me-up, I had Vista optimize the graphics for performance, which took away all the eye-candy and effectively made my desktop look like Windows 2000. I’m fine with that.

Oh, and one more thing: I shut off the UAC. Those pain-in-the-ass messages one gets when he tries to install a program, “Windows needs your permission to continue,” are gone. I can now run a command window without specifying to run it as Administrator. I can change IP settings with fewer mouse clicks. A little bubble message when I log on warning me that User Account Control is turned off is the only annoyance I have now, and I’m sure that with a simple registry edit I can get rid of that too. Maybe I’ll post it later.

I must say this little bottom-end laptop is pretty damn speedy these days. NetBeans opens in under 60 seconds. Outlook opens in under 5, and boot times are at their lowest since I got it. This doesn’t change anything about the inevitable change to a Mac when I can afford one, but it certainly makes me more comfortable in delaying it.

This is a test post.

This is a test post. I am sending this text from my phone via e-mail to a special address at blogger.

Did it work?

Update (from a computer): I am limited to a 160-character message from that phone. Not too fantastic for blogging. But it did work, and I have a new way of blogging to the world from an underground imprisonment (if that were to ever happen). Good to know.