Tag Archives: projects

More Thinking

I have had some resistance to the living-in-my-car idea, as expected.  The opposition came from more than one direction, and I couldn’t fully justify the meager savings over the inconvenience (and that inevitable dank smell that would penetrate all the fabric in the vehicle).

So this weekend we tossed around the idea of a recreational vehicle.  At this time, however, it wouldn’t necessarily be the best move to drop $3-$5K on a crappy 10-or-more-year-old travel trailer and then find a park to put it in.  Plus the work needed on our pickup just to make it up to Atlanta could double that figure.  Chuck the truck isn’t so well these days.  We thought we might could use it in the future as we like to travel, but outside of a park where I’d live in it, we don’t have the storage space for it.  And would we really use it?  I’m only guessing that most of the people selling on Craigslist thought the same thing and now they’re just trying to put it off on someone else.

So yet another thought experiment has dawned on me.  It’s another extreme, but a more justifiable and permanent idea nonetheless.  We (my lovely wife and I) have decided we wouldn’t mind living in North Georgia, as long as we can have what we want.  And what we want starts with land.  We need space to put all of our shit.  My projects, our house, my shop, farm animals, and nothing.  I think the nothing part is important.  We need a buffer of nothing to surround our place and give us some peace.

On with my mental exercise: I want to think about what I would need to start with, on a bare piece of land, to begin to live on it.  I’m talking about modern times, folks – I’m not roughing it in a tent.  What needs to go in first, before I build a small cabin?  What are the minimums for living?

I have thought a little bit about it and the first two things are whammies on the budget.  After the land purchase, the first two items on the agenda are water and sewage – a well and a septic tank.  I don’t have exact figures on what those things cost, but they must be in the several thousand dollar range, each.  And now that I think about it, it will take electricity to run the well pump.  So I might have to look for land with an existing well.

Anyway, with those minimums in place, I could begin to build a small house and finish it to live in temporarily until we could get the main house built.  My dad did this very thing.  To combat theft, I would first build a shed to keep my tools and equipment in, and begin construction on the cabin.

Perhaps I will write more on this later as it seems like a good idea right now.

What I Could Do

I’ve written quite often about how I don’t like where I live and that I should get out and go to a big city where people are more diverse and there are more opportunities.  I have an apartment in Atlanta now and while I really hate admitting this, I was wrong.

There is an overwhelming number of assholes here.  I hadn’t thought about that.  I’m an asshole, but usually only in my head.  Too many of the people here are assholes out loud.  I am a social person and I can’t help that.  But I prefer to be a recluse when everyone around me can’t stop yapping about themselves.  The competition here is so fierce that nothing gets done and people get hung out to dry instead of properly informed and trained.

Today I’m reading my Eclipse IDE: Pocket Guide in preparation for beginning Android application development.  My plan is to have such a grasp on that platform that I could work for anyone, from anywhere – including my house in the country.  I am also working through Hello, Android and then off to two other books on the platform.  The last time I started working on Android I began coding on the first day of study and got locked into an all-night hacking session trying to work out my project while researching the SDK.  Not anymore.  I’m giving myself the fundamental education so I don’t have to do so much hacking and have so many problems at once.  If you’re interested in this progress, keep a watch out at blog.twoleg.com.  I expect to release a simple app for free just to get the hang of it.  It probably won’t be anything groundbreaking – probably an enhanced flashlight application or something.  Not a whole lot of design considerations in that realm.  I will try to post progress at least weekly.


Okay, Okay…

There has always been this thought in the back of my mind, and here’s a little evidence leaning toward it.  I’ve been cautious to self-diagnose because it’s just not something I do, but I scored a 77 on this 24-question test.  Maybe I’ll go ask a professional now.

Serious ADHD Likely!

Beth took the quiz and gave answers for me.  She scored an 81.

Some Books

I got my courage up Saturday and ordered the books from O’Reilly. This press has long been highly regarded by technologists, whether they are programmers, IT professionals, or just geeks. Go ahead – ask a geek if he/she has a camel book, and chances are they’ll know what you’re talking about (and it will be within reach). Don’t tell them what it is if they don’t know.

I’m posting this to chronicle my efforts to build a web crawler and eventually a search engine. I expect to make further posts about how this project develops, and perhaps what I’ve found in these books that helped.

I have ordered three books. I went there for one, but there’s always a deal to get three for the price of two, plus free shipping. And I can always find another book to get. So:

Perl & LWP. This one I’ve borrowed before, and it opened my eyes to the possibilities of automated web surfing using Perl. I built a small script one time that looked up my SMTP server’s IP at spamcop, then e-mailed me if my mail server was ever blacklisted. It was fun and quite easy, but since I can’t find that script right now I’ll have to post it later.

Spidering Hacks. I ordered this one for obvious reasons. This book’s excerpts is where I found that little bit on needing my spider registered. I expect to learn a lot and become very frustrated with what I find here.

Perl Cookbook. This was the third choice because I needed three. Also because it’s $50 and I could use the discount. There apparently is a series of “cookbooks” that have really cool stuff (recipes) in them. There is also the PHP Cookbook, the C# 3.0 Cookbook, and more. I expect to find shortcuts and things I’d never thought of in this book.

Light Reading

I’m taking a class right now on software requirements engineering (does one actually engineer the requirements, or did they just want to make this class sound hard?) and I came across something I might use with the web crawler project.

In the chapter about “The Software Process” which talks about the processes necessary for an individual or team to succeed at building a quality piece of software or system, I came across the Personal Software Process, or PSP. The book simply states that every developer has a process, whether anyone can see it or not. Either way, there is a proper way to go about producing software at a personal level, and here is the gist (Pressman, 2005, p.37):

Planning. This activity isolates requirements and, based on these, develops both size and resource estimates. In addition, a defect estimate (the number of defects projected for the work) is made. All metrics are recorded on worksheets or templates. Finally, development tasks are identified and a project schedule is created.
High-level design. External specifications for each component to be constructed are developed and a component design is created. Prototypes are build when uncertainty exists. All issures are recorded and tracked.
High-level design review. Formal verification methods… are applied to uncover errors in the design. Metrics are maintained for all important tasks and work results.
Development. The component level design is refined and reviewed. Code is generated, reviewed, compiled, and tested. Metrics are maintained for all important tasks and work results.
Postmortem. Using the measures and metrics collected (a substantial amount of data that shoul be analyzed statistically), the effectiveness of the process is determined. Measures and metrics should provide guidance for modifying the process to improve its effectiveness.

I’m not sure if what I’m doing will fit into this personal model of development, but it’s thought provoking. Even if I don’t collect data about what my problems might be and then analyze the data about what actually went wrong, I can still hold myself to some kind of process. Even though I don’t have a deadline or an antsy customer to deliver this to, I can possibly eliminate shortfalls if I just think it out before delving into code.

But then what fun would that be?

Reference (in our favorite APA format):

Pressman, R.S. (2005). Software engineering: A practitioner’s approach. New York: McGraw-Hill.

Executive Decision

After toying with C# today, I’ve decided that it is way to process-intensive to write the application on a runtime environment like .NET or Java. What I need is a simple language that can download a page, rip through text like a bandit, write the necessary fields to the database, and move on. I can organize the data when the search engine extracts that data.

I can’t commit to anything yet, but my spidey-sense is telling me that the crawler will be written in Perl with LWP. I suppose I could look at Ruby, too, but I already have my Camel book and have worked with LWP before. I haven’t tied Perl to a RDBMS, but I have done it with PHP and it must be similar. Perl can also do some limited recursion from what I understand, and if it can’t I may can use a database back-end to save the stacks of URLs.

I was ready to buy books at O’Reilly today (I chickened out of spending the money) and found a book on writing spiders. From the preview I surmised my crawler/spider must be registered. That means I have to go mainstream, doesn’t it?

And now after some more reading, I have discovered that this crawler can be used to build an index for special purposes. I can build my own search engine for this site, for example, and get much better results than I can searching the Google index for benrehberg.com. I have searched for things I know I wrote about, but never found them with Google. Building my own search engine and maintaining my own index of the site can prove useful if I keep writing about programming.

Update: I have created a new label “Web Crawler” for all posts related to this project.

How to Write a Search Engine

It seems a bit strange using the world’s best search engine to find out how to build your own. Google is my first resource in this project, though Google itself provides nothing but the idea. There is a paper at Stanford by Larry and Sergey, and that basically is the starting point. That is Google’s only contribution so far aside from the many searches I will perform.

There are three main parts to the search engine: the crawler, which tirelessly captures data from the web, the database to hold everything, and the actual search engine – the queries that put the data together in a meaningful format for you.

I could write a search engine that actually crawls the web looking for my search criteria, but that is very VERY inefficient. Google (and many others) have solved this inefficiency by effectively downloading the Web (that’s right – as much of it as they can) to their computers so it can search it much faster and have it available in one place. They’ve done a whole lot more to increase efficiency and effectiveness of searches, but downloading the web was the first thing they did. It turns out they needed a lot of computers.

I’m going to start with two. I have three desktops that no one wants to buy, and I am really tired of looking at them. I will probably need more if I get this index working soon, but there will be software considerations to make too. You can’t fit the web on one computer, no matter how big. I will learn a lot.

I have always had an interest in distributed systems and cluster computing, so this will be fun. I have a lot to learn about distributed databases and algorithm analysis. But all that is later – I haven’t even really finished thinking out the preliminaries yet. So one development/crawling machine, and one database machine. After I figure out how to crawl the web, I will begin work on performing searches. If this project holds my interest long enough, I might publish statistics at 49times.com, so keep looking. I will be posting here if I come up with anything worth publishing. I’m going to try to journal my progress and decisions without publishing code, but I realize that I very well could lose interest in this. If I get started, I will likely enjoy it and keep going, but no one can say. If you have some confidence that I will continue, you can subscribe to this blog and get the updates. Beware, though, that you’ll get everything else I write too.