Entries Tagged 'Web related' ↓

Podcasting talks and lectures

Recently I’ve been asked to put a number of talks and lectures online for our local Christians in Science group. The usual structure is to separate the lecture into a couple of audio files; providing a track for the main talk and another for the question and answer session. The last talk had 3 speakers and they wanted 3 separate MP3 files posted online, one for each speaker.

I provide a M3U for people who want to listen to everything, but this time I thought I’d provide a podcast as well, widening the audience reach (hopefully, as talks on Science & Religion aren’t always people’s first choice to listen to ;)). At which point the separate tracks, so handy when downloading via the website, became a problem.

For those not familiar with podcasts, here’s an overview of the structure.

A podcast is effectively a RSS feed for a ‘radio show’ (for want of a better description). It’s also the name given to the audio within it - so “listen to a podcast” is referring to the audio, not the RSS feed :)
Inside the podcast are many items: usually one item per show. The Item will contain details such as the show’s title, a description of the show and other associated details.

Inside each item (yes it’s all confusingly nested) is an enclosure. The enclosure specifies the MP3 file which contains the audio show.

Right. It sounds more complex than it is, but for the discussion coming that structure (RSS feed/podcast -> Item -> Enclosure) will be important.

My problem was that for our local group of Christians in Science, we put on 4-5 events a year. That would suggest each event should be an item. However there are many audio files for each event. Technically you can have many enclosures in an item, but iTunes doesn’t support this feature, which if you care about a sizable chunk of the podcast listeners is a big problem.

So what can I do? Put each audio file in a separate item was my first thought. In a way this makes more sense, as it allows a different description for each speaker. However, if you stumble across one of the files, how will you know it’s speaker 2 of 4 at the event?

This raised another question: what consists of a podcast? I had originally planned to have one RSS feed for CiS-CS listing all events. With multiple items for one event, should I create a separate RSS feed (podcast) for each talk and lecture? Again, that makes sense at one level, but then there’s no way to keep up to date with new talks as they’re posted online, which defeats the point of podcasting in the first place!

So what have I done? I took the coward’s route out, and haven’t put up a podcast yet!

Which approach makes the most sense to you?

Google encourages (unobtrusive) link farms

I use the word link farms, but I could have equally well used ‘partner sites’, ‘featured links’, or ‘popular searches’; all of which are used instead.

You may be wondering what on Earth I’m talking about. Link farms are so last century, the search engines don’t respect them any longer and will likely blacklist your site for using them (or gateway pages). However, link farms are all around and growing in usage. They’re back in a very sneaky form. You’ve probably seen these on sites, like these links that were along the bottom of an antique dealer’s site.

links on an antique dealer site

With Google’s Pagerank putting weight on the number of incoming links, every possible strategy is being used. A site without incoming links is pretty pointless now, particularly if you’re in a highly competitive market such as renting florida villas. So the owners promoting these sites take the opportunity to purchase links from other sites. Sites with irrelevant content. Whole sites built just to provide links to other sites to increase their search engine ranking.

Today it’s nowhere as obvious as the link-farms of yesteryear. Oh no, these guys have learned their lesson. Neither are most computer-generated junk; instead a few of the sponsored links are added to proper-looking pages.

Let’s suppose the search engines agreed this was not a good way to measure popularity (he who buys the most links is the most popular). They would have a problem. How can you tell the difference between a page of content with random links placed at the bottom of the page to purposefully increase search engine ranking and a blog with a selection of random links in the linkblog?

Here’s an example I found which for a computer would (should?) be hard to distinguish automatically:

blog side-links totally unrelated to the content

I don’t want to play that game. To me it feels like match fixing, fiddling the accounts - it just shouldn’t be done. I believe it’s an ethical problem - it may work, but it isn’t measuring whether the content of the site is good, or whether the site provides a relevant match. There’s also the wrinkle that to do well in search engine results, why should I pay people to clutter up the internet with more useless search engine fodder?

As always, your thoughts are appreciated.

Continue reading →

Improved Wordpress related posts plugin

I’ve always liked the idea of having related entries shown - it’s a smart system of navigation and helps explore the knowledge available. Hence the first plugin I installed on this Wordpress-based blog was Related Posts by Alexander Malov & Mike Lu.

The aim behind the related posts plugin is to show similar entries already made on your blog, regardless of their category. Think of it as displaying search results before the reader has searched. That’s how useful it can be if it works.

However, the Wordpress plugin did not give very good results when used on this site. Upon investigating the code, I discovered it was only using the post’s URL to compare the current post with other entries. Since my weblog URLs are based on the page title, which on this site doesn’t always highlight keywords from the entry, this was unsurprisingly not proving effective.

The plugin provided the option for you to put hidden keywords in your post which would be used. However I didn’t want to go back and add keywords to my old posts. Not to mention I’d forget the standard keywords between writing posts ;) Tagging is a job for the computer, not a human…

So I’ve hacked up a modified version, which takes the post content, and calculates word frequency using weightings that can easily be modified (lines 57-62 of the code) for different parts of the post (eg URL, title, content). It then uses these details to match against the MySQL full-text index, (hopefully) returning more relevant results.

The script currently does not use any stemming algorithm or match against category IDs. Those are the most obvious features to add but the latter would require the MySQL query being reworked, and it was bedtime!

If you want to see the MySQL full-text ranking (as shown on this site next to each related posts link at present on this blog) to give you an idea of the certainty of each match being shown, uncomment line 161 in related-posts.php.

Download related-posts.php

The code is not supported and was hacked because I was curious about recommendation algorithms. Improvements or suggestions are welcome. You are free to use and modify this code.
Question: Is there some way to manage file uploads like above from within Wordpress? I’ve uploaded it manually as the ‘upload’ box on the “Write Post” screen said a php file wasn’t a valid image!

Online co-ordination strategies

I don’t usually repost from other blogs, but this was too funny to pass over. “David Weinberger”:http://www.hyperorg.com/blogger/ was talking about “Recovery 2.0″:http://www.buzzmachine.com/index.php/2005/09/05/recovery-20-a-call-to-convene/ and the ways to co-ordinate to help those involved in Hurricane Katrina. “Here’s what he had to say”:http://www.hyperorg.com/blogger/mtarchive/004502.html:

How you think that coordination should happen says a lot about your view of the Web.

A Semantic Web approach would create an ontology of victims, relatives, disasters, relief efforts, locations, threats, supplies, routes, relief agencies, medical records, doctor appointment books, local bus schedules, and stock market data.

A Web 2.0 approach would create APIs among recovery services offered on the Web and wait for hackers to build something useful. Whatever the hackers create would include plotting something on Google Maps, a requirement for all Web 2.0 apps.

A microformats approach would spend a weekend coming up with a quick-and-dirty set of useful metadata, preferably modeled on Amazon.

The regulatory approach would ask the pharmaceutical, transportation and recording industries to come up with a set of guidelines for the distribution of relief supplies with the primary objective of making sure that they do not fall into the hands of terrorists.

Web 2.0 all the way :-)

Domain registration and web hosting companies

Friends keep asking me about registering domain names, setting up hosting and generally running website stuff, and this post collects the links together so I don’t have to keep telling them ;-)

h3. Domain names

I use “123-Reg”:http://www.123-reg.co.uk/affiliate.cgi?id=AF77350 for UK domain names. It has brilliant features in the domain name control panel and a very low price.

For .com/.org/.net registration I use Gandi.net.  Originally one of the more eccentric domain registrars (being translated from French to English didn’t help) their site has just been redesigned and now looks fabulous.  They aren’t the cheapest domain registration but come under the stricter EU privacy laws, which is why I prefer them to a US registrar.
If you want something distinctive you can get a “.tv domain”:http://www.tkqlhce.com/click-1707693-10356519. One of my clients did this because it was the only way they could purchase the domain they wanted.

h3. Hosting

You pays your money, you takes your pick. I’m not going to give an all-out recommendation (if I did now doubt the hosting company’s servers would collapse and my briends would become ex-friends :-)

I currently host most sites with “DreamHost”:http://www.dreamhost.com/r.cgi?116334.  Feature for feature I don’t believe there is any other provider that offers them for such a low price.  I can host unlimited domains, they provide over 20GB of disk space and basically “unlimited bandwidth”:http://www.dreamhost.com/r.cgi?116334 - which is very useful for photo galleries!  They don’t lack control either - I can install my own PHP binary and most other programs I want (e.g. SVN, Rails, Django).  It’s like the freedom of a dedicated server without the cost!
h3. Conclusion

I think that should answer most questions that I’ve been asked so far!

Page Revised 1 August 2006.

Filtered ADSL faceplates

Found via the ADSLguide forum: you can buy “filtered faceplates”:http://www.clarity.it/telecoms/adsl_faceplate.htm from Clarity which apparently help with ADSL connections on particular lines. Worth remembering in case this, rather than ISP problems, should ever come back to haunt me :-)

German email spam

Is anyone else suddenly getting spam written in German, which is slipping past their email filters? Both the server-side SpamAssassin and my local Bayesian filter were doing well until these started arriving a couple of days ago.

Aside from that, mapledesign.co.uk is being used as the From/Return-To address by a number of spammers at present. If you’re receiving them my apologies, I can categorically state that (a) we’re not sending them, and (b) I’m not virus infected, so there’s not coming unknown from here.

Now to go and delete another round of bounced emails received…

h3. Update

See “this webpage”:http://www.clearswift.com/support/cs/email.aspx?ID=4404 for details about the origination and subject lines you can filter on to block the spam.

If you’re having problems with the amount of spam you’re getting, you may want to try Cloudmark’s new SafetyBar

Opera lover concedes defeat

I’ve been an “Opera”:http://www.opera.com/ lover for years. I bought my first license when version 6 came out and have been steadily upgrading ever since. Now, much against my wishes, I think I have to concede: “Firefox”:http://www.firefox.com/ is the better browser.

Why? Here’s the reasons that forced me to this conclusion.

# Opera 8’s support for XMLHttpRequest is incomplete. Duh.
# I’ve got addicted to using “FCKEditor”:http://www.fckeditor.net/ in CMS, which only works in Firefox and IE.
# (This one surprised me) Firefox’s user interface, menu design and dialogue boxes are far less confusing than Opera’s. I know, I was stunned. But I found the default layout in Opera 8 really confusing, and didn’t feel as clean.
# Opera’s lack of extensibility. Not a fault in its own, but I love Firefox’s “ScrapBook”:http://amb.vis.ne.jp/mozilla/scrapbook/, which allows you to save web pages. Great for when I find a design I like and want to save it ‘for future reference’ ;-) [Note: I know Opera has notes, never worked out why they're useful as they're text-only - surely that defeats the point of the 'net?]
# MathML rendering. Not really a biggie for me yet, but as a physics student it’s nice to have.
# Web standards support. Opera used to sell themselves on this; now… where’s the push gone?

It isn’t all negative though. On the plus side

# Opera has a built-in news reader (I’m not one for “buying extra software”:http://www.feeddemon.com/) and for some reason Thunderbird and myself don’t get along.
# Opera saves open tabs. I know there’s Firefox extensions to do this, but I haven’t found one which (a) doesn’t slow down/crash the browser and (b) doesn’t forget the saved tabs sometimes. Opera’s works perfectly.
# Memory usage with many open tabs appears to be less in Opera.
# Some web pages just won’t load in Firefox. I’ve tried clean reinstalls of the browser, and nothing works. No headers appear in “LiveHTTPHeaders”:http://livehttpheaders.mozdev.org/;, and the dots just go round and round until I kill the page loading.

Will I keep using Opera? The answer is yes, but alongside Firefox. For pages I find and want to read in the future, Opera’s ability to save open tabs is a godsend, and a feature I can’t live without. But it isn’t the best browser any more (I’d give my parents Firefox any day rather than try to get them to navigate round Opera).

AOL 9 and old email

I’ve put a new hard drive into my parents’ computer, and installed AOL 9 on it. Previously they were using AOL 7, but wanted the spam filtering of version 9 (besides, 7 is starting to not work smoothly as AOL change their system).

The problem is that they have a few thousand emails in the Personal Filing Cabinets’ of AOL 7, and don’t want to lose these messages. I can’t find any way to get these emails into AOL 9. Is it possible? Google hasn’t been very helpful.

See if you remember the URL…

A couple of months ago I came across a site that took a keyword, searched both “Google”:http://www.google.com and “Yahoo”:http://www.yahoo.com and showed the sites appearing in the top 20 (?), linking from Google’s search results to the position in Yahoo’s.

The results were shown using a Flash movie generated using (if I remember correctly) Ming - or some other library.

Does anyone remember the URL?