Archive for the ‘General’ Category

More memcached

Thursday, November 20th, 2008

While playing with memcached I wanted to find a way to monitor what was getting set.  You see, we are trying to pre-warm our cache with lots of colours. (roughly 16.5 million of them :) )

Anyway, I remembered I’d had a play a (long) while ago with expect and decided it would be just the thing…

So here we are – a status report from memcached every 5 minutes.

#!/bin/bash
# script to monitor memcached

expect << EOF
set timeout 1
spawn telnet localhost 11211
while 1 {
send "stats\n";
expect #
send "stats slabs\n";
expect #
puts "sleeping for 5 mins"
sleep 300;
}

# ends

Our pre_warming script has been running for almost an hour and has warmed the cache with 319116 items – so about 52 hours to go….

In terms of memory usage, I am running memcached with a Gig of Ram, and after the hour we are only using 4.7 meg of that :) . So my estimate is that we would need approximately 250MB.

Target = 16,581,375 items (colours)
1 hour = 319,116 items
1 hour = 5,005,741 bytes written
so 16,581,375/319,116 = 51.96 Hours
so 5,005,741*51.96 = 260,098,302 bytes = 248.05 MB

And now I’ve done these calculations, something makes me want to revist my memcache monitor script so that I pass it a target and then get a “time left” estimate :)

Colourphon ontology for digital images

Monday, November 17th, 2008

I was recently talking about my experiences with OWL over on ITO, and so have decided to go ahead and put my OWL ontology out there, and invite comment.  I am quite open to criticism, as I am sure that I will need to revise this, but here it is.

The aim of this ontology is to clarify some of the classes of owl:thing that concern us here at Colourphon. 

We are describing the colours found within digital images. Note: I use the term ‘digital image’ as opposed to just ‘image’ as we are specifically dealing with data gleaned from digital representations of images, and not images in any other format.

Currently the ontology deals with DigitalImage, the image itself; Pixel, a point within the image; Coordinate, the location of a pixel; RGBValue, the RGB value of the pixel; ColourName, the word or phrase used to describe the colour and GuessedColourName, A word or phrase used to describe the colour (but that word or phrase may not be accurate).

BTW, this ontology is modelled using Protégé 4, if you want to see it as I see it…

If you are aware of any work already in this area, then please do let us know.

memcached

Friday, November 7th, 2008

The Wolverhampton Hell’s Angels are shaking our windows and foundations with their stomach churning firework display, which I would be watching now rather than typing this, but for the trees at the back of the garden that are in the way!

This post comes in too parts, the good news and the bad news.

First the good news. At lunchtime, Rich and I had a chat while doing our lunchtime circuit of the business park, and we were talking about ways to make Colourphon quicker.  We know we can do the analysis which takes a bunch of coloured pixels and puts a human friendly name to the most frequently occurring in the image.  But the problem was that this was taking upwards of 30 seconds, so PHP, quite rightly, kept throwing back a maximum execution timeout error.

Rich has been thinking alot about application architecture, and in particular memcached.

So tonight, on my ubuntu development machine, I installed memcached

sudo apt-get install memcached

I installed a pecl extension for PHP.

sudo pecl install memcache

Then I added a function to instantiate a Memcache object with an array of servers, finding a neat way to get an array stored as a constant, by having the constant contain an object reference that could be evaluated.

$arr=array(1,2,3);

define("ARRAY_CONSTANT","return ".var_export($arr,1).";");

Then when we want to use the constant we simply evaluate it.

foreach(eval(ARRAY_CONSTANT) as $val ){

//do stuff with $val

}

The upshot is:
1st page load: 29.94 seconds – including several hundred calls to the class with most processing overhead.
2nd page load: 1.61 seconds – with not a single call to the class with most processing overhead!!

So if you need massive performance boost, use memcache – originally desiged for database cals, but if you want to cache a bunch of frequently used data – even objects, simply serealize and deserialize when you need it.

It has taken longer to write and proof read this post than it did to add the 12 lines of code to get it to work.

The bad news?  This isn’t live on a public machine yet :) .

LibraryThing gives away covers!

Friday, August 8th, 2008

Interesting news from LibraryThing:

LibraryThing: A million free covers from LibraryThing
A few days ago, just before hitting thirty million books, we hit one million user-uploaded covers. So, we’ve decided to give them away—to libraries, to bookstores, to everyone.

Excellent stuff. We’ll have to see if we can get a key :)


We been spotted :)

Thursday, March 13th, 2008

Dave Pattern up in Huddersfield has reminded me that he has already done some work on identifying colours in book jackets and then retrieving them to good effect.  Now I have been prompted, I vaguely remember having looked at Dave’s work.  Yet when Richard suggested we could do something interesting with book jackets… nope… blank… Thanks dave for reminding me!  I see also that Ed Vielmetti has spotted us at work…  Ed, a bit of the back story can be found on the Colourphon Wiki ;) .

Of course this little project is in very early days – we are not even storing anything yet!! – so come back later and see how we are getting on.  And of course, If you have any ideas or things that you think might be rather cool applications – let us know, either here, or on the aforementioned Wiki.

Progress

Tuesday, February 26th, 2008

After some early prototypes, proof of concepts and all round “learning” experience Tim and I have overhauled all that we have written and kept all functionality. My Uni days are now long gone, with every bit of 5 minute code at prototype easily taking 15 minutes to perfect… :)

Anyway, Tim has worked hard and put in some groovy functions, which in all means what we originally set out to do is *almost* complete – with one major flaw – we’re not *yet* storing anything.

This was our original problem, mulled over lunch whilst walking around the Business Park …

So, someone walks into a library and says … ‘I saw this book last week, in the history section and it was blue … do you know where it is now?’

The librarian replies ‘Do you know the Author, or the Title?’

‘No’ said the *now* frustrated customer…

Ah – but if we could somehow harvest the colours of a book, store them and allow users to search against them …

And this is how we were born!

Submit an ISBN

Saturday, February 16th, 2008

Wouldn’t it be great if you could submit an ISBN, and have Colourphon go away and find a jacket image for you?

Well now you can!

Try this out now!  and if you can’t find an ISBN to test, try this one: 0764555871

Cracked it!

Friday, February 15th, 2008

Yeay! Fixed a rather niggly bug.

Displayed colour counts were inaccurate, although the displayed colours were correctly identified as being the most prevalent in the image. This was a frustrating bug, as it meant that calculation of ‘percentage of image of particular colour’ could not be reliably calculated. I mean, I am looking at 36000 odd pixels, and the top colour is reported to have occured 12 times. hmmm.

The bug was mainly caused by an incorrect comparison, which assumed that all colours in the incoming array were repeated sequentially rather than randomly. All I had to do was check for existence of the key, rather than comparing the current value to the previous value. Voila.

go on, try it…

Latest on what it will do…

Monday, January 14th, 2008

So we are pretty happy with the colour guessing now, although at this stage we are limited to a named palette of 254 colours.  Try it.

The guessing is based on a sample of the most frequently occurring 400 colours, that are then compared to each other to match those that are within 10%, then de-duplicated at the named colour stage.  This seems to give a fairly accurate representation of the main colours in the image.  Next stage is capturing that info in some sort of data model. So we are looking into the semantic web’s best friend RDF.

Learning …

Friday, January 11th, 2008

This week has been mostly about learning, although it is worth mentioning that we have got better at analysing a particular image, and you also get a best guess shade, and the colour that that shade belongs to.

A few of the blogs we follow brought up something related that could be very useful in the future of colourphon – namely this article about HTML imagemaps. Something like this could be used to give focus on the images that we analyse – meaning that a more accurate colour can be obtained – the development experiments have begun – but confined to localhost (I’m afraid).

Our main aim is to help build the semantic web with images – our focus on just getting the data at the moment, but with the potential of harvest and storage (RDF) in the future. We found this which really helped me understand some of the simpler parts of RDF (triples etc… I told you we were learning this week :D ).