Bookie Sprint – Aug 31st

It’s time for another Bookie sprint!

When – Saturday August 31st

What time – Starts at 11am

Where – my house! Ping me for address/map info if you’re coming along. Map out to Clarkston, MI.

What will we be working on?

The goal is to work on test coverage and breadability article parsing. Are you new to application testing? Come out and learn while helping out an open source project.

If you want to participate online please join our irc channel #bookie on freenode.net. If there’s something else you’d rather work on then please let me know and I’ll be happy to do whatever I can to aid in participation.

Bookie 0.4: one week retrospective

Phew, that was a whirlwind of a week. Just over one week ago I finally released Bookie 0.4 and published the blog post to reddit as an announcement. This introduced signups and I was eager to see if there was real interest in the project now that users could sign up and try things out.

By the numbers

Traffic definitely came.

  • The blog post picked up 800 visits over the two days in the weekend.
  • https://bmark.us grabbed 360 unique new visitors.
  • We went from 58 to 126 activated user accounts.
  • Those users brought us to over 26,000 bookmarks stored in the site.

Complications

Of course, any swarm of new users finds the holes in the system and Bookie was no different. There were a few issues. First, the celery task that sends out emails on signup wasn’t running because the email config wasn’t setup right. This was a pretty quick fix. Next, the import system wasn’t filling out the path for uploaded files correctly. This one was another pretty easy fix, but I managed imports manually until I got the fix deployed.

The big thing was that, for probably the first time, all three moving parts to the system were trying to store bookmarks at once. The celery backend, the web UI, and a cron script that looks for new bookmarks without readable content and fetches it for storing. All of these hit the Whoosh fulltext index and caused locking issues that broke both imports and saving new bookmarks from the webui until I figured out the issue and just reset the fulltext index.

It was pretty bad timing as I could see users trying to add test bookmarks via the web interface. Google realtime analytics is pretty entrancing to watch. In the end I had to run to the Whoosh docs and change things up to use the async writer instead of the default locking mechanism. This got things running again, but the problem now is that I had to remove all the existing fulltext index. I’ve still got to finish a background job that will walk through all bookmarks and index them.

At some point I might need to remove the fulltext indexing from the current SqlAlchemy event hooks, but as purely background celery jobs that I can control from one place easier. This would remove the lock at all from the cron job and the web ui.

Disappointments

While I could see the charts showing traffic, it was tough because it was pretty invisible traffic. There were only three new users into the #bookie irc channel, and only a few people left comments in the reddit thread. No one left a comment on the blog post. Both my Twitter account and the Bookie accountgained fewer than 5 new followers. While the repository was starred many times, only two forks were created.

Going forward

There are a few new users active over the last week, and I’ve gotten a pair of pull requests. While the saving of new bookmarks was broken for a lot longer than I’d have liked, the site never went down. Imports were done in a semi-reasonable time frame. All of this felt pretty great and is encouraging for future work. I still need to finish fixing up the readable parsing. It’s the big selling point of Bookie, and the fact that fulltext search and readable parsed content for all bookmarks isn’t there is frustrating.

Here’s looking forward to great work and a more popular release announcement for Bookie 0.5.

Bookie 0.4 released into the wild!

Bookie is a Python based open source bookmark managing web application that includes content archiving, a Chrome extension, and much more.

Phew, that took a lot longer than expected. I’ve tagged Bookie 0.4 and the live site is updated to run it.

This brings a ton of work on getting an updated webui with some client side MVC, an API, Celery job running backend, some stats, and spin off projects such as breadability and a cli client.

The big thing is that signups are now there as well as a landing page. So hopefully this will spike up interest in new users checking out Bookie.

There are still a ton of long term ideas to work on with Bookie. I’d like to get a ‘reading’ view setup so that you can easily run through the bookmarks you’ve marked `toread`, especially in a mobile view. ❤ my N7. I also want to work on getting suggestions for related bookmarks, suggested tags based on content, and other interesting machine learning type problems.

If you're the type that takes your bookmarks seriously give it a try. If you don't want to run your own instance, sign up to https://bmark.us and try it out there.

You can get an idea of the roadmap we're working off of on the Trello board.

Bookie weekly status report: May 6th 2012

This week was spent on a big side project. I’ve been trying like mad to update the python-readability library and take it over to help use it in the Bookie project space. After spending a ton of time trying to do just this I gave up.

I now present the breadability package. It’s a fresh port from the arc90 readability.js using the knowledge I’ve gained from all the other work and trying to stick to the JS file that’s the original inspiration.

I’ve got a bunch more work to do to add tests, get it in the build server, etc.

If you’ve been using one of the other dozen ports out there give this a shot. There’s work to be done, but I’d love to get some real work use in there, let me know what sites don’t work well, etc.

Weekly Status Report: April 29th

More hacking! Spent a big part of the week working on my Penguicon presentation so few commits.

Bookie Parser

  • Tweaked the readable view with some nice CSS, dark background, favicon support, etc. Much nicer to read article with it now.
  • Got the tests running on the TravisCI service.
  • Updated the API to fill out and support all the bits of data I need for this to replace my readable parsing on the main Bookie project.
  • Some refactoring and cleaning up duplicate code.

Bookie

The big thing here was to start up some JS to use the Bookie Parser api in order to load the readable content of a website as you’re bookmarking it from the edit page. In this way, users of the bookmarklet will have a better experience as they can now see their article, but it’s shown in cleaned up readable form. I need to clean it up and catch some edge/error cases, but it’s a start. Once it’s solid we can then use that content to store the page content and have immediate readable results instead of waiting for the next cron job to run in the background.

Bookie Weekly Update: April 22nd 2012

Another week, another few lines of code, and yay for two weeks in a row!

Bookie

Not a ton here, just some CSS updates and updating the backup script for pulling the INI correctly.

Bookie Parser

I spent some time cleaning up the CSS. I did some research on the most readable fonts for screens and surprisingly, it seems that sans serif wins on digital displays. So I updated the CSS and combined with some work on the Bookie main CSS files to make the readable pages a bit nicer. I’ve still got some more cleanup to do, but it reads a bit nicer now.

I also fixed the html generated to not have the empty body tag. It was due to the way the readable parsing library was giving me a full html document of content. See the updates over there for some bigger updates.

Finally, I added a form on the main page so you can try it out on a url just by entering it. So if you’re just curious what it does, go try it out!

Bookie Api

Just added a ping command. It should help make sure that the configuration is correct for new users. It’s also a nice start to a non-admin specific api command. A little bit of cleanup aside from that, but nothing major.

readability_lxml

Currently, Bookie uses a library called decruft for parsing html pages for the actual important article content. The bookie_parser project is using a different fork of that called readability_lxml. The author is a bit open to merging changes in and actually says she’s in ‘maintenance mode’. Since I kind of want a really decent library for this, it’s an important feature, I started hacking on it. In the process, this is where my week of hacking went.

First I updated it to allow me to get back only a partial html document vs an entire <html> doc. I then fixed some bugs, started cleaning up the code (adding tests, making the command line client all nice and argepare’y) etc. In the process I noticed that there’s a big branch in Github that adds a ton of things like multiple page document support and such. I’ve started to try to pull his branch into my work and the origin author’s code. It’s a LOT of git cherry-pick and really a pain since I want to clean up the code as I go. Unfortunately, this just means that Git gets confused on future merges since the code’s changed between commits. Ugh!

I’m about half way done though and I hope this will leave us with one solid library to do this parsing. I’m hoping to kind of take over stewardship of the library as I complete this work. It should hopefully make Bookie and bookie_parser all the more awesome.

The coming week

I’m giving a talk on the YUI JavaScript library at Penguicon. This means my
hacking time will be a bit less since I’ve got a presentation to prepare for. Next week’s status report might be a bit light and boring, but hey, maybe I’ll scrounge up some more beta users of Bookie while at the conference.

Bookie Weekly Status Report Returns! – April 15 2012

Ok, I’m overdue for a ‘weekly’ status report. I’m going to try to kick this back into gear as it helps you out there track things and me feel like I’m moving forward by writing down all the little things I’ve done over the last bit.

Trello board to keep up to date: https://trello.com/board/bookie/4f18c1ac96c79ec27105f228

New Projects

In an effort to add some features to Bookie I’ve ended up starting two new repos of code meant to interact with Bookie.

  1. Bookie Parser

This is meant to start taking over the work of reading the page content and readable parsing the important content out. It was a chance to play with Tornado and Heroku. This also means that in the future I’ll be able to scale out the readable processing serperatly from the main Bookie website and host. It’s pretty bare bones right now and doesn’t directly talk to Bookie, but I’ll look at adding that integration soon as the API stabilizes and I get more tests going in it.

So far the Heroku bit has been pretty awesome. I have to deal with the fact that the app gets shut down and has to restart on first request, but hopefully that gets better as traffic and use picks up. You can tinker with it at http://readable.bmark.us

  1. Bookie Api

I’ve been wanting to start up a command line client for some of the Bookie work. The big thing is that I need tools to help manage invites and such. So it’s currently very admin centric, but eventually I’d like to get this into a ncurses cool command line interface to pull up recent bookmarks and even do some quick searches via the API. Aren’t API’s cool. This will also contain the reference Python API implementations so we’ll have two implementations soon. One in JS and one in Python.

I’ve got a beta version (which is really an alpha) up on PyPi so you can

$ pip install bookie_api
$ bookie ping

Build baby build

I spent some quality time with http://build.bmark.us to get the JS tests running via grover and phantomjs and that’s awesome. I also added the new projects into the builder as well. So, while I don’t have all the tests I need, at least now the ones I do have run consistantly.

Other little tweaks

  • Prettied up the new user invite email and landing page
  • Fixed a bug with dupe tags in the tagcontroller
  • Added more icons from the fontawesome set to pretty up the ui, especially the account page.
  • Lots of changes to the make/build steps for JS and CSS including actually doing the pyscss transition.
  • Everything is now on the final stable release of YUI 3.5. It’s been a good ride through the development releases.

Upcoming events

I’ll be giving a talk at Penguicon on using YUI for JS app development. If you’re in the area stop by. This is Friday April 27th, at 6pm. Then on Saturday I’ve got a Bookie mini-sprint going on. I’ll probably be hacking most of the weekend. Feel free to stop by and check things out.

A few ideas, quick ways to get on the Bookie contrib list

Quick ideas for improving Bookie

Ideas!

Well, with all the great stuff going on with Bookie, I’ve gotten a bit buried in some big changes. The background processing and importing updates are going to take a bit to get right.

This means, there’s a great chance for others to hack up the little tweaks that we need to really add some polish to Bookie. So below I’ve listed a few ideas that should be pretty simple things to add, but with a really good positive and visible effect on the site.

  • Add notification that user has invites

    Now that invites are there, we should highlight a user’s account navigation link to let them know they have invites available. I’ll periodically add them to the system, and we don’t want users to have to go to their account page each time to see they’ve got invites. I think a simple adding of one of the envelope/message type icons from our font-icon set would be perfect, with some sort of hover message to start. We might also want to highlight the block in their account page so it stands out that the invites are available.

  • Flash message system.

    We want to be able to let users know things have happened successfully after doing something that redirected them. Imports are going to be doing this, saving/updating bookmarks, etc. It’d be nice to have a consistent type of ui to drop flash messages in and them to show after a redirect.

  • Show new user message if self bookmarks page has no results

    When a new users starts up and logs in, they default to their own page of bookmarks…which is going to be empty. So we should detect this in our JS code that fetches the results and displays a set of default content with links to things like importing instructions, where to get the chrome extensions, and other handy new user tips.

    Some of this might also be nice to use for the email that a new user gets when they’ve been invited to Bookie.

  • Add firefox bookmark importer

    Ok, so this one is a bit more involved, but really, it’s a single class and a couple of Python methods. The hard part is reading in and figuring out how to match bookmarks to tags in Firefox’s JSON dump of bookmarks. Once we get the Firefox extension rolling, it’ll be great to have a good import system for the browser as well.

Well, here are four things I’d love to see happen in the near future to help make the experience a level nicer for everyone. If you’re interested in all or have any questions, ping me in #bookie in irc or shoot me a comment below. I’d be happy to help walk anyone through these or any other ideas you might have.

Bookie PyCon 2012 Sprint Report

http://www.flickr.com/photos/66176388@N00/4592858614

Sprint like mad!

So PyCon sprints, what can you say? You go, you hack…and hack…and some some point you take a break for a beverage…and hack some more.

Last year I got the real movement behind Bookie. So this really marks the one year anniversary for Bookie. I’ve had it as a side project to hack on in my spare time for the last whole year. In that time, honestly it’s not crazy different from where it started. However, it’s gone through multiple JS rewrites, two different UI designs, and a whole lot more. I’ve really learned a lot about development, testing, and making some hard choices over the last year. I hope by this time next year I’ll have announced Bookie on some big site (reddit/hacker news?) and survived.

This sprint though wasn’t the time. So what did I get done?

tl;dr

Bookie got improvements

  • Better JS tests
  • Better PY tests
  • Start of HTML5 history
  • Invite system
  • Threaded content fetcher
  • Start of celery background runner

First, I started out by working on getting the html5 history stuff going. It’s not perfect yet, but it’s started and I really realized I needed to have a better way to do JS tests, so…

Next I redid the JS tests. I don’t want to have to fire up the application in order to run my JS tests. I also don’t want to have to hit the database and such. This means I had to change the API tests. Rather than making real requests/responses, I test that the classes build the right type of requests. I verify the url, data payload, etc are correct.

Once the JS tests were redone, I realized that I hated how I had yXXX.js as the filenames and redid those as well. While I was cleaning up I dumped a bunch of old code we no longer needed. Basically tons of gardening cleaning out the weeds.

With that out of the way, the next day of sprints was all about getting an invite system underway. I originally wanted to do a throttled signup process, so anyone could sign up, but then I realized that really, invites will work better. The people using Bookie now will know who’d be interested in testing and if someone really wants to get in, they’ll contact me in some fashion.

With that up, I got to spend the next day fixing bugs in everything. Wheeee! What was cool was that I managed to get a few people at the sprints curious about Bookie and testing things out. Nothing exposes bugs like new users. During this process I spent some time cleaning house on the Python side of things and making tests easier to run/write.

Finally, I’ve started work on the background processing using Celery. I’ve got a big hurdle in that, but my cron’d stats processes are working and I’ve almost got imports running as out of process celery children. That should really help with new users. You know at some point they’ll come flooding in right? 🙂

Overall, while I didn’t get a ton of new user facing features going, I did a TON of clean up and maintenance. As one person expressed “Wow, there’s a lot more in here than I expected when you said it was a bookmark application”. Bookie has really grown over the last year and she needs me to spend some time giving things some love before moving forward too fast. The sprints really gave me a chance to do that, all while hanging out and chatting with really smart people. What more can I ask for?

PyCon 2012: What a ride!

Phew, tiring trip to PyCon this year. This was my second year after hitting up my first last year. The conference definitely felt larger than last year as they crossed 2,200 attendees. It’s unbelievable to see how large the Python community has gotten. I can’t stress what great job the people that put this together.

Last year I hardly knew anyone. This year, however, I got to put faces to people I’ve interacted with over the last year, welcome back those I met last year, and get some face to face time with new co-workers from Canonical. The social aspect was a larger chunk of my time this year for sure.

Side note, I listen to The Changelog podcast from time to time, and I love their question on who you’d love to pair up/hack with as a programming hero type question. I got to meet and greet mine at this PyCon by meeting up with Mike Bayer. He’s behind some great tools like SqlAlchemy and Mako. What I love is that, not only does he rock the code part, but the community part as well. I’m always amazed to see the time he puts into his responses to questions and support avenues. Highlight of my PyCon for sure.

I’ll post a seperate blog post on my sprint notes. I feel that if you’re going to go, you might as well stay for sprints. I get as much out of that as the conference parts itself. I think I made some good progress on things for Bookie this year. The big thing is that an invite system is in place, so if you’d like an account on Bmark.us let me know and I’ll toss an invite your way.

Notes

  • Introduction to Metaclasses
    • Basic but reminded me how the bits worked and had some good examples. I like this because I often write ‘the code I want to be writing’ and then write my modules/etc to fit and metaclasses help with this sometimes.
  • Fast Test, Slow Test
    • Just a reminder that fast tests are true unit tests and run during dev which helps make things easier/faster as you go vs the whole ‘mad code’ then wait for feedback on how wrong you are.
  • Practical Machine Learning in Python
    • mloss.org – check out for lots of notes/etc on ML in OSS
    • ml-class.org – teach me some ML please
    • sluggerml – app he built as a ML demo
    • scikit-learn : lots of potential, very active right now
  • Introduction to PDB
    • whoa…where have you been all my life ‘until’ command?
    • use ‘where’ more to move up stack vs adding more debug lines
  • Flexing SQLAlchemy’s Relational Power
  • Hand Coded Applications with SQLAlchemy
    • ❤ SqlAchemy. Some really good examples of writing less code by automating the biolerplate with conventions.
  • Web Server Bottlenecks And Performance Tuning
    • lesson: if you think it’s apache’s fault think again. You’re probably doing it wrong.
  • Advanced Celery
    • check out cyme https://github.com/celery/cyme, possible way to more easily run/distribute celery work?
    • cool to see implementations of map/reduce using celery
    • chords and groups are good, check them out more
  • Building A Python-Based Search Engine
    • Good talk for into into terms and such for fulltext search
  • Lighting talks of note
    • py3 porting docs: http://docs.python.org/howto/pyporting
    • bpython rewind feature is full of win over ipython
    • ‘new virtualenv’ trying to get into stdlib for py3.3, cool!
    • asyncdynamo cool example of async boto requests for high performance working with AWS api (uses tornado)
    • I WANT the concurrent.features library…but it’s Python 3 😦