How to Customize the Command Prompt

0 comments

Posted on 20th June 2011 by Andrew Burgess in internet |Uncategorized

, , , , , , Command-line, , , , , , , , , , , , , , , , , , , , terminal, , , , , , , ,

Advertise here

I’m a big fan of the terminal: whether you’re leveraging a handful of commands (or more!) to improve your development process, or just using it to quickly move around your drives and open files and folders, the command line is an awesome tool. However, if you use it often, you’ll want to customize it to your needs. I’ll show you how to do that today!

I’m often asked, “How did you get your command prompt to look like that?” Well, in this tutorial, I’ll show you exactly how to do it. It’s pretty simple, and won’t require too much of your time.

I should note that what I’m showing you is specifically for the bash shell; this is the default shell on Mac and most Linux systems. If you’d like a bash prompt on Windows, you might want to check out Cygwin.


How does it Work?

Before we get started, let’s talk for a minute about how you customize your bash prompt. It’s not quite the same as you’re average application: there’s no preferences panel. Your customizations are stored in a file. If you’re on Linux (or using Cygwin), that will be your .bashrc file; on Mac, that’s your .bash_profile file. In both cases, this file is kept in your home directory (if you aren’t sure where that is for a Cygwin install, run the command echo $HOME). Note that I’ll only refer to the .bashrc file from here on out, but use the .bash_profile if you’re on a Mac.

Note that on Macs (and possibly Linux machines; I’m not sure), files that begin with a period are hidden by default. To show them, run these two lines in the terminal

defaults write com.apple.finder AppleShowAllFiles TRUE
killall Finder

So, what goes in that .bashrc file? Each line is actually a command that you could run on the command line. In fact, that’s how these config files work: When you open the console, all the commands you’ve written in the config file are run, setting up your environment. So, if you just want to try out some of what I’ll show below, just type it on the command line itself. The simplicity here is beautiful.


Customizing PS1

Let’s start with a definition. The prompt is what you see at the beginning of the line, each time you hit enter on the command line. Here’s what the default settings are for the Mac:

Mac Default Terminal

In this case, the prompt is andrews-macbook:~ screencast$. There’s a few variables here: andrew-macbook is the name of this computer, ~ is the current directory (the home directory) and screencast is the username. Let’s customize this a bit.

Open up either your .bashrc file. The way we set what information is displayed in the prompt is with the PS1 variable. Add this to the file:

PS1='->'

Notice that I don’t put spaces on either side of the equals sign; that’s necessary. Save this file in your home directory and re-open a terminal window. Now, you should have a prompt that looks like this:

Mac Customized Terminal

I’ll note here that if you find it tedious to close and re-open your terminal each time you make a change to your .bashrc or .bash_profile, there’s a bit of a shortcut: You can load any bash customization file with the source command. Run this in your terminal:

source ~/.bashrc 

Still too long? Well, a single period (.) is an alias for source. Happy now? If you’re quick, you’ll realize that we can use the source command to include other files within our .bashrc file, if you want to split it up to keep in under control.

Let’s customize our prompt a bit more. We can use built-in variables in the string we assign to PS1 to include helpful information in the prompt; here’s a few useful one:

  • \d: Date
  • \h: Host
  • \n: Newline
  • \t: Time
  • \u: Username
  • \W: Current working directory
  • \w: Full path to current directory

So, if you set your prompt to this:

PS1='\n\W\n[\h][\u]->'

you should see something like this:

Mac Customized Terminal

Notice a few things here: firstly, we’re using a bunch of the variables shown above to give us more information. But secondly, we’re including a few newlines in there, and getting a more interesting prompt: we have the current directory on one line, and then the actual prompt on the next line. I prefer my prompt this way, because I always have the same amount of space to write my commands, no matter how long the path to the current directory is. However, there’s a better way to do this, so let’s look at that now.


Customizing PROMPT_COMMAND

The better way to do this is the use the PROMPT_COMMAND variable; the contents of this variable isn’t just a string, like with PS1. It’s actually a command that executed before bash displays the prompt. To give this a try, let’s add this to our .bashrc:

PROMPT_COMMAND='echo "comes before the prompt"'

We’re using the echo command here; if you aren’t familiar with it, you just pass it a string, and it will write it to the terminal. By itself, it’s not incredibly useful (although you can use it to view variables: echo $PS1), but it’s great when used with other commands, so display their output. If you added the line above, you should see this:

Mac Customized Terminal

Let’s do something more useful here. Let’s write a bash function that we will assign to PROMPT_COMMAND. Try this:

print_before_the_prompt () {
    echo "comes before the prompt"
}

PROMPT_COMMAND=print_before_the_prompt

If you use this, you shouldn’t see a difference in your prompt from what we have above. Now, let’s make this useful.

print_before_the_prompt () {
  echo "$USER: $PWD"
}

PROMPT_COMMAND=print_before_the_prompt

PS1='->'

Here’s what you’ll get:

Mac Customized Terminal

That’s a good start, but I want to do a bit more. I’m going to use the printf command instead of echo because it makes including newlines and variables a bit easier. A quick background on the printf command: it takes several paramters, the first being a kind of template for the string that will be outputted. The other parameters are values that will be substituted into the template string where appropriate; we’ll see how this works.

So let’s do this:

print_before_the_prompt () {
    printf "\n%s: %s\n" "$USER" "$PWD"
}

See those %s parts in there? That means “interpret the value for this spot as a string”; for context, we could also use %d to format the value as a decimal number. As you can see, we have two %ss in the “template” string, and two other parameters. These will be placed into the string where the %ss are. Also, notice the newlines at the beginning and end: the first just gives the terminal some breathing room. The last one makes sure that the prompt (PS1) will be printed on the next line, and not on the same line as PROMPT_COMMAND.

You should get a terminal like this:

Mac Customized Terminal

Adding Some Color

Looking good! But let’s take it one step farther. Let’s add some color to this. We can use some special codes to change the color of the text in the terminal. It can be rather daunting to use the actual code, so I like to copy this list of variables for the color and add it at the top of my .bashrc file:

txtblk='\e[0;30m' # Black - Regular
txtred='\e[0;31m' # Red
txtgrn='\e[0;32m' # Green
txtylw='\e[0;33m' # Yellow
txtblu='\e[0;34m' # Blue
txtpur='\e[0;35m' # Purple
txtcyn='\e[0;36m' # Cyan
txtwht='\e[0;37m' # White

bldblk='\e[1;30m' # Black - Bold
bldred='\e[1;31m' # Red
bldgrn='\e[1;32m' # Green
bldylw='\e[1;33m' # Yellow
bldblu='\e[1;34m' # Blue
bldpur='\e[1;35m' # Purple
bldcyn='\e[1;36m' # Cyan
bldwht='\e[1;37m' # White

unkblk='\e[4;30m' # Black - Underline
undred='\e[4;31m' # Red
undgrn='\e[4;32m' # Green
undylw='\e[4;33m' # Yellow
undblu='\e[4;34m' # Blue
undpur='\e[4;35m' # Purple
undcyn='\e[4;36m' # Cyan
undwht='\e[4;37m' # White

bakblk='\e[40m'   # Black - Background
bakred='\e[41m'   # Red
badgrn='\e[42m'   # Green
bakylw='\e[43m'   # Yellow
bakblu='\e[44m'   # Blue
bakpur='\e[45m'   # Purple
bakcyn='\e[46m'   # Cyan
bakwht='\e[47m'   # White

txtrst='\e[0m'    # Text Reset

There’s some method to this madness: The first set are turn on normal coloring. The second set turn on bold coloring. The third set turn on underlined coloring. And that fourth set turn on background coloring. That last one resets the coloring to normal. So, let’s use these!

print_before_the_prompt () {
    printf "\n $txtred%s: $bldgrn%s \n$txtrst" "$USER" "$PWD"
}

Here, I’ve added $txtred before the first %s, and $bldgrn before the second %s; then, at the end, I’ve reset the text color. You have to do this because once you set a color, it will hold until you either use a new color or reset the coloring. You’ll also notice that when setting a variable, we don’t prefix it with a dollar sign; but we do use the dollar sign when using the variable: that’s the way bash variables work. This gives us the following:

Mac Customized Terminal

Let’s move on to the final step: adding some scripting to give us even more information.


Adding Version Control Information

If you’ve seen the screencasts that come with my book Getting Good with Git (yes, a shameless plug), you might remember that I have some version control information showing in my prompt. I got this idea from the excellent PeepCode “Advanced Command Line” screencast, which share this, as well as many other great tips.

To do this, we’re going to need to download and build the script that finds this information. Head on over to the repository for vcprompt, a script that outputs the version control information. If you’re familiar with the Mercurial version control system, you can use that to get the repo, but you’ll most likely want to hit that ‘zip’ link to download the script code as a zip file. Once you unzip it, you’ll have to build the script. To do this, just cd into the unzipped script folder and run the command make. Once this command runs, you should see a file named ‘vcprompt’ in the folder. This is the executable script.

So, how do we use this in our prompt? Well, this brings up an important rabbit-trail: how do we “install” a script (like this one) so that we can use it in the terminal? All the commands that you can run on the terminal are found in a defined array of folders; this array is the PATH variable. You can see a list of the folders currently in your PATH by running echo $PATH. It might look something like this:

/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin

What we need to do is put the executable script vcprompt in a folder that’s in our path. What I like to do (and yes, I learned this trick from that PeepCode screencast, too) is create a folder called ‘bin’ (short for ‘binary’) in my home directory and add that folder to my PATH. Add this to your .bashrc:

export PATH=~/bin:$PATH

This sets PATH to ~/bin, plus whatever was already in the PATH variable. If we now put that vcprompt script into ~/bin, we will be able to execute it in any folder on the terminal.

So, now let’s add this to our prompt:

print_before_the_prompt () {
    printf "\n $txtred%s: $bldgrn%s $txtpur%s\n$txtrst" "$USER" "$PWD" "$(vcprompt)"
}

I’ve added $txtpur %s to the “template” string, and added the fourth parameter"$(vcprompt)". Using the dollar sign and parenthesis will execute the script and return the output. Now, you’ll get this:

Mac Customized Terminal

Notice that if the folder doesn’t use some kind of version control, nothing shows. But, if we’re in a repository, we get the version control system that’s being used (Git, in my case) and the branch name. You can customize this output a bit, if you’d like: check the Readme file that you downloaded with the source code for the vcprompt script.


Moving On!

Here’s our complete .bashrc or .bash_profile file:

export PATH=~/bin:$PATH

txtblk='\e[0;30m' # Black - Regular
txtred='\e[0;31m' # Red
txtgrn='\e[0;32m' # Green
txtylw='\e[0;33m' # Yellow
txtblu='\e[0;34m' # Blue
txtpur='\e[0;35m' # Purple
txtcyn='\e[0;36m' # Cyan
txtwht='\e[0;37m' # White
bldblk='\e[1;30m' # Black - Bold
bldred='\e[1;31m' # Red
bldgrn='\e[1;32m' # Green
bldylw='\e[1;33m' # Yellow
bldblu='\e[1;34m' # Blue
bldpur='\e[1;35m' # Purple
bldcyn='\e[1;36m' # Cyan
bldwht='\e[1;37m' # White
unkblk='\e[4;30m' # Black - Underline
undred='\e[4;31m' # Red
undgrn='\e[4;32m' # Green
undylw='\e[4;33m' # Yellow
undblu='\e[4;34m' # Blue
undpur='\e[4;35m' # Purple
undcyn='\e[4;36m' # Cyan
undwht='\e[4;37m' # White
bakblk='\e[40m'   # Black - Background
bakred='\e[41m'   # Red
badgrn='\e[42m'   # Green
bakylw='\e[43m'   # Yellow
bakblu='\e[44m'   # Blue
bakpur='\e[45m'   # Purple
bakcyn='\e[46m'   # Cyan
bakwht='\e[47m'   # White
txtrst='\e[0m'    # Text Reset

print_before_the_prompt () {
    printf "\n $txtred%s: $bldgrn%s $txtpur%s\n$txtrst" "$USER" "$PWD" "$(vcprompt)"
}

PROMPT_COMMAND=print_before_the_prompt
PS1='->'

Well, that’s a crash-course on customizing your bash prompt. If you have any questions, be sure to drop them in the comments!

Python from Scratch: Variables, Data Types and Control Structure

0 comments

Posted on 16th June 2011 by Giles Lavelle in internet |Uncategorized

, , , , , , , , , , , , , , , , , , , , , , python from scratch, , , , , , , , , , ,

Advertise here

Welcome back to Python from Scratch, where we’re learning Python…from scratch! In the last lesson, we installed Python and got set up. Today, we’re going to cover quite a bit, as we learn the essentials. We’ll review variables, operators, and then finish up by learning about control structures to manage the flow of your data.


Video Tutorial

Alternate Source
Subscribe to our YouTube and Blip.tv channels to watch more screencasts.

Variables

Variables are the first thing you should learn in any new language. You can think of them as named containers for any kind of data. The syntax to declare them is: name = value You can name anything you like (except for a handful of keywords), and their values can be any type of data.


Data Types

There are many data types, but the following four are the most important:

Numbers

Numbers can be either integers or floating point numbers.

  • Integers are whole numbers
  • Floats have a decimal point

Strings

String are lines of text that can contain any characters. They can be declared with single or double quotes.

        empty = ""
        escaped = "Can\'t"
        greeting  = "Hello World"
        multiLine = "This is a long \n\
        string of text"

You have to escape single and double quotes within the string with a backslash. Otherwise, Python will assume that you’re using them to end the string. Insert line breaks with \n. Python also supports string interpolation using the percent symbol as follows:

name = "John Doe"
greeting = "My name is %s" % name

You can access sets of characters in strings with slices, which use the square bracket notation:

"Hello"[2] #outputs "l"

Booleans

Booleans represent either a True or False value. It’s important to note that you have to make the first letter capital. They represent data that can only be one thing or the other. For example:

        isMale = True #Could be used in software with a database of users
        isAlive = False #Could be used in a game, set when the character dies

Lists

Lists are used to group other data. They are called Arrays in nearly all other languages. You can create a list with square brackets.

        emptyList = []
        numbersList = [1, 2, 3]
        stringsList = ["spam", "eggs"]
        mixedList = ["Hello", [1, 2, 3], False]

As you can see above, lists may contain any datatypes, including other lists or nothing at all.

You can access parts of lists just like strings with list indexes. The syntax is the same:

numbersList[1] #outputs 2
stringList[0] #outputs spam
mixedList[1][2] #outputs 3

If you nest a list within another list, you can access them with multiple indexes.


Comments

Comments are used to describe your code, in the case that you want to come back to it later, or work in a project with someone else.

#This a comment on it's own line
#You create them with the hash symbol
var = "Hello" #They can be on the same line as code

Operators

You’ve seen operators before. They’re those things like plus and minus, and you use them in the same way that you learned in school.

        2 + 3 #Addition, returns 5
        8 - 5 #Subtraction, returns 3
        2 * 6 #Multiplication, returns 12
        12 / 3 #Division, returns 4
        7 % 3 #Modulo, returns the remainder from a division, 1 in this case.
        3**2 #Raise to the power, returns 9

You can also assign the result of an operation on a variable back to the same variable by combining the operator with an equals sign. For example, a += b is a more concise version of a = a + b

        x = 2
        x += 4 #Adds 4 to x, it now equals 6
        x /= 2 #Divides x by 2, it now equals 3

Control Structures

Once you’ve created and manipulated variables, control structures allow you to control the flow of data. The two types we’re learning today are conditionals and loops.

Conditionals

Conditionals allow you to run different blocks of code based on the value of data.

a = 2
b = 3

if a 
            

Loops

The two types of loops we’re discussing here are for loops and while loops. for loops work using lists, and while loops work using conditions.

while loops

a, b = 0, 5

while a 
            

for Loops

myList = [1, 2, 3, 4, 5]

for a in myList:
        print a

Conclusion

That’s it for today, but we’ve covered a bunch of techniques. Feel free to run though everything a few times until it makes sense. I’ll try and answer any more questions in the comments, and I hope you’ll join me for the rest of the series!

Even More Memorable Maintenance & 404 Error Pages

0 comments

Posted on 23rd May 2011 by Lindsey in internet |Uncategorized

404 error pages, , , , , , , , , , , , , , , , , maintenance pages, , , , , , , , , , , , , , , , , , ,

Way back in 2008, I wrote an article with tips on creating memorable maintenance/error pages and examples of some awesome pages from other websites. The importance of the maintenance/error page is just as important today. Even more important than a … Continue reading

How to Create a Web Service in a Matter of Minutes

0 comments

Posted on 9th May 2011 by Christian Heilmann in internet |Uncategorized

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , web service, web services, , , , yql

Advertise here

Twice a month, we revisit some of our readers’ favorite posts from throughout the history of Nettuts+. This tutorial was first published in July, 2010.

Offering your content or logic as a service on the web is a great idea. For starters it allows you to build numerous front-ends for your own information without having to access the databases all the time (and thus making scaling your system much easier).

The even more practical upshot is that you allow people on the web to play with your information and build things you never even dreamed of doing. A lot of companies understand that this “crowd-sourced innovation” is a freebie that is too good to miss which is why there are so many great APIs around.

Providing an API to the world is a totally different story though. You need to know how to scale your servers, you need to be there for answering questions by implementers, and you need to maintain a good documentation to allow people to use your content. You also need to think about a good caching strategy to keep your servers from blowing up and you need to find a way to limit access to your system to avoid people abusing it. Or do you?


Enter YQL

Yahoo offers a system for people to access their APIs called the Yahoo Query Language, or YQL. YQL is a SQL-style language that turns information on the web into virtual databases that can be queried by end users. So if you want to, for example, search the web for the term “elephant,” all you need to do is to use the following statement:

select * from search.web where query="elephant"

You send this statement to a data endpoint, and you get it back as either XML, JSON, or JSON-P. You can request more results, and you can filter them by defining what you want to get back:

http://query.yahooapis.com/v1/public/yql
?q={yql query}
&diagnostics={true|false}
&format={json|xml}
&callback={function name}

Mix and Match

All of Yahoo’s APIs are available through this interface, and you can mix and match services with sub-selections. For example, you could run a keyword analysis tool over the abstract of a web search to find relevant keyterms. Using the unique() functions, you can also easily remove false positives.

select * from search.termextract where context in (
  select abstract from search.web(50) where query="elephant")
| unique(field="Result")

See the results of this more complex query here.

Keywords extracted from the abstract of search results

The Console

The easiest way to play with YQL as a consumer is to use the console at http://developer.yahoo.com/yql/console/. There you can click on different tables to see a demo query how to use it and if you click the desc link you find out which options are available to you.


YQL Limits

The use of YQL has a few limits which are described in the documentation. In essence, you can access the open data endpoint 1,000 times in an hour, per IP. If you authenticate an application with oAuth, you get 10,000 hits an hour. Each application is allowed 100,000 hits a day.

This, and the caching of results that YQL does automatically means that the data does only get requested when it changed. This means that YQL is sort of a firewall for requests to the data people offer with it.

Be careful when using jQuery’s “$.getJSON,” and an anonymous function as its callback. This can bust YQL’s caching abilities, and hinder performance.


Building Web Services with Open Tables

The really cool thing for you as a provider is that YQL is open for other data providers.

If you want to offer an API to the world (or just have one for yourself internally) you can easily do that by writing an “open table” which is an XML schema pointing to a web service.

People do this a lot, which means that, if you click the “Show community tables” link in the YQL console, you will find that there are now 812 instead of 118 tables to play with (as of today – tomorrow there will probably be more).

To get your service into YQL and offer it to the world all you need to do is to point YQL to it. Let’s look at a simple example:


Real-World Application: Craigslist as an API

The free classified ad web site Craigslist has no public API – which is a shame, really. However, when you do a search on the site you will find that the search results have an RSS output – which is at least pointing towards API functionality. For example, when I search for “schwinn mountain bike” in San Francisco, the URL of the search would be:

http://sfbay.craigslist.co.uk/search/sss?format=rss&query=schwinn+mountain+bike

This can be changed into a URL with variables, with the variables being the location, the type of product you are looking for (which is the section of the site) and the query you searched for (in this case I wrapped the parameters in curly braces):

http://{location}.craigslist.co.uk/search/{type}?format=rss&query={query}

Once you found a pattern like this you can start writing your open table:


Yahoo! Inc.http://craigslist.org/select * from {table} where location="sfbay" and type="sss" and query="schwinn mountain bike"Searches Craigslist.org

For a full description of what all that means, you can check the YQL documentation on open tables but here is a quick walkthrough:

  1. You start with the XML prologue and a table element pointing to the schema for YQL open tables. This allows YQL to validate your table.
  2. You add a meta element with information about your table: the author, the URL of your documentation and a sample query. The sample query is the most important here, as this is what will show up in the query box of the YQL console when people click on your table name. It is the first step to using your API — so make it worth while. Show the parameters you offer and how to use them. The {table} part will be replaced with the name of the table.
  3. The bindings element shows what the table is connected to and what keys are expected in a query.
  4. You define the path and the type of the output in the select element – values for the type are XML or JSON and the path allows you only to return a certain section of the data returned from the URL you access.
  5. In the urls section, you define the URL endpoints of your service. In our case, this is the parameterised URL from earlier. YQL replaces the elements in curly braces with the information provided by the YQL user.
  6. In the inputs section, you define all the possible keys the end users can or should provide. Each key has an id, a paramType which is either path, if the parameter is a part of the URL path, or query, if it is to be added to the URL as a parameter. You define which keys are mandatory by setting the mandatory attribute to true.

And that is it! By putting together this XML document, you did the first of three steps to get your web services to be part of the YQL infrastructure. The next step is to tell YQL where your web service definition is. Simply upload the file to a server, for example http://isithackday.com/craigslist.search.xml. You then point YQL to the service by applying the use command:

use "http://isithackday.com/craigslist.search.xml" as cl;
select * from cl where location"sfbay" and type="sss" and query="playstation"

You can try this out and you’ll see that you now find playstations for sale in the San Francisco Bay Area. Neat, isn’t it?


Logic as a Service

Sometimes you have no web service at all, and all you want to do is offer a certain logic to the world. I found myself doing this very thing the other day. What I wanted to know is the distance between two places on Earth. For this, I needed to find the latitude and longitude of the places and then do very clever calculations. As I am a lazy person, I built on work that other people have done for me. In order to find the latitude and longitude of a certain place on Earth you can use the Yahoo Geo APIs. In YQL, you can do this with:

select * from geo.places(1) where text="paris"

Try this out yourself.

In order to find a function that calculates the distance between two places on Earth reliably, I spent a few minutes on Google and found Chris Veness’ implementation of the “Vincenty Inverse Solution of Geodesics on the Ellipsoid”.

YQL offers an executable block inside open tables which contains server-side JavaScript. Instead of simply returning the data from the service, you can use this to convert information before returning it. You can also do REST calls to other services and to YQL itself in these JavaScript blocks. And this is what I did:


select * from {table} where place1="london" and place2="paris" Christian Heilmann http://isithackday.com/hacks/geo/distance/ Gives you the distance of two places on earth in miles or kilometers
  1. The meta element is the same as any other open table.
  2. In the bindings we don’t have a URL to point to so we can omit that one. However, we now add an execute element which ensures that the keys defined will be sent to the JavaScript defined in this block.
  3. As the Geo API of Yahoo returns namespaced XML, we need to tell the JavaScript which namespace that is.
  4. I execute two YQL queries from the script using the y.query() method using the place1 and place2 parameters to get the locations of the two places. The .results after the method call makes sure I get the results. I store them in res and res2 respectively.
  5. I then get the latitude and longitude for each of the results and call the distVincenty() method.
  6. I divide the result by 1000 to get the kilometers and multiply the result with the right number to get the miles.
  7. I end the script part by defining a response.object which is what YQL will return. As this is server-side JavaScript with full E4X support all I need to write is the XML I want to return with the JavaScript variables I want to render out in curly braces.

Using this service and adding a bit of interface to it, I can now easily show the distance between Batman and Robin.

Showing the distance between two places on earth

Using server-side JavaScript you can not only convert data but also easily offer a service that only consists of calculations – much like Google Calculator does.


Turning an Editable Data Set into a Web Service

What you really want to do in most cases though is to allow people to edit the data that drives the web service in an easy fashion. Normally, we’d build a CMS, we’d train people on it, and spend a lot of time to get the data from the CMS onto the web to access it through YQL. It can be done easier though.

A few months ago, I released a web site called winterolympicsmedals.com which shows you all the information about the Winter Olympics over the years.

The data that drives the web site was released for free by The Guardian in the UK on their Data Blog as an Excel spreadsheet. In order to turn this into an editable data set, all I had to do was save a copy to my own Google Docs repository. You can reach that data here. Google Docs allows sharing of Spreadsheets on the web. By using “CSV” as the output format, I get a URL to access in YQL:

Sharing a spreadsheet in Google docs

And using YQL you can use CSV as a data source:

select * from csv where
url="http://spreadsheets.google.com/pub?
key=0AhphLklK1Ve4dHBXRGtJWk1abGVRYVJFZjQ5M3YxSnc &hl=en&output=csv"

See the result of that in your own browser.

As you can see, the CSV table automatically adds rows and columns to the XML output. In order to make that a more useful and filter-able web service, you can provide a columns list to rename the resulting XML elements:

select * from csv where url="http://spreadsheets.google.com/pub?
key=0AhphLklK1Ve4dHBXRGtJWk1abGVRYVJFZjQ5M3YxSnc&hl=en&output=csv" and
columns="year,city,sport,discipline,country,event,gender,type"

See the renamed columns in your browser.

This allows you to filter the information, which is exactly what I did to build winterolympicsmedals.com. For example to get all the gold medals from 1924 you’d do the following:

select * from csv where url="http://spreadsheets.google.com/pub?
key=0AhphLklK1Ve4dHBXRGtJWk1abGVRYVJFZjQ5M3YxSnc&hl=en&output=csv" and
columns="year,city,sport,discipline,country,event,gender,type"
and year="1924" and type="Gold"

See the gold medals of 1924 in your browser.

So you can use the free storage of Google and the free web service infrastructure to convert free data into a web service. All you need to do is create a nice interface for it.


Adding your Service to YQL’s Community Tables

Once you’ve defined your open table, you can use it by hosting it on your own server, or you can go full in by adding it to the YQL table repository. To do this, all it needs is for you to add it to the YQL tables repository at GitHub which can be found at http://github.com/yql/yql-tables/. Extensive help on how to use Git and GitHub can be found in their help section.

If you send a request to the YQL team to pull from your repository, they’ll test your table, and if all is fine with it, they’ll move it over to http://datatables.org/ which is the resource for the communities table in the YQL console.

This does not only make the life of other developers more interesting, but is also very good promotion for you. Instead of hoping to find developers to play with your data, you bring the data to where developers already look for it.


Advanced YQL Topics

This introduction can only scrape the surface of what you can do with YQL. If you check the documentation, you’ll find that, in addition to these “read” open tables, you can also set up some services that can be written to, and YQL also offers cloud storage of your information. Check the extensive YQL documentation for more.


Summary

Combining open systems like YQL and Google Docs, and some knowledge of XML and JavaScript, you can offer a web service to people in a matter of minutes. In any case, moving your development from accessing local files and databases to accessing services makes it much more versatile and allows you to switch providers any time in the future. With YQL, you can dip your toes into the water of web services without drowning as most of the tough work has already been done for you. Thanks for reading!


About the Author

Christian Heilmann is an international Developer Evangelist who works for Mozilla.

Python from Scratch: Getting Started

0 comments

Posted on 5th May 2011 by Giles Lavelle in internet |Uncategorized

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Videos, , ,

Advertise here

Welcome to Python from Scratch, where I’m going to teach you the ins and outs of Python development… from scratch.

In this first lesson, we’re going to choose a version, install Python, and then create the obligatory “Hello world” script. If you’re already familiar with Python, feel free to skip ahead to a later lesson in the series.


Video Tutorial

Subscribe to our YouTube and Blip.tv channels to watch more screencasts.

Companion Article


Choosing a Version

“It’s important to choose the right version.”

There are two versions of Python that are currently being developed: 2.x and 3.x. It’s important to make the right choice ,because there are several differences between the two. Once you’ve learned one, it can be a bit annoying to have to transition to the other. In this series, we’ll be working through version 2.7.1. You may want to go this route in order to follow along with the videos and articles in this series. That said, most things should work with either version. Version two has much more support from third party libraries, whereas version three has more features, and plenty of bug fixes and refinements.

To make things easier, a lot of features that are being added to version three have also being added to version two, so there’s less need to worry about the differences.


Installing the Interpreter

Once you’ve chosen a version, it’s time to install. Download the version of Python for your OS, and run the installer which will get it set up on your machine. There are three ways you can now use Python:

  • Python Shell- lets you run commands line by line.
  • IDLE GUI – lets you write more complex scripts, and run them in one go.
  • Text Editor – any text editor that runs on you system. You can then save it with a .py extension, and run it from the shell.

For now, launch the shell to test if it works correctly. On Windows, navigate to the directory you installed Python. By default, it should be C:\Python27 Once there, launch python.exe. If you’re on Mac or Linux, launch the Terminal, then type python.

I personally find IDLE to be unpleasant to use; so for most of this series, we’re going to be using a standard code editor. If you would rather use an IDE, there are plenty of good ones:


Hello World!

No we’re all set up; let’s write your first bit of Python! Go to the shell, type print "Hello World!", and hit enter. The phrase Hello World! should appear.

And that’s it: it really is as simple as that. You’ve written your first Python program! Now is a good opportunity to highlight one of the differences beween version two and three: they changed print from a statement to a function. Don’t worry too much about what those words mean for now. All you need to know is that, if you chose to use version three, you need to write print("Hello World!") — the only difference is that you don’t use brackets. We’ll learn about the technical reasons behind this in a future lesson.


Conclusion

So in this lesson, we set ourselves up with a fresh Python installation, discussed the best tools to write code, and wrote our first program. If you have any questions, I’m happy to answer them in the comments, and I hope you’ll join me for the rest of this series.

Vim Essential Plugin: PeepOpen

0 comments

Posted on 2nd May 2011 by Jeffrey Way in internet |Uncategorized

, , , , , , code editors, , , , , , , , , , , , , , , , , , , , Tips, , , , , , Videos, vim, , ,

Advertise here

In this episode of our “Vim Essential Plugins” quick tip series, we’ll take a look at the only application in our list that isn’t free: PeepOpen. But, don’t let this deter you from picking up one of the most useful applications available for Vim users.

Subscribe to our YouTube and Blip.tv channels to watch more screencasts.

Usage

Usage

Like the other plugins that have been featured in this Session, PeepOpen is essentially a file explorer — but a beautiful and incredibly fast one at that. Somewhat uniquely, PeepOpen is an application that is stored in your menu bar.

PeepOpen Menu

You’ll find that this application isn’t limited only to Vim users, though. Nope – from Coda, to TextMate, to Emacs…all the best Mac editors are available. Sorry Windows users. You’re out in the cold for this one, unfortunately.

Preferences

Upon viewing the preferences section of the app, you’ll find a helpful tab that allows you to specify which file types to ignore — such as DS_STORE, .swp, etc. The PeepOpen team have already taken care of the most pertinent file types for you.

Within MacVim, PeepOpen can be triggered by typing Control + o. At this point, you can type any sequence of characters to identify the file that you wish to access, and note that this is fuzzy searching.

And it’s really as simple as that. Rather than opening the file explorer, or wasting time searching through your buffers, simply use PeepOpen. It’ll save you an incredible amount of time over the course of the year.

An In Depth Guide to mod_rewrite for Apache

0 comments

Posted on 26th April 2011 by Joseph Pecoraro in internet |Uncategorized

, , , , , , , , , , , htaccess, , , , , mod rewrite, mod_rewrite, , , , , , , , , , , , , , , , , ,

Advertise here

Twice a month, we revisit some of our readers’ favorite posts from throughout the history of Nettuts+. This tutorial was first published last September.

When people think of .htaccess configuration, the first thing that might pop into their minds is URL manipulation with mod_rewrite. But they’re often frustrated by mod_rewrite’s complexity. This tutorial will walk you through everything you need to know for the most common mod_rewrite tasks.

Mod_rewrite Rants

Thoughts on mod_rewrite vary quite a bit. To gain a quick feel for what the world thinks, I just ran a Twitter search on “mod_rewrite”. Here’s a sample of what was returned.

mldk: Aargh! .htaccess and mod_rewrite can be such a pain in the —!

bsterzenbach: Man do I love mod_rewrite. I could work with it the rest of my life and still not master it – so powerful

mikemackay: Still loving the total flexibility of mod_rewrite – coming to the rescue again. Often so overlooked…and easier than you might think too!

hostpc: I hate mod_rewrite. Can’t get this dang application to work properly :(

awanderingmind: Oh WordPress and Apache, how thou dost vex me. Mod_rewrite be damned!

danielishiding: Why won’t mod_rewrite work! Damn it!

A few things I noticed are that people clearly recognize the power of mod_rewrite, but are often frustrated by the syntax. That’s not surprising, considering the front page of Apache’s mod_rewrite documentation says essentially the same thing:

Despite the tons of examples and docs, mod_rewrite is voodoo. Damned cool voodoo, but still voodoo.” — Brian Moore

What a turn off! So, in this article, I’m really going to bring things down a notch. We’ll address not only mod_rewrite’s syntax, but I’ll also provide a workflow that you can use to debug and solve your mod_rewrite problems. Along the way, we’ll review a few useful real-world examples.

Before we begin, a note of caution: as with many subjects – this one in particular – you won’t learn unless you try on your own! That is one of the reasons why I’m going to focus on teaching a debug workflow. As usual I’ll demonstrate how to get your system setup if you don’t already have the module loaded. I urge you to work through the examples on your own server – preferably, in a test environment.


What is mod_rewrite?

mod_rewrite is an Apache module that allows for server-side manipulation of requested URLs.

mod_rewrite is an Apache module that allows for server-side manipulation of requested URLs. Incoming URLs are checked against a series of rules. The rules contain a regular expression to detect a particular pattern. If the pattern is found in the URL, and the proper conditions are met, the pattern is replaced with a provided substitution string or action. This process continues until there are no more rules left or the process is explicitly told to stop.

This is summarized in these three points:

  • There are a list of rules that are processed in order.
  • If a rule matches, it checks the conditions for that rule.
  • If everything is a go, it makes a substitution or action.

Advantages of mod_rewrite

There are some obvious advantages to using a URL rewriting tool like this, but there are others that might not be as obvious.

mod_rewrite is most commonly used to transform ugly, cryptic URLs into what are known as “friendly URLs” or “clean URLs.

As an added bonus, these URLs are also more search engine friendly. Consider the following example:

Not so friendly: http://example.com/user.php?id=4512
Much friendlier: http://example.com/user/4512/
Even better:     http://example.com/user/Joe/

Not only is the final link easier on the eyes, it’s also possible for search engines to extract semantic meaning from it. This basic kind of URL rewriting is one way that mod_rewrite is used. However, as you will see, it can do a whole lot more than just these simple transformations.

Expanding on the same example, some people claim there are security benefits to having mod_rewrite tranform your URLs. Given the same example, imagine the following attack on the user id:

http://example.com/user.php?id=AHHHHHH

http://example.com/user/AHHHHHH/

In the first example, the PHP script is explicitly being invoked and must handle the invalid id number. A poorly written script would likely fail, and, in a more extreme case (in a poorly written web application), bad input could cause data corruption. However, if the user is only ever shown the friendlier URLs. they would never know that the user.php page existed.

Trying the same attack in that case would likely fail before it even reaches the PHP script. This is because, at the core of mod_rewrite is regular expression pattern matching. In the example case above, you would have been expecting a number, for example (\d+), not characters like a-z. This extra layer of abstraction is nice from a security perspective.


Enabling mod_rewrite on the Server

Enabling mod_rewrite or any apache module must be done from the global configuration file (httpd.conf).

Just like enabling .htaccess support, enabling mod_rewrite or any Apache module must be done from the global configuration file (httpd.conf). Just as before, since mod_rewrite usage is so widespread, hosting companies almost always have it enabled. However, if you suspect that your hosting company does not – and we will test for that below – contact them and they will likely enable it.

If you rolled your own Apache installation, it’s worth noting that mod_rewrite needs to be included when compiled, as it is not done so by default. However, it’s so common that nearly all installation guides, including Apache’s show how in their example.

If you’re the administrator for your web server, and you want to make sure that you load the module, you should look in the httpd.conf file. In the configuration file, there will be a large section which loads a bunch of modules. The following line will likely appear somewhere within the file. If it is, great! If its commented out, meaning there is a # symbol at the start of the line, then uncomment it by removing the #:

LoadModule rewrite_module modules/mod\_rewrite.so 

Olders version of Apache 1.3 may require you to add the following directive after the LoadModule directive.

# Only in Apache 1.3 AddModule mod\_rewrite.c 

However, this seems to have disappeared in Apache 2 and later. Only the LoadModule directive is required.

If you had to modify the configuration file at all (not likely), then you will need to restart the web server. As always, you should remember to make a backup of the original file in case you need to revert back to it later.


Testing for mod_rewrite

You can test if mod_rewrite is enabled/working in a number of ways. One of the simplest methods is to view the output from PHP’s phpinfo function. Create this very simple PHP page, open it in your browser, and search for “mod_rewrite” in the output.

 

mod_rewrite should show up in the “Loaded Modules” section of the page like so:

Good, mod_rewrite enabled

If you’re not using PHP (although I will for the rest of the tutorial), there are some others ways to check. Apache comes with a number of command line tools that you can refer to. You can also use other tools, like apachectl or httpd to directly test for the module. There are command line switches that allow you to check all of the loaded modules in the existing installation. You can execute the following to get a listing of all of the loaded modules.

 shell> apachectl -t -D DUMP_MODULES 

This command with display the “help” page for the command. I then run the command and search for “rewrite” in the results and it shows there was a line of output that matched!

apache test

Finally, if you are still unsure if it’s enabled, just give it a shot! The following .htaccess file will redirect any request in the given folder to the good.html file. That means, if mod_rewrite is working, you should see good.html.

# Redirect everything in this directory to "good.html"
RewriteEngine on RewriteRule .* good.html 
Good, mod_rewrite worked
Bad, mod_rewrite didn't work

Inside .htaccess

As always, anything that you can put in a .htaccess file can also be placed inside the global configuration file. With mod_rewrite, there is a small differences if you put a rule in one or the other. Most notably:

If you’re putting […] rules in an .htaccess file […] the directory prefix (/) is removed from the REQUEST_URI variable, as all requests are automatically assumed to be relative to the current directory. – Apache Documentation

This is something to keep in mind if you see examples online or if you’re trying an example yourself: beware of the leading slash. I will attempt to clarify this below when we work through some examples together.


Regular Expressions

This tutorial does not intend to teach you regular expressions. For those of you who are familiar with them, the regular expressions used in mod_rewrite seem to vary between versions of Apache. In Apache 2.0 they’re Perl Compatible Regular Expressions (PCRE). This means that many of the shortcuts you are used to, such as \w referring to [A-Za-z0-9_], \d referring to [0-9], and much more do exist. However, my particular hosting company uses Apache 1.3 and the regular expressions are more limited.

Helpful RegEx Resources

If you don’t know regular expressions here are some useful tutorials that will bring you up to speed quickly.

And a few references that everyone should know about:

If you haven’t yet taken the time to learn regular expressions, I highly suggest doing so. It’s an incredibly helpful tool to have. As is usually the case, they are not quite as complex as some might think. I selected the links above from my years of experience working with regular expressions. I feel that these guides do a very good job of getting the basics across.

Regular expression knowledge is a necessity if you want to effectively use mod_rewrite.


Getting a Feel for it.

Okay, you’ve waited patiently enough; let’s run through a quick example. This is included in the linked source files. Here is the code from the .htaccess file:

# Enable Rewriting
RewriteEngine on

# Rewrite user URLs
#   Input:  user/NAME/
#   Output: user.php?id=NAME
RewriteRule ^user/(\w+)/?$ user.php?id=$1

Before I can explain any of the code above, we should quickly review the other files in the directory.

The directory contains an index.php and a user.php file. The index only has some links, of various formats, to the user page. The PHP code is used purely for debugging purposes to confirm that the page was accessed and what the given “id” parameter contained. Here is the contents of user.php:




    
    Simple mod\_rewrite example
    


  

You Are on user.php!

Welcome:

This example has a few different section. First, notice that URL Rewriting must be enabled via the RewriteEngine directive! If your .htaccess file is going to use rewrite rules, you should always include this line. Otherwise, you can’t be sure if its enabled or not. As a rule of thumb, always include it. The string “on” is case insensitive.

The first RewriteRule is for handling the user.php page. As the comments indicate, we are rewriting the friendly URL into the format of the normal URL. To do so, when the friendly URL comes in as input, we are actually transforming it into the standard query string URL. Breaking it down we get:

The Rule:
RewriteRule ^user/(\w+)/?$ user.php?id=$1

Pattern to Match:
^              Beginning of Input
user/          The REQUEST_URI starts with the literal string "user/"
(\w+)          Capture any word characters, put in $1
/?             Optional trailing slash "/"
$              End of Input

Substitute with:
user.php?id=   Literal string to use.
$1             The first (capture) noted above.

Here are some examples and an explanation for each:

User.php
Incoming Match Capture Outgoing Result
user.php?id=joe No   user.php?id=joe Normal
user/joe Yes joe user.php?id=joe Good
user/joe/ Yes joe user.php?id=joe Good
user/joe/x No   user/joe/x Fail

The first example goes through unaffected by the RewriteRule and works just fine. The second and third examples match the RewriteRule, are rewritten accordingly and end up working fine, as well. The last example does not match the rule and proceeds untouched. The server doesn’t have a user directory and fails trying to find it. This is as expected, because user/joe/x is a bad URL in the first place!

This example was rather easy to understand. However, that said, there were a lot of minute details that I glossed over. To execute more complex scripts, we should clarify exactly what is happening above. In the following section, I’m going to walk through every step in the cycle.

NOTE: If this example above didn’t work for you, it’s possible that your Apache or mod_rewrite versions are not PCRE compatible. Try changing ^user/(\w+)/?$ into ^user/([a-z]+)/?$. Notice that I did not use the \w shorthand. If this version works for you, then you will have to avoid the regex shortcuts and instead use their longer equivalents (see the Regular Expressions section above).


Flow of Execution in Detail

The flow of execution through the rewrite rules is simple, though not exactly straight forward. So, I’m going to break it down into painful detail.

It all begins with the user making a request to your server. They type a URL into their browser’s address bar, their browser translates that into an HTTP request to send to the server, Apache receives that request, and then parses it into pieces. Here is an example:

Full URL Analysis

Note that whenever I mention one of Apache’s variables, I use an odd looking syntax: %{APACHE_VAR}. I only do so because its similar to the syntax that mod_rewrite uses to access its variables. However, it is the name inside the braces that is important.

So what part does mod_rewrite deal with? If you’re working inside a .htaccess file, then you’re working with the REMOTE_URI portion but without the leading slash. I made of note of this before; it tends to be something that is very confusing for most people when they start out. If you’re working from inside the global configuration file, however, then you would leave the leading slash in.

To be as specific as possible, buried in the Apache Documentation is this description of the “URL Part” that mod_rewrite acts on:

The Pattern is always a regular expression matched against the URL-Path of the incoming request (the part after the hostname but before any question mark indicating the beginning of a query string). Apache Documentation

To remove any ambiguity, highlighted in gold in these two URLs below is the “URL Part” that mod_rewrite acts on inside a .htaccess file:

The Rewrite Portion of the URL

For the rest of this section, I’ll be using these two URLs to describe the flow of execution. I’ll also refer to the first url as the “green” URL and the second as the “blue” URL. I will be using “URL Part” throughout this analysis, meaning the REMOTE_URI without the leading slash.


URL vs. URI

For those pedantic readers, these two things that I am calling URLs are actually URIs. The definition of a Uniform Resource Identifier (URI) differs from a Uniform Resource Locator (URL).

  • URI: An indicator of where a resource is. This means that multiple URIs can point to the same resource but are themselves different addresses. Following a URI might take any number of hops or redirections until it actually arrives at the resource.
  • URL: a stricter term that identifies the exact location of a resource. This subtle difference has blurred over time such that nobody cares about the difference. I will continue to use the term URL, because people are more comfortable with it.

Now, we know what the rewrite rules are going to be acting on. Once Apache has parsed the request, it translates that to the file it thinks is needed and proceeds to fetch that file. At this point, it will traverse directories and encounter the .htaccess files. Assuming this file enables the RewriteEngine, any RewriteRule could change the URL. A drastic enough change (such as one that points Apache to another directory instead of the original directory it was heading toward) will cause Apache to issue a sub-request and proceed to fetch the new file.

In most cases, sub-requests are invisible to you.

In most cases, sub-requests are invisible to you. This implementation detail is not important to know for the majority of the simple rewrites that you will ever write or use. What is more important to know is how Apache processes the rewrite rules inside a .htaccess file.

The rules in a .htaccess file are processed in the order that they appear. Note that each RewriteRule is acting on the “URL Part” that is similar to the REMOTE_URI. When a rule makes a substitution, the modified “URL Part” will be handed to the next rule. This means that the URL that a rule is processing may have been edited by a previous rule! The URL is continually being updated by each rule that it matches. This is important to remember!

Flow Chart

Here is a flow chart that tries to provide a visualization of the generic flow of execution across multiple rules in a .htaccess file:

mod_rewrite flow chart

Note that, at the top of the flow chart, the value going into the rewrite rules is that “URL Part” and if the substitution is successful, the modified part proceeds into the next rule.

Each RewriteCond is associated with a single RewriteRule.

I referred to rewriting conditions earlier, but didn’t go into detail. Each RewriteCond is associated with a single RewriteRule. The conditions appear before the rule they are associated with one another, but only get evaluated if the rule’s pattern matched. As the flow chart illustrates, if a rewrite rule’s pattern matches, then Apache will check to see if there are any conditions for that rule. If there aren’t, then it will make the substitution and continue. If there are conditions, on the other hand, then it will only make the substitution if all of the conditions are true. Let’s visualize this in a concrete example.

The URLs that I’m working with are part of the “Profile Example” that I’ve included in the source code download in the “profile_example” directory. This is similar to the previous example with the user.php but it now has a profile.php page, an added rewrite rule, and a condition!

Let’s take a look at the code and Apache’s flow of execution through it:

Profile Rewrite Rules

Here, there are two rules. Rule #1 is the same as the user example we reviewed previously. Rule #2 is new; notice that it has a a condition. The “URL Part” we have been discussing goes through the rules in order, top to bottom.

The key to understanding this example is to first understand the goal. I am going to allow friendly profile URLs, but I’m actually going to explicitly forbid access to the PHP page directly. Note, some people might say argue that this is a bad idea. They might say that, as a developer this will make things harder for you to debug. That’s certainly true; I don’t actually recommend doing a trick like this, but it makes for an excellent example! More practical uses for mod_rewrite will show up later in this tutorial.

With that in mind, let’s see what happens with our green URL. We want this one to be successful.

Green URL Execution

At the top, you’ll see Apache’s THE_REQUEST variable. I put this at the top because, unlike many of the Apache variables we will deal with, during the duration of the request, this variables value will never change! That is one of the reasons why Rule #2 uses %{THE_REQUEST}. Underneath THE_REQUEST, we see the green “URL Part” going into the first rule:

  • The URL matches the pattern.
  • There are no conditions, so it continues.
  • The substitution is made.
  • There are no flags, so it continues.

After making it through the first rule, the URL has changed. The total URL has been rewritten to profile.php?id=joe, which Apache then breaks down and updates many of its variables. The ?id=joe portion gets hidden from us and profile.php, the new “URL Part”, continues into the second rule. This is our first encounter with conditions:

  • The URL matches the pattern.
  • There are conditions, so we will try the conditions.
  • THE_REQUEST does not contains profile.php, so the condition fails.
  • Because a condition failed, we ignore the substitution and flags.
  • The URL is unchanged by this rule.

At this point, we made it through all the rewrites and the profile.php?id=joe page will be fetched properly.


Here is how the execution looks for the blue URL – the one we want to fail:

Blue URL Execution

Again I place the THE_REQUEST value at the top. The blue “URL Part” enters Rule #1:

  • The URL does not match the pattern.
  • Everything else is ignored and the URL proceeds unchanged.

The first rule was easy. As is often the case, a URL that you have won’t match a rule’s pattern and will proceed untouched. Next, it enters Rule #2:

  • The URL matches the pattern.
  • There are conditions, so we will try the conditions.
  • THE_REQUEST contains profile.php, so the condition passes.
  • We can make the substitution.
  • ”-” is a special substition that means: don’t change anything.
  • There are flags on the rule, so we process the flags.
  • There is a F flag, which means return a forbidden response.
  • A 403 Forbidden response is sent to the client.

The F flag refers to a “forbidden response.”

A few things are worth re-iterating. In order for the substitution to work, all of the conditions have to pass. In this case, there is only one; it passes, so the substitution occurs. Note that - is a special substitution that doesn’t change anything. This is useful when you want to use flags to do something for you, which is exactly what we want to do in this case.

Here is the familiar table breakdown of example URLs and their responses:

Profile.php
Incoming Match Capture Outgoing Result
profile.php?id=joe Yes (#2)   profile.php?id=joe Forbidden
profile/joe Yes (#1) joe profile.php?id=joe Good
profile/joe/ Yes (#1) joe profile.php?id=joe Good
profile/joe/x No   profile/joe/x Fail

Syntax

Before going over the syntax of RewriteRule and RewriteCond, I suggest that you first download the AddedBytes Cheatsheet. This cheatsheet lists the most useful server variables and flags, has regular expression tips, and even a few examples.

Let’s start with RewriteRule. You can always visit Apache’s Documentation on RewriteRule if you require more information or instruction.

Syntax of RewriteRule

The cheatsheet, linked to above, displays the various flags that are available to you. While many tutorials cover these in detail, we’ll keep things simple and review the ones that I see most commonly used in real world projects.

Syntax of RewriteCond

Debug Workflow

When working with mod_rewrite and creating new rules, always begin with a simple, dumbed down version of the rule, and work your way up to the final version. Resist the urge to do everything at once. The same applies for conditions. Add rules and conditions one at a time. Test often!

The key concept that I am trying to get across with this approach is that this will let you know quickly if a change you made doesn’t function properly, or causes something to work incorrectly. Otherwise, you’ll inevitably run into some form of error, and will have to revert all of the changes you made to track down what the problem was. This is a very roller coaster approach and will likely lead to frustration. However, if you’re always steadily advancing, and each step along the way moving to workable checkpoints, you’ll be in much, much better shape.

People often ignore this advice, create a complex rule, and it ends up not working. Hours later they find out the problem was not in the complex portion, but instead was a simple mistake in the regular expression that could have been caught much earlier had they carefully constructed the rule like I’ve explained above. The same goes for deconstructing a rule to reverse engineer a problem. This approach will seriously reduce frustration!


In the Examples

In the examples below, we will always assume that the website’s domain is example.com. This domain name is important because it affects the HTTP_HOST variable, as well as specifies a redirect URL to another file on your website. Keep this in mind if you intend to modify any of the following examples for your own website. If so, simply replace “example.com” with your domain. For example, Nettuts+ would replace “example.com”; with “nettuts.com”.


Removing www

This is the most classic rewrite rule. The following script will listen for anyone who comes to your website via http://www.example.com. Those who do will receive a hard redirect, and, thus, the location bar in their browser will update accordingly.

RewriteEngine on
RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L] 

The RewriteRule above matches anything, and saves it as $1 – as specified by the wrapping parens. The important part in this example, though, is the RewriteCond. This condition checks the HTTP_HOST variable to determine if it started with “www.” If this condition is true, the rewrite occurs:

  • The substitution is a full URL (it starts with http://)
  • The substitution contains $1, which was captured earlier
  • The [R=301] flag redirects the browser to the rewritten URL. This is a hard redirect in the sense that it forces the browser to load the new page and update its location bar with the new URL.
  • The [L] flag indicates that this is the last rule to parse. Beyond this line, the rewrite engine should stop.

If the incoming URL had been “http://www.example.com/user/index.html”, then HTTP_HOST would have been set to www.example.com and the rewrite would trigger, creating http://example.com/user/index.html.

On the other hand, If the incoming URL had been “http://example.com/user/index.html”, then HTTP_HOST would have been example.com, the condition would fail, and the rewrite engine would proceed with the URL unchanged.


Forbid Hotlinking

Hotlinking, referred to as Inline Linking on Wikipedia, is the term used to describe one site leeching off of another site.

Hotlinking, referred to as Inline Linking on Wikipedia, is the term used to describe one site leeching off of another site. Usually one site – the Leecher – will include a link to some media file (let[s say an image or video) that is hosted on another site, the Content Host. In this scenario, the Content Host’s servers are wasting bandwidth serving content to some other website.

The most common and basic approach to preventing hotlinking is to whitelist a specified number of websites, and block everything else. To determine who is requesting the content from your site, you can check the referrer.

The HTTP_REFERER header is set by the browser or client that is requesting the resource.

Ultimately, is not 100% reliable, however, it's generally more than effective at ceasing the majority of hotlinking. So, in our script, we need to verify if the referrer is include in a whilelist of acceptable referrers. If not, then we should them a forbidden warning:

# Give Hotlinkers a 403 Forbidden warning.
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://example\.net/?.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://example\.com/?.*$ [NC]
RewriteRule \.(gif|jpe?g|png|bmp)$ – [F,NC]

Above, the RewriteRule is checking for the request of a file with any popular image extension, such as .gif, .png, or .jpg. Feel free to add other extensions to this list if you want to protect .flv, .swf, or other files.

The domains which are allowed to access this content are “example.net” and “example.com”. In either of these two instances, a Rewrite Conditions will fail and the substitution won’t occur. If any other domain makes an attempt, however - let's say “sample.com” - then all the Rewrite Conditions will pass, the substitution will happen, and the [F] forbidden action will trigger.


Give Hotlinkers a Warning Image

The previous example returns a 404 Forbidden warning when someone attempts to hotlink content from your server. You can actually go one step further, and send the hotlinker any resource of your choice! For instance, you can return a warning image with text stating, “hotlinking is not allowed”. This way, the abuser will realize their mistake and host a copy on their own server. The only required change is to follow through with the rewrite substitution, and provide your chosen image instead of the one being requested:

# Redirect Hotlinkers to "warning.png"
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://example\.net/?.*$
RewriteCond %{HTTP_REFERER} !^http://example\.com/?.*$   [NC]
RewriteRule \.(gif|jpe?g|png|bmp)$ http://example.com/warning.png [R,NC]

Note that this is an example of what I call a “hard” or “external” redirect. The RewriteRule has a URL in the substitution portion and it also has the [R] flag.


Custom 404

One neat trick that you can do with htaccess is to determine if the current “URL Part” leads to an actual file or directory on the web server. This is an excellent way to create a custom 404 “File not Found” page. For example, if a user tries to fetch a page in a particular directory that doesn’t exist, you can redirect him to any page you wish, such as the index page or a custom 404 page.

# Generic 404 to show the "custom_404.html" page
# If the requested page is not a file or directory
# Silent Redirect: the user's URL bar is unchanged.
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* custom_404.html [L]

This is a great example of mod_rewrite’s file test operators. They are identical to file tests in bash shell scripts and even Perl scripts. Above, the condition checks if the REQUEST_FILENAME is not a file and not a directory. In the case where it is neither, there is no such file for the request.

If the incoming request filename can’t be found, then this script loads a “custom404.html” page. Note that there is no [R] flag - this is a silent redirect, not a hard redirect. The user’s Location Bar will not change, but the contents of the page will be “custom404.html”.


Safety First

If you have various mod_rewrite snippets that you want to easily distribute to other servers or environments, you might want to be careful. Any invalid directive in a .htaccess file will likely trigger an internal server error. So, if an environment you move the snippet to doesn’t support mod_rewrite, you could temporarily break it.

One solution to this problem is the “check” for the mod_rewrite module. This is possible with any module; simply wrap your mod_rewrite code in an block and you’ll be all set:



  # Turn on
  RewriteEngine on

  # Always remove www (with a hard redirect)
  RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
  RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]

  # Generic 404 for anyplace on the site
  # ...



Conclusion

I hope that this tutorial has proven that mod_rewrite isn’t too scary. In fact, its quirks and speed bumps can be avoided with careful development practices. Let me know if you have ny questions!

Quick Tip: The Power of Google Analytics Custom Variables

0 comments

Posted on 21st April 2011 by Lukasz Koszela in internet |Uncategorized

, , , , , , , , , , google analytics, , , , , , , , , , , , , , , , Tips, , , , , , , ,

Advertise here

Familiarity with your users and their needs is the most important aspect of any successful site or web campaign. Google Analytics is the perfect tool to gather this sort of information. But there’s much more you can find out about your users when you begin using custom variables to make better decisions.


What are Custom Variables?

Using custom variables carries huge potential, as it allows you to acquire information about the behavior of your visitors.

You can treat custom variables as your own extension of metrics and dimensions. These provide you with the ability to gather non-standard and detailed data, which are not available via the Google Analytics panel.

Using custom variables carries huge potential, as it allows you to acquire information about the behavior of your visitors, which, in total, can significantly contribute to an increase of the ROI (return on investment) of the website or e-shop.

For example, using custom variables makes it possible to differentiate the activities of the logged in users versus those who never signed in. This provides an opportunity to observe the behavior of the website designed for a particular target group. For instance, we can learn which page of our website is the one that males – aged between 20 and 30 – visit most frequently. And this is just a tiny piece of the information that can be stored with the help of custom variables.


So How do Custom Variables Work?

The functionality of custom variables is wonderful in its simplicity. In fact, it’s based on the simple fact that, while performing a pre-defined activity, the user is labeled and the information about this label is subsequently stored in a cookie. Then, on the basis of this particular label, we can create a new section in the statistics of the Google Analytics panel.

Custom variables can be used in one of three ways:

  • page-level – this is the lowest level used to monitor particular ways in which the visitor interacts with the web page (e.g. AJAX, watching a video etc.)
  • session-level – a label attached to this level is added to the actions of the visitor throughout the session, and is deleted when the session cookie expires.
  • visitor-level – this is the highest level; the label is permanent and remains detached during subsequent visits until the visitor deletes the cookie or overwrites the value.

How Do I Configure Custom Variables?

Custom variables are quite easy to configure; you only need to add one line of code before the _trackPageview call.

_gaq.push(['._setCustomVar,INDEX, NAME, VALUE, OPT_SCOPE']);
  • INDEX (required) – determines a slot for a custom variable in Google Analytics. There are 5 slots available, numbered 1 to 5. You should remember that, if the variables are to function correctly, a single variable must be placed in a single slot.
  • NAME (required) – is the name of the variable that is going to appear in the report of the Google Analytics panel – Custom Variables
  • VALUE (optional) – is the actual value of the variable which is paired with NAME. Any number of such value pairs is possible, for instance if NAME=country, VALUE can, one by one, equal the values of US, GB, PL etc.
  • OPT_SCOPE (optional) – is a level within which a custom variable functions. As I described above, there are three levels: 1 (visitor-level), 2 (session-level), 3 (page-level). When this parameter is not provided, it is replaced with the default page-level.
        var _gaq = _gaq || [];
        _gaq.push(['_setAccount','UA-xxxxxxxx-x']);
        _gaq.push(['._setCustomVar,INDEX, NAME, VALUE, OPT_SCOPE']);
       _gaq.push(['_trackPageview']);

Some Practice

Now, let’s review how custom variables function in practice. Let’s suppose that we want to keep track of the visitors of our website’s – distinguishing between those who did and did not log in. In order to do so, before the _trackPageview call, we insert code describing the user.

_gaq.push(['._setCustomVar,
                        1,                          // first slot
                        'user-type',    // custom variable name
                        'visitor',          // custom variable value
                        2                           // custom variable scope - session-level
                        ]);

Once the visitor logs into your website, we change this code, accordingly:

_gaq.push(['._setCustomVar,
                        1,                          // first slot
                        'user-type',    // custom variable name
                        'regular-user', // custom variable value
                        2                           // custom variable scope - session-level
                        ]);

But What Follows?

It’s time to present the results of the described script. After the script had been executing for a week, an advanced segment in Google Analytics panel was created. Its aim is to divide the data in the panel into: total, report for logged-in users, and report for users who didn’t log in for particular metrics.

The segment itself is created through Advanced Segments => Create a new advanced segment. Then you should set the dimensions according to the screenshot below:

The variable which we defined using JavaScript was in the first slot, so we have to select Key 1 and Value 1. Then, we set the key which we are interested in (user-type), and the value for the defined key (visitor) together using concatenation. Next, we name and test the advanced segment. The number of visits during a particular period of time for particular segments will be calculated within the test.

We define the second segment which takes into account the logged-in users in the same way. We can create it by analogy with the pattern presented above – with the difference in that the custom variable is set as regular-user.

After establishing the two segments, we can activate them. The result is presented below. Such a set of data is a great basis for an in-depth analysis of the activities on a webpage.


It’s Worth Remembering…

  • Don’t duplicate the names of custom variables between the slots.
  • Set custom variables before the pageview call.
  • You cannot use more than five custom variables in a single request.
  • Consider event tracking instead of custom variables for some demands, as then no additional (false) pageviews are generated in the panel.
  • You can determine whether custom variables work by observing requests within Firebug, or with the help of the Chrome extension, Google Analytics Tracking Code Debugger.

This is Only the Beginning

Using custom variables translates into more justified and accurate site decisions.

The example presented in this article only illustrate the process of using a single custom variable, and determining the best way to manage a website, according to the type of visitor. Of course, this is only the beginning. Custom variables can become incredibly powerful when we combine several of them at the same time. As a quick example, with the applicable data from a website’s registration process, we can then track, not only the sex of the visitor (session-level), but also his or her age bracket (session-level). Further, we could divide the visitors into groups who have made purchases in our fictional eShop, or even track those who took a specific action, such as clicking on a Facebook button.

These techniques translate into more justified and accurate site decisions.

Vim Essential Plugin: NERDTree

0 comments

Posted on 21st April 2011 by Jeffrey Way in internet |Uncategorized

, , , , , , , , , , , , , , , , , , , , , , , , , Tips, , , , , , Videos, vim, vim essential plugins, , ,

Advertise here

In this episode of Vim Essential Plugins, we’ll review the fantastic NERDTree plugin, which is a much improved replacement for the traditional file explorer.

Subscribe to our YouTube and Blip.tv channels to watch more screencasts.

Usage

Begin by downloading the plugin to your Desktop (or any directory, really), and installing it.

cd ~/Desktop
git clone https://github.com/scrooloose/nerdtree.git
cd nerdtree
rake

With those few lines of code, the plugin is now installed! To open a NERDTree panel, in normal mode, we call :NERDTree.

NerdTree

At this point, we can open any file by typing o, or with the more convenient alias, the Enter key. Unlike the default file browser, this will open the new file directly into the buffer to the right of NERDTree, similar to what you might be used to in a program, like TextMate.

If you ever forget what the correct key is for a particular action, press ? to display a quickie help buffer.

Bookmarks

To expedite the process of navigating through your directory structures, NERDTree allows you to conveniently create bookmarks. Do so by moving the cursor to the directory that you wish to bookmark, and then typing:

:bookmark 

With this in place, you can now bring up your list of available bookmarks by pressing B. It’s a huge help!

Bookmark Section

Menu

Press the letter m to bring up a menu that will allow you to quickly add, move, copy, and delete nodes (or files). So, for example, if I want to create a new html file within a particular directory, I can type ma newfile.html.

The menu bar

There are plenty more useful shortcuts available, but I’ll leave it to you to discover them. Hint – research the cd and C commands; I use them religiously. Additionally, refer to the screencast above for more shortcuts.

The Perfect Workflow, with Git, GitHub, and SSH

0 comments

Posted on 18th April 2011 by Jeffrey Way in internet |Uncategorized

, , , , , , Command-line, , , , , git, github, , , , , , , , , , , , , , , , service hooks, ssh, terminal, , , , , , Videos, , ,

Advertise here

In this lesson, we’ll focus on workflow. More specifically, we’ll use the helpful GitHub service hooks to automatically update a project on our personal server whenever we push updates to a GitHub repo.


Prefer a Video Tutorial?

Press the HD for a clearer picture.
Subscribe to our YouTube and Blip.tv channels to watch more screencasts.

Step 1 - Create a Git Repo

We certainly need some sort of project to play around with, right? Let’s do that right now. Using which ever tool you prefer (I’d recommend Structurer), create a new directory, called awesomeProject, and add an index.html file. Feel free to populate this with some gibberish markup for the time being.

With our test directory in place, let’s create our first Git commit.

If you’re unfamiliar with Git, I highly recommend that you first review “Easy Version Control with Git.”

Open the command line:

cd path/to/awesomeProject
git init
git add .
git commit -m 'First commit'

Those familiar with Git should feel right at home. We’re creating a Git repo, adding all the files to the staging area, and are then creating our first commit.


Step 2 - Uploading to GitHub

The next step is to upload our project to GitHub. That way, we can easily call a git pull to download this project from any computer/server we wish.

Again, if you’re not familiar with GitHub, and haven’t yet created an account, read Terminal, Git, and GitHub for the Rest of Us.

Begin by creating a new Git repository.

Create a Git Repo

Next, you’ll need to fill in some details about your project. That’s simple:

Details

And finally, since we’re already working with an existing Git repo, we only need to run:

git remote add origin [email protected]:Your-Username/awesomeProject.git
git push -u origin master
Git

With that out of the way, our awesomeProject is now available on GitHub. That was easy!


Step 3 - SSH

Now, we certainly need some sort of live preview for our project, ideally stored on our own server. But this can be a pain sometimes. Push your updates to GitHub, login to your server, manually transfer the updated directory, etc. Granted, this only takes a moment or so, but when you make multiple changes through out the day, this can quickly become a burden.

But one step at a time. We’ll tackle this dilemma in Step 4. For now, let’s simply pull in our Git repo to our server. To do so, we need to SSH in.

Depending upon your host, your SSH credentials will vary slightly. Search Google for “your-host-name SSH,” and you’ll surely find the necessary instructions. Once you’re ready, let’s move along:

We’ll use my personal server as an example:

ssh [email protected]

And with those two lines, we’re in!

SSH

Next, we cd to the parent directory of where we wish to store awesomeProject. For me, this will be: cd domains/demo.jeffrey-way.com/html/. Of course, modify this according to your own directory structure.

Git Clone

Let’s clone the GitHub repo now.

git clone [email protected]:Your-User-Name/awesomeProject.git

Give that command a few seconds, but, before you know it, that directory is now available on your server, and, in my case, could be viewed at: http://demo.jeffrey-way.com/awesomeProject.


Step 4 - Creating a Connection

The inherent problem at this point is that there’s no specific connection between our GitHub repo and the directory stored on our server — at least not an automated connection. For example, if we update our source files on our local machine, and then push the changes to GitHub:

git add index.html
git commit -m 'Added photo of dancing chicken'
git push origin master

These changes will certainly not be reflected on our server. Of course they won’t! In order to do so, we must – once again – SSH into our server, cd to the awesomeProject directory, and perform another git pull to bring in the updated source files.

Wouldn’t it be great if, every time we pushed updates to GitHub, those new source files were automatically updated on our live preview server?

As it turns out, we can do this quite easily with GitHub service hooks.

GitHub Service Hooks

You can access this page by pressing the “Admin” button from within your GitHub repo, and then clicking “Service Hooks.” The “Post-Receive URL” option will instruct GitHub to send a POST request to the specified page every time you push to your GitHub repo. This is exactly what we need!

“We’ll hit these URLs with POST requests when you push to us, passing along information about the push.”

To make this work, we’ll need to create one more file that will handle the process of performing the git pull. Add a new file, called github.php (or anything you wish – preferably more vague), and add:


            

So now you’re thinking: “Jeff’s gone crazy. You can’t put a Bash script into a PHP string.” Well…yes you can, once you realize that those aren’t single quotes above, they’re back-ticks.

When you wrap a sequence in back-ticks, in PHP, it’ll be treated as a Bash script. In fact, it’s identical to using the bash_exec function.

Save that file, and upload it to the awesomeProject directory on your server. When finished, copy the url to that file, and paste it into the “Post-Receive URL” textbox. In my case, the url would be http://demo.jeffrey-way.com/awesomeProject/github.php.

With this in place, every single time you push to your GitHub repo, that file will be called, and the awesomeProject directory on your server will auto-update, without you needing to move a finger. Pretty nifty, ay?


You Also Might Enjoy: