Set - 5

Question 1 :

How do I make a Python script executable on Unix? 

Answer :

You need to do two things: the script file's mode must be executable and the first line must begin with #! followed by the path of the Python interpreter. 
The first is done by executing chmod +x scriptfile or perhaps chmod 755 scriptfile. 
The second can be done in a number of ways. The most straightforward way is to write

#!/usr/local/bin/python

as the very first line of your file, using the pathname for where the Python interpreter is installed on your platform. 
If you would like the script to be independent of where the Python interpreter lives, you can use the "env" program. Almost all Unix variants support the following, assuming the python interpreter is in a directory on the user's $PATH:

#! /usr/bin/env python

Don't do this for CGI scripts. The $PATH variable for CGI scripts is often very minimal, so you need to use the actual absolute pathname of the interpreter. 
Occasionally, a user's environment is so full that the /usr/bin/env program fails; or there's no env program at all. In that case, you can try the following hack (due to Alex Rezinsky): 

#! /bin/sh
""":"
exec python $0 ${1+"$@"}
"""

The minor disadvantage is that this defines the script's __doc__ string. However, you can fix that by adding

__doc__ = """...Whatever..."""

 


Question 2 :

Why don't my signal handlers work?

Answer :

The most common problem is that the signal handler is declared with the wrong argument list. It is called as

handler(signum, frame)

so it should be declared with two arguments:

def handler(signum, frame):
...

 


Question 3 :

How do I test a Python program or component? 

Answer :

Python comes with two testing frameworks. The doctest module finds examples in the docstrings for a module and runs them, comparing the output with the expected output given in the docstring. 
The unittest module is a fancier testing framework modelled on Java and Smalltalk testing frameworks. 
For testing, it helps to write the program so that it may be easily tested by using good modular design. Your program should have almost all functionality encapsulated in either functions or class methods -- and this sometimes has the surprising and delightful effect of making the program run faster (because local variable accesses are faster than global accesses). Furthermore the program should avoid depending on mutating global variables, since this makes testing much more difficult to do. 

The "global main logic" of your program may be as simple as

if __name__=="__main__":
main_logic()

at the bottom of the main module of your program. 

Once your program is organized as a tractable collection of functions and class behaviours you should write test functions that exercise the behaviours. A test suite can be associated with each module which automates a sequence of tests. This sounds like a lot of work, but since Python is so terse and flexible it's surprisingly easy. You can make coding much more pleasant and fun by writing your test functions in parallel with the "production code", since this makes it easy to find bugs and even design flaws earlier. 
"Support modules" that are not intended to be the main module of a program may include a self-test of the module.

if __name__ == "__main__":
self_test()

Even programs that interact with complex external interfaces may be tested when the external interfaces are unavailable by using "fake" interfaces implemented in Python.


Question 4 :

None of my threads seem to run: why? 

Answer :

As soon as the main thread exits, all threads are killed. Your main thread is running too quickly, giving the threads no time to do any work. 
A simple fix is to add a sleep to the end of the program that's long enough for all the threads to finish:

import threading, time 
def thread_task(name, n):
for i in range(n): print name, i
for i in range(10):
T = threading.Thread(target=thread_task, args=(str(i), i))
T.start()
time.sleep(10) # <----------------------------!

But now (on many platforms) the threads don't run in parallel, but appear to run sequentially, one at a time! The reason is that the OS thread scheduler doesn't start a new thread until the previous thread is blocked. 

A simple fix is to add a tiny sleep to the start of the run function: 

def thread_task(name, n):
time.sleep(0.001) # <---------------------!
for i in range(n): print name, i
for i in range(10):
T = threading.Thread(target=thread_task, args=(str(i), i))
T.start()
time.sleep(10)

Instead of trying to guess how long a time.sleep() delay will be enough, it's better to use some kind of semaphore mechanism. One idea is to use the Queue module to create a queue object, let each thread append a token to the queue when it finishes, and let the main thread read as many tokens from the queue as there are threads.


Question 5 :

How do I parcel out work among a bunch of worker threads?

Answer :

Use the Queue module to create a queue containing a list of jobs. The Queue class maintains a list of objects with .put(obj) to add an item to the queue and .get() to return an item. The class will take care of the locking necessary to ensure that each job is handed out exactly once. 
Here's a trivial example:

import threading, Queue, time 
# The worker thread gets jobs off the queue. When the queue is empty, it 
# assumes there will be no more work and exits. 
# (Realistically workers will run until terminated.)
def worker ():
print 'Running worker'
time.sleep(0.1)
while True:
try:
	arg = q.get(block=False)
except Queue.Empty:
	print 'Worker', threading.currentThread(),
	print 'queue empty'
	break
else:
	print 'Worker', threading.currentThread(),
	print 'running with argument', arg
time.sleep(0.5)
# Create queue
q = Queue.Queue()
# Start a pool of 5 workers
for i in range(5):
t = threading.Thread(target=worker, name='worker %i' % (i+1))
t.start()
# Begin adding work to the queue
for i in range(50):
q.put(i)
# Give threads time to run
print 'Main thread sleeping'
time.sleep(5)

When run, this will produce the following output: 

Running worker Running worker Running worker Running worker Running worker Main thread sleeping Worker <Thread(worker 1, started)> running with argument 0 Worker <Thread(worker 2, started)> running with argument 1 Worker <Thread(worker 3, started)> running with argument 2 Worker <Thread(worker 4, started)> running with argument 3 Worker <Thread(worker 5, started)> running with argument 4 Worker <Thread(worker 1, started)> running with argument 5 ...


Question 6 :

How do I delete a file? (And other file questions...)

Answer :

Use os.remove(filename) or os.unlink(filename);

 


Question 7 :

How do I copy a file? 

Answer :

The shutil module contains a copyfile() function.


Question 8 :

How do I read (or write) binary data?

Answer :

or complex data formats, it's best to use the struct module. It allows you to take a string containing binary data (usually numbers) and convert it to Python objects; and vice versa. 
For example, the following code reads two 2-byte integers and one 4-byte integer in big-endian format from a file:

import struct 
f = open(filename, "rb") # Open in binary mode for portability
s = f.read(8)
x, y, z = struct.unpack(">hhl", s)

The '>' in the format string forces big-endian data; the letter 'h' reads one "short integer" (2 bytes), and 'l' reads one "long integer" (4 bytes) from the string.


Question 9 :

How do I run a subprocess with pipes connected to both input and output?

Use the popen2 module. For example:

Answer :

import popen2
fromchild, tochild = popen2.popen2("command")
tochild.write("input\n")
tochild.flush()
output = fromchild.readline()

 


Question 10 :

How can I mimic CGI form submission (METHOD=POST)? 
I would like to retrieve web pages that are the result of POSTing a form. Is there existing code that would let me do this easily? 

Answer :

Yes. Here's a simple example that uses httplib:

#!/usr/local/bin/python 
import httplib, sys, time 
### build the query string
qs = "First=Josephine&MI=Q&Last=Public"
### connect and send the server a path
httpobj = httplib.HTTP('www.some-server.out-there', 80)
httpobj.putrequest('POST', '/cgi-bin/some-cgi-script')
### now generate the rest of the HTTP headers...
httpobj.putheader('Accept', '*/*')
httpobj.putheader('Connection', 'Keep-Alive')
httpobj.putheader('Content-type', 'application/x-www-form-urlencoded')
httpobj.putheader('Content-length', '%d' % len(qs))
httpobj.endheaders()
httpobj.send(qs)
### find out what the server said in response...
reply, msg, hdrs = httpobj.getreply()
if reply != 200:
sys.stdout.write(httpobj.getfile().read())

Note that in general for URL-encoded POST operations, query strings must be quoted by using urllib.quote(). For example to send name="Guy Steele, Jr.":

>>> from urllib import quote
>>> x = quote("Guy Steele, Jr.")
>>> x
'Guy%20Steele,%20Jr.'
>>> query_string = "name="+x
>>> query_string
'name=Guy%20Steele,%20Jr.'