Python's class mechanism adds classes to the language with a minimum of new syntax and semantics. It is a mixture of the class mechanisms found in C++ and Modula-3. As is true for modules, classes in Python do not put an absolute barrier between definition and user, but rather rely on the politeness of the user not to ``break into the definition.'' The most important features of classes are retained with full power, however: the class inheritance mechanism allows multiple base classes, a derived class can override any methods of its base class or classes, a method can call the method of a base class with the same name. Objects can contain an arbitrary amount of private data.
In C++ terminology, all class members (including the data members) are public, and all member functions are virtual. There are no special constructors or destructors. As in Modula-3, there are no shorthands for referencing the object's members from its methods: the method function is declared with an explicit first argument representing the object, which is provided implicitly by the call. As in Smalltalk, classes themselves are objects, albeit in the wider sense of the word: in Python, all data types are objects. This provides semantics for importing and renaming. Unlike C++ and Modula-3, built-in types can be used as base classes for extension by the user. Also, like in C++ but unlike in Modula-3, most built-in operators with special syntax (arithmetic operators, subscripting etc.) can be redefined for class instances.
Lacking universally accepted terminology to talk about classes, I will make occasional use of Smalltalk and C++ terms. (I would use Modula-3 terms, since its object-oriented semantics are closer to those of Python than C++, but I expect that few readers have heard of it.)
I also have to warn you that there's a terminological pitfall for object-oriented readers: the word ``object'' in Python does not necessarily mean a class instance. Like C++ and Modula-3, and unlike Smalltalk, not all types in Python are classes: the basic built-in types like integers and lists are not, and even somewhat more exotic types like files aren't. However, all Python types share a little bit of common semantics that is best described by using the word object.
Objects have individuality, and multiple names (in multiple scopes) can be bound to the same object. This is known as aliasing in other languages. This is usually not appreciated on a first glance at Python, and can be safely ignored when dealing with immutable basic types (numbers, strings, tuples). However, aliasing has an (intended!) effect on the semantics of Python code involving mutable objects such as lists, dictionaries, and most types representing entities outside the program (files, windows, etc.). This is usually used to the benefit of the program, since aliases behave like pointers in some respects. For example, passing an object is cheap since only a pointer is passed by the implementation; and if a function modifies an object passed as an argument, the caller will see the change -- this eliminates the need for two different argument passing mechanisms as in Pascal.
Before introducing classes, I first have to tell you something about Python's scope rules. Class definitions play some neat tricks with namespaces, and you need to know how scopes and namespaces work to fully understand what's going on. Incidentally, knowledge about this subject is useful for any advanced Python programmer.
Let's begin with some definitions.
A namespace is a mapping from names to objects. Most namespaces are currently implemented as Python dictionaries, but that's normally not noticeable in any way (except for performance), and it may change in the future. Examples of namespaces are: the set of built-in names (functions such as abs(), and built-in exception names); the global names in a module; and the local names in a function invocation. In a sense the set of attributes of an object also form a namespace. The important thing to know about namespaces is that there is absolutely no relation between names in different namespaces; for instance, two different modules may both define a function ``maximize'' without confusion -- users of the modules must prefix it with the module name.
By the way, I use the word attribute for any name following a
dot -- for example, in the expression
Attributes may be read-only or writable. In the latter case,
assignment to attributes is possible. Module attributes are writable:
you can write "modname.the_answer = 42. Writable attributes may
also be deleted with the del statement. For example,
"del modname.the_answerwill remove the attribute
the_answer from the object named by
Name spaces are created at different moments and have different
lifetimes. The namespace containing the built-in names is created
when the Python interpreter starts up, and is never deleted. The
global namespace for a module is created when the module definition
is read in; normally, module namespaces also last until the
interpreter quits. The statements executed by the top-level
invocation of the interpreter, either read from a script file or
interactively, are considered part of a module called
__main__, so they have their own global namespace. (The
built-in names actually also live in a module; this is called
__builtin__.)
The local namespace for a function is created when the function is
called, and deleted when the function returns or raises an exception
that is not handled within the function. (Actually, forgetting would
be a better way to describe what actually happens.) Of course,
recursive invocations each have their own local namespace.
A scope is a textual region of a Python program where a
namespace is directly accessible. ``Directly accessible'' here means
that an unqualified reference to a name attempts to find the name in
the namespace.
Although scopes are determined statically, they are used dynamically.
At any time during execution, there are at least three nested scopes whose
namespaces are directly accessible: the innermost scope, which is searched
first, contains the local names; the namespaces of any enclosing
functions, which are searched starting with the nearest enclosing scope;
the middle scope, searched next, contains the current module's global names;
and the outermost scope (searched last) is the namespace containing built-in
names.
If a name is declared global, then all references and assignments go
directly to the middle scope containing the module's global names.
Otherwise, all variables found outside of the innermost scope are read-only.
Usually, the local scope references the local names of the (textually)
current function. Outside of functions, the local scope references
the same namespace as the global scope: the module's namespace.
Class definitions place yet another namespace in the local scope.
It is important to realize that scopes are determined textually: the
global scope of a function defined in a module is that module's
namespace, no matter from where or by what alias the function is
called. On the other hand, the actual search for names is done
dynamically, at run time -- however, the language definition is
evolving towards static name resolution, at ``compile'' time, so don't
rely on dynamic name resolution! (In fact, local variables are
already determined statically.)
A special quirk of Python is that assignments always go into the
innermost scope. Assignments do not copy data -- they just
bind names to objects. The same is true for deletions: the statement
"del xremoves the binding of
Classes introduce a little bit of new syntax, three new object types,
and some new semantics.
The simplest form of class definition looks like this:
Class definitions, like function definitions
(def statements) must be executed before they have any
effect. (You could conceivably place a class definition in a branch
of an if statement, or inside a function.)
In practice, the statements inside a class definition will usually be
function definitions, but other statements are allowed, and sometimes
useful -- we'll come back to this later. The function definitions
inside a class normally have a peculiar form of argument list,
dictated by the calling conventions for methods -- again, this is
explained later.
When a class definition is entered, a new namespace is created, and
used as the local scope -- thus, all assignments to local variables
go into this new namespace. In particular, function definitions bind
the name of the new function here.
When a class definition is left normally (via the end), a class
object is created. This is basically a wrapper around the contents
of the namespace created by the class definition; we'll learn more
about class objects in the next section. The original local scope
(the one in effect just before the class definitions was entered) is
reinstated, and the class object is bound here to the class name given
in the class definition header (ClassName in the example).
Class objects support two kinds of operations: attribute references
and instantiation.
Attribute references use the standard syntax used for all
attribute references in Python:
then
Class instantiation uses function notation. Just pretend that
the class object is a parameterless function that returns a new
instance of the class. For example (assuming the above class):
creates a new instance of the class and assigns this object to
the local variable
The instantiation operation (``calling'' a class object) creates an
empty object. Many classes like to create objects in a known initial
state. Therefore a class may define a special method named
__init__(), like this:
When a class defines an __init__() method, class
instantiation automatically invokes __init__() for the
newly-created class instance. So in this example, a new, initialized
instance can be obtained by:
Of course, the __init__() method may have arguments for
greater flexibility. In that case, arguments given to the class
instantiation operator are passed on to __init__(). For
example,
Now what can we do with instance objects? The only operations
understood by instance objects are attribute references. There are
two kinds of valid attribute names.
The first I'll call data attributes. These correspond to
``instance variables'' in Smalltalk, and to ``data members'' in
C++. Data attributes need not be declared; like local variables,
they spring into existence when they are first assigned to. For
example, if
The second kind of attribute references understood by instance objects
are methods. A method is a function that ``belongs to'' an
object. (In Python, the term method is not unique to class instances:
other object types can have methods as well. For example, list objects have
methods called append, insert, remove, sort, and so on. However,
below, we'll use the term method exclusively to mean methods of class
instance objects, unless explicitly stated otherwise.)
Valid method names of an instance object depend on its class. By
definition, all attributes of a class that are (user-defined) function
objects define corresponding methods of its instances. So in our
example,
Usually, a method is called immediately:
In our example, this will return the string
will continue to print "hello worlduntil the end of time.
What exactly happens when a method is called? You may have noticed
that
Actually, you may have guessed the answer: the special thing about
methods is that the object is passed as the first argument of the
function. In our example, the call
If you still don't understand how methods work, a look at the
implementation can perhaps clarify matters. When an instance
attribute is referenced that isn't a data attribute, its class is
searched. If the name denotes a valid class attribute that is a
function object, a method object is created by packing (pointers to)
the instance object and the function object just found together in an
abstract object: this is the method object. When the method object is
called with an argument list, it is unpacked again, a new argument
list is constructed from the instance object and the original argument
list, and the function object is called with this new argument list.
Data attributes override method attributes with the same name; to
avoid accidental name conflicts, which may cause hard-to-find bugs in
large programs, it is wise to use some kind of convention that
minimizes the chance of conflicts. Possible conventions include
capitalizing method names, prefixing data attribute names with a small
unique string (perhaps just an underscore), or using verbs for methods
and nouns for data attributes.
Data attributes may be referenced by methods as well as by ordinary
users (``clients'') of an object. In other words, classes are not
usable to implement pure abstract data types. In fact, nothing in
Python makes it possible to enforce data hiding -- it is all based
upon convention. (On the other hand, the Python implementation,
written in C, can completely hide implementation details and control
access to an object if necessary; this can be used by extensions to
Python written in C.)
Clients should use data attributes with care -- clients may mess up
invariants maintained by the methods by stamping on their data
attributes. Note that clients may add data attributes of their own to
an instance object without affecting the validity of the methods, as
long as name conflicts are avoided -- again, a naming convention can
save a lot of headaches here.
There is no shorthand for referencing data attributes (or other
methods!) from within methods. I find that this actually increases
the readability of methods: there is no chance of confusing local
variables and instance variables when glancing through a method.
Conventionally, the first argument of methods is often called
Any function object that is a class attribute defines a method for
instances of that class. It is not necessary that the function
definition is textually enclosed in the class definition: assigning a
function object to a local variable in the class is also ok. For
example:
Now
Bhopal newsz.real, real is
an attribute of the object z. Strictly speaking, references to
names in modules are attribute references: in the expression
modname.funcname, modname is a module object and
funcname is an attribute of it. In this case there happens to
be a straightforward mapping between the module's attributes and the
global names defined in the module: they share the same namespace!
9.1
modname.
x from the namespace
referenced by the local scope. In fact, all operations that introduce
new names use the local scope: in particular, import statements and
function definitions bind the module or function name in the local
scope. (The global statement can be used to indicate that
particular variables live in the global scope.)
9.3 A First Look at Classes
9.3.1 Class Definition Syntax
9.3.2 Class Objects
obj.name. Valid attribute
names are all the names that were in the class's namespace when the
class object was created. So, if the class definition looked like
this:
MyClass.i and MyClass.f are valid attribute
references, returning an integer and a method object, respectively.
Class attributes can also be assigned to, so you can change the value
of MyClass.i by assignment. __doc__ is also a valid
attribute, returning the docstring belonging to the class: "A
simple example class.
x.
9.3.3 Instance Objects
x is the instance of MyClass created above,
the following piece of code will print the value 16, without
leaving a trace:
x.f is a valid method reference, since
MyClass.f is a function, but x.i is not, since
MyClass.i is not. But x.f is not the same thing as
MyClass.f -- it is a method object, not
a function object.
9.3.4 Method Objects
'hello world'.
However, it is not necessary to call a method right away:
x.f is a method object, and can be stored away and called at a
later time. For example:
x.f() was called without an argument above, even though
the function definition for f specified an argument. What
happened to the argument? Surely Python raises an exception when a
function that requires an argument is called without any -- even if
the argument isn't actually used...
x.f() is exactly equivalent
to MyClass.f(x). In general, calling a method with a list of
n arguments is equivalent to calling the corresponding function
with an argument list that is created by inserting the method's object
before the first argument.
9.4 Random Remarks
self. This is nothing more than a convention: the name
self has absolutely no special meaning to Python. (Note,
however, that by not following the convention your code may be less
readable by other Python programmers, and it is also conceivable that
a class browser program be written which relies upon such a
convention.)
f, g and h are all attributes of class
C that refer to function objects, and consequently they are all
methods of instances of C -- h being
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100