Learning Zookeeper
It exposes a simple API, inspired by the filesystem API, that allows developers to implement common coordination tasks, such as electing a master server, managing group membership, and managing metadata, it enables coordination tasks for distributed systems.
When programming with ZooKeeper,
developers design their applications as a set of clients that connect to ZooKeeper servers
and invoke operations on them through the ZooKeeper client API
Strong consistency, ordering, and durability guarantees
• The ability to implement typical synchronization primitives
• A simpler way to deal with many aspects of concurrency that often lead to incorrect
behavior in real distributed systems
it exposes a file system-like API comprised of a small
set of calls that enables applications to implement their own primitives.
We typically
use recipes to denote these implementations of primitives. Recipes include ZooKeeper
operations that manipulate small data nodes, called znodes, that are organized hierarchically
as a tree, just like in a file system.
create /path data
delete /path
exists /path
setData /path data
getData /path
getChildren /path
ZooKeeper does not allow partial writes or reads of the znode data.
Different Modes for Znodes
Persistent and ephemeral znodes
A persistent znode /path can be deleted
only through a call to delete. An ephemeral znode, in contrast, is deleted if the client
that created it crashes or simply closes its connection to ZooKeeper.
Sequential znodes
A sequential znode is assigned a unique, monotonically
increasing integer. This sequence number is appended to the path used to
create the znode.
there are four options for the mode of a znode: persistent, ephemeral,
persistent_sequential, and ephemeral_sequential.
Watches and Notifications
clients register with ZooKeeper to receive notifications
of changes to znodes. Registering to receive a notification for a given znode
consists of setting a watch. A watch is a one-shot operation, which means that it triggers
one notification. To receive multiple notifications over time, the client must set a new
watch upon receiving each notification.
Because notifications
are one-shot operations, it is possible that new changes will occur to a znode between
a client receiving a notification for the znode and setting a new watch.
To observe this change, c1 has to actually read the state of /tasks, which it
does when setting the watch because we set watches with operations that read the state
of ZooKeeper. Consequently, c1 does not miss any changes.
One important guarantee of notifications is that they are delivered to a client before any
other change is made to the same znode.
notifications preserve the order of
updates the client observes. Although changes to the state of ZooKeeper may end up
propagating more slowly to any given client, we guarantee that clients observe changes
to the ZooKeeper state according to a global order
Versions
Each znode has a version number associated with it that is incremented every time its
data changes. A couple of operations in the API can be executed conditionally: setDa
ta and delete.
ZooKeeper servers run in two modes: standalone and quorum.
Standalone mode is
pretty much what the term says: there is a single server, and ZooKeeper state is not
replicated. In quorum mode, a group of ZooKeeper servers, which we call a ZooKeeper
ensemble, replicates the state, and together they serve client requests.
ZooKeeper Quorums
In ZooKeeper, it is the minimum number
of servers that have to be running and available in order for ZooKeeper to work. This
number is also the minimum number of servers that have to store a client’s data before
telling the client it is safely stored.
Sessions
All operations a client submits to ZooKeeper are associated
to a session. When a session ends for any reason, the ephemeral nodes created during
that session disappear.
The client initially connects to any server in the ensemble,
and only to a single server.
Sessions offer order guarantees, which means that requests in a session are executed in
FIFO (first in, first out) order.
dataDir=/users/me/zookeeper
bin/zkServer.sh start
zkServer.sh start-foreground
bin/zkServer.cmd
We are now ready to start a client.
zkCli.sh
ls /
create /workers ""
delete /workers
zkServer.sh stop
ZooKeeper determines freshness by ordering updates in the service: using transaction identifiers (zxids) for reconnecting.
ZooKeeper with Quorums
dataDir=./data
clientPort=2181
server.1=127.0.0.1:2222:2223
server.2=127.0.0.1:3333:3334
server.3=127.0.0.1:4444:4445
When we start up a server, it needs to know which server it is. A server figures out its
ID by reading a file named myid in the data directory.
echo 1 > z1/data/myid
zkServer.sh start-foreground
We then create configurations z2/z2.cfg and z3/z3.cfg by changing client
Port to be 2182 and 2183, respectively.
zkServer.sh start ./z1.cfg
zkCli.sh -server 127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
Implementing a Primitive: Locks with ZooKeeper
To acquire a lock, each process p
tries to create a znode, say /lock. If p succeeds in creating the znode, it has the lock and
can proceed to execute its critical section.
Other processes that try to create /lock fail so long as the znode exists. So, they watch
for changes to /lock and try to acquire the lock again once they detect that /lock has
been deleted. Upon receiving a notification that /lock has been deleted, if a process p'
is still interested in acquiring the lock, it repeats the steps of attempting to create /lock
and, if another process has created the znode already, watching it.
Implementation of a Master-Worker Example
The Master Role
create -e /master "master1.example.com:2223"
get /master
set a watch on the /master node
stat /master true
The stat command gets the attributes of a znode and allows us to set a watch on the
existence of the znode.
Having the parameter true after the path sets the watch
Workers, Tasks, and Assignments
create /workers ""
create /tasks ""
create /assign ""
Watch for changes in the children of /workers and /tasks:
ls /workers true
ls /tasks true
The Worker Role
create -e /workers/worker1.example.com "worker1.example.com:2224"
create /assign/worker1.example.com ""
ls /assign/worker1.example.com true
The Client Role
create -s /tasks/task- "cmd"
We make the task znode sequential to create an order for the tasks added, essentially
providing a queue.
Watch for the creation of the status znode:
ls /tasks/task-0000000000 true
The master next checks the new task, gets the list of available workers, and assigns it to
worker1.example.com:
ls /tasks
ls /workers
create /assign/worker1.example.com/task-0000000000 ""
The worker receives a notification that a new task has been assigned:
ls /assign/worker1.example.com
create /tasks/task-0000000000/status "done"
the client receives a notification and checks the result:
get /tasks/task-0000000000
get /tasks/task-0000000000/status
Setting the ZooKeeper CLASSPATH
ZOOBINDIR="<path_to_distro>/bin"
. "$ZOOBINDIR"/zkEnv.sh
watcher
An object we need to create that will receive session events.
Clients use the Watcher interface to monitor the health of the session with ZooKeeper. Events will be generated when a connection
is established or lost to a ZooKeeper server. They can also be used to monitor
changes to ZooKeeper data. Finally, if a session with ZooKeeper expires, an event
is delivered through the Watcher interface to notify the client application
telnet 127.0.0.1 2181
stat
dump
Often ZooKeeper is used within a trusted environment, so an open ACL is used.
ZooDefs.Ids.OPEN_ACL_UNSAFE gives all permissions to everyone.
Getting Mastership Asynchronously
zk.create("/master", serverId.getBytes(), Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL, masterCreateCallback, null);
Setting Up Metadata
The Admin Client
Stat stat = new Stat();
byte masterData[] = zk.getData("/master", false, stat);
Date startDate = new Date(stat.getCtime());
System.out.println("Master: " + new String(masterData) +
" since " + startDate);
Dealing with State Change
The primary mechanism ZooKeeper provides to
deal with changes is watches. With watches, a client registers its request to receive a onetime
notification of a change to a given znode.
We use None when the watched event is for
a change of the state of the ZooKeeper session.
When creating a ZooKeeper object, we need to pass a default Watcher
object. The ZooKeeper client uses this watcher to notify the application of changes to
the ZooKeeper state, in case the state of the session changes.
The Stat structure contains
information about the znode, such as the timestamp of the last change (zxid) that
changed this znode and the number of children in the znode.
An Alternative Way: Multiop
Multiop enables the execution of multiple ZooKeeper operations in a block atomically.
The execution is atomic in the sense that either all operations in a multiop block succeed
or all fail.
Op deleteZnode(String z) {
return Op.delete(z, -1);
}
List<OpResult> results = zk.multi(Arrays.asList(deleteZnode("/a/b"), deleteZnode("/a"));
Transaction is a wrapper for multi with a simpler interface. We can create an instance
of Transaction, add operations, and commit the transaction.
Transaction t = new Transaction();
t.delete("/a/b", -1);
t.delete("/a", -1);
List<OpResult> results = t.commit();
public void commit(MultiCallback cb, Object ctx);
Watches as a Replacement for Explicit Cache Management
Ordering Guarantees
Order of Writes
ZooKeeper state is replicated across all servers forming the ensemble of an installation.
The servers agree upon the order of state changes and apply them using the same order.
Order of Notifications
ZooKeeper orders notifications with respect to other notifications and asynchronous
replies, respecting the order of updates to the system state.
It exposes a simple API, inspired by the filesystem API, that allows developers to implement common coordination tasks, such as electing a master server, managing group membership, and managing metadata, it enables coordination tasks for distributed systems.
When programming with ZooKeeper,
developers design their applications as a set of clients that connect to ZooKeeper servers
and invoke operations on them through the ZooKeeper client API
Strong consistency, ordering, and durability guarantees
• The ability to implement typical synchronization primitives
• A simpler way to deal with many aspects of concurrency that often lead to incorrect
behavior in real distributed systems
it exposes a file system-like API comprised of a small
set of calls that enables applications to implement their own primitives.
We typically
use recipes to denote these implementations of primitives. Recipes include ZooKeeper
operations that manipulate small data nodes, called znodes, that are organized hierarchically
as a tree, just like in a file system.
create /path data
delete /path
exists /path
setData /path data
getData /path
getChildren /path
ZooKeeper does not allow partial writes or reads of the znode data.
Different Modes for Znodes
Persistent and ephemeral znodes
A persistent znode /path can be deleted
only through a call to delete. An ephemeral znode, in contrast, is deleted if the client
that created it crashes or simply closes its connection to ZooKeeper.
Sequential znodes
A sequential znode is assigned a unique, monotonically
increasing integer. This sequence number is appended to the path used to
create the znode.
there are four options for the mode of a znode: persistent, ephemeral,
persistent_sequential, and ephemeral_sequential.
Watches and Notifications
clients register with ZooKeeper to receive notifications
of changes to znodes. Registering to receive a notification for a given znode
consists of setting a watch. A watch is a one-shot operation, which means that it triggers
one notification. To receive multiple notifications over time, the client must set a new
watch upon receiving each notification.
Because notifications
are one-shot operations, it is possible that new changes will occur to a znode between
a client receiving a notification for the znode and setting a new watch.
To observe this change, c1 has to actually read the state of /tasks, which it
does when setting the watch because we set watches with operations that read the state
of ZooKeeper. Consequently, c1 does not miss any changes.
One important guarantee of notifications is that they are delivered to a client before any
other change is made to the same znode.
notifications preserve the order of
updates the client observes. Although changes to the state of ZooKeeper may end up
propagating more slowly to any given client, we guarantee that clients observe changes
to the ZooKeeper state according to a global order
Versions
Each znode has a version number associated with it that is incremented every time its
data changes. A couple of operations in the API can be executed conditionally: setDa
ta and delete.
ZooKeeper servers run in two modes: standalone and quorum.
Standalone mode is
pretty much what the term says: there is a single server, and ZooKeeper state is not
replicated. In quorum mode, a group of ZooKeeper servers, which we call a ZooKeeper
ensemble, replicates the state, and together they serve client requests.
ZooKeeper Quorums
In ZooKeeper, it is the minimum number
of servers that have to be running and available in order for ZooKeeper to work. This
number is also the minimum number of servers that have to store a client’s data before
telling the client it is safely stored.
Sessions
All operations a client submits to ZooKeeper are associated
to a session. When a session ends for any reason, the ephemeral nodes created during
that session disappear.
The client initially connects to any server in the ensemble,
and only to a single server.
Sessions offer order guarantees, which means that requests in a session are executed in
FIFO (first in, first out) order.
dataDir=/users/me/zookeeper
bin/zkServer.sh start
zkServer.sh start-foreground
bin/zkServer.cmd
We are now ready to start a client.
zkCli.sh
ls /
create /workers ""
delete /workers
zkServer.sh stop
ZooKeeper determines freshness by ordering updates in the service: using transaction identifiers (zxids) for reconnecting.
ZooKeeper with Quorums
dataDir=./data
clientPort=2181
server.1=127.0.0.1:2222:2223
server.2=127.0.0.1:3333:3334
server.3=127.0.0.1:4444:4445
When we start up a server, it needs to know which server it is. A server figures out its
ID by reading a file named myid in the data directory.
echo 1 > z1/data/myid
zkServer.sh start-foreground
We then create configurations z2/z2.cfg and z3/z3.cfg by changing client
Port to be 2182 and 2183, respectively.
zkServer.sh start ./z1.cfg
zkCli.sh -server 127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
Implementing a Primitive: Locks with ZooKeeper
To acquire a lock, each process p
tries to create a znode, say /lock. If p succeeds in creating the znode, it has the lock and
can proceed to execute its critical section.
Other processes that try to create /lock fail so long as the znode exists. So, they watch
for changes to /lock and try to acquire the lock again once they detect that /lock has
been deleted. Upon receiving a notification that /lock has been deleted, if a process p'
is still interested in acquiring the lock, it repeats the steps of attempting to create /lock
and, if another process has created the znode already, watching it.
Implementation of a Master-Worker Example
The Master Role
create -e /master "master1.example.com:2223"
get /master
set a watch on the /master node
stat /master true
The stat command gets the attributes of a znode and allows us to set a watch on the
existence of the znode.
Having the parameter true after the path sets the watch
Workers, Tasks, and Assignments
create /workers ""
create /tasks ""
create /assign ""
Watch for changes in the children of /workers and /tasks:
ls /workers true
ls /tasks true
The Worker Role
create -e /workers/worker1.example.com "worker1.example.com:2224"
create /assign/worker1.example.com ""
ls /assign/worker1.example.com true
The Client Role
create -s /tasks/task- "cmd"
We make the task znode sequential to create an order for the tasks added, essentially
providing a queue.
Watch for the creation of the status znode:
ls /tasks/task-0000000000 true
The master next checks the new task, gets the list of available workers, and assigns it to
worker1.example.com:
ls /tasks
ls /workers
create /assign/worker1.example.com/task-0000000000 ""
The worker receives a notification that a new task has been assigned:
ls /assign/worker1.example.com
create /tasks/task-0000000000/status "done"
the client receives a notification and checks the result:
get /tasks/task-0000000000
get /tasks/task-0000000000/status
Setting the ZooKeeper CLASSPATH
ZOOBINDIR="<path_to_distro>/bin"
. "$ZOOBINDIR"/zkEnv.sh
watcher
An object we need to create that will receive session events.
Clients use the Watcher interface to monitor the health of the session with ZooKeeper. Events will be generated when a connection
is established or lost to a ZooKeeper server. They can also be used to monitor
changes to ZooKeeper data. Finally, if a session with ZooKeeper expires, an event
is delivered through the Watcher interface to notify the client application
telnet 127.0.0.1 2181
stat
dump
Often ZooKeeper is used within a trusted environment, so an open ACL is used.
ZooDefs.Ids.OPEN_ACL_UNSAFE gives all permissions to everyone.
Getting Mastership Asynchronously
zk.create("/master", serverId.getBytes(), Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL, masterCreateCallback, null);
Setting Up Metadata
The Admin Client
Stat stat = new Stat();
byte masterData[] = zk.getData("/master", false, stat);
Date startDate = new Date(stat.getCtime());
System.out.println("Master: " + new String(masterData) +
" since " + startDate);
Dealing with State Change
The primary mechanism ZooKeeper provides to
deal with changes is watches. With watches, a client registers its request to receive a onetime
notification of a change to a given znode.
We use None when the watched event is for
a change of the state of the ZooKeeper session.
When creating a ZooKeeper object, we need to pass a default Watcher
object. The ZooKeeper client uses this watcher to notify the application of changes to
the ZooKeeper state, in case the state of the session changes.
The Stat structure contains
information about the znode, such as the timestamp of the last change (zxid) that
changed this znode and the number of children in the znode.
An Alternative Way: Multiop
Multiop enables the execution of multiple ZooKeeper operations in a block atomically.
The execution is atomic in the sense that either all operations in a multiop block succeed
or all fail.
Op deleteZnode(String z) {
return Op.delete(z, -1);
}
List<OpResult> results = zk.multi(Arrays.asList(deleteZnode("/a/b"), deleteZnode("/a"));
Transaction is a wrapper for multi with a simpler interface. We can create an instance
of Transaction, add operations, and commit the transaction.
Transaction t = new Transaction();
t.delete("/a/b", -1);
t.delete("/a", -1);
List<OpResult> results = t.commit();
public void commit(MultiCallback cb, Object ctx);
Watches as a Replacement for Explicit Cache Management
Ordering Guarantees
Order of Writes
ZooKeeper state is replicated across all servers forming the ensemble of an installation.
The servers agree upon the order of state changes and apply them using the same order.
Order of Notifications
ZooKeeper orders notifications with respect to other notifications and asynchronous
replies, respecting the order of updates to the system state.
No comments:
Post a Comment