Planet Fellowship (en)
Friday, 24 May 2013
JAX: Projec Nashorn
Inductive Bias | 20:41, Friday, 24 May 2013
The last talk I went to was on project Nashorn - demonstrating the capability
to run dynamic languages on the JVM by writing a JavaScript implementation as a
proof of concept that is fully ECMA compliant and still performs better than
Mozilla’s project Rhino.
It was nice to see Lisp, created in 1962, referenced as being the first
language that featured a JIT compiler as well as garbage collection. It was
also good to see Smalltalk referenced as pioneering class libraries, visual GUI
driven IDEs and bytecode.
As such Java essentially stands on the shoulders of giants. Now dynamic
language writers can themselves use the JVM to boost their productivity by
profiting from the VM’s memory management, JIT optimisations, native threading.
The result could be a smaller code base and more time to concentrate on
interesting language features (of course another result would be that the JVM
becomes interesting not only for Java developers but also to people who want to
use dynamic languages instead).
The projects invoke dynamic as well as the DaVinci machine are both interesting
areas for people to follow who are interested in running dynamic languages on
the JVM.
Thursday, 23 May 2013
JAX: Tales from production
Inductive Bias | 20:38, Thursday, 23 May 2013
In a second presentation Peter Roßbach together with Andreas Schmidt provided
some more detail on what the topic logging entails in real world projects.
Development messages turn into valuable information needed to uncover issues
and downtime of systems, capacity planning, measuring the effect of software
changes, analysing resource usage under real world usage. In addition to these
technical use cases there is a need to provide business metrics.
When dealing with multiple systems you deal with correlating values across
machines and systems, providing meaningful visualisations to draw the correct
decisions.
When thinking of your log architecture you might want to consider storing not
only log messages. In addition facts like release numbers should be tracked
somewhere - ready to join in when needed to correlate behaviour with release
version. To do that also track events like rolling out a release to production.
Launching in a new market, switching traffic to a new system could be other
events. Introduce not only pure log messages but also provide aggregated
metrics and counters. All of these pieces should be stored and tracked
automatically to free operations for more important work.
Have you ever thought about documenting not only your software, it’s interfaces
and input/output format? What about documenting the logged information as well?
What about the fields contained in each log message? Are they documented or do
people have to infer their meaning from the content? What about valid ranges
for values - are they noted down somewhere? Did you store whether a specific
field can only contain integers or whether some day it also could contain
letters? What about the number format - is it decimal, hexadecimal?
For a nice architecture documentation of the BBC checkout
Winning the metrics battle by the BBC dev blog.
There’s an abundance of tools out there to help you with all sorts of logging
related topics:
- For visualisation and transport: Datadog, kibana, logstash, statsd,
graphite, syslog-ng
- For providing the values: JMX, metrics, Jolokia
- For collection: collecd, statsd, graphite, newrelic, datadog
- For storage: typical RRD tools including RRD4j, MongoDB, OpenTSDB based
on HBase, Hadoop
- For charting: Munin, Cacti, Nagios, Graphit, Ganglia, New Relic, Datadog
- For Profiling: Dynatrace, New Relic, Boundary
- For events: Zabbix, Icinga, OMD, OpenNMS, HypericHQ, Nagios,JbossRHQ
- For logging: splunk, Graylog2, Kibana, logstash
Make sure to provide metrics consistently and be able to add them with minimal
effort. Self adaption and automation are useful for this. Make sure developers,
operations and product owners are able to use the same system so there is no
information gap on either side. Your logging pipeline should be tailored to
provide easy and fast feedback on the implementation and features of the
product.
To reach a decent level of automation a set of tools is needed for:
- Configuration management (where to store passwords, urls or ips, log
levels etc.). Typical names here include Zookeeper,but also CFEngine, Puppet
and Chef.
- Deployment management. Typical names here are UC4, udeploy, glu, etsy
deployment.
- Server orchestration (e.g. what is started when during boot). Typical
names include UC4, Nolio, Marionette Collective, rundeck.
- Automated provisioning (think “how long does it take from server failure
to bringing that service back up online?”). Typical names include kickstart,
vagrant, or typical cloud environments.
- Test driven/ behaviour driven environments (think about adjusting not
only your application but also firewall configurations). Typical tools that
come to mind here include Server spec, rspec, cucumber, c-puppet, chef.
- When it comes to defining the points of communication for the whole
pipeline there is no tool you can use that is better than traditional pen and
paper, socially getting both development and operations into one room.
The tooling to support this process goes from simple self-written bash scripts
in the startup model to frameworks that support the flow partially, up to
process based suites that help you. No matter which path you choose the goal
should always be to end up with a well documented, reproducable step into
production. When introducing such systems problems in your organisation may
become apparent. Sometimes it helps to just create facts: It’s easier to ask for
forgiveness than permission.
Wednesday, 22 May 2013
JAX: Logging best practices
Inductive Bias | 20:37, Wednesday, 22 May 2013
The ideal outcome of Peter Roßbach’s talk on logging best practices was to have
attendees leave the room thinking “we know all this already and are applying
it successfully” - most likely though the majority left thinking about how to
implement even the most basic advise discussed.
From his consultancy and fire fighter background he has a good overview of what
logging in the average corporate environment looks like: No logging plan, no
rules, dozens of logging frameworks in active use, output in many different
languages, no structured log events but a myriad of different quoting,
formatting and bracketing standards instead.
So what should the ideal log line contain? First of all it should really be a
log line instead of a multi line something that cannot be reconstructed when
interleaved with other messages. The line should not only contain the class
name that logged the information (actually that is the least important piece of
information), it should contain the thread id, server name, a (standardised and
always consistently formatted) timestamp in a decent resolution (hint: one new
timestamp per second is not helpful when facing several hundred requests per
second). Make sure to have timing aligned across machines if timestamps are
needed for correlating logs. Ideally there should be context in the form of
request id, flow id, session id.
When thinking about logs, do not think too much about human readability - think
more in terms of machine readability and parsability. Treat your logging system
as the db in your data center that has to deal with most traffic. It is what
holds user interactions and system metrics that can be used as business
metrics, for debugging performance problems, for digging up functional issues.
Most likely you will want to turn free text that provides lots of flexibility
for screwing up into a more structured format like json, or even some binary
format that is storage efficient (think protocol buffers, thrift, avro).
In terms of log levels, make sure to log development traces on trace, provide
detailed problem analysis stuff on debug, put normal behaviour onto info. In
case of degraded functionality, log to warn. In case of things you cannot
easily recovered from put them on error. When it comes to logging hierarchies -
do not only think in class hierarchies but also in terms of use cases: Just
because your http connector is used in two modules doesn’t mean that there
should be no way to turn logging on just for one of the modules alone.
When designing your logging make sure to talk to all stakeholders to get clear
requirements. Make sure you can find out how the system is being used in the
wild, be able to quantify the number of exceptions; max, min and average
duration of a request and similar metrics.
Tools you could look at for help include but are not limited to splunk, jmx,
jconsole, syslog, logstash, statd, redis for log collection and queuing.
As a parting exercise: Look at all of your own logfiles and count the different
formats used for storing time.
JAX: Java performance myths
Inductive Bias | 20:37, Wednesday, 22 May 2013
This talk was one of the famous talks on Java performance myths by Arno Haase.
His main point - supported with dozens of illustrative examples was for
software developers to stop trusting in word of mouth, cargo cult like myths
that are abundant among engineers. Again the goal should be to write readable
code above all - for one the Java compiler and JIT are great at optimising. In
addition many of the myths being spread in the Java community that are claimed
to lead to better performance are simply not true.
It was interesting to learn how many different aspects of both software and
hardware contribute to code performance. Micro benchmarks are considered
dangerous for a reason - creating a well controlled environment that matches
what the code will encounter in production is influenced by things like just in
time compilation, cpu throttling, etc.
Some myths that Arno proved wrong include final making code faster (in case of
method parameters it doesn’t make a difference up to bytecode being identical
with and without), inheritance being always expensive (even with an abstract
class between the interface and the implementation Java 6 and 7 can still
inline the method in question). Another one was on often wrongly scoped Java
vs. C comparisons. One myth resolved around the creation of temporary objects -
since Java 6 and 7 in simple cases even these can be optimised away.
When it comes to (un-)boxing and reflection there is a performance penalty. For
the latter mostly for method lookup, not so much for calling the method. What we
are talking about however are penalties in the range of about 1000 compute
cycles. Compared to doing any remote calls this is still dwarfed. Reflection on
fields is even cheaper.
One of the more wide spread myths resolved around string concatenation being
expensive - doing a “A” + “B” in code will be turned into “AB” in
bytecode. Even doing the same with a variable will be turned into the use of
StringBuilder ever since -XX:OptimizeStringConcat was turned on by default.
The main message here is to stop trusting your intuition when reasoning about a
system’s performance and performance bottlenecks. Instead the goal should be to
go and measure what is really going on. Those are simple examples where your
average Java intuition goes wrong. Make sure to stay on top with what the JVM
turns your code into and how that is than executed on the hardware you have
rolled out if you really want to get the last bit of speed out of your
application.
Tuesday, 21 May 2013
JAX: Does parallel equal performant?
Inductive Bias | 20:34, Tuesday, 21 May 2013
In general there is a tendency to set parallel implementations to being equal
to performant implementations. Except in the really naive case there is always
going to be some overhead due to scheduling work, managing memory sharing and
network communication overhead. Essentially that knowledge is reflected in
Amdahl’s law (the amount of serial work limits the benefit from running parts
of your implementation in parallel, http://en.wikipedia.org/wiki/Amdahl’s_law),
and Little’s law (http://en.wikipedia.org/wiki/Little’s_law) in case of queuing
problems.
When looking at current Java optimisations there is quite a bit going on to
support better parallelisation: Work is being done to provide for improving
lock contention situations, the GC adaptive sizing policy has been improved to
a usable state, there is added support for parallel arrays and lampbda’s
splitable interface.
When it comes to better locking optimisations what is most notable is work
towards coarsening locks at compile and JIT time (essentially moving locks from
the inside of a loop to the outside); eliminating locks if objects are being
used in a local, non-threaded context anyway; and support for biased locking
(that is forcing locks only when a second thread is trying to access an
object). All three taken together can lead to performance improvements that
will almost render StringBuffer and StringBuilder to exhibit equal performance
in a single threaded context.
For pieces of code that suffer from false sharing (two variables used in
separate threads independently that end up in the same CPU cacheline and as a
result are both flushed on update) there is a new annotation: Adding the
“@contended” annotation can help the compiler for which pieces of code to add
cacheline padding (or re-arrange entirely) to avoid that false sharing from
happening. One other way to avoid false sharing seems to be to look for class
cohesion - coherent classes where methods and variables are closely related
tend to suffer less from false sharing. If you would like to view the resulting
layout use the “-XX:PrintFieldLayout” option.
Java 8 will bring a few more notable improvements including changes to the
adaptive sizing GC policy, the introduction of parallel arrays that allow for
parallel execution of predicates on array entries, changes to the concurrency
libraries, internalised iterators.
Introducing TeleBT
Bits from the Basement | 19:55, Tuesday, 21 May 2013
Keith and I are pleased to announce the immediate availability of TeleBT, a new Altus Metrum ground station product providing the equivalent of TeleDongle plus Bluetooth.
TeleBT working with AltosDroid on an Android device provides everything needed to monitor a rocket in flight, record telemetry, and know how to walk right to the airframe after it's back on the ground.
The Bluetooth capability of TeleBT is also supported by AltosUI on Linux, and with a micro USB cable TeleBT works just like TeleDongle on Windows, Mac, and Linux systems running AltOS version 1.2.1 or later.
Free commenting disabled
emergency exit | 16:57, Tuesday, 21 May 2013
Since I have had more and more spam coming through the filters and annoying me no end, I decided to restrict commenting to registered users (which are only FSFE fellows). I regret to do this, but the ratio of spam to content is 5,245 to 22 (do the math yourself) and I don’t want to spend what time I have for the blog with removing spam. I hope to change it in the future again, once a better anti-spam solution has been found.
Google Talk discontinued
you can't do that online anymore » English | 13:18, Tuesday, 21 May 2013
Will Google keep its promise and give xmpp users a way out?
As you may have seen, Google announced at their Google I/O conference that they were discontinuing their XMPP service, Google Talk. It’s very unfortunate, because XMPP is the most deployed open standard for instant messaging. It gave Google users the ability to communicate instantly with anyone using an XMPP federated service (like FSFE’s fellows XMPP server). Even Microsoft recently enabled its users to communicate to the outside world through XMPP. Now, Google is “replacing” Google Talk with Google+ Hangouts which will no longer support XMPP¹:
Note: We announced a new communications product, Hangouts, in May 2013. Hangouts will replace Google Talk and does not support XMPP.
What we know is that Google stops XMPP federation. Soon, Google users won’t be able to chat with anybody but other Google users. If I were paranoid, I’d say this makes their recent move on Google Talk look suspicious. But enough whining. What can we do about this? Well, there might be a way out for those of you who were using Google Talk as their XMPP service and who had a lot of non-Google contacts. Did you read Google’s Terms of Service? I bet you didn’t
. No worries, we sum it up for you at Terms of Service; Didn’t Read. So, you might have noticed this interesting bit:
Google enables you to get your information out when a service is discontinued Discussion Google gives you reasonable advance notice when a service is discontinued and “a chance to get information out of that Service.”
The full terms state:
We believe that you own your data and preserving your access to such data is important. If we discontinue a Service, where reasonably possible, we will give you reasonable advance notice and a chance to get information out of that Service.
So far, the only notice I have seen is on a developer page so I don’t think that counts for a “reasonable advance notice”; we yet have to wait for this when Google announces to their users that they discontinue Google Talk. Or maybe Google’s going to argue that they don’t “discontinue” a Service because Talk is replaced by Hangouts (which does not support XMPP and which isn’t federated). I’d argue it’s not true and that XMPP chat is discontinued. Hence Google should give users a way out. Let’s hope that those who have decided to pay allegiance to Google will be able to get their chat contact list out of Google Talk, with a way to import them into XMPP providers which are federated.
- it remains unclear whether XMPP support is entirely gone for xmpp-client-to-server according to Ars↩
Network from laptop to Android device over USB
Losca | 14:23, Tuesday, 21 May 2013
If you're running an Android device with GNU userland Linux in a chroot and need a full network access over USB cable (so that you can use your laptop/desktop machine's network connection from the device), here's a quick primer on how it can be set up.When doing Openmoko hacking, one always first plugged in the USB cable and forwarded network, or like I did later forwarded network over Bluetooth. It was mostly because the WiFi was quite unstable with many of the kernels.
I recently found out myself using a chroot on a Nexus 4 without working WiFi, so instead of my usual WiFi usage I needed network over USB... trivial, of course, except that there's Android on the way and I'm a Android newbie. Thanks to ZDmitry on Freenode, I got the bits for the Android part so I got it working.
On device, have eg. data/usb.sh with the following contents.
On the host, execute the following:#!/system/xbin/sh
CHROOT="/data/chroot"
ip addr add 192.168.137.2/30 dev usb0
ip link set usb0 up
ip route delete default
ip route add default via 192.168.137.1;
setprop net.dns1 8.8.8.8
echo 'nameserver 8.8.8.8' >> $CHROOT/run/resolvconf/resolv.conf
This works at least with Ubuntu saucy chroot. The main difference in some other distro might be whether the resolv.conf has moved to /run or not. You should be now all set up to browse / apt-get stuff from the device again.adb shell setprop sys.usb.config rndis,adb
adb shell data/usb.sh
sudo ifconfig usb0 192.168.137.1
sudo iptables -A POSTROUTING -t nat -j MASQUERADE -s 192.168.137.0/24
echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward
sudo iptables -P FORWARD ACCEPT
Update: Clarified that this is to forward the desktop/laptop's network connection to the device so that network is accessible from the device over USB.
Monday, 20 May 2013
JAX: Pigs, snakes and deaths by 1k cuts
Inductive Bias | 20:32, Monday, 20 May 2013
In his talk on performance problems Rainer Schuppe gave a great introduction to
which kinds of performance problems can be observed in production and how to
best root-cause them.
Simply put performance issues usually arise due to a difference in either data
volumn, concurrency levels or resource usage between the dev, qa and production
environments. The tooling to uncover and explain them is pretty well known:
Staring with looking at logfiles, ARM tools, using aspects, bytecode
instrumentalisation, sampling, watching JMX statistics, and PMI tools.
All of theses tools have their own unique advantages and disadvantages. With
logs you get the most freedom, however you have to know what to log at
development time. In addition logging is i/o heavy, so doing too much can slow
the application down itself. In a common distributed system logs need to be
aggregated somehow. As a simple example of what can go wrong are cascading
exceptions spilled to disk that cause machines to run out of disk space one
after the other. When relying on logging make sure to keep transaction
contexts, in particular transaction ids across machines and services to
correlate outages. In terms of tool support, look at scribe, splunk and flume.
A tool often used for tracking down performance issues in development is the
well known profiler. Usually it creates lots of very detailed data. However it
is most valuable in development - in production profiling a complete server
stack produces way too much load and data to be feasable. In addition there’s
usually no transaction context available for correlation again.
A third way of watching applications do their work is to watch via JMX. This
capability is built in for any Java application, in particular for servlet
containers. Again there is not transaction context. Unless you take care of it
there won’t be any historic data.
When it comes to diagnosing problems, you are essentially left with fixing
either the “it does not work” case or the “it is slow case”.
For the “it is slow case” there are a few incarnations:
- It was always slow, we got used to it.
- It gets slow over time.
- It gets slower exponentially.
- It suddenly gets slow.
- There is a spontanous crash.
In the case of “it does not work” you are left with the following observations:
- Sudden outages.
- Always flaky.
- Sporadic error messages.
- Silent death.
- Increasing error rates.
- Misleading error messages.
In the end you will always be spinning in a Look at symptoms, Elimnate
non-causes, Identifiy suspects, Confirm and Eliminate comparing to normal. If
not done with that, leather, rinse, repeat. When it comes to causes for errors
and slowness you will usually will run into one of the following causes: In
many cases bad coding practices are a problem, too much load, missing backends,
resource conflicts, memory and resource leakage as well as hardware/networking
issues are causes.
Some symptoms you may observe include foreseeable lock ups (it’s always slow
after four hours, so we just reboot automatically before that), consistent
slowness, sporadic errors (it always happens after a certain request came in),
getting slow and slower (most likely leaking resources), sudden chaos (e.g.
someone pulling the plug or someone removing a hard disk), and high utilisation
of resources.
Linear memory leak
In case of a linear memory leak, the application usually runs into an OOM
eventually, getting ever slower before that due to GC pressure. Reasons could
be linear structures being filled but never emptied. What you observe are
growing heap utilisation and growing GC times. In order to find such leakage
make sure to turn on verbose GC logging, do heapdumps to find leaks. One
challenge though: It may be hard to find the leakage if the problem is not one
large object, but many, many small ones that lead to a death by 1000 cuts
bleeding the application to death.
In development and testing you will do heap comparisons. Keep in mind that
taking a heap dump causes the JVM to stop. You can use common profilers to look
at the heap dump. There are variants that help with automatic leak detection.
A variant is the pig in a python issue where sudden unusually large objects
cause the application to be overloaded.
Resource leaks and conflicts
Another common problem is leaking resources other than memory - not closing
file handles can be one incarnation. Those problems cause a slowness over time,
they may lead to having the heap grow over time - usually that is not the most
visible problem though. If instance tracking does not help here, your last
resort should be doing code audits.
In case of conflicting resource usage you usually face code that was developed
with overly cautious locking and data integrity constraints. The way to go are
threaddumps to uncover threads in block and wait states.
Bad coding practices
When it comes to bad coding practices what is usually seen is code in endless
loops (easy to see in thread dumps), cpu bound computations where no result
caching is done. Also layeritis with too much (de-)serialisation can be a
problem. In addition there is a general “the ORM will save us all” problem that
may lead to massive SQL statements, or to using the wrong data fetch strategy.
When it comes to caching - if caches are too large, access times of course grow
as well. There could be never ending retry loops, ever blocking networking
calls. Also people tend to catch exceptions but not do anything about them
other than adding a little #fixme annotation to the code.
When it comes to locking you might run into dead-/live-lock problems. There
could be chokepoints (resources that all threads need for each processing
chain). In a thread dump you will typically see lots of wait instead of block
time.
In addition there could be internal and external bottlenecks. In particular
keep those in mind when dealing with databases.
The goal should be to find an optimum for your application between too many too
small requests that waste resources getting dispatched, and one huge request
that everyone else is waiting for.
Sunday, 19 May 2013
JAX: Java HPC by Norman Maurer
Inductive Bias | 20:31, Sunday, 19 May 2013
For slides see also: Speakerdeck: High performance networking on the JVM
Norman started his talk clarifying what he means by high scale: Anything above
1000 concurrent connections in his talk are considered high scale, anything
below 100 concurrent connections is fine to be handled with threads and blocking
IO. Before tuning anything, make sure to measure if you have any problem at
all: Readability should always go before optimisation.
He gave a few pointers as to where to look for optimisations: Get started by
studying the socket options - TCP-NO-DELAY as well as the send and receive
buffer sizes are most interesting. When under GC pressure (check the GC locks
to figure out if you are) make sure to minimise allocation and deallocation of
objects. In order to do that consider making objects static and final where
possible. Make sure to use CMS or G1 for garbage collection in order to
maximise throughput. Size areas in the JVM heap according to your access
patterns. The goal should always be to minimise the chance of running into a
stop the world garbage collection.
When it comes to using buffers you have the choice of using direct or heap
buffers. While the former are expensive to create, the latter come with the
cost of being zero’ed out. Often people start buffer pooling, potentially
initialising the pool in a lazy manner. In order to avoid memory fragmentation
in the Java heap, it can be a good idea to create the buffer at startup time
and re-use it later on.
In particular when parsing structured messages like they are common in
protocols it usually makes sense to use gathering writes and scattering reads
to minimise the number of system calls for reading and writing. Also try to
buffer more if you want to minimise system calls. Use slice and duplicate to
create views on your buffers to avoid mem copies. Use a file channel when
copying files without modifications.
Make sure you do not block - think of DNS servers being unavailable or slow as
an example.
As a parting note, make sure to define and document your threading model. It
may ease development to know that some objects will always only be used in a
single threaded context. It usually helps to reduce context switches as well as
may ease development to know that some objects will always only be used in a
single threaded context. It usually helps to reduce context switches as well as
keeping data in the same thread to avoid having to use synchronisation and the
use of volatile.
Also make a conscious decision about which protocol you would like to use for
transport - in addition to tcp there’s also udp, udt, sctp. Use pipelining in
order to parallelise.
Saturday, 18 May 2013
JAX: Hadoop overview by Bernd Fondermann
Inductive Bias | 20:29, Saturday, 18 May 2013
After breakfast was over the first day started with a talk by Bernd on the
Hadoop ecosystem. He did a good job selecting the most important and
interesting projects related to storing data in HDFS and processing it with Map
Reduce. After the usual “what is Hadoop”, “what does the general architecture
look like”, “what will change with YARN” Bernd gave a nice overview of which
publications each of the relevant projects rely on:
- HDFS is mainly based on the paper on GFS.
- Map Reduce comes with it’s own publication.
- The big table paper mainly inspired Cassandra (to some extend), HBase,
Accumulo and Hypertable.
- Protocol Buffers inspired Avro and Thrift, and is available as free
software itself.
- Dremel (the storage side of things) inspired Parquet.
- The query language side of Dremel inspired Drill and Impala.
- Power Drill might inspire Drill.
- Pregel (a graph database) inspired Giraph.
- Percolator provided some inspiration to HBase.
- Dynamo by Amazon kicked of Cassandra and others.
- Chubby inspired Zookeeper, both are based on Paxos.
- On top of Map Reduce today there are tons of higher level languages,
starting with Sawzall inside of Google, continuing with Pig and Hive at Apache
we are now left with added languages like Cascading, Cascalog, Scalding and
many more.
- There are many other interesting publications (Megastore, Spanner, F1 to
name just a few) for which there is no free implementation yet. In addition
with Storm, Hana and Haystack there are implementations lacking canonical
publications.
After this really broad clarification of names and terms used, Bernd went into
some more detail on how Zookeeper is being used for defining the namenode in
Hadoop 2, how high availablility and federation works for namenodes. In
addition he gave a clear explanation of how block reports work on cluster
bootup. The remainder of the talk was reserved for giving an intro to HBase,
Giraph and Drill.
Friday, 17 May 2013
BigDataCon
Inductive Bias | 20:29, Friday, 17 May 2013
Together with Uwe Schindler I had published a series of articles on Apache
Lucene at Software and Support Media’s Java Mag several years ago. Earlier this
year S&S kindly invited my to their BigDataCon - co-located with JAX to give a
talk of my choosing that at least touches upon Lucene.
Thinking back and forth about what topic to cover what came to my mind was to
give a talk on how easy it is to do text classification with Mahout when
relying on Apache Lucene for text analysis, tokenisation and token filtering.
All classes essentially are in place to integrate Lucene Analyzers with Mahout
vector generation - needed e.g. as a pre-processing step for classification or
text clustering.
Feel free to check out some of my sandbox code over at <a
href=“http://github.org/MaineC/sofia”>github</a>.
After attending the conference I can only recommend everyone interested in Java
programming and able to understand German to buy a ticket for the conference.
It’s really well executed, great selection of talks (though the sponsored
keynotes usually aren’t particularly interesting), tasty meals, interesting
people to chat with.
Formatted a drive but “you are not the owner, you cannot…”
anna.morris's blog | 19:42, Friday, 17 May 2013
FairPhone starting a movement
Jens Lechtenbörger » English | 13:31, Friday, 17 May 2013
Currently, our phones are stained with blood, literally, as exemplified by the well-known Foxconn suicides or the less well-known role of conflict minerals to sustain war and murder in Congo. FairPhone is an endeavor to produce smartphones in a fairer way, and after three years of preparatory work, the Dutch start-up just started its pre-order campaign (restricted to Europe). This page explains their approach in detail, while technical specs are available here.
Today I ordered my FairPhone (two, actually). If FairPhone receives 5000 orders, production will start in June (delivery is estimated for October). The phone will come with Android pre-installed but installation of alternative OSes should be possible. I’ll certainly free my devices once I receive them.
If you think about buying a new phone, why not start a movement?
Zermatt, Matterhorn and Gornergrat
DanielPocock.com | 09:46, Friday, 17 May 2013
The DebConf13 registration deadline for developers requesting sponsorship has been extended up to Sunday, so for those still undecided or anybody else thinking about a visit to .ch, I'm sharing some more pictures today.

The Matterhorn is one of the iconic symbols of Switzerland's natural beauty and appears on many postcards. The car-free town of Zermatt is at the bottom and is the final stop on the Matterhorn Gotthard Bahn railway, so it is really easy to get there with one or two trains every hour of the day.

One of the most exciting places to view the Matterhorn is from the nearby observatory at Gornergrat, which is 3,089m above sea level. It is great for a single day trip of hiking and there is a convenient train station there too.

The scenic train to Zermatt is included in any of the Swiss rail passes, but the train up to Gornergrat is a private railway and a special ticket must be purchased. Discounts are sometimes offered at rail stations in Swiss cities or online or it is possible to hike up and down.

Hacking Firefox OS Developer Phone
nikos.roussos » libre | 09:20, Friday, 17 May 2013
Probably you already know about Geeksphone‘s Firefox OS Developer Phones. A couple of days ago I received mine (the Keon version) as a Mozilla Rep for testing and showcasing Firefox OS on upcoming events.
Keon comes with an old Firefox OS build, which means that it misses many cool features already landed on the current release branch (for instance most of the contacts import options), but also makes bug reporting difficult since you have to determine if a bug you’ve found has already been resolved before reporting it.
So with a little help from (Flash)Fredy here are some quick steps to get your Keon device updated :-)
Flash a new Firefox OS build
At Geeksphone forum you’ll find a relevant thread with unofficial recent builds and detailed guidelines on how to flash it on your device. Nothing else to add here, besides the fact that flashing a build comes with a certain amount of risk :-)
Update Gaia
If you are feeling less adventurous you could just update Firefox OS UI (Gaia). The steps are really easy. Plug your device and run:
git clone https://github.com/mozilla-b2g/gaia.git git checkout v1-train make reset-gaia
This will restart your device and when it comes back you’ll have the new Gaia.
The tools you need
In order for all the above steps to work you need adb, which stands for Android Debug Bridge and it’s a simple command line tool that helps you communicate with a connected Androd/Linux phone device.
Here are the steps needed to have adb working (at least on Fedora). First download the SDK tools from Android. Let’s assume that you uncompressed the archive on ~/android-sdk/.
cd ~/android-sdk/tools ./android
Then you check to install the “Tools”, that would probably have a “Not Installed” status on your system. This will download and install everything you need under ~/android-sdk/platform-tools.

If you want to use adb from everywhere you have to add it on your system’s PATH. In this case:
export PATH=$PATH:~/android-sdk/platform-tools
You can add this to your ~/.bash_profile for permanent effect.
One last step is to add a udev rule for the specific device. Running lsusb on a terminal gives you the vendor identifier. For Keon this is “05c6″. So you have to add the line below at /etc/udev/rules.d/51-android.rules
SUBSYSTEM=="usb", ATTR{idVendor}=="05c6", MODE="0666", GROUP="plugdev"
and tell udev to re-read the rules:
sudo udevadm control --reload-rules
If you did everything correctly then plug your Keon device and run:
adb devices
You should see a line for keon.
Hack
If you are developing an Open Web App then you could just use Firefox OS Simulator to push it on the Keon device. See how simple that is. The only thing you’ll need for this to work is the udev rule above. You won’t need adb, since the simulator comes with it’s own copy.
If you want to hack around Gaia and the core apps, then just fork the code, do all the changes you want and use your repo to flash Gaia on the phone as described above.
Happy Hacking :-)
Important: Don’t ever choose to reset your phone to Factory defaults. It seems that currently you get stuck if you do it and the only way to undo it is by flashing a new build (I speak from experience :P).
Thursday, 16 May 2013
Hadoop Summit Amsterdam
Inductive Bias | 20:27, Thursday, 16 May 2013
About a month ago I attended the first European Hadoop Summit, organised by
Hortonworks in Amsterdam. The two day conference brought together both vendors
and users of Apache Hadoop for talks, exhibition and after conference beer
drinking.
Russel Jurney kindly asked me to chair the Hadoop applied track during
Apache Con EU. As a result I had a good excuse to attend the event. Overall
there were at least three times as many submissions than could reasonably be
accepted. Accordingly accepting proposals was pretty hard.
Though some of the Apache community aspect was missing at Hadoop summit it was
interesting nevertheless to see who is active in this space both as users as
well as vendors.
If you check out the talks on Youtube make sure to not miss the two sessions by
Ted Dunning as well as the talk on handling logging data by Twitter.
Debian to rescue Skype users?
DanielPocock.com | 07:16, Thursday, 16 May 2013
Last year at DebConf12 and the Paris mini-DebConf I mentioned some of the sophisticated techniques that the likes of Microsoft and Facebook are using to monitor their customers.
So when Skype was busted spying on the content of chat messages, it was no surprise for many people in the Debian community.
People are already rushing to find alternatives like XMPP and Jitsi. Debian 7 has been released just in time, with powerful features like TURN support that finally allow users to make free calls and chats with seamless NAT traversal. Sadly, Debian's built-in VoIP/RTC client, Empathy, only uses Google's TURN servers and not native Debian servers, but hopefully a solution will come soon, but it is easy enough to install Jitsi instead and configure it to use any of the free TURN server software on Debian.
It should be emphasized that Skype does not just spy on URLs in chat - it has simply been possible to detect this form of spying by detecting when the URL is accessed. Microsoft has taken out various patents for secretive monitoring of Internet phone calls and the analysis of speech patterns to detect both the content and emotions during a conversation. This allows them to get a very thorough analysis of the state of mind of every user at almost every moment and fine-tune the type of advertising and branding that is delivered to that person through conventional means and also through biased `news' reporting and other means.
Wednesday, 15 May 2013
ApacheConNA: Misc
Inductive Bias | 20:26, Wednesday, 15 May 2013
In his talk on Spdy Mathew Steele explained how he implemented the spdy protocol
as an Apache httpd module - working around most of the safety measures and
design decisions in the current httpd version. Essentially to get httpd to
support the protocol all you need now is mod_spdy plus a modified version of
mod_ssl.
The keynote on the last day was given by the Puppet founder. Some interesting
points to take away from that:
- Though hard in the beginning (and half way through, and after years) it
is important to learn giving up control: It usually is much more productive and
leads to better results to encourage people to do something than to be
restrictive about it. A single developer only has so much bandwidth - by
farming tasks out to others - and giving them full control - you substantially
increase your throughput without having to put in more energy.
- Be transparent - it’s ok to have commercial goals with your project. Just
make sure that the community knows about it and is not surprised to learn about
it.
- Be nice - not many succeed at this, not many are truely able to ignore
religion (vi vs. emacs). This also means to be welcoming to newbies, to hustle
at conferences, to engage the community as opposed to announcing changes.
Overall good advise for those starting or working on an OSS project and seeking
to increase visibility and reach.
If you want to learn more on what other talks were given at ApacheCon NA or want to follow up in more detail on some of the talks described here check out the slides archive online.
Tuesday, 14 May 2013
Make it like Facebook…or not?! Or: From WordPress to Drupal
mkesper's blog » English | 20:30, Tuesday, 14 May 2013
My colleague and me had got the task of creating an intranet site “like Facebook”. OK, so we checked alternatives and installed a site with WordPress and BuddyPress.
Guess what, after we showed the result to our boss, friendships had to be removed, groups had to be removed etc. etc.
After realising we had disabled virtually every BuddyPress feature and after struggling with getting some sort of rights system into the site, we finally recognized WordPress was not the right base for our site and went looking for alternatives again. Imagine we had settled on a proprietary platform, we’d been stuck!
Then I discovered Drupal. “Oh look, they’ve got something like a structure. Oh, wow, permissions are in core modules. Cool!”
Drupal just is much more structured and generalized than WordPress. The downside is it takes you longer to figure out how things work. Maybe I should have looked into “Understanding Drupal” earlier but I just “had no time for that”.
Community support is super helpful in irc or managing issues. Translation process is really cool, making it possible to edit translations in-site and (after a little configuration) also giving back to the Drupal community.
So, if you want to hack together a simple page in short time, take WordPress, but if you need a little bit more structure, I’d recommend Drupal!
Side note: As a Python fan I also tried https://www.django-cms.org/ but alas, it’s very hard to install compared to the two other systems. And if users won’t get your system installed, they won’t use it! And no, it isn’t helpful to first give you a toy server and then let you figure out how you turn it into a production system via several unconnected help documents.
ApacheConNA: Hadoop metrics
Inductive Bias | 20:25, Tuesday, 14 May 2013
Have you ever measured the general behaviour of your Hadoop jobs? Have you
sized your cluster accordingly? Do you know whether your work load really is IO
bound or CPU bound? Legend has it noone expecpt Allen Wittenauer over at
Linked.In, formerly Y! ever did this analysis for his clusters.
Steve Watt gave a pitch for actually going out into your datacenter measuring
what is going on there and adjusting the deployment accordingly: In small
clusters it may make sense to rely on raided disks instead of additional
storage nodes to guarantee “replication levels”. When going out to vendors to
buy hardware don’t rely on paper calculations only: Standard servers in Hadoop
clusters are 1 or 2u. This is quite unlike beefy boxes being sold otherwise.
Figure out what reference architecture is being used by partners, run your
standard workloads, adjust the configuration. If you want to run the 10TB
Terrasort to benchmark your hardware and system configuration. Make sure to
capture data during all your runs - have Ganglia or SAR, watch out for
intersting behaviour in io rates, cpu utilisation, network traffic. The goal is
to get the cpu busy, not wait for network or disk.
After the instrumentation and trial run look for over- and underprovisionings,
adjust, leather, rinse, repeat.
Also make sure to talk to the datacenter people: There are floor space, power
and cooling constraints to keep in mind. Don’t let the whole datacenter go down
because your cpu intensive job is drawing more power than the DC was designed
for. Ther are also power constraints per floor tile due to cooling issues -
those should dictate the design.
Take a close look at the disks you deploy: SATA vs. SAS can make a 40%
performance difference at a 20% cost difference. Also the number of cores per
machines dictates the number of disks to spread the likelyhood of random read
access. As a rule of thumb - in a 2U machine today there should be at least
twelve large form factor disks.
When it comes to controllers he goal should be to get a dedicated lane to disc,
safe one controller if price is an issue. Trade off compute power against power
consumption.
Designing your network keep in mind that one switch going down means that one
rack will be gone. This may be a non-issue in a Y! size cluster, in your
smaller scale world it might be worth the money investing in a second switch
though: Having 20 nodes go black isn’t a lot of fun if you cannot farm out the
work and re-replication to other nodes and racks. Also make sure to have enough
ports in rack switches for the machines you are planning to provision.
Avoid playing the ops whake-a-mole game by having one large cluster in the
organisation than many different ones where possible. Multi-tenancy in Hadoop is
still pre-mature though.
If you want to play with future deployments - watch out for HP currently
packing 270 servers where today are just two via system on a chip designs.
Monday, 13 May 2013
ApacheConNA: Monitoring httpd and Tomcat
Inductive Bias | 20:23, Monday, 13 May 2013
Monitoring - a task generally neglected - or over done - during development.
But still vital enough to wake up people from well earned sleep at night when
done wrong. Rainer Jung provided some valuable insights on how to monitor Apache httpd and Tomcat.
Of course failure detection, alarms and notifications are all part of good
monitoring. However so is avoidance of false positives and metric collection,
visualisation, and collection in advance to help with capacity planning and
uncover irregular behaviour.
In general the standard pieces being monitored are load, cache utilisation,
memory, garbage collection and response times. What we do not see from all that
are times spent waiting for the backend, looping in code, blocked threads.
When it comes to monitoring Java - JMX is pretty much the standard choice. Data
is grouped in management beans (MBeans). Each Java process has default beans,
on top there are beans provided by Tomcat, on top there may be application
specific ones.
For remote access, there are Java clients that know the protocol - the server
must be configured though to accept their connection. Keep in mind to open the
firewall in between as well if there is any. Well known clients include
JVisualVM (nice for interactive inspection), jmxterm as a command line client.
The only issue: Most MBeans encode source code structure, where what you really
need is change rates. In general those are easy to infer though.
On the server side for Tomcat there is the JMXProxy in Tomcat manager that
exposes MBeans. In addition there is Jolohia (including JSon serialisation) or
the option to roll your own.
So what kind of information is in MBeans:
- OS - load, process cpu time, physical memory, global OS level
stats. As an example: Here deviding cpu time by time geves you the average cpu
concurrency.
- Runtime MBean gives uptime.
- Threading MBean gives information on count, max available threads etc
- Class Loading MBean should get stable unless you are using dynamic
languaes or have enabled class unloading for jsps in Tomcat.
- Compliation contains HotSpot compiler information.
- Memory contains information on all regions thrown in one pot. If you need
more fine grained information look out for the Memory Pool and GC MBeans.
As for Tomcat specific things:
- Threadpool (for each connector) has information on size, number of busy
threads.
- GlobalRequestProc has request counts, processing times, max time bytes
received/sent, error count (those that Tomcat notices that is).
- RequestProcessor exists once per thread, it shows if a request is
currently running and for how long. Nice to see if there are long running
requests.
- DataSource provides information on Tomcat provided database connections.
Per Webapp there are a couple of more MBeans:
- ManagerMBean has information on session management - e.g. session
counter since start, login rate, active sessions, expired sessions, max active
sinse restart sessions (here a restart is possible), number of rejected
sessions, average alive time, processing time it took to clean up sessions,
create and required rate for last 100 sessions
- ServletMBean contains request count, accumulated processing time.
- JspMBean (together with activated loading/unloading policy) has
information on unload and reload stats and provides the max number of loaded
jsps.
For httpd the goals with monitoring are pretty similar. The only difference is
the protocol used - in this case provided by the status module. As an
alternative use the scoreboard connections.
You will find information on
- restart time, uptime
- serverload
- total number of accesses and traffic
- idle workers and number of requests currently processed
- cpu usage - though that is only accurate when all children are stopped
which in production isn’t particularly likely.
Lines that indicate what threads do contain waitinng, request read, send reply
- more information is documented online.
When monitoring make sure to monitor not only production but also your stress
tests to make meaningful comparisons.
Setting up GPG keys on second machine: importing your existning key
anna.morris's blog | 16:31, Monday, 13 May 2013
I had a little trouble today while trying to set up my GPG encryption on a second computer, using Wnigmail. The key importing process is rather unintuitive. Once you install Enigmail on your second machine, the natural thing to do is run the Set-up Wizard, which appears to give you the option to import your public and private keys. Once I had found out how to export my keys from my current set-up, I discovered that they get exported as one file, not two, but the set-up wizard wants you to import using separate files, one for your public and one for your private key. After a while I found needed to import them using a separate process. Here is what I did:
To export current GPG pair: In your email client go to Open-GPG > Key Management. I found my key by clicking the “display all keys” box on the window, but un-clicking “Display keys from other people” in the View menu. Selected your key by clicking on it, so it is highlighted in blue, and go File > Export keys to File. Click the option to “include your private key” and save the file to a memory stick or external drive (don’t email it to yourself!)
To Import current GPG pair on a second machine: Go to that same dialogue on your new machine, under Open-GPG > Key Management. Go to File > Import keys from file. Chose your file and import them. You should now be set up. You can check by trying to read an encrypted mail – if you don’t have one, send one to yourself from your other machine.
All done!
However, I think the set up wizard needs some work!!
Anna
xxxx
Get WebRTC going fast
DanielPocock.com | 15:05, Monday, 13 May 2013
A question that comes up more and more these days: what's the quickest way to try WebRTC and see it working? How can a web developer start experimenting with WebRTC in their blog or demo site?
Good news: it's no longer necessary to compile anything from source - and many of the components are available on Debian-based systems (including Ubuntu) or RPM-based solutions like Fedora
A quick look at how easy it is, explanation below:
# apt-get update # apt-get install -t experimental repro resiprocate-turn-server # apt-get install -t unstable chromium sipml5-web-phone # cd /var/www && mkdir jssip && cd jssip # wget -r -nH http://tryit.jssip.net # vi /etc/repro/repro.config # vi /etc/reTurnServer.config # vi /var/www/jssip/js/custom.REMOVE_THIS.js
and then try browsing to /jssip or /sipml5-web-phone
Start with a SIP proxy
As explained in the RTC Quick Start guide for regular RTC, a SIP proxy is a clean and simple component to start with. The same is true for WebRTC: start with a proxy. There are two I'll emphasize here:
- repro from reSIProcate is quick and easy to set up and has built in TLS support. A 1.9 alpha release with WebSocket support for WebRTC has just been uploaded to Debian experimental and is ready to use on wheezy. RPM users just need to download the alpha release tarball and use rpmbuild to get packages from it.
- Kamailio provides very good WebRTC support too. The packages are available but due to GPL license issues must be recompiled with TLS, see the README.Debian file for details. Also feel free to try the upstream package repository for binaries that do include TLS
Get a TURN server
TURN servers help media streams traverse NAT. They are very easy to set up, but must have real IP addresses.
- reTurn from reSIProcate is in Debian packages and it is in Fedora too
- TurnServer.org from the makers of Jitsi is also packaged on Debian
- RFC5766 TURN server is another open-source TURN server with some advanced features that is in the final stages of Debian package, on the mentors site for review
Put the JavaScript in the web server
Adding WebRTC to a web site can be as simple as cutting and pasting some JavaScript code into the HTML.
Three working samples to start with:
- Use the SIPml5 web phone package on Debian and add it to a virtual host, browse to /sipml5-web-phone
- Use wget to fetch a copy of the JsSIP sample page from tryit.jssip.net and edit the provided custom.js file to hard-code your SIP proxy address
- QoffeeSIP is another alternative - this email gives details how to get started with it
Browser
Users need a recent browser.
The latest Chromium packages in Debian are based on Google Chrome M26 code and this should work.
Help and support
Please come and ask on the users mailing lists or IRC channels for any of the packages mentioned here.
To Gnome or not Gnome
hesa's Weblog » Free Software | 08:28, Monday, 13 May 2013
Hired a trailer yesterday and saw this sticker on it.
Someone doesn’t seem to like Gnome. Compare with Gnome’s official logo.
Will KDE start using it?
Sunday, 12 May 2013
ApacheConNA: On Security
Inductive Bias | 20:22, Sunday, 12 May 2013
During the security talk at Apache Con a topic commonly glossed over by
developers was covered in quite some detail: With software being developed that
is being deployed rather widely online (over 50% of all websites are powered
by the Apache webserver) natually security issues are of large concern.
Currently there are eight trustworthy people on the foundation-wide security
response team, subscribed to security@apache.org. The team was started by
William A. Rowe when he found a volnarability in httpd. The general work mode -
as opposed to the otherwise “all things open” way of doing things at Apache -
is to keep the issues found private until fixed and publicise widely
afterwards.
So when running Apache software on your servers - how do you learn about
security issues? There is no such thing as a priority list for specific
vendors. The only way to get an inside scoop is to join the respective
project’s PMC list - that is to get active yourself.
So what is being found? 90% of all security issues are found be security
researches. The remaining 10% are usually published accidentially - e.g. by
users submitting the issue through the regular public bug tracker of the
respective project.
In Tomcat currently no issues was disclosed w/o letting the project know. httpd
still is the prime target - even of security researchers who are in favour of
a full disclosure policy - the PMC cannot do a lot here other than fix issues
quickly (usually within 24 hours).
As a general rule of thumb: Keep your average release cycle time in mind - how
long will it take to get fixes into people’s hands? Communicate transparently
which version will get security fixes - and which won’t.
As for static analysis tools - many of those are written for web apps and as
such not very helpful for a container. What is highly dangerous in a web app
may just be the thing the container has to do to provide features to web apps.
As for Tomcat, they have made good experiences with Findbugs - most others have
too many false positives.
When dealing with a volnarability yourself, try to get guidance from the
security team on what is actually a security volnarability - though the final
decision is with the project.
Dealing with the tradeoff of working in private vs. exposing users affected by
the volnarability to attacks is up to the PMC. Some work in public but call the
actual fix a refactoring or cleanup. Given enough coding skills on the attacker
side this of course will not help too much as sort of reverse engineering what
is being fixed by the patches is still possible. On the other hand doing
everything in private on a separate branch isn’t public development anymore.
After this general introduction Mark gave a good overview of the good, the bad
and the ugly way of handling security issues in Tomcat. For his slides
(including an anecdote of what according to the timing and topic looks like it
was highly related to the 2011 Java Hash Collision talk at Chaos Communication
Congress).
Saturday, 11 May 2013
ApacheConNA: On documentation
Inductive Bias | 20:20, Saturday, 11 May 2013
In her talk on documentation on OSS Noirin gave a great wrap up of the topic of
what documentation to create for a project and how to go about that task.
One way to think about documentation is to keep in mind that it fulfills
different tasks: There is conceptual, procedural and task-reference
documentation. When starting to analyse your docs you may first want to debug
the way it fails to help its users: “I can’t read my mail” really could mean
“My computer is under water”.
A good way to find awesome documentation can be to check out Stackoverflow
questions on your project, blog posts and training articles. Users today really
are searching instead of browsing docs. So where to find documentation actually
is starting to matter less. What does matter though is that those pages with
relevant information are written in a way that makes it easy to find them
through search engines: Provide a decent title, stable URLs, reasonable tags
and descriptions. By the way, both infra and docs people are happy to talk to
*good* SEO guys.
In terms of where to keep documentation:
For conceptual docs that need regular review it’s probably best to keep them in
version control. For task documentation steps should be easy to upgrade once
they fail for users. Make sure to accept bug reports in any form - be it on
Facebook, Twitter or in your issue tracker.
When writing good documentation always keep your audience in mind: If you don’t
have a specific one, pitch one. Don’t try to cater for everyone - if your docs
are too simplistic or too complex for others, link out to further material.
Understand their level of understanding. Understand what they will do after
reading the docs.
On a general level always include an about section, a system overview, a
description of when to read the doc, how to achieve the goal, provide
examples, provide a trouble shooting section and provide further information
links. Write breadth first - details are hard to fill in without a bigger
picture. Complete the overview section last. Call out context and
pre-requesites explicitly, don’t make your audience do more than they really
need to do. Reserve the details for a later document.
In general the most important and most general stuff as well as the basics
should come first. Mention the number of steps to be taken early. When it comes
to providing details: The more you provide, the more important the reader will
deem that part.
Experiment: There may be confidential content in your search results. Please do not share outside Google. – Missing Q and A’s?
anna.morris's blog | 14:18, Saturday, 11 May 2013
as a follow up to the recent “Experiment: There may be confidential content in your search results. Please do not share outside Google.” incident which I blogged about here, I searched for more info using Google. Found a few odd little things…
Some* questions about it on Yahoo have been removed.
I wonder which part of their Community Guidelines forbid discussing GoogleBugs?
Perhaps stranger still, looks like some user replies have been deleted and some redirects have been made too.
A little later in this thread, some one named epontius said that other threads had been redirected to this single thread because “This one is being monitored by staff… so the other threads concerning this issue have been redirected to this one to keep things in one place.”
Given that Google recently proclaimed search results should be impartial (and so should not (necessarily) be included in the “right to be forgotten” protection), I really trust that they wouldn’t mess about with links and users questions for something so trivial as a technical hiccup….
jus’sayin…
* (one or more, hard to tell: several links in search results but all leading one URL)
Friday, 10 May 2013
ApacheConNA: On delegation
Inductive Bias | 20:19, Friday, 10 May 2013
In her talk on delegation Deb Nicholson touched upon a really important topic in
OSS: Your project may live longer than you are willing to support it yourself.
The first important point about delegation is to delegate - and to not wait
until you have to do it. Soon you will realise that mentoring and delegation
actually is a way to multiply your resources.
In order to delegate people to delegate to are needed. To find those it can be
helpful to understand what motivates people to work in general as well as on
open source in particular: Sure, fixing a given problem and working on great
software projects may be part of it. As important though is recognition
individually and in groups of people.
Keeping that in mind, “Thanking” is actually a license to print free money in
the open source world. Do it in a verbose manner to be believable, do it in
public and in a way that makes your contributors feel a little bit of glory.
Another way to lead people in is to help out socially: Facilitate connections,
suggest connections, introduce people. Based on the diversity of the project
you are working on you may be in a way larger network and have access to much
more corporations and communities than any peer who is not active. Use that
potential.
Also when leading OSS projects keep in eye on people being rude: Your project
should be accessible to facilitate participation.
In case of questions treat them as a welcome opportunity to pull a new
community member in: Answer quickly, answer on your list, delegate to middle
seniors to pull them in. Have training missions for people who want to get
started and don’t know your tooling yet. Have prepared documents to provide
links to in case questions occur.
In Apache we tend to argue people should not fall victim of volunteeritis.
Another way to put that is to make sure to avoid the licked cookie syndrom:
When people volunteer to do a task and never re-appear that task is tainted
until explicitly marked as “not taken” later on. One way to automate that is
to have a fixed deadline after which tasks are automatically marked as free to
take and tackle by anyone.
When it comes to the question of When to write documentation: There really is
no point in time that should stop you from contributing docs - all the way from
just above getting started level (writing the getting started docs for those
following you) up to the “I’m an awesome super-hacker” mode for those trying
to hack on similar areas.
Especially when delegating to newbies make sure to set the right expectations:
How long is it going to take to fix an issue, what is the task complexity, tell
them who is going to be involved, who is there to help out in case of road
blocks.
In general make sure to be a role model for the behaviour you want in your
project: Ask questions yourself, step back when your have taken on too much,
appreciate people stepping back.
Understand the motivation of your new comers - try to talk to them one on one
to understand their motivation and help to align work on the project with their
life goal. When starting to delegate, start with tasks that seem to small to
delegate at all to get new people familiar with the process - and to get
yourself familiar with the feeling of giving up control. Usually you will need
to pull tasks apart that before were done by one person. Don’t look for a
person replacement - instead look for separate tasks and how people can best
perform these.
Make visible and clear what you need: Is it code or reviews? Documentation or
translations, UX helpers? Incentivise what you really need - have code sprints,
gamify the process of creating better docs, put the logo creation under a
challenge.
All of this is great if you have only people who all contribute in a very
positive way. What if there is someone who’s contributions are actually
detrimental to the project? How to deal with bad people? They may not even do
so intentionally… One option is to find a task that better suits their
skills. Another might be to find another project for them that better fits
their way of communicating. Talk to the person in question, address head on
what is going on. Talking around or avoiding that conversation usually only
delays and enlarges your problem. One simple but effective strategy can be to
tell people what you would like them to do in order to help them find out that
this is not what they want to do - that they are not the right people for you
and should find a better place.
More on this can be found in material like “How assholes are killing your
project” as well as the “Poisonous people talk” and the book “Producing
open source software”.
On the how of dealing with bad people make sure to criticise privately first,
chack in a backchannel of other committers for their opinion - otherwise you
might be lonely very quickly. Keep to criticising the bahaviour instead of the
person itself. Most people really do not want to be a jerk.
Planet Fellowship (en):
RSS 2.0 |
Atom |
FOAF |
/127.0.0.?
Agile Workers Software » FLOSS
Albrechts Blog
Alessandro at FSFE » English
Alexandre De Dommelin
Alina Mierlus - Building the Freedom » English
Being Fellow #952 of FSFE » English
Bernhard's Blog
Bits from the Basement
Björn Schießle's Weblog » English
Blog of Martin Husovec
Blog » English
Bobulate
Brian Gough's Notes
Carlo Piana :: Law is Freedom ::
Ciarán's free software notes
Colors of Noise - Entries tagged planetfsfe
Communicating freely
Computer Floss
Creative Destruction & Me » FLOSS
DanielPocock.com
Don't Panic
ENOWITTYNAME
Escape to freedom
FSFE Fellowship Vienna » English
Fellowship Interviews
Fellowship News
Frederik Gladhorn (fregl) » FSFE
Free Software & Digital Rights Noosphere
Free as LIBRE
Free speech is better than free beer » English
Free, Easy and Others
Freedom Blog » Free Software
From Out There
GLOG » Free Software
Gianf:) » free software
Graeme's notes » Page not found
Green Eggs and Ham
Handhelds, Linux and Heroes
Heiki "Repentinus" Ojasild » English
HennR's FSFE blog
Henri Bergius
Hook’s Humble Homepage
I love it here » English
Inductive Bias
Intuitionistically Uncertain » Technology
Jelle Hermsen » English
Jens Lechtenbörger » English
Karsten on Free Software
Leena Simon» english
Losca
Marcus Möller » FSFE
Mario Fux
Mark P. Lindhout’s Flamepit
Myriam's blog
Mäh?
Nice blog
Nicolas Jean's FSFE blog » English
Paul Boddie's Free Software-related blog
Pressreview
Saint's Log
Sam Tuke's blog
Seravo
Software Livre com um toque feminino
Supporting Free Software » English
The trunk
Thomas Koch - free software
Thomas Løcke Being Incoherent
Thoughts in Parentheses » Free Software
Tonnerre Lombard
Torsten's FSFE blog » english
Weblog
Weblog
Weblog
Weblog
Weblog
Weblog
Weblog
Werner's own blurbs
a fellowship ahead
agger's Free Software blog
anna.morris's blog
ayers's blog
blog
blog.padowi.se » English
drdanzs blog » freesoftware
emergency exit
free software blog
freedom bits
gollo's blog » English
hesa's Weblog » Free Software
julia.e.klein's blog
marc0s on Free Software
mina86.com
mkesper's blog » English
nikos.roussos » libre
pb's blog
pichel's blog
rieper|blog » en
stargrave's blog
things i made
tolld's blog
wkossen's blog
yahuxo's blog
you can't do that online anymore » English










