Chris Hager
Programming, Technology & More

How to get 4 to 5 stars in the Android market (Appirater for Android)

Short answer: Ask your especially engaged users to rate the app (eg. with a tool such as AppRater).

iPhone developer Amro Mousa published a post yesterday with recommendations about how to reach a high average rating on the AppStore, since many app users are only remembered to rate an app on uninstalling, which naturally leads to less-than-optimal reviews and ratings.

The reality is some developers pay for downloads and reviews to get higher rankings on the App Store. It’s tough for Apple to do much about it since the sales look legitimate. It’s easy to be frustrated by this sort of thing but there are two things you can do to beat it (ymmv, of course):

1) Release a good app
2) Use Appirater

Appirater asks your users to review your app after some conditions are met. …

More specifically, iOS library Appirater prompts users to rate the app after it was launched at least 15 times and installed at least 30 days ago. This targets particularly engaged users which is more likely to yield a great rating.

After seeing a number of people ask for an Android equivalent, I wrote up a simple helper class that does this….

Usage is simple: After adding your APP_TITLE and APP_PNAME and perhaps adjusting DAYS_UNTIL_PROMPT and LAUNCHES_UNTIL_PROMPT, simply call app_launched() from your Activity’sonCreate method:


The dialog’s user interface is still very basic — if you have improvements for the user interface please let me know and I will add it!


Scaling Python Servers with Worker Processes and Socket Duplication

Developing servers that scale is usually quite tricky, even more so with Python and the absence of worker threads which can run on multiple cpu cores [1].A possible solution are worker processes that duplicate the client’s socket, a technique that allows the workers to processes requests and send responses directly to the client socket. This approach is particularly useful for long lasting connections with more than one request per session.

You can skip the introduction to jump directly to the Python section. [Update: With this approach I was not able to cleanly close all sockets. Be sure to check with lsof.]

Basics: TCP Servers

A tcp server binds to a specific local port and starts listening for incoming connections. When a client connects to this port, the server accepts the connection and waits for incoming data. Data usually arrives in chunks, and the server tries to find the end of a request by looking for delimiter characters or by a specified length (which might be indicated in the first few bytes as in protocol buffers).

Often one request contains a number of values, which may have to be deserialized with a protocol such as json, protocol buffers, avro, thrift or any of the other established or self-invented serialization protocols. After deserializing the incoming bytestream, it can be processed.

Finally the server may or may not respond to the client’s request, and/or close to socket connection at any time.


Now, what happens if 10,000 clients try to connect to your server concurrently and start sending requests and, in particular, how will it impact the response time? The C10K Problem is a classic resource which discusses this exact situation and provides a few general approaches, which boil down to:

  1. One thread, one client
  2. One thread, many clients
  3. Build the server code into the kernel

We usually don’t want to do the latter, and it’s a general advise to avoid running thousands of concurrent threads. Therefore the question becomes how to handle many clients within one thread.

Asynchronous Servers

For one thread to manage multiple clients, the sockets have to be handled in an asynchronous fashion. That is, the server provides the operating system with a list of file descriptors, and receives a notification as soon as any socket has an event (read, write, error) ready to process. Operating system calls that provide this functionality include select, poll, epoll and kqueue. This approach makes it possible to develop scalable single-threaded servers, such as Facebook’s tornado webserver (free software, written in Python).

The critical point is processing time per request. If one request takes 1 ms to process and send a response, a single threaded server will have a limit at 1,000 requests per second.

Distributing the Load

There are various approaches for distributing incoming connections in order to reach a higher number of concurrently processed requests.

Load balancers distribute incoming requests across servers, usually proxying all traffic in both directions.

Reverse proxies allow you to run multiple server process on different local ports, and distribute incoming connections across them. This works very well in particular for short lived connections such as HTTP requests. Well known reverse proxies include Nginx, Varnish and Squid. (Wikipedia).

Socket accept preforking is a technique that allows multiple processes to listen on the same port and, in turn, pick up new incoming connections and handle the client sockets independently. This works by having one process open the server socket and then forking itself, which copies all existing file descriptors for the children to use.

  • man fork: “… The child inherits copies of the parent’s set of open file descriptors.”

Socket accept preforking works very well; popular projects that successfully employ this approach includeGUnicorn, Tornado and Apache.

Worker threads process requests in the background while the socket thread can get back to waiting for events from the operating system. They usually listen on a queue from the main thread and receive the client socket’s file descriptor or also the incoming bytestream, process it and send the response back to the client. Developers need to be very careful with locking issues when accessing shared state/variables.

Naturally, worker threads are one of the best explored ways of distributing the work and having the main thread concentrate on the socket events.


Python restricts multiple threads to one Python interpreter instance, thereby forcing multiple threads to share a single cpu. Context switches between the threads take place at latest after 100 Python instructions. Inside Python the controlling mechanism is called the Global Interpreter Lock (GIL) [1].

In a server that means as soon as one worker thread uses the cpu, it steals processing time from the socket handling thread, which makes a traditional worker thread architecture unfeasible. But there is an alternative:worker processes.


[Update: With this approach I was not able to cleanly close all open sockets. Be sure to check with lsof.]

Duplicating Sockets across Processes

Given the ability to pickle a socket, worker processes can do the same thing as worker threads: receive a client’s socket file descriptor and possibly the bytestream, process it, and send a response without requiring a callback to the main socket handler process.

Sockets can be pickled and reconstructed with an almost hidden, undocumented feature of Python’s multiprocessing module: multiprocessing.reduction.


I discovered this module recently through an inline comment in one of the examples in the multiprocessing docs, and wanted to find out more:

$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing.reduction
>>> help(multiprocessing.reduction)

Help on module multiprocessing.reduction in multiprocessing:





# Module to allow connection and socket objects to be transferred
# between processes
# multiprocessing/
# Copyright (c) 2006-2008, R Oudkerk --- see COPYING.txt

Not much information, but at least we know where to look: multiprocessing/

And there it is, starting at line 122:

# Functions to be used for pickling/unpickling objects with handles

def reduce_handle(handle):
if Popen.thread_is_spawning():
return (None, Popen.duplicate_for_child(handle), True)
= duplicate(handle)
('reducing handle %d', handle)
return (_get_listener().address, dup_handle, False)

def rebuild_handle(pickled_data):
, handle, inherited = pickled_data
if inherited:
return handle
('rebuilding handle %d', handle)
= Client(address, authkey=current_process().authkey)
.send((handle, os.getpid()))
= recv_handle(conn)
return new_handle

Example Code

Putting it all together, this is how to use the above functions to share sockets with another process in Python:

# Main process
from multiprocessing.reduction import reduce_handle
= reduce_handle(client_socket.fileno())

# Worker process
from multiprocessing.reduction import rebuild_handle
= pipe.recv()
= rebuild_handle(h)
= socket.fromfd(fd, socket.AF_INET, socket.SOCK_STREAM)
.send("hello from the worker process\r\n")


  1. Python, Threads and the Global Interpreter Lock (GIL)
  2. The C10K Problem
  3. Global Interpreter Lock


Thanks to Brian Jones for reading drafts of this post.

Recommended Reading


Unicode and UTF Overview

This post is a brief technival overview of Unicode, a widely used standard for multilingual character representation, and the family of UTF-x encoding algorithms. First a brief introduction to Unicode:

Unicode is intended to address the need for a workable, reliable world text encoding.

Unicode could be roughly described as “wide-body ASCII” that has been stretched to 16 bits to encompass the characters of all the world’s living languages. In a properly engineered design, 16 bits per character are more than sufficient for this purpose.

Character Representation: Code Points and Planes

The reference to a specific character is called a code-point. ASCII for example uses 8 bit per character, which allows for 2^8 = 256 different characters (code-points).

Unicode uses 16 bits (2 bytes) per code-point and furthermore associates each code-point with one of 17planes. Therefore Unicode provides 2^16 = 65,536 unique code-points per plane, with 2^16 * 17 = 1,114,112 maximum total unique code-points.

Currently only 6 of the 17 available planes are used:

Plane    Unicode repr.Description
0U+0000 … U+FFFFBasic Multilingual Plane
1U+10000 … U+1FFFFSupplementary Multilingual Plane
2U+20000 … U+2FFFFSupplementary Ideographic Plane
14U+E0000 … U+EFFFFSupplementary Special-purpose Plane
15-16U+F0000 … U+10FFFF   Private Use Area

Unicode code points of the first plane use two bytes, all other planes require a third byte to indicate the plane (blue color above).

Code points U+0000 to U+00FF (0-255) are identical to the Latin-1 values, so converting between them simply requires converting code points to byte values. In fact any document containing only characters of the first 127 code-points of the ASCII character map is a perfectly valid UTF-8 encoded Unicode document.

Character Encoding: UTF-8, 16 and 32


>>> u = u"€"
>>> u
>>> bytearray(u)
Traceback (most recent call last):
  File "", line 1, in
TypeError: unicode argument without an encoding

This is where Unicode Transformation Formats (UTF) come into play. UTF-8/16/32 encoding stores any given unicode byte-array into either a variable amount of 8 bit blocks, or one or multiple 16 or 32 bit blocks.


UTF-8 is a variable-width encoding, with each unicode character represented by one to four bytes. A main advantage of UTF-8 is backward compatibility with the ASCII charset, allowing us to use the same decoding function for both any ASCII text and any utf-8 encoded unicode text.

If the character is encoded into just one byte, the high-order bit is 0 and the other bits represent the code point (in the range 0..127). If the character is encoded into a sequence of more than one byte, the first byte has as many leading ’1′ bits as the total number of bytes in the sequence, followed by a ’0′ bit, and the succeeding bytes are all marked by a leading “10″ bit pattern. The remaining bits in the byte sequence are concatenated to form the Unicode code point value.


UTF-16 always uses two bytes for encoding each code-point, and is thereby limited to characters of only the “Basic Multilingual Plane” (U+0000 to U+FFFF). Unicode code-points of other planes use 3 bytes and UTF-16 converts these into two 16-bit pairs, called a surrogate pair.


UTF-32 always uses exactly four bytes for encoding each Unicode code point (if the endianess is specified).


    • UTF-8 can encode any code-point of any plane, and compresses lower code-points into fewer bytes (eg. ASCII charset into 1 byte). UTF-8 furthermore shares a common encoding with the first 127 code-points of the ASCII character set. Recommended for everything related to text.
    • UTF-16 always saves 16 bit blocks without compression. If Unicode character is of a higher plane than 0 it has three bytes, and UTF-16 needs two 16-bit groups to represent it (see the euro € sign example below)
    • UTF-32 encodes all Unicode code-points, but always saves 32 bit groups with no compression



>>> u = u"a"
>>> u
>>> repr(u.encode("utf-8"))
>>> repr(u.encode("utf-16"))    # no endianess specified
>>> repr(u.encode("utf-16-le")) # little endian byte order
>>> repr(u.encode("utf-16-be")) # big endian byte order
>>> repr(u.encode("utf-32"))
>>> repr(u.encode("utf-32-le"))
>>> repr(u.encode("utf-32-be"))

>>> u = u"€"
>>> u
>>> repr(u.encode("utf-8"))
>>> repr(u.encode("utf-16"))
"'\xff\xfe\xac '"
>>> repr(u.encode("utf-16-le"))
"'\xac '"
>>> repr(u.encode("utf-16-be"))
"' \xac'"
>>> repr(u.encode("utf-32"))
"'\xff\xfe\x00\x00\xac \x00\x00'"
>>> repr(u.encode("utf-32-le"))
"'\xac \x00\x00'"
>>> repr(u.encode("utf-32-be"))
"'\x00\x00 \xac'"


Please leave a comment if you have feedback or questions!

Further Reading

Blog Archive