[governance] Towards a data analytic society

Yehuda Katz yehudakatz at mailinator.com
Wed Jun 11 19:51:58 EDT 2008


This Article held an intresting insight, and as I am 'Famous' for posting items
that may not apprear relevent to IG (as Suresh will undoubtably point out ;-)
in the current realm of IG relevance, I was compelled to post it for its future
value.

Article Excerpt:

"... The world is, of course, always changing and has always done so.
Sometimes, however, it changes in a way that it becomes a different world:
agrarian, capitalist, industrial revolutions, or the invention of printing, are
examples. I’m fairly sure that future historians will point to the early 21st
century as a time when the ways in which both societies and their individual
members think about themselves went through that sort of radical shift within a
very short time. And they will identify the transfer of data analytic
approaches from scientific computing specialist to general population as the
responsible agent. ..."

--

Towards a data analytic society
Felix Grant on the use of statistics in the analysis of society

Scientific Computing World / DATA ANALYSIS: SOCIETY April/May 2008
http://www.scientific-computing.com/features/feature.php?feature_id=192

A city of 600k, drug-tested in one fell swoop. No exceptions, no consent
sought. Dosage of everything from sugar to crystal meth recorded, tabulated,
sorted and analysed.

Not a scene from a paranoid TV drama, but a good example of scientific research
interacting noninvasively with social policy. The project has not, so far as I
know, yet been formally published at the time of writing, though it was
presented to the American Chemical Society in August[1] and there has been
plenty of press commentary[2][3]. Forty communities in Oregon were initially
tested, with more planned, analysing small samples of water entering sewage
treatment plants. And, interestingly, this ‘community urinalysis’ won at
least guarded acceptance across the spectrum of opinion, from enforcement
agencies to drug users.

Shift up several levels, for example to the UNAIDS[4] mapping of AIDS
incidence[5] in terms of continental populations, and there is little public
attention at all. Shift down to the level where those citizens are themselves
units of analysis, and they become less happy – but, despite all those
paranoid TV dramas, less so than would once have been the case. 

There is increasing acceptance at all levels of the data analytic as a default
approach to life. The fact of analysis occurring in societal contexts, in and
of itself, has social impact. Computing in general, and scientific computing in
particular, have brought many changes, but a data analytic view of the world is
the one which will most separate the future from the past. We are in the middle
of a revolution in the way populations and individuals think about the world,
and computerised science is the trigger.

There are three main components here: application of scientific methods to
social policy, subliminal acceptance by individuals of a statistical view of
themselves, and adoption of such views by individuals in looking outward onto
their world.

The AIDS mapping seems, to most people not affected by the issue, to have
little to do with them at all. Most would say, if asked, that the figures
illustrated are too big and too far away to handle. But such illustrations
have, through campaign posters and public education leaflets, become part of
the background informational wallpaper of life and this, in itself,
acclimatises us to the normality of data analytic presentation. This acceptance
of the top (social policy) layer of analysis is what progressively extends the
middle layer where community urinalysis is now cautiously accepted.

That acceptance could not always have been taken for granted. The apparent
generality of the results, describing whole communities in broad brush, makes
the whole exercise seem abstract – not so very different from the
international AIDS mapping. Not so long ago, though, it would have seemed an
invasion. Civil rights concerns would have seemed more of an issue. The
perception has shifted: statistical aggregate scientific description of a
social state has moved down from the international to the intranational or even
lower, because it looks much the same. 

Community urinalysis makes it possible to very closely define and compare the
level of (for example) cocaine use in Portland and Salem. Better still, those
levels can be tracked very precisely over time, using frequent sample
extraction. Daily sampling, or even several times a day, has already been
suggested as a way to track the spread of new substances – methamphetamines
being a topical chronic example; localised batches of badly cut heroin with
toxic fillers is a recurrent acute case. In the Oregon study, fine-grained
analysis of this kind showed methamphetamine usage to be geographically
heterogenous to a high degree and comparatively stable over time, while cocaine
use peaks and troughs on a weekly cycle.

All of this, while staying general, produces specific changes that impact
individuals. The fact that information on distribution of proscribed substance
usage exists will inevitably influence the distribution of infrastructure
funding. At least one European health authority is looking at the work in
relation to sociospatial resource targeting. At least one police force has, as
a direct result of the Oregon study reports, expensively seconded an officer to
commercial premises with the broader agenda of acquiring expertise in the use
of Sanitas groundwater analysis software.

At least one military intelligence agency, with existing in-house analytic
expertise in using SAS software to explore the strategic implications of water
shortage, is interested in taking urinalysis down to smaller units than the
city. 

Historically, there is a Darwinian social entropy in application of
technologies and scientific methods. As long as they yield results, they
diffuse through a society and become part of its fabric. The process is
generally irreversible: the society becomes dependent, and withdrawal would be
too traumatic. This is not doom-mongering: the diffusions are usually, at least
in the long run and the broad picture, advantageous, and this one will be no
different. There are cases (DDT comes to mind) where reversals do occur, but
the principle is there: once community urinalysis has been accepted, it will
progress and is unlikely to be abandoned. This principle applies also to the
extending acceptance of statistical approaches to issues, and in particular to
computer driven data analysis. 

Not only does it now underpin everything we do, but it propagates through what
we are at a rate which far outpaces any other in history. The world is, of
course, always changing and has always done so. Sometimes, however, it changes
in a way that it becomes a different world: agrarian, capitalist, industrial
revolutions, or the invention of printing, are examples. I’m fairly sure that
future historians will point to the early 21st century as a time when the ways
in which both societies and their individual members think about themselves
went through that sort of radical shift within a very short time. And they will
identify the transfer of data analytic approaches from scientific computing
specialist to general population as the responsible agent.

The penetration of individual social perception by scientific approaches
begins, naturally enough, with scientists. Sam Roberts, application engineer at
the MathWorks with a previous background in big pharma, recently mentioned to
me in passing that he went into that field partly out of concern over the
ethics of animal testing. He didn’t go onto the streets, nor into politics:
he sought to shift the area of his concern from physical to conceptual realms.
The micro array is, perhaps, the best symbol of modern science – which would
have seemed quite extraordinary to practitioners of even 30 years ago.


Looking at the vertical markets of a company like the MathWorks, and the
products which have evolved to serve them, is instructive from a sociodynamic
viewpoint. Matlab, as I’ve discovered over the last year or two, is as likely
to be used in finance as in automotive and aeronautic control engineering. 

Indeed, control engineering is a concept that has escaped from its box, a term
as likely to be used by molecular chemists or biologists as by the designers of
jet aircraft. SimBiology, a market-specific Matlab extension into life science,
applies Gillespie-style discrete event simulation modelling and other
stochastic approaches to the study of biological systems and their components.
If this all seems commonplace and obvious to you, ask yourself two questions:
when did it become so, and what does your answer tell you about the rate of
change in unconscious habits of thought in the society of which you are part?

This move from directly thought models of the world to statistically evolved
ones, though a direct result of scientific method, is adopted and adapted by
those who would never think of themselves as scientists or even scientific.
Computerisation doesn’t just make data analytic approaches ever more rapid
and efficient; it drives their osmotic spread throughout human society.
Gradually, such approaches become more and more commonly accepted as bases for
decisions and judgements – not only by governments, but the governed too. In
itself, this can only be a good thing, but it does release social forces that
are not always predictable. I could follow the chain on from pharmaceuticals
through industry to general commerce and into politics, the place of personal
cellular communications, and so on, but space doesn’t permit it. Let’s,
instead, make a leap straight down to the bottom of the pyramid. 

Scientists are also people, with families and friends and children: the
conversation with Sam Roberts, above, started from a shared interest in
out-of-hours work with teachers who want to encourage scientific thinking in
children. Teaching is, as Postman and Weingartner[6] told us, a subversive
activity; it is now the infection vector bringing together analytic thinking
with the spread of cheap high technology.

In SCW’s website education pages, last year[7], a teacher described an
experiment in which 10- and 11-year-olds analytically considered development
options within their school, relating funding options to costs and benefits.
She commented that they were ‘interested in... using such methods to explore
problem solving choices’, and ‘was astonished at... the degree of
sophistication in their handling’. This is welcome news to those, like me,
who are exercised by lack of critical thinking skills in new undergraduates; it
may be seen as a mixed blessing to governments making such funding allocations
on behalf of electorates.

In the last couple of weeks I have seen this approach applied closer to the
bone, by children of the same age using personal computers to evaluate through
administrative spreadsheets the effectiveness  of their own teachers. It’s
not easy to hand over power in that way, but it’s certainly a good way to
start building a critically-aware citizenry of the future.

>From computer access to a connected machine in every school bag, 24 hours a day
and seven days a week, is a huge leap – but thoroughly serious programmes are
under way to bring it about. This has many implications, but I’m concerned
here with those that flow from resulting universal access to easily used data
stores and analytic tools. For most people, the first doorway through the
wardrobe into data analysis is the ubiquitous spreadsheet.

In affluent societies the initial running is being made by ‘netbook’
machines based on Intel’s Classmate pattern. One manifestation of this is a
range of durable subnotebook machines from Asus (see the separate review on the
SCW website[8] for more detail), with prices starting from roughly e200. A
public falling out with the ‘One Laptop Per Child’ (OLPC) project[9],
dedicated to supplying children in the developing world with laptop computers,
has intensified rather than cooled the race to put a processor in every pocket.


OLPC machines are rugged, use low power consumption processors (supplied, in
the absence of Intel, by AMD), charge from hand-cranked generators, have
built-in wireless internet, and cost just over e130 (target cost, as economies
of scale kick in, about e70). And their bundled office suite includes the
all-important spreadsheet. The social  implications of this are incalculable.
Many rural African children have no paper, sharing slates between students;
books and teachers may be in short supply; classes may be large and the student
centred ideal is an impossible dream. The arrival of OLPC in such places will
be an even greater revolution than in industrialised societies, leapfrogging
more than a century of educational evolution. 

How governments buying these machines will be affected by their future results
is anybody’s guess. M, a teacher in a developing world school that he
doesn’t want identified, describes how he used a new supply of laptops in
teaching the principles of simple bookkeeping for sole trader businesses. The
following week, the students returned to him with their own spreadsheets
applying the same principles to national economics and regional investment
imbalances. 

Over the past few months, I’ve been exploring the impact of ICT saturation on
small groups of both children and the adults (teachers, relatives, neighbours)
in direct contact with them. I’ve had a set of Asus machines to play with,
moving them around small projects with spreadsheets and SysQuake installed. One
class, in cooperation with the local water company, conducted a community
urinalysis of their school’s sewage outflow – though they tested for
dietary and biological byproducts rather than proscribed substances. I’ve
also been fortunate in my access to M’s school, where machines have arrived
in quantity. In each case, the result of leaving the same machine in the same
child’s hands 24 hours a day, seven days a week, has been a marked increase
in data analytic approaches across all activity boundaries, at school and
outside it.

Having tools is not, of course, the same as having the material on which to use
them; increasingly sophisticated analytic views means increasingly
sophisticated data access.

This, too, trickles down from the top tier where data extraction runs into much
the same problems as in the hard sciences – and reaches for the same
solutions. Admire (Advanced Data Mining and Integration Research for Europe), a
three-year project coordinated from the University of Edinburgh, is doing for
social data what University of Portsmouth’s Helen Xiang was doing through the
NGS for astronomy[10] in the last issue: unifying queries on disparate,
distributed and heterogenous data sources. 

This sort of query currently involves ruinously expensive time spent on
minutely detailed specification of strategies, sources, and mechanisms. Admire
seeks to subsume all that under a structure of internet and grid gateways,
communicating through Infrastructure Service Bus-mediated services under high
level language control. Crucial to this are semantic technologies, which are
key to all future data access at my two lower levels. One of the initial
proving ground scenarios returns to the theme of water, with an integrated
application to make flood predictions from meteorological forecasts.

The flow of information, and the restriction or shaping of that flow, have
always been crucial to balances of power. In the 15th century, movable type
paved the way for the Enlightenment; mass literacy and numeracy were engines of
social change; the internet is the bane of repressive states, and rapidly
modernising societies struggle to maintainequilibrium as they come to terms
with it. In the long run, extensions of data access will work their way down to
individual level and meet the universally available computing power. In a time
of globalisation, that sets the scene for data analytic outlooks to produce a
similar revolution in social structure whose outcome is impossible to guess.

http://www.scientific-computing.com/features/feature.php?feature_id=192
--

References
1. Field, J., Sewage chemicals reveal evidence of illegal drug use, in News
Service Weekly Press Package. 2007, American Chemical Society.

2. Thompson, C., The 7th Annual Year in Ideas.(Magazine). The New York Times
Magazine, 2007: p. 62(L).0028-7822 3. Service, R.F., New York, Have You Ever
Smoked Pot? ScienceNOW, 2007. 2007(822):

p.3 et seq 4. UNAIDS. UNAIDS: The Joint United Nations Programme on HIV/AIDS.
http://unaids.org/. 

5. UNAIDS. UNAIDS Knowledge Centre HIV Data.
http://www.unaids.org/en/KnowledgeCentre/HIVData/default.asp.

6. Postman, N. and C. Weingartner, Teaching as a subversive activity. 1969: New
York, Delacorte Press. 

7. Beyond the Prisoner’s Dilemma. 2007 2007-06-20, 17:07
http://www.scientificcomputing.com/education/archives/42 

8. Grant, F. Asus EEE PC. 2007 2008-04-14 [Review].
http://www.scientificcomputing.com/products/review_details.php?review_id=34.

9. One Laptop Per Child. http://laptop.org/

10. Grant, F., Beyond the skies, in Scientific Computing World. 2008, Europa
Science:Cambridge. p. 16-18. 1356-7853


Sources
Advanced Micro Devices AMD processors Tel: +44-1276-803100 Fax:+44-1276-803227

RM Asus netbooks http://www.rm.com/ContactUs/Default.asp (Research Machines,
Oxford)

Intel Classmate specification
http://www.intel.com/intel/worldahead/classmatepc/

Asus Eee PC http://asus.com 

One Laptop Per Child Laptops for developing world education
information at laptop.org 

MathWorks Matlab. Simulink. Simbiology info at mathworks.co.uk 

Sanitas Technologies Sanitas software http://www.sanitastech.com/contact.html 

Calerga Sysquake info at calerga.com

---

-30-
____________________________________________________________
You received this message as a subscriber on the list:
     governance at lists.cpsr.org
To be removed from the list, send any message to:
     governance-unsubscribe at lists.cpsr.org

For all list information and functions, see:
     http://lists.cpsr.org/lists/info/governance



More information about the Governance mailing list