26 stories

A bright and shiny hell


(Apologies for blogging so infrequently this month. I'm currently up to my elbows in The Labyrinth Index, with a tight deadline to hit if the book's going to be published next July. Blogging will continue to be infrequent, but hopefully as provocative as usual.)

Remember Orwell's 1984 and his description of the world ahead—"if you want a vision of the future, imagine a boot stamping on a human face, forever"?

This is the 21st century, and we can do better.

George got the telescreens and cameras and the stench of omnipresent surveillance right, but he was writing in the age of microfilm and 3x5 index cards. Data storage was prodigiously expensive and mass communication networks were centralized and costly to run — it wasn't practical for amateurs to set up a decentralized, end-to-end encrypted shadow network tunnelling over the public phone system, or to run private anonymous blogs in the classified columns of newspapers. He was also writing in the age of mass-mobilization of labour and intercontinental warfare. Limned in the backdrop to 1984 is a world where atom bombs have been used in warfare and are no longer used by the great powers, by tacit agreement. Instead, we see soldiers and machine-guns and refugees and the presentation of inevitable border wars and genocides between the three giant power blocs.

Been there, done that.

What we have today is a vision of 1984 disrupted by a torrent of data storage. Circa 1972-73, total US manufacturing volume of online computer storage — hard drives and RAM and core memory, but not tape — amounted to some 100Gb/year. Today, my cellphone has about double that capacity. I'm guessing that my desk probably supports the entire planetary installed digital media volume of 1980. (I'm looking at about 10Tb of disks ...) There's a good chance that anything that happens in front of a camera, and anything that transits the internet, will be preserved digitally into the indefinite future, for however long some major state or corporate institution considers it of interest. And when I'm taking about large-scale data retention, just to clue you in, Amazon AWS already offers a commercial data transfer and storage service using AWS Snowmobile, whereby a gigantic trailer full of storage will drive up to the loading bay of your data center and download everything. It's currently good for up to 100PB per Snowmobile load. (1PB is a million gigabytes; 1EB is a billion gigabytes; ten snowmobile loads is 1EB, or about 10,000,000 1973's worth of global hard drive manufacturing capacity). Folks, Amazon wouldn't be offering this product if there wasn't a market for it.

These heaps and drifts of retained data (and metadata) can be subjected to analytical processes not yet invented — historic data is still useful. And some of the potential applications of neural network driven deep learning and machine vision are really hair-raising. We've all seen video of mass demonstrations over the past year. A paper to be presented at the IEEE International Conference on Computer Vision Workshops (ICCVW) introduces a deep-learning algorithm that can identify an individual even when part of their face is obscured. The system was able to correctly identify a person concealed by a scarf 67 percent of the time against a "complex" background. Police already routinely record demonstrations: now they'll be able to apply offline analytics to work out who was there and track protestors' activities in the long term ... and coordinate with public CCTV and face recognition networks to arrest them long afterwards, if they're so inclined.

It turns out that facial recognition neural networks can be trained to accurately recognize pain! The researchers were doubtless thinking of clinical medical applications — doctors are bad at objectively evaluating patients' expressions of pain and patients often don't self-evaluate effectively — but just think how much use this technology might be to a regime bent of using torture as a tool of social repression (like, oh, Egypt or Syria today). They also appear to be better than human beings at evaluating sexual orientation of a subject, which might be of interest in President Pence's Republic of Gilead, or Chechnya, or Iran. (There's still a terrible false positive rate, but hey, you can't build an algorithmic dictatorship without breaking heads.)

(Footnote: it also turns out that neural networks and data mining in general are really good at reinforcing the prejudices of their programmers, and embedding them in hardware. Here's a racist hand dryer — it's proximity sensor simply doesn't work on dark skin! Engineers with untested assumptions about the human subjects of their machines can wreak havoc.)

All of this is pretty horrific — so far, so 2017 — but I'd like to throw two more web pages in your face. Firstly, the Gerasimov Doctrine which appears to shape Russian infowar practices against the west. We've seen glaring evidence of Russian tampering in the recent US presidential election, including bulk buying of micro-targeted facebook ads, not focussing on particular candidates but on party-affiliated hot-button issues such as race, gay rights, gun control, and immigration. (I'm not touching the allegations about bribery and Trump with a barge pole — that way lies the gibbering spectre of Louise Mensch — but the evidence for the use of borderline-illegal advertising to energize voters and prod them in a particular direction looks overwhelming.) Here's a translation of Gerasimov's paper, titled e Value of Science Is in the Foresight: New Challenges Demand Rethinking the Forms and Methods of Carrying out Combat Operations. As he's the Russian army Chief of General Staff, what he says can be taken as gospel, and he's saying things like, "the focus of applied methods of conflict has altered in the direction of the broad use of political, economic, informational, humanitarian, and other nonmilitary [my emphasis] measures — applied in coordination with the protest potential of the population". This isn't your grandpa's ministry of propaganda. Our social media have inadvertently created a swamp of "false news" in which superficially attractive memes outcompete the truth because humans are lousy at distinguishing between lies which reinforce their existing prejudices and an objective assessment of the situation. And this has created a battlefield where indirect stealth attacks on elections have become routine to the point where savvy campaigns pre-emptively place bait for hackers.

There are a couple of rays of hope, however. The United Nations Development Program recently released a report, Journey to extremism in Africa: drivers, incentives and the tipping point for recruitment that pointed out the deficiencies in the Emperor's wardrobe with respect to security services. Religion and ideology are post-hoc excuses for recruitment into extremist groups: the truth is somewhat different. "The research specifically set out to discover what pushed a handful of individuals to join violent extremist groups, when many others facing similar sets of circumstances did not. This specific moment or factor is referred to as the 'tipping point'. The idea of a transformative trigger that pushes individuals decisively from the 'at-risk' category to actually taking the step of joining is substantiated by the Journey to Extremism data. A striking 71 percent pointed to 'government action', including 'killing of a family member or friend' or 'arrest of a family member or friend', as the incident that prompted them to join. These findings throw into stark relief the question of how counter-terrorism and wider security functions of governments in at-risk environments conduct themselves with regard to human rights and due process. State security-actor conduct is revealed as a prominent accelerator of recruitment, rather than the reverse." In fact, the best defenses against generating recruits for extremist organizations seemed to be things like reduced social and eonomic exclusion (poverty), improved education, having a family background (peer pressure), and not being on the receiving end of violent repression. Because violence breeds more violence — who knew? (Not the CIA and USAF with their typical "oops" response whenever a drone blows up a wedding party they've mistaken for Al Qaida Central.)

So, let me put some stuff together.

We're living in a period where everything we do in public can be observed, recorded, and will in future provide the grist for deductive mills deployed by the authorities. (Hideous tools of data-driven repression are emerging almost daily without much notice, whether through malice or because they have socially useful applications and the developers are blind to the potential for abuse.) Foreign state-level actors and non-state groupings (such as the new fascist international and its hive of internet-connected insurgents) are now able to use data mining techniques to target individuals with opinions likely to appeal to their prejudices and inflame them into activism. Democracy is directly threatened by these techniques and may not survive in its current form, although there are suggestions that what technology broke, technology might help fix (TLDR: blockchain-enabled e-voting, from the European Parliament Think Tank). And there are some signs that our existing transnational frameworks are beginning to recognize that repressive policing is one of the worst possible shields against terrorism.

Social solidarity. Tolerance. Openness. Transparency that runs up as well as down the personal-institutional scale. And, possibly, better tools for authenticating public statements such as votes, tweets, and blog essays like this one. These are what we need to cleave to if we're not going to live out our lives in a shiny algorithmic big data hellscape.

Read the whole story
9 days ago
Share this story

Panoramic Eclipse Composite with Star Trails

1 Share

What was happening in the sky during last week's total solar eclipse? What was happening in the sky during last week's total solar eclipse?

Read the whole story
21 days ago
Share this story

Trapped in the wrong trouser-leg of time

1 Share

So it's time I faced facts: I've been writing this blog for seventeen years and it is getting bloody difficult to come up with stuff to say. (At least, right now.)

My usual book launch promo stuff last month was derailed totally by family circumstances (that won't recur). I really don't feel like kvetching about politics, either the ongoing UK-specific slow-motion train wreck that is Brexit, or the equally bizarre theatre of the absurd and evil that is the current incumbent of the White House. The global neo-nazi resurgence might be another angle, but I'm not the ideal person to write a "why Nazis are bad, 101" for folks who haven't already got the message—I'm not patient enough and the subject strikes much too close to home for comfort. (I grew up attending a synagogue with older members who had numbers tattooed on their arms; I'm pretty sure that if I lived in the US right now then I'd be a gun owner by now, and stockpiling ammunition and escape plans.)

These are dangerous times in the anglophone lands, and worse is coming; the UK seems to be rushing headlong towards a private debt crisis (largely due to nearly a decade of misguided austerity policies, but with insane ramping of student loan debt on top) and the economic uncertainty induced by the Brexit-triggered recession we're entering isn't helping ... and the Tangerine Shitgibbon in Chief seems to have decided that, in comparison with a short victorious war with North Korea, sending the US army back into Afghanistan is a vote-winner.

Against such news headlines I don't much feel like prognosticating about the near future right now.

I'd like to be able to take comfort by speculating about how things might have turned out differently in another time-line, but that's not so good either. Imagine the Brexit referendum and the US Presidential election results were flipped: where would we be now?

Let's tackle the UK first. David Cameron would still in all probability be Prime Minister, Theresa May would still be Home Secretary, and Boris Johnson would still be a joke. I see no way the UK wouldn't have been hit by several terrorist attacks—Manchester, London Bridge, the same sorry litany—so the likely political response from Dave and Theresa would be the same (kiss your civil rights goodybye, oh, and we're going to censor the internet while we're about it). Osborne would still be Chancellor, so a continuation of his austerity program would be on-going, albeit with an economy not sinking into recession and a currency that isn't crashing to a 30 year low. So it'd all be fucking depressing for those of us on the "let's not starve poor people to death" left, but at least it'd be a familiar kind of depressing instead of an "oh god and by god I mean Cthulhu why are they flooring the accelerator towards that cliff edge?" depressing.

In the USA, let's suppose Hilary Clinton took the Electoral College—just—but the House and Senate seats landed the same way. By now we would for a certainty have a Kenneth Starr 2.0 investigating the Clinton White House on some pretext or other ("but her emails!" would be a good start, even if "Benghazi!" flopped), while a drunk and angry Donald Trump would be tweeting up a storm about how he was robbed and threatening to sue Crooked Hilary in the Supreme Court over those rigged votes she bought from (... insert nonsensical Trumpian rant here). There would probably be deadlock between the executive branch and legislature over Clinton's choice of a new Supreme Court justice, but the exploding clown car attempts at repealing the ACA would have broken down immediately on the inconvenient problem of a Democrat president. The US government would have competent civil service leadership in place, mostly inherited from the Obama administration. There'd be none of the chaotic misrule we've seen this year. But there would still be angst and drama and threats of impeachment, and a President tempted to use foreign military adventurism as a tool of distraction ... and unlike Trump, this alternate-45th POTUS would know exactly how to make that happen. I'm calling it for a US/Russian clash in Syrian airspace, or a disastrous North Korean miscalculation. (What doesn't happen is Clinton going after Iran: she was part of the team that brokered the deal. It's probably too early for a presidential visit and a formal apology for Operation AJAX, at least unless she makes it into a second term, but at least that particular pot would be off the boil.) And the neo-Nazis would still be rebranding themselves as the alt-right and getting their fangs into pop culture via social media and the Republican party via Breitbart Media and Fox.

Tentative diagnosis: we're in a deviant time-line, careering towards a catastrophe. But the time-line we branched off between last June and November held all the seeds of our current doom and we'd have ended up here sooner or later. The root cause is the breakdown of the beige dictatorship at a point where wholly new and frightening tools of propaganda have become available and the social media many people trust are themselves in thrall to toxic agendas. The progressive opposition is chaotic and scattered and racist rabble-rousers have pulled their jack boots on and gotten marching, and they seem to have a first-mover advantage (if only because most of our mass media is owned by chancreous cockstains like Rupert Murdoch).

Read the whole story
31 days ago
28 days ago
Never let it be said that C. Stross doesn't have a way with words.
Share this story

Sean Whitton: The knowledge that one has an unread message is equivalent to a 10 point drop in one's IQ

1 Share

According to Daniel Pocock’s talk at DebConf17’s Open Day, hearing a ping from your messaging or e-mail app or seeing a visual notification of a new unread message has an equivalent effect on your ability to concentrate as

  • a 10 point drop in your IQ; or

  • drinking a glass of wine.

This effect is probably at least somewhat mitigated by reading the message, but that is a context switch, and we all know what those do to your concentration. So if you want to get anything done, be sure to turn off notifications.

Read the whole story
35 days ago
Share this story

Five Privileges of English Speakers, part 1

1 Share

It’s very common today on progressive blogs to urge people to check their privilege.

Being an English speaker, native or non-native, is a privilege.

It’s not as often as discussed as other forms of privilege, such as white, male, cis, hetero, or rich privilege. The reason for this is simple: The world’s media is dominated by the English language. English-language movies are more popular in many countries than movies in these countries’ own languages, English-language news networks are quoted by the rest of the world, the world’s most popular social networks are based in the U.S. and are optimized for U.S. audiences, etc.

So, when English speakers discuss privilege among each other, English is not much of an issue, and they dedicate more time to race, gender, wealth, religion, and other factors that differentiate between people in English-speaking countries.

Despite this, I am not the first one to describe English as a privilege. A simple Google search for english language privilege will yield many interesting results.

What I do want to try to do in this series of posts is to list the particular nuances that make English such a privilege in as much detail as possible. I wanted to write this for a long time, but there are many such nuances, so I’ll just do it in batches of five, in no particular order:

1. Keyboard

If you speak English, congratulations: A keyboard on which your language can be written is available on all electronic devices.

All of them.

All desktops, laptops, phones, tablets, watches. The only notable exception I can think of is typewriters, which only makes the point more tragic: technology moved forward and made writing easier in English, but harder in many other languages, where local-language typewriters were replaced with computers with English-only keyboard.

At the very worst case, writing English on a computer will be slightly inconvenient in countries like Germany, France, or Turkey, where the placement of the Latin letters on the keys is slightly different from the U.S. and U.K. QWERTY standard. Oh, poor American tourists.

On a more serious note, though, even though a lot of languages use the Latin alphabet, a lot of them also use a lot of extra diacritics and special characters, and English is one of the very few that doesn’t. Of the top 100 world’s languages by native speakers, only Malay, Kinyarwanda, Somali, and Uzbek have standardized orthographies that can be written in the basic 26-letter Latin alphabet without any extra characters. We can also add Swahili, which has a large number of non-native speakers, but that’s it. With other languages you can get stuck and not be able to write your language at all (Hindi, Chinese, Russian, etc.), or you may have to write in a substandard orthography because you can’t type letters like é or ł (French, Vietnamese, Polish, etc.).

The above is just the teeny-tiny tip of the iceberg; the keyboard problem will be explored in more points later.

2. Spell-checking

English word morphology is laughably simple.

There’s -s for plurals and for third person present tense verbs, there’s -‘s for possession, and there are -ed and -ing verb forms. There are also some contractions (‘d, ‘s, ‘ll, ‘ve), and a long, but finite list of irregular verb forms, and an even shorter list of irregular plural noun forms. And that’s it.

Most languages aren’t like that. In most languages words change with prefixes, suffixes, infixes, clitics, and so on, according to their role in the sentence.

Beyond the fact that English writing is (arguably) easier for children and foreigners to learn, this means that software tools for processing a language are easy to develop for English and hard to develop for other languages.

The first simple example is spell-checking.

English has had not just spelling, but also grammar and style checkers built into common word processors for decades, and many languages of today don’t even have spelling checkers, not to mention grammar, or style, or convenient searching. (See below.)

So in English, when you type “kinh”, most word processors will suggest correcting it to “king”, but then, some of them may also suggest replacing this word with “monarch” to be more inclusive for women, and this is just one of the hundreds of style improvement suggestions that these tools can make. For a lot of other languages, even simple spell-checking of single words hasn’t been developed yet, and grammar checking is a barely-imaginable dream.

3. Autocompletion

Simpler morphology has many other effects.

Even though Russian is my first native language and I speak it more fluently than I speak English, I am much slower when I’m typing in Russian on my phone. In English, the autocompleting keyboard makes it possible to write just two or three letters of a word and let the software complete the rest. In Russian, the ending of the word must be typed, and autocompletion rarely guesses it correctly. Typing an incorrect ending will make a sentence convey incorrect information, or just make it completely ungrammatical.

4. Searching

A yet-another issue of the previous point, English’s very simple morphology makes searching easier.

For example word processors have a search and replace function. For English, it will likely find all forms of the word, because there are so few of them anyway. But in Hebrew and Arabic, letters are often inserted or changed in the middle of the word according to its grammatical state, and you need to search for each form, which is quite agonizing. It’s comparable to “man” vs. “men” in English, except that in English such changes are very rare, while in many other languages it happens in almost every word.

With search engines that must find words across thousands of documents it gets even harder. Google can easily figure out that if you’re searching for “drive”, you may also be interested in “driving”, “drove”, and “driven”, but Russian has dozens of other forms for this word. A few languages are lucky: special support was developed for them in search engines, and tasks of this kind are automated, but most languages our just out in the cold. But English barely needs extra support like this in the first place.

5. Very little gender

A lot can be said about gendered language, but as far as basic grammar goes, English has very little in the area of gender. “He” and “She”, and that’s about it. There are also man/woman, actor/actress, boy/girl, etc., but these distinctions are rarely relevant in technology.

In many other languages gender is far more pervasive. In Semitic and Slavic languages, a lot of verb forms have gender. In English, the verb “retweeted” is the same in “Helen retweeted you” and “Michael retweeted you”, but in Hebrew the verb is different. Because Twitter doesn’t know that Helen needs a different verb, it uses the masculine verb there, which sounds silly to Hebrew speakers.

I asked Twitter developers about this many times, and they always replied that there’s no field for gender in the user profile. It becomes more and more amusing lately, now that it has become so common —and for good reasons!— to mention what one’s preferred pronouns are in the Twitter profile bio. So people see it, but computers don’t.

On a more practical note, in the relatively rare cases when third person pronouns must be used in software strings, English will often use the singular “they” instead of “he” or “she”. So English-speaking developers do notice it, but not as often as they should, and when they do, they just use the lazy singular-they solution, which is socially acceptable and doesn’t require any extra coding. If only they’d notice it more often, using their software in other languages would be much more convenient for people of all genders.

The only software packages that I know that have reasonably good support for grammatical gender are MediaWiki and Facebook’s software. I once read that Diaspora had a very progressive solution for that, but I don’t know anybody who actually uses it. There may be other software packages that do, but probably very few.

These are just the first five examples of English-language privilege I can think of. There will be many, many more. Stay tuned, and send me your ideas!

Filed under: English, Hebrew, language, Russian, software, translation, Wikipedia
Read the whole story
36 days ago
Share this story

Laughing ORES to death with regular expressions and fake threads

1 Share

At 1100 UTC on June 23rd, ORES started to struggle. Within a half hour, it had fully choked and could no longer respond to any requests. It took us 10 hours to diagnose the problem, solve it, and consider it solved. We learned some valuable lessons when studying and addressing this issue.

You can't prevent bad things from happening. Something will always go wrong. So you do the best that you can to handle bad things gracefully. In a distributed processing environment like ORES' cluster, the worst thing that could happen is to have a process block for forever. So, preparing for bad things means means you use timeouts for just about everything. So far, this has been a great strategy and it makes it so that, at worst, only a few requests out of many fail when something goes wrong. Regretfully, for this downtime event, we had one of the worst bad things happen, and at the same time, we discovered that our timeouts were not capable of stopping deeply processes that go rogue in a specific way. In this blog post, I'll explain what happened.

Recursive backtracking in a regular expression

Many of the models deployed in ORES use regular expressions to extract signal about the quality of an edit. For example, we use them to match "badwords" (curse words, racial slurs, and other words that are commonly used to cause offense) and "informals" (linguistic colloquialisms like "haha" or "lol" or "wtf"). One such regular expression that we used to match informal laughing in Spanish language looked like this: /j+[eaiou]+(j+[aeiou]*)*/ It is intended to match strings like "jajajaja" or "jijiji".

In this edit of Spanish Wikipedia, an IP editor added a very long string of repeated "JAJAJJAJAJAJAJAJAJ" to the article for "Terrain". This is exactly what the regular expression was designed to match. But there was a problem. This regular expression was poorly designed in that it caused a catastrophic backtracking pattern. Every time it would match the entire sequence of "JAJAJJAJAJAJAJAJAJ" and then fail when encountered "...JAJAJlentos...", it would re-attempt the entire match dropping just one "JA" from the middle. This problem doesn't really matter for any short sequences. But for one very long one (and this one was 4155 chars long == 230 repetitions of "JAJAJJAJAJAJAJAJAJ"), it would have taken days to finish. The plot below demonstrates how badly things break down at only 14 repetitions.

Where were the timeouts?

So things like this happen. When operating in a distributed processing environment, you should always have timeouts on everything so if something goes haywire, it doesn't take everything down. Regretfully, matching a regular expression is not just a special opportunity for pathological backtracking, but also an opportunity to learn hard lessons about safe timeouts.

So, we have timeouts in ORES in a few strategical places. E.g. if a single scoring job takes longer than 15 seconds (extracting informal "JAJAJA" is part of a scoring job), then it is supposed to time out. But for some reason, we weren't timing out during regular expression matching. So, I started digging into the library we use to implement execution timeouts. What I learned was horrifying.

So, most timeouts in python are implemented with "threads". I put "threads" in quotes because threads in python are a convenient abstraction and not true concurrency. Python's Global Interprer Lock(GIL) is an internal mutex that prevents truly concurrent threading. In order to get around this, python uses separate processes to implement concurrency. I'm not going to get into the details of the GIL or process based concurrency. But suffice it to say, when you use an external C library to execute a regular expression match on a string, any thread that is trying to implement a timeout is going to get locked up and totally fail to do what it is supposed to do!

Because our threading-based timeouts were completely disabled by this long regular expression match, our "precaching" system (makes sure we score every edit and put the score in the cache ASAP) was slowly taking us down. Every time the problematic diff was requested, it would render yet another worker unreachable. Because ORES would just fail to respond, our precaching system registered a Connection Timeout and would simply retry the request. Eventually capacity would decay as our ~200 workers were locking at 100% CPU one by one.

Luckily, there's an easy solution to this problem in unix signals. By having the operating system help us manage our timeouts, we could stop relying on python threads to behave sanely in order for us to recover from future rogue processes.

So, you fixed it right?

First, I should thank @ssastry for his quick work identifying the pathological backtracking problem and submitting a fix. We also completed an emergency deployment of ORES that implemented the use of Unix signals and we've been humming along, scoring all of the things, ever since.

Read the whole story
36 days ago
Share this story
Next Page of Stories