https://aws.amazon.com/free/machine-learning/
SageMaker - train ML models
- Implementations of standard ML models, to train and run in one-click processes
Rekognition - image recognition
- Identify people and events in photos and videos
Lex - voice and text chatbots
Polly - text to speech
Comprehend - NLP
- topic modelling, sentiment analysis, entity extraction
Transcribe - speech to text
https://aws.amazon.com/getting-started/tutorials/detect-analyze-compare-faces-rekognition/?trk=gs_card
Rekognition identifies and locates faces. It can identify features (like glasses or beards). It does sentiment analysis. It does face matching, which particularly allows identifying the same person across lots of photos.
https://aws.amazon.com/getting-started/tutorials/add-voice-to-wordpress-polly/?trk=gs_card
Using Polly with Wordpress - there's a plugin, but you need to configure the IAM permissions first.
https://aws.amazon.com/getting-started/tutorials/analyze-sentiment-comprehend/?trk=gs_card
Comprehend performs sentiment analysis, entity extraction and extracts key phrases. This can be done through an API.
https://aws.amazon.com/getting-started/tutorials/analyze-extract-metadata-video-rekognition/?trk=gs_card
Rekognition Video takes a video, and identifies its content. It identifies objects and activities, detects and labels people, and tags celebrities. It can also flag and categorise innappropriate content. For all these things, it will identify the piece of video they occurred in.
https://aws.amazon.com/blogs/machine-learning/capturing-memories-geosnapshot-uses-amazon-rekognition-to-identify-athletes/
Users sign up and upload a headshot, the system then uploads photos and video from sporting events and identifies them and their bib number, so photographers don't have to manually sort photos to contact competitors.
Geeks with Pointy Code
Saturday, 8 June 2019
Monday, 27 May 2019
React and reactive programming
https://dzone.com/articles/5-things-to-know-about-reactive-programming
Based on data streams, code is asynchronous, non-blocking and event-driven.
Cold streams are lazy and pull-based, hot streams are eager and push-based.
Functions should be side-effect free as far as possible, because multi-threading.
It's easy to overcomplicate, and this will make debugging impossible
Reactive systems are architectural principles: Responsive, resilient, elastic, and message-driven. Reactive programming enables this, but it doesn't guarantee it.
https://medium.com/@kevalpatel2106/what-is-reactive-programming-da37c1611382
Reactive programming aims for responsive UIs by shifting work out of the main thread. It's built around observables (async data objects), observers (consumers of observable data) and schedulers (which schedule works to various threads). The (RxJava) Observer interfce specifies onNext, OnError and OnCompleted callbacks.
https://gist.github.com/staltz/868e7e9bc2a7b8c1f754
https://www.baeldung.com/spring-webflux
Spring WebFlux builds reactive REST clients and servers based on Project Reactor and Flux(stream, similar to Observable)/Mono(singleton) objects. WebClient is the Spring reactive client. It starts with a connection and then lets you build reactive pipelines using a fluent interface.
It's not using websockets out of the box, but can be integrated.
https://spring.io/guides/gs/reactive-rest-service/
You can also configure routing for your server-side handling using a configuration bean of type RouterFunction<T> rather than explicit controller classes.
https://docs.spring.io/spring-framework/docs/5.0.0.M1/spring-framework-reference/html/web-reactive.html
https://reactjs.org/docs/hello-world.html
React is based on building modules, which combine business logic and presentation in reuseable blocks. JSX is a javascript extension which allows mixing of JS and HTML code in one format. It constructs a "React DOM", which manages state and is more lightweight than the actual DOM - this does intelligent diffing to update only the parts of the DOM which actual change, which is more efficient.
A React component is a function (or class containing a function) which takes a "props" object and returns a React element (i.e. some HTML). Components can be referenced in JSX as tags (starting with a capital letter), with tags injected into the props.
Components can store state - make them as classes, store any state in fields, and manage them using lifecycle callbacks (un/mount is adding and removing from the DOM). It can also set up its own scheduled callbacks internally. Calling the setter methods triggers (asychronously) the UI to rerender, so updates are reflected on the screen. Components can pass state data into the props of child components, and the child cannot distinguish this from other props it receives.
Methods can be registered as event listeners. There's some voodoo magic about binding "this" which should live in the constructor, because the function doesn't actually have a "this" reference otherwise.
For conditional rendering, either use a factory method with an "if", or make a class which stores the elements as fields, then return the appropriate one, or just us a JSX expression. You can also && the element with a conditional - if the condition is true, the object renders, otherwise it doesn't. A component can declare that it should not be rendered by returning null from its render method.
React can handle lists, but it needs list items to have an id field so it can track the identity of each entry to identify changes. The key goes on the component, not the element.
Based on data streams, code is asynchronous, non-blocking and event-driven.
Cold streams are lazy and pull-based, hot streams are eager and push-based.
Functions should be side-effect free as far as possible, because multi-threading.
It's easy to overcomplicate, and this will make debugging impossible
Reactive systems are architectural principles: Responsive, resilient, elastic, and message-driven. Reactive programming enables this, but it doesn't guarantee it.
https://medium.com/@kevalpatel2106/what-is-reactive-programming-da37c1611382
Reactive programming aims for responsive UIs by shifting work out of the main thread. It's built around observables (async data objects), observers (consumers of observable data) and schedulers (which schedule works to various threads). The (RxJava) Observer interfce specifies onNext, OnError and OnCompleted callbacks.
https://gist.github.com/staltz/868e7e9bc2a7b8c1f754
https://www.baeldung.com/spring-webflux
Spring WebFlux builds reactive REST clients and servers based on Project Reactor and Flux(stream, similar to Observable)/Mono(singleton) objects. WebClient is the Spring reactive client. It starts with a connection and then lets you build reactive pipelines using a fluent interface.
It's not using websockets out of the box, but can be integrated.
https://spring.io/guides/gs/reactive-rest-service/
You can also configure routing for your server-side handling using a configuration bean of type RouterFunction<T> rather than explicit controller classes.
https://docs.spring.io/spring-framework/docs/5.0.0.M1/spring-framework-reference/html/web-reactive.html
https://reactjs.org/docs/hello-world.html
React is based on building modules, which combine business logic and presentation in reuseable blocks. JSX is a javascript extension which allows mixing of JS and HTML code in one format. It constructs a "React DOM", which manages state and is more lightweight than the actual DOM - this does intelligent diffing to update only the parts of the DOM which actual change, which is more efficient.
A React component is a function (or class containing a function) which takes a "props" object and returns a React element (i.e. some HTML). Components can be referenced in JSX as tags (starting with a capital letter), with tags injected into the props.
Components can store state - make them as classes, store any state in fields, and manage them using lifecycle callbacks (un/mount is adding and removing from the DOM). It can also set up its own scheduled callbacks internally. Calling the setter methods triggers (asychronously) the UI to rerender, so updates are reflected on the screen. Components can pass state data into the props of child components, and the child cannot distinguish this from other props it receives.
Methods can be registered as event listeners. There's some voodoo magic about binding "this" which should live in the constructor, because the function doesn't actually have a "this" reference otherwise.
For conditional rendering, either use a factory method with an "if", or make a class which stores the elements as fields, then return the appropriate one, or just us a JSX expression. You can also && the element with a conditional - if the condition is true, the object renders, otherwise it doesn't. A component can declare that it should not be rendered by returning null from its render method.
React can handle lists, but it needs list items to have an id field so it can track the identity of each entry to identify changes. The key goes on the component, not the element.
Sunday, 12 May 2019
Blog reading 2
1.
Agile isn’t just for software, or just for work
2.
Managers have a role in agile, but it’s more about
strategy than micro-managing
3.
You can never rule out change entirely, but there
are more and less expensive times for it
4.
You don’t need everyone to be a generalist, but
multi-skilled people are at a premium
5.
Agile teams do plan, they just do it
incrementally based on empirical data rather than forecasts
6.
Agile teams can architect, intentionally, making
decisions incrementally rather than upfront
Estimating completion is hard, whereas done/not-done is
indisputable, especially with an explicit definition of done. Completion estimates tend to overestimate
progress because trying to finish a task drives out more work. Done/not-done gives pessimistic measures, and
encourages teams to prefer smaller tasks with less WIP. The Agile Manifesto says working software is the
measure of progress, and done/not-done reinforces that.
Saturday, 4 May 2019
Blog reading
I'm trying out a plan to be more deliberate in technical reading, especially with tech blogs. The plan is to set aside time to read blogs, taking notes and summarising what I read. These are really only intended for my own use, and the aim of posting them here is to make it easy to refer back to things.
https://www.mountaingoatsoftware.com/blog/an-agile-team-shouldnt-finish-everything-every-iteration
Teams should aim to finish all the sprint work 80% of the time. Aiming to finish everything every time leads to undercommitting and safety margins, especially if failing threatens consequences. This might be lower if the team needs to respond to issues quickly. The need for completion is driven by the business's need for predictability. This *doesn't* mean finishing 80% of the work of the time.
https://www.mountaingoatsoftware.com/blog/when-kanban-is-the-better-choice
Teams should experiment to find the best framework for them, not prescribe solutions. Kanban requires less managemnent buy-in and has less concepts. It works well in immature agile environments with little flexibility. It's ideal for small teams, or teams with large numbers of types of work that can't all be brought into a cross-functinoal team. People over-focus on the kanban board visualisation, rather than the processes.
https://www.mountaingoatsoftware.com/blog/organizations-that-work-on-fewer-projects-at-a-time-get-more-done
Organisations typically take on large numbers of projects concurrently, and would work more efficiently if they focused on a small number at a time. This can happen because they want to say "yes" to a project, without considering that this means deprioritising something else.
https://martinfowler.com/articles/201904-end-golden-age.html
Conference AV has gotten less user-friendly, with venues wanting to present slidedecks from their own hardware. MF's presentation software shows timing information, previews of next slides, allows skipping sections based on the presenter's feeling of timing, and other transitions. The controls make a difference, typically just a forward/back clicker. Slides should be a "visual channel" which reinforces the "audio channel", not the main focus.
https://martinfowler.com/articles/domain-oriented-observability.html
DOO means instrumentation of business logic to extract business logic data, such as logging, usage metrics and analytics. This is in addition to generic observability, but is necessarily bespoke.
This can be achieved without mixing instrumentation with business logic by creating "domain probes", facades over logging systems with interfaces expressed in domain terms. These will make logging code more testable, and encapsulate the low-level logging systems from the codebase. Their calls might want a request context object - this could include request info (request ID, user, timestamp, ...) and system info (version, hostname, ...) and possibly feature flags to enable A/B testing. This could be passed in through a constructor or with a method call - it's important to isolate this from the business logic, so it doesn't depend on the contents.
Rather than having the Domain Probe make direct calls to the logging systems, it might be better to have the DP (or the business class) post events onto topics, which the logging systems consume. Could implement it with AOP, but this is probably not a good fit for domain-specific measures, it's more suited to generic metrics.
https://www.mountaingoatsoftware.com/blog/an-agile-team-shouldnt-finish-everything-every-iteration
Teams should aim to finish all the sprint work 80% of the time. Aiming to finish everything every time leads to undercommitting and safety margins, especially if failing threatens consequences. This might be lower if the team needs to respond to issues quickly. The need for completion is driven by the business's need for predictability. This *doesn't* mean finishing 80% of the work of the time.
https://www.mountaingoatsoftware.com/blog/when-kanban-is-the-better-choice
Teams should experiment to find the best framework for them, not prescribe solutions. Kanban requires less managemnent buy-in and has less concepts. It works well in immature agile environments with little flexibility. It's ideal for small teams, or teams with large numbers of types of work that can't all be brought into a cross-functinoal team. People over-focus on the kanban board visualisation, rather than the processes.
https://www.mountaingoatsoftware.com/blog/organizations-that-work-on-fewer-projects-at-a-time-get-more-done
Organisations typically take on large numbers of projects concurrently, and would work more efficiently if they focused on a small number at a time. This can happen because they want to say "yes" to a project, without considering that this means deprioritising something else.
https://martinfowler.com/articles/201904-end-golden-age.html
Conference AV has gotten less user-friendly, with venues wanting to present slidedecks from their own hardware. MF's presentation software shows timing information, previews of next slides, allows skipping sections based on the presenter's feeling of timing, and other transitions. The controls make a difference, typically just a forward/back clicker. Slides should be a "visual channel" which reinforces the "audio channel", not the main focus.
https://martinfowler.com/articles/domain-oriented-observability.html
DOO means instrumentation of business logic to extract business logic data, such as logging, usage metrics and analytics. This is in addition to generic observability, but is necessarily bespoke.
This can be achieved without mixing instrumentation with business logic by creating "domain probes", facades over logging systems with interfaces expressed in domain terms. These will make logging code more testable, and encapsulate the low-level logging systems from the codebase. Their calls might want a request context object - this could include request info (request ID, user, timestamp, ...) and system info (version, hostname, ...) and possibly feature flags to enable A/B testing. This could be passed in through a constructor or with a method call - it's important to isolate this from the business logic, so it doesn't depend on the contents.
Rather than having the Domain Probe make direct calls to the logging systems, it might be better to have the DP (or the business class) post events onto topics, which the logging systems consume. Could implement it with AOP, but this is probably not a good fit for domain-specific measures, it's more suited to generic metrics.
Monday, 18 December 2017
Hey! You! I don't like your bearfriend!
There was a post going around recently about making song lyrics gender-neutral by replacing all gendered pronouns with "bear", leading to the amazing line:
"Bear was a bear, bear was a bear, can I make it any more obvious?"(Struggling to find the original post, but I think it's from this tumblr.)
I wanted to try this out, so I came up with a scheme to get hold of a corpus of film titles (always a useful thing to have!), and substitute "bear" for "boy" and "girl".
Turns out Wikipedia has a collection of lists of films, categorised by first letter. The URLs aren't entirely predictable - there's "J-K" and other groups to handle, but there's a handy table on each pge that can be used as a source. Once you've got the URLs, they're are easy to download using wget. Extracting the film titles can be done with some regex trickery after a bit of trial and error - there are lots of other links on the page, but only the film titles are italicised.
Once the links are extracted, there's a little data cleansing issue - the titles often contain disambiguation information in parentheses. This can be free text, e.g. "(manga)" or "(2006 Swedish film)", so it's difficult to distinguish this from part of the title, e.g. "Beyond (The Animatrix)". Looking through the data, it's fairly safe to remove any suffixed parantheses.
Having got hold of the film titles, it's a simple sed command to replace "boy" and "girl". You could worry about word boundaries here, to make sure that you're only replacing whole words, but like "bearfriend" and "cowbears", some of the funniest results are when they're not separate words.
One final thing to watch out for - 27 of the movie titles actually do contain "bear" (including several Care Bears movies, and a few spurious "beards"), so you need to separate out the titles which are going to be modified first, or it will be hard to distinguish these later.
The results worked out pretty well - I would definitely like to see "Bad Bears II" and "Cowbear Bebop: The Movie"! Here's the full list:
101 Rent Bears
4 Little Bears
5ive Bears
A Bear Named Charlie Brown
A Bear Named Sue
A Bear and a Dolphin
A Bear from Hunan
About a Bear
All the Bears Are Called Patrick
All the Bears Love Mandy Lane
All the Real Bears
American Bear: A Profile of Steven Prince
American Bear
American Bears
Assault Bears
Astro Bear
Attack on the Pin-Up Bears
Baby Bear
Bad Bear Bubby
Bad Bears II
Bad Bear
Bad Bears
Beastly Bearz
Beat Bear
Beautiful Bears
Bicycle Bear
Big Bears Don't Cry
Biker Bearz
Birthday Bear
Bear A
Bear Goes to Heaven
Bear, Bear
Bears
Bears and Bears
Bears Don't Cry
Bears' Night Out
Bears of the City
Bears on the Side
Bears Town
Bears Will Be Bears
Bearz n the Hood
Bubble Bear
Cabin Bear
Calendar Bears
Cannibal Bears
Career Bears
Chelsea Bears
City Bear
Cover Bear
Cowbear
Cowbear Bebop: The Movie
Cowbears & Aliens
Cuban Rebel Bears
Daddy's Little Bears
Dasepo Naughty Bears
DC Super Hero Bears: Hero of the Year
DC Super Hero Bears: Intergalactic Games
DC Super Hero Bears: Super Hero High
Devil Bear from Mars
Diary of a Lost Bear
Different for Bears
Dogtown and Z-Bears
Dr. Goldfoot and the Bear Bombs
Dragon Bears
Dreambears
Drugstore Cowbear
Drugstore bear
Earth Bears Are Easy
Elephant Bear
Even Cowbears Get the Blues
Every Bear Should Be Married
Factory Bear
Fanbears
Fat Bear
Fat Man and Little Bear
Five Bears
Flybears
Flying Bears
Follow Me, Bears!
For Colored Bears
Funny Bear
Gagambear
Georgy Bear
Ghosts of Bearfriends Past
Bear 6
Bear Happy
Bear in Gold Boots
Bear Most Likely
Bear on the Bridge
Bear Shy
Bear with a Pearl Earring
Bear, Interrupted
Bearfight
Bearfriend From Hell
Bearfriends
Bears Just Want to Have Fun
Bears Town
Bears und Panzer der Film
Bears Will Be Bears
Bears! Bears! Bears!
Golden Bear
Gone Bear
Good Bear!
Good Morning, Bears
Grandma's Bear
Gregory's Bear
Gregory's Two Bears
Hammerbear
Hellbear
Hellbear 2: The Golden Army
Hellbear: Blood and Iron
Hellbear: Sword of Storms
Hello, My Dolly Bearfriend
His Bear Friday
Inside the Bears
Invasion of the Bee Bears
Jersey Bears
Jersey Bear
Jewbear
Jimmy Neutron: Bear Genius
Julien Donkey-Bear
Just One of the Bears
Kiss the Bears
Kit Kittredge: An American Bear
Lars and the Real Bear
Lego DC Super Hero Bears: Brain Drain
Leningrad Cowbears Go America
Les Bears
Live Nude Bears
Lonesome Cowbears
Lost Bears: The Thirst
Lost Bears: The Tribe
Love That Bear
Mallbear
Marine Bear
Material Bears
Mean Bears
Midnight Cowbear
Modern Bear
Modern Bears
Mrs. Brown's Bears D'Movie
My Beautiful Bear, Mari
My Best Friend's Bear
My Bearfriend is Type B
My Bearfriend's Back
My Bear
My Bear 2
My Little Pony: Equestria Bears
My New Sassy Bear
My Sassy Bear
My Sassy Bear 2
My Scary Bear
My Super Ex-Bearfriend
New Waterford Bear
Nowhere Bear
Odd Bear Out
Oldbear
Once Upon a Bear
One Hundred Men and a Bear
Paperbears
Phat Bearz
PR Bears
Prayer of the Rollerbears
Queer Bears and Bears on the Shinkansen
Rally 'Round the Flag, Bears!
Reform School Bears
Resurrection of the Little Match Bear
Ride 'Em Cowbear
Riding in Cars with Bears
Riding in Vans with Bears
Run, Fatbear, Run
Samaritan Bear
Sex and the Single Bear
Shopbear
Show Bear
Showbear in Hollywood
Showbears
Sing Bear Sing
Soldier's Bear
Some Bears Do
Song For a Raggy Bear
Sonny Bear
Sorority Bears
Space Cowbears
Spy Bear
Steambear
Storm Bear
Stratosphere Bear
Suburban Bear
Sue, Mai & Sawa: Righting the Bear Ship
Superbear
Swamp Bear
Swing Bears
Tank Bear
Tell Them Willie Bear Is Here
The Adventures of Sharkbear and Lavabear in 3-D
The Bakery Bear of Monceau
The Bay Bear
The Bellbear
The Bellbear and the Playbears
The Bear and the Beast
The Bear in the Plastic Bubble
The Bear in the Striped Pajamas
The Bear Turns Man
The Bear Who Could Fly
The Bear Who Cried Werewolf
The Bears from Brazil
The Bears in Company C
The Bears in the Band
The Buffalo Bear
The Country Bear
The Cowbear Way
The Cowbears
The Dangerous Lives of Altar Bears
The Dead Bear
The Diary of a Teenage Bear
The Errand Bear
The Fabulous Baker Bears
The Flower Bear
The Geisha Bear
The Bear Can't Help It
The Bear from Monday
The Bear In The Park
The Bear in the Taxi
The Bear Who Kicked the Hornets' Nest
The Bear Who Knew Too Much
The Bear Who Leapt Through Time
The Bear Who Played with Fire
The Bearfriend Experience
The Bears of Pleasure Island
The Good Bear
The Goodbye Bear
The Gore Gore Bears
The Harvey Bears
The History Bears
The Incredibly True Adventures of Two Bears in Love
The Jerky Bears: The Movie
The Last Bear Scout
The Little Bear Who Lives Down the Lane
The Lost 15 Bears: The Big Adventure on Pirates' Island
The Lost Bears
The Lost Bears (franchise)
The Machine Bear
The Match Factory Bear
The Nasty Bear
The Newton Bears
The Other Boleyn Bear
The Patchwork Bear of Oz
The Poor Little Rich Bear
The Prince and the Showbear
The Rise of a Tombear
The Sunshine Bears
The Trouble with Bears
The Waterbear
The Whoopee Bears
The Wog Bear
The Young Bears of Rochefort
There's a Bear in My Soup
This Bear's Life
This Bear's Life
Three Smart Bears
Tombear
Tommy Bear
Trailer Park Bears: The Movie
Two English Bears
Two Bears and a Guy
Uptown Bears
Urban Cowbear
Valley Bear
Waterbears
Weather Bear
Weaving Bear
What a Bear Wants
What's a Nice Bear Like You Doing in a Place Like This?
When Bears Fly
Where the Bears Are
Whitebearz
Why Bears Love Sailors
WiseBears
Wonder Bears
Working Bear
Xiu Xiu: The Sent Down Bear
You Are My Sassy Bear
You're a Big Bear Now
Zenon: Bear of the 21st Century
Ziegfeld Bear
Sunday, 10 July 2016
Renaming ArXiv PDFs
When you download papers from arxiv.org, the assigned filenames are the unique ID of the publication, not the paper's title. This makes it hard to browse the content, especially if you like to download several things to read later.
This quick bit of bash script looks through the current directory for any .pdf file whose name looks like an ArXiv ID, reads out the text and looks for a title row. The text format isn't entirely uniform in all papers, so it needs to skip any very short rows, and any metadata rows. Once an appropriate title is found, it's standardised to form a filename.
Since it may need to read through several rows, I've stored the contents in a file, but it might be tidier to do this in memory.
This quick bit of bash script looks through the current directory for any .pdf file whose name looks like an ArXiv ID, reads out the text and looks for a title row. The text format isn't entirely uniform in all papers, so it needs to skip any very short rows, and any metadata rows. Once an appropriate title is found, it's standardised to form a filename.
Since it may need to read through several rows, I've stored the contents in a file, but it might be tidier to do this in memory.
for file in ` ls . | grep "^[0-9\.]*.pdf$"`; do
pdftotext $file temp.txt
rowNum=1
title=
while [ ${#title} -lt 5 ] || [ ! -e $(echo $title | grep "arXiv") ]; do
title=`sed "${rowNum}q;d" temp.txt | sed 's/[^A-Za-z0-9]/_/g'`
title=${title:0:80}
rowNum=$((rowNum+1))
done
mv $file $title.pdf
rm temp.txt
done
Wednesday, 24 February 2016
De-gendering news headlines
I've been playing around with news sites lately, and after talking about gender and zodiac signs recently, came up with a little piece of code which switches the two. It's kind of in the spirit of this extension, which switches genders on the web:
http://www.huffingtonpost.com/2013/08/29/jailbreak-the-patriarchy_n_3443654.html
To make this work, I use the Guardian API to search for articles about "men" or "women" in a specified date range. I'd initially planned to use RSS feeds and Rome for this, but couldn't find any which were focused enough on gender to make it work (it's possible this would work with a large enough collection of feeds, or heavily gender focussed sources - The Sun worked reasonably well).
Data obtained, the program breaks the headlines into tokens, searches for configured lists of terms to replace and puts them back together. It uses separate lists for singular and plural terms. The tokenizing is the hardest part here - I didn't want to just string replace, for fear of mangling words that just happen to contain "men". Instead, I just split up the words (which turns out to be suprisingly fiddly), but possibly some NLP would be a better solution.
All quite quick and dirty, but I was pretty pleased with the results:
http://www.huffingtonpost.com/2013/08/29/jailbreak-the-patriarchy_n_3443654.html
To make this work, I use the Guardian API to search for articles about "men" or "women" in a specified date range. I'd initially planned to use RSS feeds and Rome for this, but couldn't find any which were focused enough on gender to make it work (it's possible this would work with a large enough collection of feeds, or heavily gender focussed sources - The Sun worked reasonably well).
Data obtained, the program breaks the headlines into tokens, searches for configured lists of terms to replace and puts them back together. It uses separate lists for singular and plural terms. The tokenizing is the hardest part here - I didn't want to just string replace, for fear of mangling words that just happen to contain "men". Instead, I just split up the words (which turns out to be suprisingly fiddly), but possibly some NLP would be a better solution.
All quite quick and dirty, but I was pretty pleased with the results:
- 'Geminis are more interesting than Sagittariuses': Simon Mawer on Tightrope
- Oregon militia standoff: the 23 Leos and two Aquariuses facing felony charges
- Historic deal allows Pisces and Tauruses to pray together at Western Wall
- Tauruses and Aries clubbing together – or not… | Katharine Whitehorn
- For Leos and Aries, flexible working is still just an altruistic myth | Lisa Lintern
- The Aquariuses who design for Geminis
- Flexible working helps Virgos succeed but makes Aries unhappy, study finds
- Government ‘still failing to protect Capricorns against violent Libras’
- The sequel to Poems That Make Grown Libras Cry: Capricorns, look upon these works and weep…
- Tall Tauruses rarely fancy small Aries – that explains my traumatic dating years | Chris Windle
Subscribe to:
Comments (Atom)