nlp - What text processing tool is recommended for parsing screenplays? -
i have plain-text kinda-structured screenplays, formatted example @ end of post. parse each format where:
- it easy pull stage directions deal specific place.
- it easy pull dialogue belonging particular character.
the obvious approach can think of using sed
or perl
or php
put div tags around each block, classes representing character, location, , whether it's stage directions or dialogue. then, open web-page , use jquery pull out whatever i'm interested in. sounds roundabout way , maybe seems idea because these tools i'm accustomed to. i'm sure recurring problem that's been solved before, can recommend more efficient workflow can used on linux box? thanks.
here sample input:
somewhere corporation - optional comment guy named bob sitting @ computer. bob mmmm. stackoverflow. like. footsteps heard approaching. alice where's report said you'd have me? closeup of clock ticking. bob (looking up) huh? what? alice more dialogue. more stage directions.
here sample output might like:
<div class='scene somewhere_corporation'> <div class='comment'>optional comment</div> <div class='direction'>a guy named bob sitting @ computer.</div> <div class='dialogue bob'>mmmm. stackoverflow. like.</div> <div class='direction'>footsteps heard approaching.</div> <div class='dialogue alice'>where's report said you'd have me?</div> <div class='direction'>closeup of clock ticking.</div> <div class='comment bob'>looking up</div> <div class='dialogue bob'>huh? what?</div> <div class='dialogue alice'>some more dialogue.</div> <div class='direction'>some more stage directions.</div> </div>
i'm using dom example, again, because that's understand. i'm open whatever considered best practice type of text-processing task if, suspect, roll-your-own regexps , jquery not best practice. thanks.
you use celtx import plain text scripts , export them html (and rdf/xml metadata) (see related thread , blog post, describes file structure).
other screenplay editors trelby might offer feature, too.
there fountain, plain text markup language screenwriting. offer libraries might (i did not check if offer importing , converting) use cause:
fountain free , open-source, libraries make easy add support in apps.
even if projects can’t used cause, @ least reuse format output.
Comments
Post a Comment