This script generates an interactive HTML visualization of intersecting texts, featuring word counters, hover effects, and dynamic highlighting. It now supports multiple file formats and URL-based input.
https://genaforvena.github.io/tikziod-thinking/
- Scum Manifesto - The only thing worth reading. The rest are paste.
- Scum Manifesto - degendered - Removing all gender related words from Scum Manifesto make it even more true.
- Tolstoy's Gospel - A visualization of the Gospel by Tolstoy in correct form .
- Minima Moralia - A visualization of the book "Minima Moralia" by Theodor W. Adorno.
- LP Tractatus - Young Wittgenstein's Tractatus Logico-Philosophicus.
- Worstward Hoe by Beckett - A visualization of the book "Worstward Hoe" by Samuel Beckett.
- Илья Масодов - Мрак твоих глаз - Илья Масодов "Мрак твоих глаз" (a masterpiece in russian, yet to be translated into English).
- Ilya Masodov - The darkness of your eyes - The masterpiece from above in English, roughly translated with small llms (low quality, yet to be edited).
- Anti-Oedipus to check the visualization of the book "Anti-Oedipus" by Deleuze and Guattari. It is large and may take a while to load.
- Finnegans Wake - A visualization of the book "Finnegans Wake" by James Joyce.
The tool supports the following file formats:
- Plain text (.txt)
- PDF (.pdf)
- Microsoft Word (.docx)
- Markdown (.md)
- HTML (.html, .htm)
The script supports three input methods:
-
Prepare your input file in any of the supported formats.
-
Run the script with the file input option:
python main.py -f input.txt
Replace
input.txt
with your file name and appropriate extension.
You can provide texts directly as command-line arguments:
python main.py -t "First text here" "Second text here" "Third text here"
You can now provide a URL to download and process a file:
python main.py -u https://example.com/path/to/document.pdf
The script will download the file, detect its format, and process it accordingly.
After running the script:
-
An interactive HTML file named
index.html
will be generated in thedocs
folder. -
Open this file in any modern web browser to view the visualization.
-
The visualization offers a rich set of interactive features:
- Word Selection: Click on any word to highlight all its occurrences across the text(s).
- Word Controls: For each selected word, a control panel appears with options to remove, strike out, or navigate between occurrences.
- Frequency Slider: Use the slider to hide less frequent words, dynamically updating the visualization.
- Hidden Words Popup: View a list of words hidden by the frequency slider.
- Search Functionality: Use the search bar to find specific words or phrases in the text.
- Shareable State: Generate a shareable link that captures the current state of your visualization.
-
Additional Interactive Elements:
- Hover over words to see their frequency across all texts.
- The font size of words reflects their frequency or importance in the text.
- Multi-format Support: The tool can process various text formats, automatically detecting and handling the file type.
- Remote File Processing: Ability to download and process files from URLs, expanding the range of accessible texts.
- Natural Language Processing: Utilizes NLTK for advanced text tokenization and analysis.
- LaTeX Integration: Uses Jinja2 for potential LaTeX template rendering, useful for academic or scientific texts.
You can customize the visualization by modifying the interactive.js
file:
- Adjust the color scheme for highlighted words
- Modify the behavior of word selection and navigation
- Add new interactive features or buttons
- Customize the styling of various elements (words, control panels, popups)
To run this script with all features, you need to have the following Python libraries installed:
- numpy (1.21.0): For numerical computations
- matplotlib (3.4.2): For data visualization
- nltk (3.6.2): For natural language processing
- PyPDF2 (3.0.1): For PDF file support
- python-docx (0.8.11): For DOCX file support
- markdown (3.3.4): For Markdown file support
- beautifulsoup4 (4.9.3): For HTML parsing
- requests (2.25.1): For downloading files from URLs
- jinja2 (3.0.1): For template rendering
Development dependencies:
- pylint (2.8.3): For code linting
- black (21.6b0): For code formatting
You can install the required dependencies using the provided requirements.txt
file:
pip install -r requirements.txt
- If you encounter issues with PDF processing, ensure you have the correct version of PyPDF2 installed.
- For NLTK-related functions, you may need to download additional NLTK data. Refer to the NLTK documentation for details.
- When processing files from URLs, ensure you have a stable internet connection and the URL is accessible.
- If a specific file format fails to process, check that you have the necessary dependencies installed and the file is not corrupted.
For any further questions or issues, please refer to the script comments or reach out to the project maintainers.